Bug 657

Summary: sudo hangs on poll() and never exits
Product: Sudo Reporter: Andrew <aahamlin>
Component: SudoAssignee: Todd C. Miller <Todd.Miller>
Status: RESOLVED FIXED    
Severity: normal CC: kshpitsa
Priority: low    
Version: 1.8.10   
Hardware: PC   
OS: Linux   
Bug Depends on:    
Bug Blocks: 677    
Attachments: debug log all@debug
Make sure we don't treat SIGCHLD or SIGALARM as user-generated.
Use a socketpair instead of a pipe for the signals.
Add syncronization between parent and child when executing a command.
Only handle one signal at a time in in signal_pipe_cb() before returning back to the event loop.
Don't allow sudo_ev_loopcont() to override sudo_ev_loopexit().
Instead of calling sudo_ev_loopexit() just remove the events from the base.
signal debugging diff that writes to stderr
Log of signal_debug.diff patch - all@debug
strace without pam_gnome_keyring.so
strace with pam_gnome_keyring.so
Patch to defer SIGCHLD handler installation until immediately before fork

Description Andrew 2014-08-21 13:20:34 MDT
Created attachment 416 [details]
debug log all@debug

sudo never exits. User must kill the process, e.g. su -c 'kill `pidof sudo`'

For example,
$ sudo echo hello
hello

Also seen after switching to root shell via sudo -s and then calling `exit`.

Attached log contains debug log statements, leading up to the the hang:
Aug 21 13:49:46 sudo[24375] -> sudo_ev_scan_impl @ ./event_poll.c:140

Coredump backtrace shows:
#0  0x00007fd06f9490c0 in poll () from /lib64/libc.so.6
(gdb) bt
#0  0x00007fd06f9490c0 in poll () from /lib64/libc.so.6
#1  0x00007fd0702504d5 in poll (__timeout=<optimized out>,
    __nfds=<optimized out>, __fds=<optimized out>)
    at /usr/include/bits/poll2.h:46
#2  sudo_ev_scan_impl (base=0x7fd07194aac0, flags=<optimized out>)
    at ./event_poll.c:153
#3  0x00007fd07024b407 in sudo_ev_loop (base=0x7fd07194aac0, flags=0)
    at ./event.c:268
#4  0x00007fd07023df7a in sudo_execute (
    details=details@entry=0x7fd07045b600 <command_details>,
    cstat=cstat@entry=0x7fffb97a18b0) at ./exec.c:481
#5  0x00007fd070246ebf in run_command (
    details=0x7fd07045b600 <command_details>) at ./sudo.c:1026
#6  0x00007fd07023ba7e in main (argc=<optimized out>, argv=<optimized out>,
    envp=<optimized out>) at ./sudo.c:288


NOTE: I also saw the same behavior on v1.8.6 but the hang was on select() in exec.c:376. From the history I see that that code has been refactored to poll() call shown above.
Comment 1 Todd C. Miller 2014-08-21 15:27:08 MDT
It sounds like the SIGCHLD is not getting processed for some reason.  Does this happen for longer running processes, like shells, or just short-lived ones such as echo and id?
Comment 2 Andrew 2014-08-21 17:30:07 MDT
(In reply to Todd C Miller from comment #1)
> It sounds like the SIGCHLD is not getting processed for some reason.
> Does this happen for longer running processes, like shells, or just
> short-lived ones such as echo and id?

I have seen the behavior (non-exiting of parent sudo) when attempting to exit from 'sudo -s', as well. Though I only captured logs and coredumps from the simple echo command. 

I can collect whatever you'd think helpful.

I am running Gentoo and there were updates to both pam and glibc libs earlier this summer - when I first started seeing this odd behavior. Sudo 1.8.6 had been running fine for me up until then. I am sure it has something to with a change external to sudo. Presumably, something in a dependency has adversely affected sudo. 

I have not found any bug reports in the Gentoo community from others experiencing this...
Comment 3 Todd C. Miller 2014-08-22 14:01:54 MDT
Created attachment 417 [details]
Make sure we don't treat SIGCHLD or SIGALARM as user-generated.
Comment 4 Todd C. Miller 2014-08-22 14:02:23 MDT
Created attachment 418 [details]
Use a socketpair instead of a pipe for the signals.
Comment 5 Todd C. Miller 2014-08-22 14:02:55 MDT
Created attachment 419 [details]
Add syncronization between parent and child when executing a command.
Comment 6 Todd C. Miller 2014-08-22 14:04:11 MDT
I've attached three patches you can try; they all do different things.  I've been unable to find why sudo would not be receiving SIGCHLD but if any of the patches changes the behavior please let me know.
Comment 7 Andrew 2014-08-23 19:35:32 MDT
(In reply to Todd C Miller from comment #6)
> I've attached three patches you can try; they all do different
> things.  I've been unable to find why sudo would not be receiving
> SIGCHLD but if any of the patches changes the behavior please let me
> know.

Thanks for the patches. I tried all three this afternoon. Unfortunately, there was no difference in the result. They each still seem to hang at event_poll.c:140
Comment 8 Todd C. Miller 2014-08-26 16:04:32 MDT
I'm not too surprised.  It seems more likely that a kernel change has triggered the changed behavior.  Can you verify that the problem is still present in the latest sudo beta version, 1.8.11b2?  http://www.sudo.ws/dist/beta/sudo-1.8.11b2.tar.gz
Comment 9 Todd C. Miller 2014-08-27 06:40:19 MDT
It would also be useful to know whether adding:

Defaults use_pty

to sudoers changes the behavior at all.
Comment 10 Andrew 2014-08-27 10:47:40 MDT
(In reply to Todd C Miller from comment #9)
> It would also be useful to know whether adding:
> 
> Defaults use_pty
> 
> to sudoers changes the behavior at all.

Yes. This fixed the behavior.
Comment 11 Andrew 2014-08-28 07:44:47 MDT
(In reply to Todd C Miller from comment #8)
> I'm not too surprised.  It seems more likely that a kernel change
> has triggered the changed behavior.  Can you verify that the problem
> is still present in the latest sudo beta version, 1.8.11b2? 
> http://www.sudo.ws/dist/beta/sudo-1.8.11b2.tar.gz

The problem still exists in 1.8.11b2.
Comment 12 Todd C. Miller 2014-08-28 15:07:33 MDT
The content of attachment 417 [details] has been deleted for the following reason:

Patch has no effect
Comment 13 Todd C. Miller 2014-08-28 15:07:56 MDT
The content of attachment 418 [details] has been deleted for the following reason:

Patch has no effect
Comment 14 Todd C. Miller 2014-08-28 15:08:16 MDT
The content of attachment 419 [details] has been deleted for the following reason:

Patch has no effect
Comment 15 Todd C. Miller 2014-08-28 15:08:59 MDT
Created attachment 421 [details]
Only handle one signal at a time in in signal_pipe_cb() before returning back to the event loop.
Comment 16 Todd C. Miller 2014-08-28 15:09:34 MDT
Created attachment 422 [details]
Don't allow sudo_ev_loopcont() to override sudo_ev_loopexit().
Comment 17 Todd C. Miller 2014-08-28 15:09:59 MDT
Created attachment 423 [details]
Instead of calling sudo_ev_loopexit() just remove the events from the base.
Comment 18 Todd C. Miller 2014-08-28 15:13:18 MDT
I've attached three new patches (421-423) that are relative to sudo 1.8.10p3 (I can provide patches to 1.8.11b2 also).  If you could try them out in order and let me know if the behavior in the non-use_pty case is fixed that would be great.  Please apply the patches cumulatively as the later patches may not apply without the earlier ones.

Thanks!
Comment 19 Andrew 2014-09-01 08:19:37 MDT
(In reply to Todd C Miller from comment #18)
> I've attached three new patches (421-423) that are relative to sudo
> 1.8.10p3 (I can provide patches to 1.8.11b2 also).  If you could try
> them out in order and let me know if the behavior in the non-use_pty
> case is fixed that would be great.  Please apply the patches
> cumulatively as the later patches may not apply without the earlier
> ones.
> 
> Thanks!

I tried the patches but they did not resolve the issue. 

Are there kernel settings that may influence the event handling that I should try? I am running Linux 3.12.21.
Comment 20 Todd C. Miller 2014-09-02 15:35:10 MDT
Created attachment 424 [details]
signal debugging diff that writes to stderr

Looking through the debug output again the only conclusion I can come to is that either SIGCHLD is not being delivered for some reason or the write to the signal pipe is failing.  The attached patch will write to stderr when the signal is received and report write errors on the signal pipe.

You should see output like this:

$ sudo echo hi
hi
received signal: Child exited
Comment 21 Andrew 2014-09-03 06:44:37 MDT
Created attachment 425 [details]
Log of signal_debug.diff patch - all@debug
Comment 22 Andrew 2014-09-03 06:46:06 MDT
Comment on attachment 425 [details]
Log of signal_debug.diff patch - all@debug

~ $ sudo echo hi
Password: 
hi


No signal is printed.
Comment 23 Todd C. Miller 2014-09-03 07:08:45 MDT
If sudo never receive SIGCHLD then it won't know the command has completed.  Can you verify that the echo command actually did exit and is no longer running?
Comment 24 Kyrylo Shpytsya 2014-11-16 12:48:55 MST
Created attachment 438 [details]
strace without pam_gnome_keyring.so
Comment 25 Kyrylo Shpytsya 2014-11-16 12:49:53 MST
Created attachment 439 [details]
strace with pam_gnome_keyring.so
Comment 26 Kyrylo Shpytsya 2014-11-16 13:03:31 MST
Running 1.8.11_p2 with patch from (http://www.sudo.ws/bugs/show_bug.cgi?id=676, which apparently makes no difference) on Linux kernel 3.16.0, gentoo.

The "Defaults use_pty" fixes the issue. Without it, I have -- by luck or intuition -- pinpointed that removal of pam_gnome_keyring.so from pam config fixes the problem. I have attached two straces, one with and one without pam_gnome_keyring. Traces where obtained by running 

strace -f -s 200 -o /tmp/sudo.pam_gnome_keyring.strace -t sudo sleep 3

as a root.

sudo debugging has been enabled, so the debug output can also be seen in strace (look for "/writev.*Nov")

In case with pam_gnome_keyring, the "sleep" process has been left in zombie state and I had to "kill -9" both "sudo" and "strace".
Comment 27 Todd C. Miller 2014-11-17 08:27:01 MST
Created attachment 440 [details]
Patch to defer SIGCHLD handler installation until immediately before fork

It looks like pam_gnome_keyring has a bug where it modifies the SIGCHLD handler but does not set it back to the original value before returning.  Please try the attached patch which defers setup of the SIGCHLD handler until after all PAM session modules have run.
Comment 28 Kyrylo Shpytsya 2014-11-17 09:03:47 MST
This patch fixes the problem. :)
Comment 29 Kyrylo Shpytsya 2014-11-18 11:33:31 MST
"sudo" no longer hangs on exit, but instead, I have an orphaned gnome-keyring-daemon for each "sudo" invocation. Could probably be a bug in gnome-keyring-daemon.
Comment 30 Todd C. Miller 2014-11-18 13:24:56 MST
That sounds like a bug in pam_gnome_keyring.  Looking at the code (version 3.14.0) it does wait for the gnome-keyring-daemon to exit.  However, it will not wait for the child if there is an error reading stdout and stderr.
Comment 31 Todd C. Miller 2014-11-21 08:08:28 MST
FYI, the fix will be in sudo 1.8.12 which is currently in beta.
Comment 32 Todd C. Miller 2015-02-09 15:14:37 MST
Fixed in sudo 1.8.12, available now.