Bugzilla – Bug 657
sudo hangs on poll() and never exits
Last modified: 2015-02-09 15:14:37 MST
Created attachment 416 [details] debug log all@debug sudo never exits. User must kill the process, e.g. su -c 'kill `pidof sudo`' For example, $ sudo echo hello hello Also seen after switching to root shell via sudo -s and then calling `exit`. Attached log contains debug log statements, leading up to the the hang: Aug 21 13:49:46 sudo[24375] -> sudo_ev_scan_impl @ ./event_poll.c:140 Coredump backtrace shows: #0 0x00007fd06f9490c0 in poll () from /lib64/libc.so.6 (gdb) bt #0 0x00007fd06f9490c0 in poll () from /lib64/libc.so.6 #1 0x00007fd0702504d5 in poll (__timeout=<optimized out>, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:46 #2 sudo_ev_scan_impl (base=0x7fd07194aac0, flags=<optimized out>) at ./event_poll.c:153 #3 0x00007fd07024b407 in sudo_ev_loop (base=0x7fd07194aac0, flags=0) at ./event.c:268 #4 0x00007fd07023df7a in sudo_execute ( details=details@entry=0x7fd07045b600 <command_details>, cstat=cstat@entry=0x7fffb97a18b0) at ./exec.c:481 #5 0x00007fd070246ebf in run_command ( details=0x7fd07045b600 <command_details>) at ./sudo.c:1026 #6 0x00007fd07023ba7e in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at ./sudo.c:288 NOTE: I also saw the same behavior on v1.8.6 but the hang was on select() in exec.c:376. From the history I see that that code has been refactored to poll() call shown above.
It sounds like the SIGCHLD is not getting processed for some reason. Does this happen for longer running processes, like shells, or just short-lived ones such as echo and id?
(In reply to Todd C Miller from comment #1) > It sounds like the SIGCHLD is not getting processed for some reason. > Does this happen for longer running processes, like shells, or just > short-lived ones such as echo and id? I have seen the behavior (non-exiting of parent sudo) when attempting to exit from 'sudo -s', as well. Though I only captured logs and coredumps from the simple echo command. I can collect whatever you'd think helpful. I am running Gentoo and there were updates to both pam and glibc libs earlier this summer - when I first started seeing this odd behavior. Sudo 1.8.6 had been running fine for me up until then. I am sure it has something to with a change external to sudo. Presumably, something in a dependency has adversely affected sudo. I have not found any bug reports in the Gentoo community from others experiencing this...
Created attachment 417 [details] Make sure we don't treat SIGCHLD or SIGALARM as user-generated.
Created attachment 418 [details] Use a socketpair instead of a pipe for the signals.
Created attachment 419 [details] Add syncronization between parent and child when executing a command.
I've attached three patches you can try; they all do different things. I've been unable to find why sudo would not be receiving SIGCHLD but if any of the patches changes the behavior please let me know.
(In reply to Todd C Miller from comment #6) > I've attached three patches you can try; they all do different > things. I've been unable to find why sudo would not be receiving > SIGCHLD but if any of the patches changes the behavior please let me > know. Thanks for the patches. I tried all three this afternoon. Unfortunately, there was no difference in the result. They each still seem to hang at event_poll.c:140
I'm not too surprised. It seems more likely that a kernel change has triggered the changed behavior. Can you verify that the problem is still present in the latest sudo beta version, 1.8.11b2? http://www.sudo.ws/dist/beta/sudo-1.8.11b2.tar.gz
It would also be useful to know whether adding: Defaults use_pty to sudoers changes the behavior at all.
(In reply to Todd C Miller from comment #9) > It would also be useful to know whether adding: > > Defaults use_pty > > to sudoers changes the behavior at all. Yes. This fixed the behavior.
(In reply to Todd C Miller from comment #8) > I'm not too surprised. It seems more likely that a kernel change > has triggered the changed behavior. Can you verify that the problem > is still present in the latest sudo beta version, 1.8.11b2? > http://www.sudo.ws/dist/beta/sudo-1.8.11b2.tar.gz The problem still exists in 1.8.11b2.
The content of attachment 417 [details] has been deleted for the following reason: Patch has no effect
The content of attachment 418 [details] has been deleted for the following reason: Patch has no effect
The content of attachment 419 [details] has been deleted for the following reason: Patch has no effect
Created attachment 421 [details] Only handle one signal at a time in in signal_pipe_cb() before returning back to the event loop.
Created attachment 422 [details] Don't allow sudo_ev_loopcont() to override sudo_ev_loopexit().
Created attachment 423 [details] Instead of calling sudo_ev_loopexit() just remove the events from the base.
I've attached three new patches (421-423) that are relative to sudo 1.8.10p3 (I can provide patches to 1.8.11b2 also). If you could try them out in order and let me know if the behavior in the non-use_pty case is fixed that would be great. Please apply the patches cumulatively as the later patches may not apply without the earlier ones. Thanks!
(In reply to Todd C Miller from comment #18) > I've attached three new patches (421-423) that are relative to sudo > 1.8.10p3 (I can provide patches to 1.8.11b2 also). If you could try > them out in order and let me know if the behavior in the non-use_pty > case is fixed that would be great. Please apply the patches > cumulatively as the later patches may not apply without the earlier > ones. > > Thanks! I tried the patches but they did not resolve the issue. Are there kernel settings that may influence the event handling that I should try? I am running Linux 3.12.21.
Created attachment 424 [details] signal debugging diff that writes to stderr Looking through the debug output again the only conclusion I can come to is that either SIGCHLD is not being delivered for some reason or the write to the signal pipe is failing. The attached patch will write to stderr when the signal is received and report write errors on the signal pipe. You should see output like this: $ sudo echo hi hi received signal: Child exited
Created attachment 425 [details] Log of signal_debug.diff patch - all@debug
Comment on attachment 425 [details] Log of signal_debug.diff patch - all@debug ~ $ sudo echo hi Password: hi No signal is printed.
If sudo never receive SIGCHLD then it won't know the command has completed. Can you verify that the echo command actually did exit and is no longer running?
Created attachment 438 [details] strace without pam_gnome_keyring.so
Created attachment 439 [details] strace with pam_gnome_keyring.so
Running 1.8.11_p2 with patch from (http://www.sudo.ws/bugs/show_bug.cgi?id=676, which apparently makes no difference) on Linux kernel 3.16.0, gentoo. The "Defaults use_pty" fixes the issue. Without it, I have -- by luck or intuition -- pinpointed that removal of pam_gnome_keyring.so from pam config fixes the problem. I have attached two straces, one with and one without pam_gnome_keyring. Traces where obtained by running strace -f -s 200 -o /tmp/sudo.pam_gnome_keyring.strace -t sudo sleep 3 as a root. sudo debugging has been enabled, so the debug output can also be seen in strace (look for "/writev.*Nov") In case with pam_gnome_keyring, the "sleep" process has been left in zombie state and I had to "kill -9" both "sudo" and "strace".
Created attachment 440 [details] Patch to defer SIGCHLD handler installation until immediately before fork It looks like pam_gnome_keyring has a bug where it modifies the SIGCHLD handler but does not set it back to the original value before returning. Please try the attached patch which defers setup of the SIGCHLD handler until after all PAM session modules have run.
This patch fixes the problem. :)
"sudo" no longer hangs on exit, but instead, I have an orphaned gnome-keyring-daemon for each "sudo" invocation. Could probably be a bug in gnome-keyring-daemon.
That sounds like a bug in pam_gnome_keyring. Looking at the code (version 3.14.0) it does wait for the gnome-keyring-daemon to exit. However, it will not wait for the child if there is an error reading stdout and stderr.
FYI, the fix will be in sudo 1.8.12 which is currently in beta.
Fixed in sudo 1.8.12, available now.