From e8f55568de602630e0ce60f7863bdaeeea26c1c0 Mon Sep 17 00:00:00 2001 From: Jeff King Date: Thu, 9 Sep 2021 18:57:57 -0400 Subject: t5562: use alarm() to interrupt timed child-wait MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The t5562 script occasionally takes 60 extra seconds to complete due to a race condition in the invoke-with-content-length.pl helper. The way it's supposed to work is this: - we set up a SIGCLD handler - we kick off http-backend and write to it with a set content-length, but _don't_ close the pipe - we sleep for 60 seconds, assuming that SIGCLD from http-backend finishing will interrupt us - after the sleep finishes (whetherby 60 seconds or because it was interrupted by the signal), we check a flag to see if our SIGCLD handler was called. If not, then we complain. This usually completes immediately, because the signal interrupts our sleep. But very occasionally the child process dies _before_ we hit the sleep, so we don't realize it. The test still completes successfully (because our $exited flag is set), but it takes an extra 60 seconds. There's no way to check the flag and sleep atomically. So the best we can do with this approach is to sleep in smaller chunks (say, 1 second) and check the flag incrementally. Then we waste a maximum of 1 second if we lose the race. This was proposed in: https://lore.kernel.org/git/20190218205028.32486-1-max@max630.net/ and it does work. But we can do better. Instead of blocking on sleep and waiting for the child signal to interrupt us, we can block on the child exiting and set an alarm signal to trigger the timeout. This lets us exit the script immediately when the child behaves (with no race possible), and wait a maximum of 60 seconds when it doesn't. Note one small subtlety: perl is very willing to restart the waitpid() call after the alarm is delivered, even if we've thrown an exception via die. "perldoc -f alarm" claims you can get around this with an eval/die combo (and even has some example code), but it doesn't seem to work for me with waitpid(); instead, we continue waiting until the child exits. So instead, we'll instruct the child process to exit in the alarm handler itself. In the original code this was done by calling close($out). That would continue to work, since our child is always http-backend, which should exit when its stdin closes. But we can be even more robust against a hung or confused child by sending a KILL signal, which should terminate it immediately. Reported-by: SZEDER Gábor Signed-off-by: Jeff King Signed-off-by: Junio C Hamano diff --git a/t/t5562/invoke-with-content-length.pl b/t/t5562/invoke-with-content-length.pl index 0943474..718dd9b 100644 --- a/t/t5562/invoke-with-content-length.pl +++ b/t/t5562/invoke-with-content-length.pl @@ -13,11 +13,6 @@ my $body_data; defined read($body_fh, $body_data, $body_size) or die "Cannot read $body_filename: $!"; close($body_fh); -my $exited = 0; -$SIG{"CHLD"} = sub { - $exited = 1; -}; - # write data my $pid = open(my $out, "|-", @command); { @@ -29,8 +24,13 @@ my $pid = open(my $out, "|-", @command); } print $out $body_data or die "Cannot write data: $!"; -sleep 60; # is interrupted by SIGCHLD -if (!$exited) { - close($out); +$SIG{ALRM} = sub { + kill 'KILL', $pid; die "Command did not exit after reading whole body"; +}; +alarm 60; + +my $ret = waitpid($pid, 0); +if ($ret != $pid) { + die "confusing return from waitpid: $ret"; } -- cgit v0.10.2-6-g49f6