[Buildbot-commits] [Buildbot] #1792: BuildStep timeout detection does not kill child processes
Buildbot
nobody at buildbot.net
Fri Feb 4 14:23:21 UTC 2011
#1792: BuildStep timeout detection does not kill child processes
--------------------+-----------------------
Reporter: cortana | Owner:
Type: defect | Status: new
Priority: major | Milestone: undecided
Version: 0.8.2 | Keywords:
--------------------+-----------------------
I have noticed my buildslave machine becoming overloaded several times
recently. I believe this is caused by the following sequence of events:
1. 'make check' is run as part of a build
2. buildbot sends SIGKILL to the build process because it takes too long
3. only the top-level process is killed: child processes are not killed,
so the test suite continues to run!
4. buildbot kicks off another build...
The result is 8-9 copies of the test suite from improperly killed-off
builds hanging around, until I SSH in and kill all buildslave processes by
hand.
Possible solutions:
* when killing a BuildStep, issue it a SIGINT, instead of SIGKILL. In my
case, this would have allowed make to kill off all child processes
properly, as if I had hit Ctrl+C in a terminal.
* to guard against buggy build systems, however, you probably want to
send a SIGINT, then wait 10 seconds, then send a SIGKILL to the buildstep
*and all its child processes*. Either by hand, or using some kind of
session group magic from POSIX.
* I believe that in modern Linux kernels, the same can be achieved with
'cgroups'. Each build would go into its own cgroup, and then the
buildslave can kill all processes in a cgroup at once.
Workaround: increase 'timeout' property of the 'make check' BuildStep.
--
Ticket URL: <http://trac.buildbot.net/ticket/1792>
Buildbot <http://buildbot.net/>
Buildbot: build/test automation
More information about the Commits
mailing list