[Buildbot-commits] [Buildbot] #1792: BuildStep timeout detection does not kill child processes

Buildbot nobody at buildbot.net
Fri Feb 4 14:23:21 UTC 2011


#1792: BuildStep timeout detection does not kill child processes
--------------------+-----------------------
Reporter:  cortana  |      Owner:
    Type:  defect   |     Status:  new
Priority:  major    |  Milestone:  undecided
 Version:  0.8.2    |   Keywords:
--------------------+-----------------------
 I have noticed my buildslave machine becoming overloaded several times
 recently. I believe this is caused by the following sequence of events:

  1. 'make check' is run as part of a build
  2. buildbot sends SIGKILL to the build process because it takes too long
  3. only the top-level process is killed: child processes are not killed,
 so the test suite continues to run!
  4. buildbot kicks off another build...

 The result is 8-9 copies of the test suite from improperly killed-off
 builds hanging around, until I SSH in and kill all buildslave processes by
 hand.

 Possible solutions:

  * when killing a BuildStep, issue it a SIGINT, instead of SIGKILL. In my
 case, this would have allowed make to kill off all child processes
 properly, as if I had hit Ctrl+C in a terminal.
  * to guard against buggy build systems, however, you probably want to
 send a SIGINT, then wait 10 seconds, then send a SIGKILL to the buildstep
 *and all its child processes*. Either by hand, or using some kind of
 session group magic from POSIX.
  * I believe that in modern Linux kernels, the same can be achieved with
 'cgroups'. Each build would go into its own cgroup, and then the
 buildslave can kill all processes in a cgroup at once.

 Workaround: increase 'timeout' property of the 'make check' BuildStep.

-- 
Ticket URL: <http://trac.buildbot.net/ticket/1792>
Buildbot <http://buildbot.net/>
Buildbot: build/test automation


More information about the Commits mailing list