[Buildbot-commits] [Buildbot] #1047: Build slave sends kill signal to wrong pid

Buildbot buildbot-devel at lists.sourceforge.net
Tue Nov 9 05:39:22 UTC 2010

#1047: Build slave sends kill signal to wrong pid
Reporter:  ixokai  |       Owner:           
    Type:  defect  |      Status:  new      
Priority:  major   |   Milestone:  undecided
 Version:  0.8.2   |    Keywords:           
 On my build slave, there's a test enabled currently that fails
 consistently by taking too long without output-- that's fine, however I
 noticed that when the slave started up another build later, the previous
 failed process never got killed.

 This is the build in question:

 Of particular interest are these lines:

 {{{./python.exe -Wd -E -bb  ./Lib/test/regrtest.py -uall -rwW -M5.1G
 == CPython 3.2a3+ (py3k:86348, Nov 8 2010, 19:35:02) [GCC 4.0.1 (Apple
 Inc. build 5493)]
 ==   Darwin-9.8.0-i386-64bit little-endian
 ==   /Users/pythonbuildbot/buildarea/3.x.hansen-


 [ 67/349] test_bigmem

 command timed out: 1800 seconds without output, killing pid 24014
 process killed by signal 9
 program finished with exit code -1

 If you notice in the top few lines, it mentions 24024 -- which is the pid
 of the python.exe process. I'm quite certain it is the actual pid, as
 while this entire test-run was going on I was monitoring it and watching
 its memory usage spike.

 But on the bottom, its killing pid 24014.

 I've done this a few times, and noticed that each time the pid it tries to
 kill is exactly -10 of the actual id of the process.

 Since the test that this is failing on is all about consuming huge amounts
 of memory-- the fact that the slave leaves the process running then goes
 on to start new ones is problematic. Several processes running trying to
 chew up gigs of ram is bad :)

 This slave is running Mac OSX 10.5.8, buildbot-slave 0.8.2, twisted
 10.1.0, python 2.5.1.

