[Buildbot-commits] [Buildbot] #1047: Build slave sends kill signal to wrong pid

Buildbot buildbot-devel at lists.sourceforge.net
Tue Nov 9 05:56:10 UTC 2010


#1047: Build slave sends kill signal to wrong pid
-------------------+--------------------------------------------------------
Reporter:  ixokai  |       Owner:           
    Type:  defect  |      Status:  new      
Priority:  major   |   Milestone:  undecided
 Version:  0.8.2   |    Keywords:           
-------------------+--------------------------------------------------------

Old description:

> On my build slave, there's a test enabled currently that fails
> consistently by taking too long without output-- that's fine, however I
> noticed that when the slave started up another build later, the previous
> failed process never got killed.
>
> This is the build in question:
> http://www.python.org/dev/buildbot/builders/AMD64%20Leopard%20Bigmem%203.x/builds/147/steps/test/logs/stdio
>
> Of particular interest are these lines:
>
> {{{./python.exe -Wd -E -bb  ./Lib/test/regrtest.py -uall -rwW -M5.1G
> == CPython 3.2a3+ (py3k:86348, Nov 8 2010, 19:35:02) [GCC 4.0.1 (Apple
> Inc. build 5493)]
> ==   Darwin-9.8.0-i386-64bit little-endian
> ==   /Users/pythonbuildbot/buildarea/3.x.hansen-
> osx-x86-2/build/build/test_python_24024
>
> ...
>
> [ 67/349] test_bigmem
>
> command timed out: 1800 seconds without output, killing pid 24014
> process killed by signal 9
> program finished with exit code -1
> elapsedTime=1933.903858}}}
>
> If you notice in the top few lines, it mentions 24024 -- which is the pid
> of the python.exe process. I'm quite certain it is the actual pid, as
> while this entire test-run was going on I was monitoring it and watching
> its memory usage spike.
>
> But on the bottom, its killing pid 24014.
>
> I've done this a few times, and noticed that each time the pid it tries
> to kill is exactly -10 of the actual id of the process.
>
> Since the test that this is failing on is all about consuming huge
> amounts of memory-- the fact that the slave leaves the process running
> then goes on to start new ones is problematic. Several processes running
> trying to chew up gigs of ram is bad :)
>
> This slave is running Mac OSX 10.5.8, buildbot-slave 0.8.2, twisted
> 10.1.0, python 2.5.1.

New description:

 On my build slave, there's a test enabled currently that fails
 consistently by taking too long without output-- that's fine, however I
 noticed that when the slave started up another build later, the previous
 failed process never got killed.

 This is the build in question:
 http://www.python.org/dev/buildbot/builders/AMD64%20Leopard%20Bigmem%203.x/builds/147/steps/test/logs/stdio

 Of particular interest are these lines:

 {{{
 ./python.exe -Wd -E -bb  ./Lib/test/regrtest.py -uall -rwW -M5.1G
 == CPython 3.2a3+ (py3k:86348, Nov 8 2010, 19:35:02) [GCC 4.0.1 (Apple
 Inc. build 5493)]
 ==   Darwin-9.8.0-i386-64bit little-endian
 ==   /Users/pythonbuildbot/buildarea/3.x.hansen-
 osx-x86-2/build/build/test_python_24024

 ...

 [ 67/349] test_bigmem

 command timed out: 1800 seconds without output, killing pid 24014
 process killed by signal 9
 program finished with exit code -1
 elapsedTime=1933.903858
 }}}

 If you notice in the top few lines, it mentions 24024 -- which is the pid
 of the python.exe process. I'm quite certain it is the actual pid, as
 while this entire test-run was going on I was monitoring it and watching
 its memory usage spike.

 But on the bottom, its killing pid 24014.

 I've done this a few times, and noticed that each time the pid it tries to
 kill is exactly -10 of the actual id of the process.

 Since the test that this is failing on is all about consuming huge amounts
 of memory-- the fact that the slave leaves the process running then goes
 on to start new ones is problematic. Several processes running trying to
 chew up gigs of ram is bad :)

 This slave is running Mac OSX 10.5.8, buildbot-slave 0.8.2, twisted
 10.1.0, python 2.5.1.

--

Comment(by dustin):

 I suspect that what's happening is Buildbot is trying to kill the parent
 shell process, since you're running "./python.exe ....", which is executed
 via {{{sh -c}}}.  It seems that the child Python process removes itself
 from the session so that it doesn't get the kill signal?  Can you verify
 what process is (or was) at the PID it tries to kill?

-- 
Ticket URL: <http://buildbot.net/trac/ticket/1047#comment:1>
Buildbot <http://buildbot.net/>
Buildbot: build/test automation


More information about the Commits mailing list