[Buildbot-devel] spurious SIGHUP when running under Ubuntu hardy heron (buildbot 0.7.6)

Charles Lepple clepple at gmail.com
Fri Mar 7 01:21:21 UTC 2008


I have a Buildbot master running 0.7.5 on a Debian system, and I was
able to build successfully using an Ubuntu gutsy gibbon box (7.10)
running Buildbot 0.7.5 (as well as a handful of other 0.7.5 and 0.7.6
slaves).

Part of the root filesystem on the Ubuntu box got trashed, and so I
figured hey, I'll throw another hard drive in the box and install a
Hardy Heron pre-release. I copied over /var/lib/buildbot from the old
filesystem, and started the buildslave.

Whenever I fired off a build on the Hardy 0.7.6 buildslave, the source
checkout step would fail with "program finished with exit code -1". I
usually saw something like this in twistd.log:

2008/03/06 10:04 -0400 [-] command finished with signal 1, exit code None
2008/03/06 10:04 -0400 [-] _checkAbandoned [Failure instance:
Traceback: <class 'buildbot.slave.commands.AbandonChain'>: -1
        /usr/lib/python2.5/site-packages/buildbot/slave/commands.py:156:processEnded
        /usr/lib/python2.5/site-packages/buildbot/slave/commands.py:447:finished
        /usr/lib/python2.5/site-packages/twisted/internet/defer.py:239:callback
        /usr/lib/python2.5/site-packages/twisted/internet/defer.py:304:_startRunCallbacks
        --- <exception caught here> ---
        /usr/lib/python2.5/site-packages/twisted/internet/defer.py:317:_runCallbacks
        /usr/lib/python2.5/site-packages/buildbot/slave/commands.py:672:_abandonOnFailure
        ]
2008/03/06 10:04 -0400 [-]  abandoning chain -1
2008/03/06 10:04 -0400 [-] SlaveBuilder.commandComplete
<buildbot.slave.commands.SVN instance at 0x82c4cec>

Signal 1 seems to be SIGHUP, so I ran it under strace, and caught the following:

10774 close(2 <unfinished ...>
10758 <... select resumed> )            = 1 (in [7], left {18, 972000})
10758 read(7, 0x8498ac4, 8192)          = -1 EIO (Input/output error)
10758 close(7)                          = 0
10758 gettimeofday({1204849231, 195637}, NULL) = 0
10758 gettimeofday({1204849231, 195684}, NULL) = 0
10758 select(6, [4 5], [], [], {18, 969442} <unfinished ...>
10774 <... close resumed> )             = 0
10774 --- SIGHUP (Hangup) @ 0 (0) ---
10758 <... select resumed> )            = ? ERESTARTNOHAND (To be restarted)
10758 --- SIGCHLD (Child exited) @ 0 (0) ---
10758 sigreturn()                       = ? (mask now [])
10758 waitpid(10774, [{WIFSIGNALED(s) && WTERMSIG(s) == SIGHUP}],
WNOHANG) = 10774

(10758 is the buildbot slave, and 10774 is the buildstep child process
- in this case, 'rm'.)

Debian and Ubuntu have a variable in their startup scripts to let the
user prefix the buildbot command with other tools such as "nice". I
set it to "nohup nice" and things seem to build properly now.

However, I am curious as to what changed, and if there is something
simple that could be added to buildbot to work around this. It doesn't
seem like the slave side uses SIGHUP for much, if anything, but I am
somewhat mystified by the details of signal handling across process
groups, etc.

The versions of python and twisted are almost the same between gutsy
and hardy, with the only major difference being the buildbot version
number (0.7.5 vs 0.7.6). FWIW, I have "usepty = 1" in the slave
buildbot.tac.

Any ideas?

-- 
- Charles Lepple




More information about the devel mailing list