[Buildbot-devel] Another twisted server is running

Brian Warner warner-buildbot at lothar.com
Fri Oct 13 07:12:16 UTC 2006


"Roy S. Rapoport" <buildbot-devel at ols.inorganic.org> writes:

> I suspect some sort of part of restarting that takes much longer when
> you've got 157 builders than normal, and buildbot's not waiting long
> enough.  Any suggestions as to what I can do? Is there any information I
> can submit to help debug this issue?

Ooh, yeah :).

You can verify that this is happening by looking at twistd.log. When the
'buildbot stop' or 'buildbot restart' command begins, you should see a
"Received SIGTERM" line in the log. When the process finally exits, you
should see a "Server Shut Down" message in the log. If you can measure the
time that elapses between these two messages, and it is greater than the 5
second timeout built into 'buildbot stop' and 'buildbot restart', then you've
spotted the problem.

(the standard timestamp format in twistd.log doesn't include seconds, but you
can probably eyeball with 'tail -f' and a stopwatch. You might also try using
'multilog', from djb's daemontools package, and something like 'tail -f -s
0.1 twistd.log |multilog t ./logs' and then 'cat logs/current |tai64nlocal',
but that's kind of overkill).

The easiest way to deal with this is to edit buildbot/scripts/runner.py (grep
for 'def stop'), replacing the "while timer < 5" with a larger number. I'd be
willing to accept a patch that made this timeout configurable if you think it
would help.

I'd be interested to know what's making it take so long. I think there is
some build-status saving going on at shutdown time, but not a whole lot else,
so I'm kind of surprised that even 157 builders can't be shutdown in less
than 5 seconds. You might use something like 'ls -ltr' or 'find -cmin -1 .
-ls' to find all the buildmaster files that were changed in the last minute
and see if there are a lot of them or not.

cheers,
 -Brian




More information about the devel mailing list