[Buildbot-devel] buildslave connects briefly after graceful shutdown?
Dustin J. Mitchell
dustin at v.igoro.us
Wed Jan 9 04:43:21 UTC 2013
On Mon, Jan 7, 2013 at 7:30 PM, Dan Kegel <dank at kegel.com> wrote:
> Or triggers that weird 'building but not building' state that I've been
> complaining about, which requires restarting the master to recover from.
Definitely worth investigating. If you can reproduce reliably, can
you add some prints to bot.py to figure out what's going on?
The code looks like this:
log.msg("slave shutting down on command from master")
# there's no good way to learn that the PB response has been delivered,
# so we'll just wait a bit, in hopes the master hears back.
# resilinet to slaves dropping their connections, so there is no harm
# if this timeout is too short.
which should have you thinking "OMG GIANT HACK"
Without this delay, the slave never sends the packet indicating that
the RPC shutdown call has completed, so the master gets a
ConnectionLost exception. This isn't a big problem, as the comment
indicates, but it makes it tricky to distinguish a graceful shutdown
from a slave failure.
I can see how this would explain being able to start a build on a
graceful'd slave. That's partly the fault of the protocol design
(where the slave doesn't have a good way to know if it's in the middle
of a build, and only barely knows if it's in the middle of a step),
and partly the fault of the 200ms hack shown above.
I'm not sure how this would explain the reconnect, though: the slave
doesn't *drop* the connection until it exits (even after reactor.stop,
as far as I understand, the TCP socket is still open).
More information about the devel