[Buildbot-devel] Catching an exception when a slave is expected to disconnect

Dustin J. Mitchell dustin at v.igoro.us
Fri Jun 3 05:15:59 UTC 2011


On Thu, Jun 2, 2011 at 12:14 PM, Charles Lepple <clepple at gmail.com> wrote:
> I have a test case in one of my builders where I would like to run
> some code on a slave, but at the end of the build, I want to reboot
> the slave system (to ensure a clean slate for the next test).

Mozilla does this too..

> Through some combination of scheduling the reboot for some time in the
> future, and having a MasterShellCommand run "sleep" while the slave
> disconnects and reboots, things kinda-sorta worked in 0.8.2
> (master-side; I think it was 0.8.3 on the slave side).
>
> After upgrading to 0.8.3 on the master, the disconnection turns into
> an exception, which results in the same build being retried after the
> slave reconnects. (I certainly see this as being useful in the general
> case, but we would like to somehow override this such that
> disconnection is expected, or at least not a problem, in the reboot
> step.)
>
> Before I dive too deeply into the master-side disconnection handling
> code, has anyone else encountered a similar situation?

Mozilla's still using 0.8.2, but we've seen some similar problems.
Among others, Twisted has a memory leak when slaves disconnect in this
situation, and it also seems to occasionally "wedge" a slave such that
the master will no longer reconfig.

My plan (https://bugzilla.mozilla.org/show_bug.cgi?id=660080) is to
change this to something like what Ben Clifford suggested.  Basically,
use the graceful-shutdown support in the master to shut down the
slave, after setting some kind of flag on the slave to say "hey, when
you shut down, reboot."  This should let the master/slave disconnect
cleanly, while still getting the reboot behavior.

Dustin




More information about the devel mailing list