[Buildbot-devel] Catching an exception when a slave is expected to disconnect

Charles Lepple clepple at gmail.com
Fri Jun 3 10:15:21 UTC 2011


On Jun 3, 2011, at 1:15 AM, "Dustin J. Mitchell" <dustin at v.igoro.us> wrote:

> On Thu, Jun 2, 2011 at 12:14 PM, Charles Lepple <clepple at gmail.com> wrote:
>> I have a test case in one of my builders where I would like to run
>> some code on a slave, but at the end of the build, I want to reboot
>> the slave system (to ensure a clean slate for the next test).
> 
> Mozilla does this too..
> 
>> Through some combination of scheduling the reboot for some time in the
>> future, and having a MasterShellCommand run "sleep" while the slave
>> disconnects and reboots, things kinda-sorta worked in 0.8.2
>> (master-side; I think it was 0.8.3 on the slave side).
>> 
>> After upgrading to 0.8.3 on the master, the disconnection turns into
>> an exception, which results in the same build being retried after the
>> slave reconnects. (I certainly see this as being useful in the general
>> case, but we would like to somehow override this such that
>> disconnection is expected, or at least not a problem, in the reboot
>> step.)
>> 
>> Before I dive too deeply into the master-side disconnection handling
>> code, has anyone else encountered a similar situation?
> 
> Mozilla's still using 0.8.2, but we've seen some similar problems.
> Among others, Twisted has a memory leak when slaves disconnect in this
> situation, and it also seems to occasionally "wedge" a slave such that
> the master will no longer reconfig.

Hmm, that might be an acceptable tradeoff if we need any post-0.8.2 features - we often restart the master rather than reconfigure. 

> 
> My plan (https://bugzilla.mozilla.org/show_bug.cgi?id=660080) is to
> change this to something like what Ben Clifford suggested.  Basically,
> use the graceful-shutdown support in the master to shut down the
> slave, after setting some kind of flag on the slave to say "hey, when
> you shut down, reboot."  This should let the master/slave disconnect
> cleanly, while still getting the reboot behavior.

We looked at ssh at one point, but it seemed easier to have the TCP connection go from slave to master - that way the master just starts a build when the slave connects, rather than monkeying with timeouts and retries to accommodate different boot times of the buildslaves. 

I haven't looked much at the graceful-shutdown support yet- I thought it was just for shutting down the master itself?   Then again, that was just based on a quick read of the 0.8.3 docs on authorizations, so I'm sure I missed something. 

Thanks,
- Charles Lepple





More information about the devel mailing list