[Buildbot-devel] Catching an exception when a slave is expected to disconnect
Charles Lepple
clepple at gmail.com
Fri Jun 3 10:15:21 UTC 2011
On Jun 3, 2011, at 1:15 AM, "Dustin J. Mitchell" <dustin at v.igoro.us> wrote:
> On Thu, Jun 2, 2011 at 12:14 PM, Charles Lepple <clepple at gmail.com> wrote:
>> I have a test case in one of my builders where I would like to run
>> some code on a slave, but at the end of the build, I want to reboot
>> the slave system (to ensure a clean slate for the next test).
>
> Mozilla does this too..
>
>> Through some combination of scheduling the reboot for some time in the
>> future, and having a MasterShellCommand run "sleep" while the slave
>> disconnects and reboots, things kinda-sorta worked in 0.8.2
>> (master-side; I think it was 0.8.3 on the slave side).
>>
>> After upgrading to 0.8.3 on the master, the disconnection turns into
>> an exception, which results in the same build being retried after the
>> slave reconnects. (I certainly see this as being useful in the general
>> case, but we would like to somehow override this such that
>> disconnection is expected, or at least not a problem, in the reboot
>> step.)
>>
>> Before I dive too deeply into the master-side disconnection handling
>> code, has anyone else encountered a similar situation?
>
> Mozilla's still using 0.8.2, but we've seen some similar problems.
> Among others, Twisted has a memory leak when slaves disconnect in this
> situation, and it also seems to occasionally "wedge" a slave such that
> the master will no longer reconfig.
Hmm, that might be an acceptable tradeoff if we need any post-0.8.2 features - we often restart the master rather than reconfigure.
>
> My plan (https://bugzilla.mozilla.org/show_bug.cgi?id=660080) is to
> change this to something like what Ben Clifford suggested. Basically,
> use the graceful-shutdown support in the master to shut down the
> slave, after setting some kind of flag on the slave to say "hey, when
> you shut down, reboot." This should let the master/slave disconnect
> cleanly, while still getting the reboot behavior.
We looked at ssh at one point, but it seemed easier to have the TCP connection go from slave to master - that way the master just starts a build when the slave connects, rather than monkeying with timeouts and retries to accommodate different boot times of the buildslaves.
I haven't looked much at the graceful-shutdown support yet- I thought it was just for shutting down the master itself? Then again, that was just based on a quick read of the 0.8.3 docs on authorizations, so I'm sure I missed something.
Thanks,
- Charles Lepple
More information about the devel
mailing list