Jason Edgecombe jason at rampaginggeek.com
Thu Mar 12 00:06:47 UTC 2015

On 03/11/2015 02:07 PM, Mikhail Sobolev wrote:
> Hi Jason,
> On Tue, Mar 10, 2015 at 07:54:21PM -0400, Jason Edgecombe wrote:
>> On 03/10/2015 02:33 PM, Mikhail Sobolev wrote:
>>> On Tue, Mar 10, 2015 at 09:32:00AM -0400, Jason Edgecombe wrote:
>>>> We maintain a buildbot farm of volunteer slaves. A subset of the slaves
>>>> use Gerrit as a changesource. Occasionally, one of the gerrit slaves is
>>>> down, and can be down for hours until the slave admin can fix it. During
>>>> the outage, gerrit changes are blocked from being built and receiving
>>>> status updates on the builds. Is there a way to dynamically exclude the
>>>> offline builders from the gerrit pool?
>>> Could you please elaborate a bit?  My understanding is that if a build slave is
>>> down, it won't get any jobs, how gerrit changes get blocked?
>>>> For reference, my buildbot config file is at
>>>> https://github.com/edgester/afsbotcfg/blob/master/master.cfg
>>> Thanks for the link.  I'm looking at it now to see if I understand the problem
>>> better.
>> We use a summary callback in the Buildbot config, so that only one
>> comment is posted to gerrit when all of the slaves are done. This is
>> fine, but new gerrit changes will typically trigger a build on all of
>> the gerrit builders, even the ones that are down at the time of submission.
> Let me see if I understand the problem correctly.
> There's a number of platforms that you'd like to check the things against.  So
> a build is created for each of those platforms.  However when for one of the
> platforms all Gerrit capable build slaves are down, the things get stuck.
> If this is the case, then I do not think it's possible to work around this: a
> build for each platform has to be performed.
When a build is submitted or started, buildbot already knows which 
builders are online. Is there a way to instruct the buildmaster to 
submit the change to all builders that are online at that time? I 
understand that changes that are currently building, and possibly those 
in-queue are hosed, but I would like for newly-submitted (and possibly 
queued) changes to act as if the downed builder is not in the build list.

The problem is that one of the admins must intervene to reconfigure the 
buildmaster to ignore the downed slave in order to restore partial 
service. Since the admins, like myself, are volunteers and possibly in 
different timezones, and other developers may be waiting for half a day 
until the admin gets a chance to intervene.


