[Buildbot-devel] Ignoring offline builders/slaves
Jason Edgecombe
jason at rampaginggeek.com
Thu Mar 12 00:06:47 UTC 2015
On 03/11/2015 02:07 PM, Mikhail Sobolev wrote:
> Hi Jason,
>
> On Tue, Mar 10, 2015 at 07:54:21PM -0400, Jason Edgecombe wrote:
>> On 03/10/2015 02:33 PM, Mikhail Sobolev wrote:
>>> On Tue, Mar 10, 2015 at 09:32:00AM -0400, Jason Edgecombe wrote:
>>>> We maintain a buildbot farm of volunteer slaves. A subset of the slaves
>>>> use Gerrit as a changesource. Occasionally, one of the gerrit slaves is
>>>> down, and can be down for hours until the slave admin can fix it. During
>>>> the outage, gerrit changes are blocked from being built and receiving
>>>> status updates on the builds. Is there a way to dynamically exclude the
>>>> offline builders from the gerrit pool?
>>> Could you please elaborate a bit? My understanding is that if a build slave is
>>> down, it won't get any jobs, how gerrit changes get blocked?
>>>
>>>> For reference, my buildbot config file is at
>>>> https://github.com/edgester/afsbotcfg/blob/master/master.cfg
>>> Thanks for the link. I'm looking at it now to see if I understand the problem
>>> better.
>> We use a summary callback in the Buildbot config, so that only one
>> comment is posted to gerrit when all of the slaves are done. This is
>> fine, but new gerrit changes will typically trigger a build on all of
>> the gerrit builders, even the ones that are down at the time of submission.
> Let me see if I understand the problem correctly.
>
> There's a number of platforms that you'd like to check the things against. So
> a build is created for each of those platforms. However when for one of the
> platforms all Gerrit capable build slaves are down, the things get stuck.
>
> If this is the case, then I do not think it's possible to work around this: a
> build for each platform has to be performed.
>
When a build is submitted or started, buildbot already knows which
builders are online. Is there a way to instruct the buildmaster to
submit the change to all builders that are online at that time? I
understand that changes that are currently building, and possibly those
in-queue are hosed, but I would like for newly-submitted (and possibly
queued) changes to act as if the downed builder is not in the build list.
The problem is that one of the admins must intervene to reconfigure the
buildmaster to ignore the downed slave in order to restore partial
service. Since the admins, like myself, are volunteers and possibly in
different timezones, and other developers may be waiting for half a day
until the admin gets a chance to intervene.
Thanks,
Jason
More information about the devel
mailing list