[users at bb.net] More multi-master anecdotes and collapsing questions.

Pierre Tardy tardyp at gmail.com
Sun Jul 9 11:16:24 UTC 2017


Hi Neil,

The "master election" process for schedulers is not perfect for me. It
would need a bit more work in order to really be up to date with state of
the art master election best practices.
So indeed, I would at the moment advice the have a single master doing the
schedulers.

For example:
https://docs.google.com/presentation/d/1CA932aTgicnOpIReOhZcqijHE9BGapXfpkIqx9nSRnw/edit#slide=id.g2219c85b58_1_532

Scality have a single "frontend" master which is doing the www and
schedulers, and hooks. This is not a "HA" setup, but anyway buildbot
scheduler election has a 10min recovery timer for when a master fail.

As for claims and collapsing, we have found recently some annoying bugs
about those:
https://github.com/buildbot/buildbot/pull/3411
https://github.com/buildbot/buildbot/pull/3152


On Wed, Jul 5, 2017 at 6:20 PM Neil Gilmore <ngilmore at grammatech.com> wrote:

> Hi everyone.
>
> Well, now that I can (reliably) release locks though Twisted's manhole,
> things are a bit brighter. We have a somewhat rare problem in which all
> of a worker's builders are 'acquiring locks'. Doesn't happen often, but
> it keeps things from running. Remember, we can't use the worker
> configuration to limit builds.
>
> But we're having another problem that seems to be getting a bit worse. I
> seem to recall Pierre saying that in a multi-master configuration, if
> there was a scheduler that existed on multiple masters, that scheduler
> would only be active on a single master. Other masters might activate
> that scheduler if the first master went away. So there should only be
> one master's scheduler scheduling particular builds.
>
> Well, that isn't happening for us. It's not a problem most of the time,
> because the builds do collapse, most of the time. Except when they don't.
>
> For example, last weekend we had 3 builds schedule and build for the
> same sourcestamp (according to the debug information in the UI). The
> builds were scheduled within 3 seconds of each other. However, they were
> claimed many hours apart. It appears that the first build completed
> before the second was claimed, etc. Is this how it ought to go? I
> haven't quite cracked the submitted/claimed/started timing.
>
> We had a similar claiming problem last week where a build went unclaimed
> for 44 days. So when it popped up. it appeared that we had gone back in
> time (as the revision was quite old at that time).
>
> Do I just need to figure out how to not put schedulers on more than 1
> master?
>
> Neil Gilmore
> grammatech.com
> _______________________________________________
> users mailing list
> users at buildbot.net
> https://lists.buildbot.net/mailman/listinfo/users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.buildbot.net/pipermail/users/attachments/20170709/c6ede468/attachment.html>


More information about the users mailing list