[users at bb.net] More multi-master anecdotes and collapsing questions.
tardyp at gmail.com
Sun Jul 9 11:16:24 UTC 2017
The "master election" process for schedulers is not perfect for me. It
would need a bit more work in order to really be up to date with state of
the art master election best practices.
So indeed, I would at the moment advice the have a single master doing the
Scality have a single "frontend" master which is doing the www and
schedulers, and hooks. This is not a "HA" setup, but anyway buildbot
scheduler election has a 10min recovery timer for when a master fail.
As for claims and collapsing, we have found recently some annoying bugs
On Wed, Jul 5, 2017 at 6:20 PM Neil Gilmore <ngilmore at grammatech.com> wrote:
> Hi everyone.
> Well, now that I can (reliably) release locks though Twisted's manhole,
> things are a bit brighter. We have a somewhat rare problem in which all
> of a worker's builders are 'acquiring locks'. Doesn't happen often, but
> it keeps things from running. Remember, we can't use the worker
> configuration to limit builds.
> But we're having another problem that seems to be getting a bit worse. I
> seem to recall Pierre saying that in a multi-master configuration, if
> there was a scheduler that existed on multiple masters, that scheduler
> would only be active on a single master. Other masters might activate
> that scheduler if the first master went away. So there should only be
> one master's scheduler scheduling particular builds.
> Well, that isn't happening for us. It's not a problem most of the time,
> because the builds do collapse, most of the time. Except when they don't.
> For example, last weekend we had 3 builds schedule and build for the
> same sourcestamp (according to the debug information in the UI). The
> builds were scheduled within 3 seconds of each other. However, they were
> claimed many hours apart. It appears that the first build completed
> before the second was claimed, etc. Is this how it ought to go? I
> haven't quite cracked the submitted/claimed/started timing.
> We had a similar claiming problem last week where a build went unclaimed
> for 44 days. So when it popped up. it appeared that we had gone back in
> time (as the revision was quite old at that time).
> Do I just need to figure out how to not put schedulers on more than 1
> Neil Gilmore
> users mailing list
> users at buildbot.net
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the users