[users at bb.net] 0.9.0rc1/2, multi-master, and schedulers.

Pierre Tardy tardyp at gmail.com
Thu Aug 25 09:31:16 UTC 2016


Hi Neil

Le mer. 24 août 2016 à 22:14, Neil Gilmore <ngilmore at grammatech.com> a
écrit :

> Hi everyone,
>
> Thanks for the advice previously. It's helped a lot. I'll be restarting
> our master tomorrow with bits of new code. In particular the new log code,
> a UI fix that came from rc2 (and ended up with the benefit of the cancel
> queue button, unfortunately useful in our case), and a fix for exceptions
> we were getting during reconfig (a problem with __cmp()__ in
> ComparableMixin. I never hunted it down -- it may have been things we were
> attaching to various items. The error was that cmp() couldn't handle sets.
> We're currently using that code on a second master, and we don't except
> when we reconfig using the same master.cfg.
>

Do you still have the problem with master not continuing the build after
command has terminated on the worker?


>
> I've been getting ready to attempt using multi-master, and I ran across
> possibly conflicting documentation.
>
> Here (
> http://docs.buildbot.net/latest/manual/cfg-global.html#multi-master-mode),
> it says,
>
> "Ensure that each named scheduler runs on only one master. If the same
> scheduler runs on multiple masters, it will trigger duplicate builds and
> may produce other undesirable behaviors."
>

Indeed, this part of the doc is dating from buildbot eight, and I forgot to
update it. You can ignore most of this information, it is outdated.

The recent documentation on multimaster is there.
http://docs.buildbot.net/latest/manual/concepts.html#multimaster

Like I said, this is experimental, and needs some proper docs update beyond
just the concepts, and probably bug fixes.

We did not want to delay any further nine waiting that multimaster to be
completly sorted-out and documented.



> Here (
> http://docs.buildbot.net/latest/manual/cfg-schedulers.html#scheduler-resiliency),
> it says:
> "In a multi-master configuration, schedulers with the same name can be
> configured on multiple masters."
>
> If these aren't contradictory, where is my understanding failing?
> Alternately, if they are conflicting, which is correct? Or is there another
> answer?
>

This later information is correct.

In nine, there is the new concept of clustered service, which are service
that runs on only one master, masters are competing to run those service,
and the first master which will claim this service will run it. schedulers
and changesource are all clustered services.
The database will act as an arbitrator (hence multimaster cannot work with
sqlite)
https://github.com/buildbot/buildbot/blob/master/master/buildbot/db/model.py#L254


What is not implemented is load balancing between master. Basically if you
run a symetric multimaster configuration(as per concepts.rst), the first
master that will start will take all the schedulers and change sources.

If that first master stops normally, then the second master will take over.
So basically high availability is present (with the slight delay of second
master reacting to first master shutdown event), but not load balancing.

While writting this email, I am already finding some issues. If the first
master is crashing, and not sending a ('master', masterid, 'stopped')
event, then there is a mecanism of keep-alive timeout, and second master
will notice the first master is dead only after this timeout.
Means that during that time the schedulers and change source will not
update anything.

The forcescheduler though shall not be impacted by this problem.
Forceschedulers have to be configured on the same masters as the web
servers, and the webserver will use the local forcescheduler to trig their
builds, even if it is not technically started as a clustered service.
Again, that is in theory, and this has not really been tested.


> It would also appear that some other documentation needs some updating.
>
> Here (
> http://docs.buildbot.net/latest/manual/cfg-global.html#database-specification),
> the first database specification given matches closely with what buildbot
> create-master produces, c['db'] = { 'db_url' : "",}
>
> The rest use a form like c['db_url'] = "sqlite:///state.sqlite.
>
> My guess is that the second is an older form? I could be wrong, never
> having tested the other form.
>
> I've been able to get 2 masters talking to the same database (postgres in
> our case), but I'm still working on getting all the changes that need to be
> made down pat before using our own master.cfg. That's just me getting up to
> speed.
>

What you seem to have missed is that for multimaster to work you need a
common message queue. At the moment, only crossbar.io is implemented
http://docs.buildbot.net/latest/manual/cfg-global.html#mq-specification

Messages are important so that the other master is aware that a new
buildrequest has been sent to the database

If you don't configure a multimaster capable mq, then build will not start
instantly on the second master. If will only start when other event happen
on that second master (like a new worker (dis)connection or build finish)


> We like the cancel queue button, even though we got it accidentally. It
> came in with a change we needed to the builders page. We needed to be able
> to see the results of the last build regardless of when it built. Since the
> source for the pages isn't in the regular source tarball, the guy who did
> the change used the rc2 version, and we got a new feature we liked for
> free. On the other hand, people seem to have liked the old waterfall better.
>

If I understand correctly, you are running rc1 for python code, and rc2 for
UI?
That should be fine, but I would recommend to update the whole to rc2, as a
number of bugs have been fixed. No new feature have been added on this
stable branch, so I expect this limits the risk of regression you should
expect
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.buildbot.net/pipermail/users/attachments/20160825/faa66570/attachment.html>


More information about the users mailing list