[users at bb.net] 0.9.0rc1/2, multi-master, and schedulers.
Neil Gilmore
ngilmore at grammatech.com
Thu Aug 25 15:59:59 UTC 2016
Hi Pierre,
As always, thanks for the reply. I've trimmed a bit to try to keep
things clear.
On 8/25/2016 4:31 AM, Pierre Tardy wrote:
> Do you still have the problem with master not continuing the build
> after command has terminated on the worker?
Not at the moment, though since I just restarted the master, I wouldn't
expect that particular problem just yet.
Our primary master currently has 5 workers which have one or more
builders acquiring locks, which is usually a bad sign (I'll look in on
them later). And it has one worker that has build requests queued on
multiple builders, but no running builds. That machine seems to be
having communications problems, though, so it's probably not a buildbot
problem.
On our second master, which runs just a few builders on a few workers,
we have a buildrequest that probably would never build, even though that
worker isn't building anything else. The previous buildrequest sat for
20 hours or so (I cancelled the queue and forced a new build). I stopped
it's worker and let our usual process start it back up, and it's running
now.
It's just a gut feeling, but I think that there's a single basic problem
somewhere that's manifesting itself in a few different ways. As you said
previously, I think a deferred is getting lost somewhere.
> The recent documentation on multimaster is there.
> http://docs.buildbot.net/latest/manual/concepts.html#multimaster
>
Thanks for the pointer.
> This later information is correct.
>
I thought so, but it's better to get confirmation.
> In nine, there is the new concept of clustered service, which are
> service that runs on only one master, masters are competing to run
> those service, and the first master which will claim this service will
> run it. schedulers and changesource are all clustered services.
> The database will act as an arbitrator (hence multimaster cannot work
> with sqlite)
> https://github.com/buildbot/buildbot/blob/master/master/buildbot/db/model.py#L254
>
>
> What is not implemented is load balancing between master. Basically if
> you run a symetric multimaster configuration(as per concepts.rst), the
> first master that will start will take all the schedulers and change
> sources.
(snipped the rest)
That's what I figured.
> What you seem to have missed is that for multimaster to work you need
> a common message queue. At the moment, only crossbar.io
> <http://crossbar.io> is implemented
> http://docs.buildbot.net/latest/manual/cfg-global.html#mq-specification
>
Yes, I missed it.
> Messages are important so that the other master is aware that a new
> buildrequest has been sent to the database
>
I'm not a database guy, per se, but wouldn't any database you'd want to
run multi-master on be able to notify the other masters? Postgres, for
example, has NOTIFY and LISTEN. I'm not much of a SQLAlchemy guy,
either, but a cursory search shows an Event API.
> If you don't configure a multimaster capable mq, then build will not
> start instantly on the second master. If will only start when other
> event happen on that second master (like a new worker (dis)connection
> or build finish)
>
That may be acceptable. With the volume we have, things finish pretty
often. And with some of these builds taking a long time, the wait may be
insignificant to us.
>
> If I understand correctly, you are running rc1 for python code, and
> rc2 for UI?
> That should be fine, but I would recommend to update the whole to rc2,
> as a number of bugs have been fixed. No new feature have been added on
> this stable branch, so I expect this limits the risk of regression you
> should expect
>
Not exactly, thought I can't be certain as I didn't do that work. As far
as I know, only the builders page is rc2 + our change to show the last
build, and the cancel queue just came along with it.
Neil Gilmore
grammatech.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.buildbot.net/pipermail/users/attachments/20160825/57a915a2/attachment.html>
More information about the users
mailing list