<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
Hi Pierre,<br>
<br>
As always, thanks for the reply. I've trimmed a bit to try to keep
things clear.<br>
<br>
<div class="moz-cite-prefix">On 8/25/2016 4:31 AM, Pierre Tardy
wrote:<br>
</div>
<blockquote
cite="mid:CAJ+soVdm2A00YcGtK5CnZ5D+akxFh9zRn5t8i-QsU9+JEQPWAg@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_quote">Do you still have the problem with
master not continuing the build after command has terminated
on the worker?</div>
</div>
</blockquote>
<br>
Not at the moment, though since I just restarted the master, I
wouldn't expect that particular problem just yet.<br>
<br>
Our primary master currently has 5 workers which have one or more
builders acquiring locks, which is usually a bad sign (I'll look in
on them later). And it has one worker that has build requests queued
on multiple builders, but no running builds. That machine seems to
be having communications problems, though, so it's probably not a
buildbot problem.<br>
<br>
On our second master, which runs just a few builders on a few
workers, we have a buildrequest that probably would never build,
even though that worker isn't building anything else. The previous
buildrequest sat for 20 hours or so (I cancelled the queue and
forced a new build). I stopped it's worker and let our usual process
start it back up, and it's running now.<br>
<br>
It's just a gut feeling, but I think that there's a single basic
problem somewhere that's manifesting itself in a few different ways.
As you said previously, I think a deferred is getting lost
somewhere.<br>
<br>
<blockquote
cite="mid:CAJ+soVdm2A00YcGtK5CnZ5D+akxFh9zRn5t8i-QsU9+JEQPWAg@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_quote">The recent documentation on multimaster
is there.
<div><span style="line-height:1.5"><a moz-do-not-send="true"
href="http://docs.buildbot.net/latest/manual/concepts.html#multimaster">http://docs.buildbot.net/latest/manual/concepts.html#multimaster</a></span></div>
<div><br>
</div>
</div>
</div>
</blockquote>
<br>
Thanks for the pointer.<br>
<br>
<blockquote
cite="mid:CAJ+soVdm2A00YcGtK5CnZ5D+akxFh9zRn5t8i-QsU9+JEQPWAg@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_quote">This later information is correct.
<div><br>
</div>
</div>
</div>
</blockquote>
<br>
I thought so, but it's better to get confirmation.<br>
<br>
<blockquote
cite="mid:CAJ+soVdm2A00YcGtK5CnZ5D+akxFh9zRn5t8i-QsU9+JEQPWAg@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_quote">
<div>In nine, there is the new concept of clustered service,
which are service that runs on only one master, masters are
competing to run those service, and the first master which
will claim this service will run it. schedulers and
changesource are all clustered services.</div>
<div>The database will act as an arbitrator (hence multimaster
cannot work with sqlite)</div>
<div><a moz-do-not-send="true"
href="https://github.com/buildbot/buildbot/blob/master/master/buildbot/db/model.py#L254">https://github.com/buildbot/buildbot/blob/master/master/buildbot/db/model.py#L254</a><br>
</div>
<div><br>
</div>
<div><br>
</div>
<div>What is not implemented is load balancing between master.
Basically if you run a symetric multimaster configuration(as
per concepts.rst), the first master that will start will
take all the schedulers and change sources.</div>
</div>
</div>
</blockquote>
<br>
(snipped the rest)<br>
<br>
That's what I figured.<br>
<br>
<blockquote
cite="mid:CAJ+soVdm2A00YcGtK5CnZ5D+akxFh9zRn5t8i-QsU9+JEQPWAg@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_quote">
<div>What you seem to have missed is that for multimaster to
work you need a common message queue. At the moment, only <a
moz-do-not-send="true" href="http://crossbar.io">crossbar.io</a>
is implemented</div>
<div><a moz-do-not-send="true"
href="http://docs.buildbot.net/latest/manual/cfg-global.html#mq-specification">http://docs.buildbot.net/latest/manual/cfg-global.html#mq-specification</a><br>
</div>
<div><br>
</div>
</div>
</div>
</blockquote>
<br>
Yes, I missed it. <br>
<br>
<blockquote
cite="mid:CAJ+soVdm2A00YcGtK5CnZ5D+akxFh9zRn5t8i-QsU9+JEQPWAg@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_quote">
<div>Messages are important so that the other master is aware
that a new buildrequest has been sent to the database</div>
<div><br>
</div>
</div>
</div>
</blockquote>
<br>
I'm not a database guy, per se, but wouldn't any database you'd want
to run multi-master on be able to notify the other masters?
Postgres, for example, has NOTIFY and LISTEN. I'm not much of a
SQLAlchemy guy, either, but a cursory search shows an Event API.<br>
<br>
<blockquote
cite="mid:CAJ+soVdm2A00YcGtK5CnZ5D+akxFh9zRn5t8i-QsU9+JEQPWAg@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_quote">
<div>If you don't configure a multimaster capable mq, then
build will not start instantly on the second master. If will
only start when other event happen on that second master
(like a new worker (dis)connection or build finish)</div>
<div><br>
</div>
</div>
</div>
</blockquote>
<br>
That may be acceptable. With the volume we have, things finish
pretty often. And with some of these builds taking a long time, the
wait may be insignificant to us.<br>
<br>
<blockquote
cite="mid:CAJ+soVdm2A00YcGtK5CnZ5D+akxFh9zRn5t8i-QsU9+JEQPWAg@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_quote"><br>
<div>If I understand correctly, you are running rc1 for python
code, and rc2 for UI?</div>
<div>That should be fine, but I would recommend to update the
whole to rc2, as a number of bugs have been fixed. No new
feature have been added on this stable branch, so I expect
this limits the risk of regression you should expect</div>
<div><br>
</div>
</div>
</div>
</blockquote>
<br>
Not exactly, thought I can't be certain as I didn't do that work. As
far as I know, only the builders page is rc2 + our change to show
the last build, and the cancel queue just came along with it.<br>
<br>
Neil Gilmore<br>
grammatech.com<br>
</body>
</html>