<div dir="ltr">Hi Neil<br><br><div class="gmail_quote"><div dir="ltr">Le mer. 24 août 2016 à 22:14, Neil Gilmore <<a href="mailto:ngilmore@grammatech.com">ngilmore@grammatech.com</a>> a écrit :<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

  <div bgcolor="#FFFFFF" text="#000000">

    Hi everyone,<br>

    <br>

    Thanks for the advice previously. It's helped a lot. I'll be

    restarting our master tomorrow with bits of new code. In particular

    the new log code, a UI fix that came from rc2 (and ended up with the

    benefit of the cancel queue button, unfortunately useful in our

    case), and a fix for exceptions we were getting during reconfig (a

    problem with __cmp()__ in ComparableMixin. I never hunted it down --

    it may have been things we were attaching to various items. The

    error was that cmp() couldn't handle sets. We're currently using

    that code on a second master, and we don't except when we reconfig

    using the same master.cfg.<br></div></blockquote><div><br></div><div>Do you still have the problem with master not continuing the build after command has terminated on the worker?</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000">

    <br>

    I've been getting ready to attempt using multi-master, and I ran

    across possibly conflicting documentation.<br>

    <br>

    Here

(<a href="http://docs.buildbot.net/latest/manual/cfg-global.html#multi-master-mode" target="_blank">http://docs.buildbot.net/latest/manual/cfg-global.html#multi-master-mode</a>),

    it says,<br>

    <br>

    "Ensure that each named scheduler runs on only one master.

    If the same scheduler runs on multiple masters, it will trigger

    duplicate builds and may produce other undesirable behaviors."<br></div></blockquote><div><br></div><div>Indeed, this part of the doc is dating from buildbot eight, and I forgot to update it. You can ignore most of this information, it is outdated.</div><div><br></div><div>The recent documentation on multimaster is there.</div><div><span style="line-height:1.5"><a href="http://docs.buildbot.net/latest/manual/concepts.html#multimaster">http://docs.buildbot.net/latest/manual/concepts.html#multimaster</a></span></div><div><br></div><div><div>Like I said, this is experimental, and needs some proper docs update beyond just the concepts, and probably bug fixes.</div><div><br></div><div>We did not want to delay any further nine waiting that multimaster to be completly sorted-out and documented.</div><br></div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000">

    <br>

    Here

(<a href="http://docs.buildbot.net/latest/manual/cfg-schedulers.html#scheduler-resiliency" target="_blank">http://docs.buildbot.net/latest/manual/cfg-schedulers.html#scheduler-resiliency</a>),

    it says:<br>

    "In a multi-master configuration, schedulers with the same name can

    be configured on multiple masters."<br>

    <br>

    If these aren't contradictory, where is my understanding failing?

    Alternately, if they are conflicting, which is correct? Or is there

    another answer?<br></div></blockquote><div><br></div><div>This later information is correct.</div><div><br></div><div>In nine, there is the new concept of clustered service, which are service that runs on only one master, masters are competing to run those service, and the first master which will claim this service will run it. schedulers and changesource are all clustered services.</div><div>The database will act as an arbitrator (hence multimaster cannot work with sqlite)</div><div><a href="https://github.com/buildbot/buildbot/blob/master/master/buildbot/db/model.py#L254">https://github.com/buildbot/buildbot/blob/master/master/buildbot/db/model.py#L254</a><br></div><div><br></div><div><br></div><div>What is not implemented is load balancing between master. Basically if you run a symetric multimaster configuration(as per concepts.rst), the first master that will start will take all the schedulers and change sources.</div><div><br></div><div>If that first master stops normally, then the second master will take over.</div><div>So basically high availability is present (with the slight delay of second master reacting to first master shutdown event), but not load balancing.</div><div><br></div><div>While writting this email, I am already finding some issues. If the first master is crashing, and not sending a ('master', masterid, 'stopped') event, then there is a mecanism of keep-alive timeout, and second master will notice the first master is dead only after this timeout.</div><div>Means that during that time the schedulers and change source will not update anything.</div><div><br></div><div>The forcescheduler though shall not be impacted by this problem.</div><div>Forceschedulers have to be configured on the same masters as the web servers, and the webserver will use the local forcescheduler to trig their builds, even if it is not technically started as a clustered service. Again, that is in theory, and this has not really been tested.</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000">

    <br>

    It would also appear that some other documentation needs some

    updating.<br>

    <br>

    Here

(<a href="http://docs.buildbot.net/latest/manual/cfg-global.html#database-specification" target="_blank">http://docs.buildbot.net/latest/manual/cfg-global.html#database-specification</a>),

    the first database specification given matches closely with what

    buildbot create-master produces, c['db'] = { 'db_url' : "",}<br>

    <br>

    The rest use a form like<span> c</span><span>[</span><span>'db_url'</span><span>]</span> <span>=</span>

    <span>"sqlite:///state.sqlite.<br>

      <br>

      My guess is that the second is an older form? I could be wrong,

      never having tested the other form.<br>

    </span>

    <div>

    </div>

    <br>

    I've been able to get 2 masters talking to the same database

    (postgres in our case), but I'm still working on getting all the

    changes that need to be made down pat before using our own

    master.cfg. That's just me getting up to speed.<br></div></blockquote><div><br></div><div>What you seem to have missed is that for multimaster to work you need a common message queue. At the moment, only <a href="http://crossbar.io">crossbar.io</a> is implemented</div><div><a href="http://docs.buildbot.net/latest/manual/cfg-global.html#mq-specification">http://docs.buildbot.net/latest/manual/cfg-global.html#mq-specification</a><br></div><div><br></div><div>Messages are important so that the other master is aware that a new buildrequest has been sent to the database</div><div><br></div><div>If you don't configure a multimaster capable mq, then build will not start instantly on the second master. If will only start when other event happen on that second master (like a new worker (dis)connection or build finish)</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000">

    <br>

    We like the cancel queue button, even though we got it accidentally.

    It came in with a change we needed to the builders page. We needed

    to be able to see the results of the last build regardless of when

    it built. Since the source for the pages isn't in the regular source

    tarball, the guy who did the change used the rc2 version, and we got

    a new feature we liked for free. On the other hand, people seem to

    have liked the old waterfall better.<br></div></blockquote><div> </div><div>If I understand correctly, you are running rc1 for python code, and rc2 for UI?</div><div>That should be fine, but I would recommend to update the whole to rc2, as a number of bugs have been fixed. No new feature have been added on this stable branch, so I expect this limits the risk of regression you should expect</div><div><br></div></div></div>