[Buildbot-devel] Multi-master setup (or not)

Edward Armes edward.armes at gmail.com
Wed Nov 26 09:37:30 UTC 2014


Hi all,

I see Buildbot's exact problem in this case being semi-unique. As most
systems follow a pattern of behavior of that there is a central master
which can be written to and duplicate masters that then act as readers
(i.e. the traditional distributed DB approach). Git seems to take the
approach that every copy is a master and the true master is only by user
convention. However, I get the feeling that neither of these approaches
would work for Buildbot? I would suggest is that have you looked into how
Zookeeper approaches this problem, as I imagine they may of had to have
solved similar problems?

Edward

On Tue, Nov 25, 2014 at 11:09 PM, Dustin J. Mitchell <dustin at v.igoro.us>
wrote:

> The fact that Buildbot's configuration is code makes this tricky.
> Otherwise, we could just load the configuration into the DB and have
> all masters pull from there.  Configuration as code is one of
> Buildbot's advantages over other tools, so I don't want to lose that.
>
> I'd love to have more of the smart people on this list looking at the
> problem and thinking about the right solution.  There's time for some
> simple modifications to the model, like you suggest, to make it into
> nine.  There's also an argument to be made (as Pierre did) that the
> model isn't fundamentally broken, but just has a buggy implementation
> and needs some utilities built up.
>
> In particular, I'd love to hear how other software has solved similar
> problems -- we've reinvented enough wheels already here at Buildbot
> HQ!
>
> Dustin
>
> On Tue, Nov 25, 2014 at 2:43 PM, Benoît Allard
> <benoit.allard at greenbone.net> wrote:
> > Hi there,
> >
> > [TL;DR: I propose introducing a supervisor to manage the master(s).]
> >
> > The build properties PR are flowing in [0] (more review welcome !), so
> > it's time to start tackling the next bigger trouble I have with the
> > current development branch, namely the absence of master's hierarchy.
> >
> > Let me explain.
> >
> > I believed for years (indeed !) that I wasn't in the need to bother
> > about those multi-master stuffs, I don't have hundreds of slaves, not
> > more than a dozen of repositories to care of, so why should I care ?
> > Well, that's what I thought until I realised that even without using
> > this feature, it beat me quite a few times already since I started
> > experimenting with the current development branch.
> >
> > In the current development branch ('nine'), the whole data is stored in
> > a common database. Each master (one in most of the cases) is responsible
> > for its own configuration (the master.cfg + dependencies), and as such,
> > register it in the db: its slaves, its builders, its schedulers, ...
> > They further will populate the database with sourcestamps,
> > buildrequests, buildset, builds, and all the rest.
> >
> > Nothing was wrong, until I tried to reconfigure my master, and my old
> > builders (I had renamed some of them) where still to be seen on the
> > waterfall page. A few reconfig/restart further, half of that waterfall
> > page (and builder list) is taken with builders that are not defined
> > anywhere in my configuration any more. I'm afraid of further modifying
> > my configuration ! Looking further, old slaves (actually the current
> > one, but with a different username/password), are still present, (and
> > linked to my master !), although not existing in any configuration !
> > Same for change source, I guess you got the picture.
> >
> > I didn't realised immediately the size of the trouble I had met. I
> > opened an issue [1], and expected an easy answer like ... "Yes, sure,
> > you just forgot to ..." or something similar. The answer I got was quite
> > different, it tried to explain that it was a consequence of the current
> > design, that the slaves / builders / change sources / ... could have
> > switched master, or could have belonged to a master that is not up at
> > that moment, so no one was in the position to delete their entries from
> > the database. I had just had hit a design flaw.
> >
> > Few days later, my SVNPoller stopped polling [1], and nothing could
> > bring it back to life: restart, reconfig, delete from configuration /
> > reinsert, nothing ... The point was In the (common) database, the poller
> > was still marked as active on a master, so my (one and only) master
> > didn't tried to start it ! I was hit by the same design trouble.
> >
> > Few weeks later (now), I haven't met any other manifestation of this
> > trouble. But I know, it's still there ...
> >
> > Hope you got the picture now.
> >
> > The good news is, I have an idea how to solve it. I'm just not sure if
> > it's the best one, it involves quite a few modifications, and comes at a
> > price ...
> >
> > I've been wondering how do other distributed systems do ?
> >
> > Are they any other distributed system that rely on a common database,
> > and is able to identify active vs. inactive stuff ? I don't know, and so
> > far, I've not met any. If you know of any of them, please speak-up, I'd
> > be interested to know how they manage their data.
> >
> > Back in eight, the trouble was not that big: The database was only
> > there to pass information from schedulers to builders. Neither
> > schedulers, not builders, nor ... where put in the db, they belonged to
> > the personal data of the master that was responsible for them. If that
> > master disappeared, so did that information. The old builders (and
> > builds) did not disappeared from the disk, but they were not visible any
> > more in the web interface, as the master knew which information to show.
> >
> > My idea is quite simple (in theory), I believe the main trouble is that
> > no one has authority on all the master: hence I propose introducing a
> > 'supervisor' that would be the only one to know about the configuration,
> > and manages the master(s). The configuration would probably gain some
> > 'sections' (one per master), so that the supervisor knows what part to
> > send to which master. For instance, the master responsible for the web
> > interface would get a list of active identities (schedulers, slaves,
> > builders, ...) and just show them.
> >
> > I'm convinced that this solution could completely solve the trouble
> > I've identified, however, it's not an easy one, it involves quite a few
> > modifications (not **too** much, the goal is to keep is as small as
> > possible - KISS), and they come a a price, namely time ...
> >
> > Do you have an other / better idea ?
> >
> > Thanks for reading so far.
> >
> > Best Regards,
> > Ben.
> >
> > [0] #1380, #1382, #1384, #1385, #1886, #1887 (and a few more to come)
> > [1] TRAC-2959
> > [2] TRAC-3012
> >
> >
> ------------------------------------------------------------------------------
> > Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
> > from Actuate! Instantly Supercharge Your Business Reports and Dashboards
> > with Interactivity, Sharing, Native Excel Exports, App Integration & more
> > Get technology previously reserved for billion-dollar corporations, FREE
> >
> http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk
> > _______________________________________________
> > Buildbot-devel mailing list
> > Buildbot-devel at lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/buildbot-devel
>
>
> ------------------------------------------------------------------------------
> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
> from Actuate! Instantly Supercharge Your Business Reports and Dashboards
> with Interactivity, Sharing, Native Excel Exports, App Integration & more
> Get technology previously reserved for billion-dollar corporations, FREE
>
> http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk
> _______________________________________________
> Buildbot-devel mailing list
> Buildbot-devel at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/buildbot-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://buildbot.net/pipermail/devel/attachments/20141126/0fb05e6a/attachment.html>


More information about the devel mailing list