[Buildbot-devel] Multi-master setup (or not)
Pierre Tardy
tardyp at gmail.com
Tue Nov 25 22:30:42 UTC 2014
Hi Benoit
I think the multimaster problem we are trying to solve is very generic,
probably a little bit too generic, and we might be paying the price for
un-necessary usecase.
I can see only one reason for having several configurations between masters
in multimaster mode.
- slave masters + schedulers: should have write access to the db
- web ui masters: should only have read acess to the db, for security
reasons
For me those 2 masters kinds should share config for obvious
maintainability reasons.
I think for your particular problem we should have a way to cleanup the db
for old stuff. I already wrote master.Master.doMasterHouseKeeping(), which
is responsible for finishing unfinished stuff due to master crash or
similar.
I think we could have a "buildbot housekeeping" command that would clean
the db for old stuff. This would remove builders and build that have no
master attached to it, and don't have builds older than <parameter>.
This script would be pretty simple, and should be a matter of a few lines
of data api yields.
Pierre
Le Tue Nov 25 2014 at 8:45:19 PM, Benoît Allard <benoit.allard at greenbone.net>
a écrit :
> Hi there,
>
> [TL;DR: I propose introducing a supervisor to manage the master(s).]
>
> The build properties PR are flowing in [0] (more review welcome !), so
> it's time to start tackling the next bigger trouble I have with the
> current development branch, namely the absence of master's hierarchy.
>
> Let me explain.
>
> I believed for years (indeed !) that I wasn't in the need to bother
> about those multi-master stuffs, I don't have hundreds of slaves, not
> more than a dozen of repositories to care of, so why should I care ?
> Well, that's what I thought until I realised that even without using
> this feature, it beat me quite a few times already since I started
> experimenting with the current development branch.
>
> In the current development branch ('nine'), the whole data is stored in
> a common database. Each master (one in most of the cases) is responsible
> for its own configuration (the master.cfg + dependencies), and as such,
> register it in the db: its slaves, its builders, its schedulers, ...
> They further will populate the database with sourcestamps,
> buildrequests, buildset, builds, and all the rest.
>
> Nothing was wrong, until I tried to reconfigure my master, and my old
> builders (I had renamed some of them) where still to be seen on the
> waterfall page. A few reconfig/restart further, half of that waterfall
> page (and builder list) is taken with builders that are not defined
> anywhere in my configuration any more. I'm afraid of further modifying
> my configuration ! Looking further, old slaves (actually the current
> one, but with a different username/password), are still present, (and
> linked to my master !), although not existing in any configuration !
> Same for change source, I guess you got the picture.
>
> I didn't realised immediately the size of the trouble I had met. I
> opened an issue [1], and expected an easy answer like ... "Yes, sure,
> you just forgot to ..." or something similar. The answer I got was quite
> different, it tried to explain that it was a consequence of the current
> design, that the slaves / builders / change sources / ... could have
> switched master, or could have belonged to a master that is not up at
> that moment, so no one was in the position to delete their entries from
> the database. I had just had hit a design flaw.
>
> Few days later, my SVNPoller stopped polling [1], and nothing could
> bring it back to life: restart, reconfig, delete from configuration /
> reinsert, nothing ... The point was In the (common) database, the poller
> was still marked as active on a master, so my (one and only) master
> didn't tried to start it ! I was hit by the same design trouble.
>
> Few weeks later (now), I haven't met any other manifestation of this
> trouble. But I know, it's still there ...
>
> Hope you got the picture now.
>
> The good news is, I have an idea how to solve it. I'm just not sure if
> it's the best one, it involves quite a few modifications, and comes at a
> price ...
>
> I've been wondering how do other distributed systems do ?
>
> Are they any other distributed system that rely on a common database,
> and is able to identify active vs. inactive stuff ? I don't know, and so
> far, I've not met any. If you know of any of them, please speak-up, I'd
> be interested to know how they manage their data.
>
> Back in eight, the trouble was not that big: The database was only
> there to pass information from schedulers to builders. Neither
> schedulers, not builders, nor ... where put in the db, they belonged to
> the personal data of the master that was responsible for them. If that
> master disappeared, so did that information. The old builders (and
> builds) did not disappeared from the disk, but they were not visible any
> more in the web interface, as the master knew which information to show.
>
> My idea is quite simple (in theory), I believe the main trouble is that
> no one has authority on all the master: hence I propose introducing a
> 'supervisor' that would be the only one to know about the configuration,
> and manages the master(s). The configuration would probably gain some
> 'sections' (one per master), so that the supervisor knows what part to
> send to which master. For instance, the master responsible for the web
> interface would get a list of active identities (schedulers, slaves,
> builders, ...) and just show them.
>
> I'm convinced that this solution could completely solve the trouble
> I've identified, however, it's not an easy one, it involves quite a few
> modifications (not **too** much, the goal is to keep is as small as
> possible - KISS), and they come a a price, namely time ...
>
> Do you have an other / better idea ?
>
> Thanks for reading so far.
>
> Best Regards,
> Ben.
>
> [0] #1380, #1382, #1384, #1385, #1886, #1887 (and a few more to come)
> [1] TRAC-2959
> [2] TRAC-3012
>
> ------------------------------------------------------------
> ------------------
> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
> from Actuate! Instantly Supercharge Your Business Reports and Dashboards
> with Interactivity, Sharing, Native Excel Exports, App Integration & more
> Get technology previously reserved for billion-dollar corporations, FREE
> http://pubads.g.doubleclick.net/gampad/clk?id=157005751&
> iu=/4140/ostg.clktrk
> _______________________________________________
> Buildbot-devel mailing list
> Buildbot-devel at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/buildbot-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://buildbot.net/pipermail/devel/attachments/20141125/204e1fc0/attachment.html>
More information about the devel
mailing list