[Buildbot-devel] Multi-master setup (or not)
Dustin J. Mitchell
dustin at v.igoro.us
Sun Nov 30 16:02:45 UTC 2014
ZooKeeper is primarily sychronizing *data*, not configuration, so it's
a little different. I see a few tricky things with that idea.
First, if the leader can change without warning, where does a user go
to modify the configuration?
Second, what form does the configuration take? Ansible has an
interesting model of shipping python code (as text) to remote nodes
and writing it to a temporary file, but that requires that each module
be in a single file. A Buildbot configuration might span multiple
files, and call on additional modules. Perhaps we could implement
some mechanism for synchronizing an entire directory, and then do so
for an entire virtualenv? That assumes compatible Python versions on
all masters, but that's not too bad. Alternately, we could require
that the configuration be checked into git, and specify only a git URL
for a masters' configuration. Then a master can check in and push
that configuration when it changes, and non-masters can simply 'git
pull'. Or, we could abandon the use of Python as a configuration
language altogether.
All in all, these are *major* changes to Buildbot. If the project
were otherwise stable, this might be an interesting experiment, but
nine is already changing everything else. I'm not at all convinced
that such changes are justified by the problem. From my perspective,
we have a cosmetic issue (displaying inactive builders) and some bugs
in he housekeeping code. It seems like the best fix is to address the
cosmetic issue in the UI (so, don't display inactive objects) and fix
the housekeeping bugs.
Dustin
On Wed, Nov 26, 2014 at 6:28 AM, Benoît Allard
<benoit.allard at greenbone.net> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 11/26/2014 10:37 AM, Edward Armes wrote:
>> Hi all,
>>
>> I see Buildbot's exact problem in this case being semi-unique. As
>> most systems follow a pattern of behavior of that there is a
>> central master which can be written to and duplicate masters that
>> then act as readers (i.e. the traditional distributed DB approach).
>> Git seems to take the approach that every copy is a master and the
>> true master is only by user convention. However, I get the feeling
>> that neither of these approaches would work for Buildbot? I would
>> suggest is that have you looked into how Zookeeper approaches this
>> problem, as I imagine they may of had to have solved similar
>> problems?
>>
>
> Thanks for the hint ! Let's look together at Zookeeper 's Overview's
> page [0].
>
> Zookeeper has an elected server that acts as a leader, and doesn't
> accept direct connections from clients ( ? not sure about that one ?
> ). All Zookeepers servers knows about each others, and they all share
> the same configuration. All servers have a duplication of the 'in
> memory' database. The 'between-server' protocol care of replacing
> leaders on failures and syncing followers with leaders.
>
> Let's consider Zookeeper's database as our 'configuration', and our
> database as an implementation detail.
>
> To summarize and put it in buildbot's terms: each master has it's own
> duplication of the configuration, there is a (randomly ?) chosen
> master that acts as leader, all other ones are followers. Masters
> communicate with each other directly, and the leader master is
> responsible to propagate configuration update to the other ones. That
> communication's protocol cares about electing a new leader upon
> failure (in less than 200ms !).
>
> Sounds very reliable ! But not very trivial to implement. Beside, how
> to keep configuration up-to-date between masters ? Update the
> in-memory model via protocol plus synchronize the files ?
>
> Regards,
> Ben.
>
> [0] http://zookeeper.apache.org/doc/trunk/zookeeperOver.html
>
>> Edward
>>
>> On Tue, Nov 25, 2014 at 11:09 PM, Dustin J. Mitchell
>> <dustin at v.igoro.us> wrote:
>>
>>> The fact that Buildbot's configuration is code makes this
>>> tricky. Otherwise, we could just load the configuration into the
>>> DB and have all masters pull from there. Configuration as code
>>> is one of Buildbot's advantages over other tools, so I don't want
>>> to lose that.
>>>
>>> I'd love to have more of the smart people on this list looking at
>>> the problem and thinking about the right solution. There's time
>>> for some simple modifications to the model, like you suggest, to
>>> make it into nine. There's also an argument to be made (as
>>> Pierre did) that the model isn't fundamentally broken, but just
>>> has a buggy implementation and needs some utilities built up.
>>>
>>> In particular, I'd love to hear how other software has solved
>>> similar problems -- we've reinvented enough wheels already here
>>> at Buildbot HQ!
>>>
>>> Dustin
>>>
>>> On Tue, Nov 25, 2014 at 2:43 PM, Benoît Allard
>>> <benoit.allard at greenbone.net> wrote:
>>>> Hi there,
>>>>
>>>> [TL;DR: I propose introducing a supervisor to manage the
>>>> master(s).]
>>>>
>>>> The build properties PR are flowing in [0] (more review welcome
>>>> !), so it's time to start tackling the next bigger trouble I
>>>> have with the current development branch, namely the absence of
>>>> master's hierarchy.
>>>>
>>>> Let me explain.
>>>>
>>>> I believed for years (indeed !) that I wasn't in the need to
>>>> bother about those multi-master stuffs, I don't have hundreds
>>>> of slaves, not more than a dozen of repositories to care of, so
>>>> why should I care ? Well, that's what I thought until I
>>>> realised that even without using this feature, it beat me quite
>>>> a few times already since I started experimenting with the
>>>> current development branch.
>>>>
>>>> In the current development branch ('nine'), the whole data is
>>>> stored in a common database. Each master (one in most of the
>>>> cases) is responsible for its own configuration (the master.cfg
>>>> + dependencies), and as such, register it in the db: its
>>>> slaves, its builders, its schedulers, ... They further will
>>>> populate the database with sourcestamps, buildrequests,
>>>> buildset, builds, and all the rest.
>>>>
>>>> Nothing was wrong, until I tried to reconfigure my master, and
>>>> my old builders (I had renamed some of them) where still to be
>>>> seen on the waterfall page. A few reconfig/restart further,
>>>> half of that waterfall page (and builder list) is taken with
>>>> builders that are not defined anywhere in my configuration any
>>>> more. I'm afraid of further modifying my configuration !
>>>> Looking further, old slaves (actually the current one, but with
>>>> a different username/password), are still present, (and linked
>>>> to my master !), although not existing in any configuration !
>>>> Same for change source, I guess you got the picture.
>>>>
>>>> I didn't realised immediately the size of the trouble I had
>>>> met. I opened an issue [1], and expected an easy answer like
>>>> ... "Yes, sure, you just forgot to ..." or something similar.
>>>> The answer I got was quite different, it tried to explain that
>>>> it was a consequence of the current design, that the slaves /
>>>> builders / change sources / ... could have switched master, or
>>>> could have belonged to a master that is not up at that moment,
>>>> so no one was in the position to delete their entries from the
>>>> database. I had just had hit a design flaw.
>>>>
>>>> Few days later, my SVNPoller stopped polling [1], and nothing
>>>> could bring it back to life: restart, reconfig, delete from
>>>> configuration / reinsert, nothing ... The point was In the
>>>> (common) database, the poller was still marked as active on a
>>>> master, so my (one and only) master didn't tried to start it !
>>>> I was hit by the same design trouble.
>>>>
>>>> Few weeks later (now), I haven't met any other manifestation of
>>>> this trouble. But I know, it's still there ...
>>>>
>>>> Hope you got the picture now.
>>>>
>>>> The good news is, I have an idea how to solve it. I'm just not
>>>> sure if it's the best one, it involves quite a few
>>>> modifications, and comes at a price ...
>>>>
>>>> I've been wondering how do other distributed systems do ?
>>>>
>>>> Are they any other distributed system that rely on a common
>>>> database, and is able to identify active vs. inactive stuff ? I
>>>> don't know, and so far, I've not met any. If you know of any of
>>>> them, please speak-up, I'd be interested to know how they
>>>> manage their data.
>>>>
>>>> Back in eight, the trouble was not that big: The database was
>>>> only there to pass information from schedulers to builders.
>>>> Neither schedulers, not builders, nor ... where put in the db,
>>>> they belonged to the personal data of the master that was
>>>> responsible for them. If that master disappeared, so did that
>>>> information. The old builders (and builds) did not disappeared
>>>> from the disk, but they were not visible any more in the web
>>>> interface, as the master knew which information to show.
>>>>
>>>> My idea is quite simple (in theory), I believe the main trouble
>>>> is that no one has authority on all the master: hence I propose
>>>> introducing a 'supervisor' that would be the only one to know
>>>> about the configuration, and manages the master(s). The
>>>> configuration would probably gain some 'sections' (one per
>>>> master), so that the supervisor knows what part to send to
>>>> which master. For instance, the master responsible for the web
>>>> interface would get a list of active identities (schedulers,
>>>> slaves, builders, ...) and just show them.
>>>>
>>>> I'm convinced that this solution could completely solve the
>>>> trouble I've identified, however, it's not an easy one, it
>>>> involves quite a few modifications (not **too** much, the goal
>>>> is to keep is as small as possible - KISS), and they come a a
>>>> price, namely time ...
>>>>
>>>> Do you have an other / better idea ?
>>>>
>>>> Thanks for reading so far.
>>>>
>>>> Best Regards, Ben.
>>>>
>>>> [0] #1380, #1382, #1384, #1385, #1886, #1887 (and a few more to
>>>> come) [1] TRAC-2959 [2] TRAC-3012
>>>>
>>>>
>>> ------------------------------------------------------------------------------
>>>>
>>>
> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
>>>> from Actuate! Instantly Supercharge Your Business Reports and
>>>> Dashboards with Interactivity, Sharing, Native Excel Exports,
>>>> App Integration & more Get technology previously reserved for
>>>> billion-dollar corporations, FREE
>>>>
>>> http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk
>>>>
>>>
> _______________________________________________
>>>> Buildbot-devel mailing list
>>>> Buildbot-devel at lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/buildbot-devel
>>>
>>>
>>> ------------------------------------------------------------------------------
>>>
>>>
> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
>>> from Actuate! Instantly Supercharge Your Business Reports and
>>> Dashboards with Interactivity, Sharing, Native Excel Exports, App
>>> Integration & more Get technology previously reserved for
>>> billion-dollar corporations, FREE
>>>
>>> http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk
>>>
>>>
> _______________________________________________
>>> Buildbot-devel mailing list Buildbot-devel at lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/buildbot-devel
>>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>>
>>
> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
>> from Actuate! Instantly Supercharge Your Business Reports and
>> Dashboards with Interactivity, Sharing, Native Excel Exports, App
>> Integration & more Get technology previously reserved for
>> billion-dollar corporations, FREE
>> http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk
>>
>>
>>
>>
>> _______________________________________________ Buildbot-devel
>> mailing list Buildbot-devel at lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/buildbot-devel
>>
>
>
> - --
> Benoît Allard (B30A05B0)|Greenbone Networks GmbH|http://greenbone.net
> Neuer Graben 17, 49074 Osnabrück, Germany | AG Osnabrück, HR B 202460
> Executive Directors: Lukas Grunwald, Dr. Jan-Oliver Wagner
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.12 (GNU/Linux)
>
> iQEbBAEBAgAGBQJUdblgAAoJEHZCfVOzCgWw3ZMH+MvBJTCZMuCKyHA0hqN6jvAQ
> y+0GWXy5ojGuKfN0olgFlwqig+HzNIgN/XOTW3eTSfcZlharlmlqHXHSuCGF4yEf
> ctY+YNGebIv64hE+41asEvvgf2zWlr/wqHa/h2+eKWvFSUfp4RK+co5FNSQeNeU1
> /rB+RHx1G3NWe7os6z9fmehXF+8hTsPpVuLZFoQmC7h0BFeqYOXPNDB6rX4EDyGp
> XZWi785KqcVnOtcwe5Xt4pRxRNiYOsNbaohcHkjnv5nWUnzP/pKtpAKAF3VnIJ+r
> fKLBecXv5Z9P6prXlc+5M1UVQ33C4tB11bkTVDqgvihVUaKwW3gpOPjQP6kPRw==
> =ek2B
> -----END PGP SIGNATURE-----
More information about the devel
mailing list