[users at bb.net] Buildbot scaling limits

Pierre Tardy tardyp at gmail.com
Thu Mar 23 21:01:51 UTC 2017


Hi Finn,

I did start a while ago a scalability study on Buildbot nine, which I did
not have time to completely finish yet.

When looking at the profiles of Buildbot under load, we can see that the
most important load contributor is the compression and storage of the logs
in the sql db
https://github.com/tardyp/buildbot_profiler

In order to assess how much the system can keep up with, I started up to
300 workers, and a variable number of masters.
The builders are configured to output 40000 lines of log as fast as
possible.
I used an experimental marathon setup that we have at work, and I actually
ran into stability issues of our nixy loadbalancing system, while running
those tests.
But Buildbot did run flawlessly in this setup.

One the maximum load, I saw the postgresql running at max 80%cpu, while 4
masters were at full cpu load.

During that test, the whole system was able to eat 50000 lines of logs per
seconds (20MB/s). As a log processing company, this might sound a bit few,
but this is our current state.
I wasn't able to run with more masters and workers yet, as the marathon
setup I used for this test wasn't big enough.
I really need to publish the numbers officially, though. But this will take
a bit more time, as I want to also capture the cpu and network throughput
stats during the test.

>From the theorical point of view:
- buildbot is horizontally scalable, and you can create as much master as
you want
- buildbot requires a sql database to store the data, this is for me the
major brake in term of horizontal scalability. I did not manage yet to
fully load postgresql though.
- there is also crossbar as a single point of failure, but given the number
of messages that buildbot generates, and the perf numbers that crossbar
publishes, I dont think it will be a proble,


I know that there are some other people on that list that run multimaster
on prod, and which can give more inputs.
>From what I can tell of the feedbacks he has been given here, the
multimaster process was not very much the issue he had. I can remember a
lot of discussion with live reconfiguration of schedulers, which we finally
fixed in 0.9.4.

We also have the usage statistics that we collect since 0.9.2.

Here is the number master running on buildbot, sorting by number of
builders. (for people who publish their stats)
plugins.buildbot/config/BuilderConfig: Descending  Unique count of
installid.raw
114 3
102 2
98 1
94 13
93 1
90 1
84 9
82 21
81 1And here the number of master installation sorted by number of worker
attached, again this is for the people that publish. This is why you don't
see my tests here.
plugins.buildbot/worker/base/Worker: Descending  Unique count of
installid.raw
52 5
49 5
46 13
27 35
26 15
22 5
14 17
13 24
10 6We don't upload data on multi-master, so I can't tell how many
multimaster instance we have.


I would love to see more people using Buildbot at large scale, and help
fixing the performance issues.

Regards,
Pierre

Le jeu. 23 mars 2017 à 19:26, Finn Herzfeld <fherzfeld at splunk.com> a écrit :

> Hey all,
>
> We’re evaluating using Buildbot here at Splunk, we think it’s really cool
> and could potentially solve a lot of our problems. One of the questions
> that’s come up is how large it can scale in a multi-master environment. How
> large have others scaled it? What resources are going to be constrained
> when scaling it? What issues are people running into with the current state
> of multi-master? I see that it’s is considered experimental.
>
>
>
> Thanks,
>
> Finn Herzfeld
> _______________________________________________
> users mailing list
> users at buildbot.net
> https://lists.buildbot.net/mailman/listinfo/users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.buildbot.net/pipermail/users/attachments/20170323/57064434/attachment.html>


More information about the users mailing list