[users at bb.net] Buildbot scaling limits

Thu Mar 23 21:28:19 UTC 2017

Glad this helps.

Since you are looking at that, I would be interested to get similar numbers
for other CI systems, to see where we are.

Pierre

Le jeu. 23 mars 2017 à 22:21, Finn Herzfeld <fherzfeld at splunk.com> a écrit :

> Thanks, this is exactly the sort of numbers I was looking for. Very much
> appriciated
>
>
>
> *From: *Pierre Tardy <tardyp at gmail.com>
> *Date: *Thursday, March 23, 2017 at 2:01 PM
> *To: *Finn Herzfeld <fherzfeld at splunk.com>, "users at buildbot.net" <
> users at buildbot.net>
> *Subject: *Re: [users at bb.net] Buildbot scaling limits
>
>
>
> Hi Finn,
>
>
>
> I did start a while ago a scalability study on Buildbot nine, which I did
> not have time to completely finish yet.
>
>
>
> When looking at the profiles of Buildbot under load, we can see that the
> most important load contributor is the compression and storage of the logs
> in the sql db
>
> https://github.com/tardyp/buildbot_profiler
>
>
>
> In order to assess how much the system can keep up with, I started up to
> 300 workers, and a variable number of masters.
>
> The builders are configured to output 40000 lines of log as fast as
> possible.
>
> I used an experimental marathon setup that we have at work, and I actually
> ran into stability issues of our nixy loadbalancing system, while running
> those tests.
>
> But Buildbot did run flawlessly in this setup.
>
>
>
> One the maximum load, I saw the postgresql running at max 80%cpu, while 4
> masters were at full cpu load.
>
>
>
> During that test, the whole system was able to eat 50000 lines of logs per
> seconds (20MB/s). As a log processing company, this might sound a bit few,
> but this is our current state.
>
> I wasn't able to run with more masters and workers yet, as the marathon
> setup I used for this test wasn't big enough.
>
> I really need to publish the numbers officially, though. But this will
> take a bit more time, as I want to also capture the cpu and network
> throughput stats during the test.
>
>
>
> From the theorical point of view:
>
> - buildbot is horizontally scalable, and you can create as much master as
> you want
>
> - buildbot requires a sql database to store the data, this is for me the
> major brake in term of horizontal scalability. I did not manage yet to
> fully load postgresql though.
>
> - there is also crossbar as a single point of failure, but given the
> number of messages that buildbot generates, and the perf numbers that
> crossbar publishes, I dont think it will be a proble,
>
>
>
>
>
> I know that there are some other people on that list that run multimaster
> on prod, and which can give more inputs.
>
> From what I can tell of the feedbacks he has been given here, the
> multimaster process was not very much the issue he had. I can remember a
> lot of discussion with live reconfiguration of schedulers, which we finally
> fixed in 0.9.4.
>
>
>
> We also have the usage statistics that we collect since 0.9.2.
>
>
>
> Here is the number master running on buildbot, sorting by number of
> builders. (for people who publish their stats)
>
> *plugins.buildbot/config/BuilderConfig: Descending*
>
> *Unique count of installid.raw*
>
> 114
>
> 3
>
> 102
>
> 2
>
> 98
>
> 1
>
> 94
>
> 13
>
> 93
>
> 1
>
> 90
>
> 1
>
> 84
>
> 9
>
> 82
>
> 21
>
> 81
>
> 1
>
> And here the number of master installation sorted by number of worker
> attached, again this is for the people that publish. This is why you don't
> see my tests here.
>
> *plugins.buildbot/worker/base/Worker: Descending*
>
> *Unique count of installid.raw*
>
> 52
>
> 5
>
> 49
>
> 5
>
> 46
>
> 13
>
> 27
>
> 35
>
> 26
>
> 15
>
> 22
>
> 5
>
> 14
>
> 17
>
> 13
>
> 24
>
> 10
>
> 6
>
> We don't upload data on multi-master, so I can't tell how many multimaster
> instance we have.
>
>
>
>
>
> I would love to see more people using Buildbot at large scale, and help
> fixing the performance issues.
>
>
>
> Regards,
>
> Pierre
>
>
>
> Le jeu. 23 mars 2017 à 19:26, Finn Herzfeld <fherzfeld at splunk.com> a
> écrit :
>
> Hey all,
>
> We’re evaluating using Buildbot here at Splunk, we think it’s really cool
> and could potentially solve a lot of our problems. One of the questions
> that’s come up is how large it can scale in a multi-master environment. How
> large have others scaled it? What resources are going to be constrained
> when scaling it? What issues are people running into with the current state
> of multi-master? I see that it’s is considered experimental.
>
>
>
> Thanks,
>
> Finn Herzfeld
>
> _______________________________________________
> users mailing list
> users at buildbot.net
> https://lists.buildbot.net/mailman/listinfo/users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.buildbot.net/pipermail/users/attachments/20170323/0e170104/attachment.html>