[users at bb.net] Buildbot scaling limits

Finn Herzfeld fherzfeld at splunk.com
Thu Mar 23 21:21:08 UTC 2017


Thanks, this is exactly the sort of numbers I was looking for. Very much appriciated

From: Pierre Tardy <tardyp at gmail.com>
Date: Thursday, March 23, 2017 at 2:01 PM
To: Finn Herzfeld <fherzfeld at splunk.com>, "users at buildbot.net" <users at buildbot.net>
Subject: Re: [users at bb.net] Buildbot scaling limits

Hi Finn,

I did start a while ago a scalability study on Buildbot nine, which I did not have time to completely finish yet.

When looking at the profiles of Buildbot under load, we can see that the most important load contributor is the compression and storage of the logs in the sql db
https://github.com/tardyp/buildbot_profiler

In order to assess how much the system can keep up with, I started up to 300 workers, and a variable number of masters.
The builders are configured to output 40000 lines of log as fast as possible.
I used an experimental marathon setup that we have at work, and I actually ran into stability issues of our nixy loadbalancing system, while running those tests.
But Buildbot did run flawlessly in this setup.

One the maximum load, I saw the postgresql running at max 80%cpu, while 4 masters were at full cpu load.

During that test, the whole system was able to eat 50000 lines of logs per seconds (20MB/s). As a log processing company, this might sound a bit few, but this is our current state.
I wasn't able to run with more masters and workers yet, as the marathon setup I used for this test wasn't big enough.
I really need to publish the numbers officially, though. But this will take a bit more time, as I want to also capture the cpu and network throughput stats during the test.

From the theorical point of view:
- buildbot is horizontally scalable, and you can create as much master as you want
- buildbot requires a sql database to store the data, this is for me the major brake in term of horizontal scalability. I did not manage yet to fully load postgresql though.
- there is also crossbar as a single point of failure, but given the number of messages that buildbot generates, and the perf numbers that crossbar publishes, I dont think it will be a proble,


I know that there are some other people on that list that run multimaster on prod, and which can give more inputs.
From what I can tell of the feedbacks he has been given here, the multimaster process was not very much the issue he had. I can remember a lot of discussion with live reconfiguration of schedulers, which we finally fixed in 0.9.4.

We also have the usage statistics that we collect since 0.9.2.

Here is the number master running on buildbot, sorting by number of builders. (for people who publish their stats)
plugins.buildbot/config/BuilderConfig: Descending

Unique count of installid.raw

114

3

102

2

98

1

94

13

93

1

90

1

84

9

82

21

81

1

And here the number of master installation sorted by number of worker attached, again this is for the people that publish. This is why you don't see my tests here.
plugins.buildbot/worker/base/Worker: Descending

Unique count of installid.raw

52

5

49

5

46

13

27

35

26

15

22

5

14

17

13

24

10

6

We don't upload data on multi-master, so I can't tell how many multimaster instance we have.


I would love to see more people using Buildbot at large scale, and help fixing the performance issues.

Regards,
Pierre

Le jeu. 23 mars 2017 à 19:26, Finn Herzfeld <fherzfeld at splunk.com<mailto:fherzfeld at splunk.com>> a écrit :
Hey all,
We’re evaluating using Buildbot here at Splunk, we think it’s really cool and could potentially solve a lot of our problems. One of the questions that’s come up is how large it can scale in a multi-master environment. How large have others scaled it? What resources are going to be constrained when scaling it? What issues are people running into with the current state of multi-master? I see that it’s is considered experimental.

Thanks,
Finn Herzfeld
_______________________________________________
users mailing list
users at buildbot.net<mailto:users at buildbot.net>
https://lists.buildbot.net/mailman/listinfo/users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.buildbot.net/pipermail/users/attachments/20170323/0d9c1a17/attachment.html>


More information about the users mailing list