[Buildbot-devel] Scaling buildbot

Vitali Lovich vlovich at gmail.com
Fri Jun 12 05:57:56 UTC 2015


I have not seen any benefit to configuring any of the caching parameters & it does not seem like sqlite is a bottleneck at this time.
I have seen drastic benefit switching to a unix socket but that could be because our buildmaster runs on OS X 10.9 & I think there was some kind of bug in the TCP stack that buildbot/nginx was triggering.

A simple starting point would be to take a repeatable operation that’s slow & simply see if you can instrument it with printfs that demonstrate some action that is slow.
Then keep narrowing it down until you think you have understood the operation that’s slow.  Unfortunately, I personally have not found any decent Python profiling tools that let
me profile buildbot performance.

As with all performance issues unless you measure you will never know why & how much you are improving things.
Relying on rules of thumb can be counter-intuitive as there’s nothing to say that the rule-of-thumb is data-driven & not an artifact of some particular configuration.

FWIW here is our setup.

1 buildmaster (0.8.10) running on sqlite on an OS X VM
4 slaves (0.8.12) running directly on the HW.  2 super-beefy machines, 1 middling machine, and a few mac minis.
13 builders & I doubt that will ever grow (each project contains a travis.yaml-like file so the complexity of the build system is independent from the complexity of an individual project0. 

On a quite day handles about ~40 builds across 2 git repos.  One is hosted on gitlab using web hooks & another uses just a custom git post-receive hook.
On a more active day we’re probably up to about 80 builds.

In front of buildbot I have an nginx reverse proxy forwarding all the requests to buildbot.

-Vitali

> On Jun 11, 2015, at 9:23 PM, Dan Kegel <dank at kegel.com> wrote:
> 
> On Thu, Jun 11, 2015 at 3:50 PM, Jim Rowan <jmr at computing.com> wrote:
>> Use your favorite sysadmin tools to figure out if it’s cpu, disk, or memory bound…  (Is this a small machine?  Does it have enough memory?  Is everything on local disk, or is NFS involved?   Is your disk “fast”?)
> 
> No NFS.  The box is a bunch of Xeons running vmware instances.  This
> instance has 8 GB of RAM.  ps says:
> 
> $ ps augxw
> USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
> buildbot  4239 30.5  5.8 1705764 475356 ?      Ssl  12:58 151:04
> python twistd --pidfile
> /home/buildbot/master-state/sandbox/g-speak/twistd.pid ...
> 
> That process' cpu usage bounces around a lot; in 8 hours, it racked up
> about 2 hours of CPU time.
> When there are git zombies, twistd uses 100% of CPU.
> 
> cat /proc/cpuinfo says there are four cores of type
> Intel(R) Xeon(R) CPU           X5450  @ 3.00GHz
> which probably isn't too far off from the truth.
> 
> I think disk is fast, but, um, I haven't measured it.
> 
> ------------------------------------------------------------------------------
> _______________________________________________
> Buildbot-devel mailing list
> Buildbot-devel at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/buildbot-devel





More information about the devel mailing list