[users at bb.net] Scaling buildbot. 7 seconds to fetch waterfall. Is 1400 builders too much for sqlite?

Dan Kegel dank at kegel.com
Thu Feb 18 18:36:12 UTC 2016


On Thu, Feb 18, 2016 at 10:01 AM, Pierre Tardy <tardyp at gmail.com> wrote:
> The waterfall or builder page is for me really not useful when you have that
> much builders (eight or nine)
> I think you would probably require a specific dashboard is order to split
> your builder matrix.

We already split the waterfall into sixteen categories (by project area
and real vs. try).  Even the smallest category takes 3 seconds to fetch,
which is about 10x too long.

> Buying new cores is not something I would recommend, as buildbot is
> fundamentally monothreaded.

When doing several fetches in parallel, I see the reported CPU usage of
twistd vary between 200% and 325%, so there's at least some
parallelism going on.
And assigning two more cores is easy... I'll try.

> On buildbot eight database is really about the buildrequests, and changes. I
> dont think that switching to mysql will help at all loading waterfall.

That's good to know.

> The best for you is to run https://pypi.python.org/pypi/statprof/ over
> manhole (http://docs.buildbot.net/latest/manual/cfg-global.html#manhole)
> will definitly tell you were buildbot's code is hanging.

I hesitate to touch the master install, but maybe I can try that when I install
nine.

> You can increase the buildCacheSize, that may help to trade cpu against
> memory.

Tried that just now:
+        self['caches'] = {
+            'Builds' : 1500,      # formerly c['buildCacheSize']
+        }

Didn't seem to help; /builders still takes 5-6 seconds to fetch.  (Setting both
yielded error "cannot specify c['caches'] and c['buildCacheSize']"
so I guess I put it in the right place.)
twistd's resident working set is 461MB, golly.

> As for nine, we are approaching a release, cancel/stop have been working for
> 6+ month.
> We have to see how ui will work with that many builders. For sure it will
> never hang the master process for 7 seconds, but we might have to work
> together in order to optimize some parts.

I've been holding off upgrading partly because I'll need to redo how I do
ephemeral slaves^Wworkers, but maybe now's the time, even
if it doesn't help this performance problem.

Another alternative is to split the master, but then I'd have to split the
slave sets, too, which would hurt build times.
- Dan


More information about the users mailing list