[users at bb.net] Scaling buildbot. 7 seconds to fetch waterfall. Is 1400 builders too much for sqlite?

Dan Kegel dank at kegel.com
Thu Feb 18 17:37:47 UTC 2016


To recap, my site is having response time problems with buildbot 0.8.8
(yeah, it's old).
We have 1356 builders, and used to have 190 gitpollers,
Gitpoller overhead was killing us, and we've slowly been migrating our
git repos to gitlab so we could use webhooks, so we're now down to 72
gitpollers, and buildmaster cpu load is usually low.
Developers are happier than ever with the quick response to checkins,
and with the gitlab merge request -> buildbot try build gateway I threw together
(which finally made buildbot try builds usable by mere mortals!).

But the waterfall takes 7 seconds to fetch.  Even the /builders page
takes 5 to 6 seconds to fetch, which these days is an eternity.
Doing both at once takes 15 seconds (even on my 4-core Xeon VM).
Developers avoid the waterfall at all costs, and go straight to
individual builder pages.
But they don't have the right builder in their browser completion list all
the time, so they asked me to create a static builders.html page without status.
I now generate that on each reconfigure, and it loads in 5 milliseconds.
This made them a little happier.
I think they'd be happier still if I added static links to the builders from the
gitlab page for each project, so they could avoid the buildbot UI even more,
and stay in happy, beautiful gitlab land as long as possible.

Hmm, maybe I should assign another few cores to the buildmaster and
see what happens.

I wonder if my use of sqlite is part of the problem.  Has anyone
with > 1000 builders noticed a radical decrease in time to fetch the
waterfall upon switching from sqlite to mysql?

And how's nine coming along?   Last I heard, it still lacked a 'cancel
build' button,
which would be an issue here.
- Dan


More information about the users mailing list