[Buildbot-devel] Practical limit to number of builders?

Dustin J. Mitchell dustin at v.igoro.us
Tue Oct 28 01:37:53 UTC 2014


The git polling is almost entirely accomplished by forking 'git' processes
-- there's *very* little processing done by Buildbot itself.  And the
waterfall shouldn't multi-thread at all: it reads almost entirely from
pickles, not the database, and thus shouldn't even get any parallelism from
concurrent database queries.

Note that build pickles are cached pretty heavily.  Is it possible that the
difference you're observing has to do with whether the cache is hot or cold?

Dustin

On Mon Oct 27 2014 at 2:00:25 PM Dan Kegel <dank at kegel.com> wrote:

> On Fri, Oct 3, 2014 at 6:33 PM, Mikhail Sobolev <mss at mawhrin.net> wrote:
> >> > What kind of database do you use?
> >>
> >> sqlite.  Think that could be the problem?
> > This is definitely one thing to check: sqlite is pretty OK for basic
> > needs, and your needs do not seem to be that basic.
>
> I looked at this a bit more.  When rendering the waterfall:
>
> If 'top' shows system is idle before clicking 'reload' on waterfall page,
> the render finishes "quickly" (about ten seconds), and twisted uses
> 330% CPU (so it multithreading nicely?).
> (This is so even if a du is keeping the disk busy.)
>
> If 'top' showed buildbot was doing git polling (i.e. about 100% cpu
> use in twistd
> and 3-10 'git' instances and/or zombies), the render finishes "slowly"
> (about 35 seconds).
> Fewer git instances -> render finishes faster.
>
> So git polling appears to be slowing down the waterfall significantly.
>
> Is there a more efficient way to do large numbers of git polls?
>
> I also profiled the system a bit to check whether sqlite was slow, using
> $ perf record -e cpu-clock -v -a -g sleep 20
> $ perf report
> while restarting the master:
>
> Events: 79K cpu-clock
> +  24.62%         twistd  [kernel.kallsyms]         [k] __ticket_spin_lock
> +  22.23%        swapper  [kernel.kallsyms]         [k] native_safe_halt
> +  10.01%         twistd  [kernel.kallsyms]         [k]
> _raw_spin_unlock_irqrestore
> +   9.36%         twistd  libsqlite3.so.0.8.6       [.] 0x3f55d
> +   3.90%         twistd  python                    [.] 0x16ffcc
> +   3.60%         twistd  [kernel.kallsyms]         [k] finish_task_switch
>
> Using -g seems to show that both the ticket_spin_lock and
> raw_spin_lock_irqrestore are futex-related
> (which makes sense if twisted is using epoll, I guess, but still seems
> kind of high).
>
> Anyway, it doesn't seem offhand that sqlite is my problem...
> - Dan
>
> ------------------------------------------------------------
> ------------------
> _______________________________________________
> Buildbot-devel mailing list
> Buildbot-devel at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/buildbot-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://buildbot.net/pipermail/devel/attachments/20141028/692ad5d5/attachment.html>


More information about the devel mailing list