[Buildbot-devel] Practical limit to number of builders?

Dan Kegel dank at kegel.com
Fri Oct 31 20:35:12 UTC 2014


I wonder if it's the sheer number of parallel git processes.  I should
try adding a wrapper around git to limit the number running at once to
one or two and see if that helps.

On Mon, Oct 27, 2014 at 6:37 PM, Dustin J. Mitchell <dustin at v.igoro.us> wrote:
> The git polling is almost entirely accomplished by forking 'git' processes
> -- there's *very* little processing done by Buildbot itself.  And the
> waterfall shouldn't multi-thread at all: it reads almost entirely from
> pickles, not the database, and thus shouldn't even get any parallelism from
> concurrent database queries.
>
> Note that build pickles are cached pretty heavily.  Is it possible that the
> difference you're observing has to do with whether the cache is hot or cold?
>
> Dustin
>
> On Mon Oct 27 2014 at 2:00:25 PM Dan Kegel <dank at kegel.com> wrote:
>>
>> On Fri, Oct 3, 2014 at 6:33 PM, Mikhail Sobolev <mss at mawhrin.net> wrote:
>> >> > What kind of database do you use?
>> >>
>> >> sqlite.  Think that could be the problem?
>> > This is definitely one thing to check: sqlite is pretty OK for basic
>> > needs, and your needs do not seem to be that basic.
>>
>> I looked at this a bit more.  When rendering the waterfall:
>>
>> If 'top' shows system is idle before clicking 'reload' on waterfall page,
>> the render finishes "quickly" (about ten seconds), and twisted uses
>> 330% CPU (so it multithreading nicely?).
>> (This is so even if a du is keeping the disk busy.)
>>
>> If 'top' showed buildbot was doing git polling (i.e. about 100% cpu
>> use in twistd
>> and 3-10 'git' instances and/or zombies), the render finishes "slowly"
>> (about 35 seconds).
>> Fewer git instances -> render finishes faster.
>>
>> So git polling appears to be slowing down the waterfall significantly.
>>
>> Is there a more efficient way to do large numbers of git polls?
>>
>> I also profiled the system a bit to check whether sqlite was slow, using
>> $ perf record -e cpu-clock -v -a -g sleep 20
>> $ perf report
>> while restarting the master:
>>
>> Events: 79K cpu-clock
>> +  24.62%         twistd  [kernel.kallsyms]         [k] __ticket_spin_lock
>> +  22.23%        swapper  [kernel.kallsyms]         [k] native_safe_halt
>> +  10.01%         twistd  [kernel.kallsyms]         [k]
>> _raw_spin_unlock_irqrestore
>> +   9.36%         twistd  libsqlite3.so.0.8.6       [.] 0x3f55d
>> +   3.90%         twistd  python                    [.] 0x16ffcc
>> +   3.60%         twistd  [kernel.kallsyms]         [k] finish_task_switch
>>
>> Using -g seems to show that both the ticket_spin_lock and
>> raw_spin_lock_irqrestore are futex-related
>> (which makes sense if twisted is using epoll, I guess, but still seems
>> kind of high).
>>
>> Anyway, it doesn't seem offhand that sqlite is my problem...
>> - Dan
>>
>>
>> ------------------------------------------------------------------------------
>> _______________________________________________
>> Buildbot-devel mailing list
>> Buildbot-devel at lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/buildbot-devel




More information about the devel mailing list