[Buildbot-devel] Practical limit to number of builders?
dank at kegel.com
Fri Oct 31 20:35:12 UTC 2014
I wonder if it's the sheer number of parallel git processes. I should
try adding a wrapper around git to limit the number running at once to
one or two and see if that helps.
On Mon, Oct 27, 2014 at 6:37 PM, Dustin J. Mitchell <dustin at v.igoro.us> wrote:
> The git polling is almost entirely accomplished by forking 'git' processes
> -- there's *very* little processing done by Buildbot itself. And the
> waterfall shouldn't multi-thread at all: it reads almost entirely from
> pickles, not the database, and thus shouldn't even get any parallelism from
> concurrent database queries.
> Note that build pickles are cached pretty heavily. Is it possible that the
> difference you're observing has to do with whether the cache is hot or cold?
> On Mon Oct 27 2014 at 2:00:25 PM Dan Kegel <dank at kegel.com> wrote:
>> On Fri, Oct 3, 2014 at 6:33 PM, Mikhail Sobolev <mss at mawhrin.net> wrote:
>> >> > What kind of database do you use?
>> >> sqlite. Think that could be the problem?
>> > This is definitely one thing to check: sqlite is pretty OK for basic
>> > needs, and your needs do not seem to be that basic.
>> I looked at this a bit more. When rendering the waterfall:
>> If 'top' shows system is idle before clicking 'reload' on waterfall page,
>> the render finishes "quickly" (about ten seconds), and twisted uses
>> 330% CPU (so it multithreading nicely?).
>> (This is so even if a du is keeping the disk busy.)
>> If 'top' showed buildbot was doing git polling (i.e. about 100% cpu
>> use in twistd
>> and 3-10 'git' instances and/or zombies), the render finishes "slowly"
>> (about 35 seconds).
>> Fewer git instances -> render finishes faster.
>> So git polling appears to be slowing down the waterfall significantly.
>> Is there a more efficient way to do large numbers of git polls?
>> I also profiled the system a bit to check whether sqlite was slow, using
>> $ perf record -e cpu-clock -v -a -g sleep 20
>> $ perf report
>> while restarting the master:
>> Events: 79K cpu-clock
>> + 24.62% twistd [kernel.kallsyms] [k] __ticket_spin_lock
>> + 22.23% swapper [kernel.kallsyms] [k] native_safe_halt
>> + 10.01% twistd [kernel.kallsyms] [k]
>> + 9.36% twistd libsqlite3.so.0.8.6 [.] 0x3f55d
>> + 3.90% twistd python [.] 0x16ffcc
>> + 3.60% twistd [kernel.kallsyms] [k] finish_task_switch
>> Using -g seems to show that both the ticket_spin_lock and
>> raw_spin_lock_irqrestore are futex-related
>> (which makes sense if twisted is using epoll, I guess, but still seems
>> kind of high).
>> Anyway, it doesn't seem offhand that sqlite is my problem...
>> - Dan
>> Buildbot-devel mailing list
>> Buildbot-devel at lists.sourceforge.net
More information about the devel