[users at bb.net] buildbot CPU usage
Pierre Tardy
tardyp at gmail.com
Tue Aug 2 17:24:46 UTC 2016
How many projects are your pulling? I'll see if I can make a PoC of a
builder which runs statprof
Le mar. 2 août 2016 à 18:53, Francesco Di Mizio <francescodimizio at gmail.com>
a écrit :
> Thanks for the kind replies both of you.
>
> @Pierre:
> Not sure I get what you mean. Given the context, for a step to be CPU
> demanding it should be a master side step right? I happen to not have any.
> What would you be profiling with statprof?
> I'd really appreciate if you could elaborate on your idea.
>
> Really all I can think of is the poller. I'll keep looking into it.
>
>
>
> On Tue, Aug 2, 2016 at 6:36 PM, Dan Kegel <dank at kegel.com> wrote:
>
>> With gitpoller, it was easy to see; whenever the number of git
>> sessions from the poller went over 0 or so, web gui performance was
>> poor.
>> And if it went over 10, well, you could kiss the gui goodbye for
>> several minutes.
>>
>> One countermeasure was to randomize the polling intervals, a la
>>
>> interval=6 # minutes
>> self['change_source'].append(
>> # Fuzz the interval to avoid slamming the git server
>> and hitting the MaxStartups or MaxSessions limits
>> # If you hit them, twistd.log will have lots of
>> "ssh_exchange_identification: Connection closed by remote host" errors
>> # See http://trac.buildbot.net/ticket/2480
>> changes.GitPoller(repourl, branches=branchnames,
>> workdir='gitpoller-workdir-'+name, pollinterval=interval*60 +
>> random.uniform(-10, 10)))
>>
>> That made life just barely bearable, at least until number of projects
>> polled was under 50 or so.
>> What really helped was not using pollers anymore, and switching to
>> gitlab's webhooks.
>> We're at 190 now, of which 57 are still using gitpoller, and it's
>> almost ok. (I really have
>> to move the last 57 onto gitlab. Or, well, since they're not
>> critical, increase the polling
>> interval...)
>>
>> On Tue, Aug 2, 2016 at 9:13 AM, Pierre Tardy <tardyp at gmail.com> wrote:
>> > Hi,
>> >
>> > Pollers are usually indeed not scaling as they, hmm, poll.
>> > What you are describing here is hints that the twisted reactor thread is
>> > always busy, which should not happen if you only start 10 builds.
>> > You might have some custom steps which are doing something heavily cpu
>> bound
>> > in the main thread.
>> > What I usually do is to use statprof:
>> > https://pypi.python.org/pypi/statprof/
>> >
>> > in order to know what the cpu is doing.
>> > You could create a builder which you can trig whenever you need, and
>> which
>> > would start the profiling, wait a few minutes, and then save profiling
>> to a
>> > file.
>> >
>> >
>> >
>> > Le mar. 2 août 2016 à 17:53, Francesco Di Mizio <
>> francescodimizio at gmail.com>
>> > a écrit :
>> >>
>> >> Hey Dan,
>> >>
>> >> I am using a p4 poller. Maybe it's suffering from the same problems?
>> >>
>> >> On Tue, Aug 2, 2016 at 5:45 PM, Francesco Di Mizio
>> >> <francescodimizio at gmail.com> wrote:
>> >>>
>> >>> I'd like to provide a bit more context.Right after restarting the
>> master
>> >>> and kicking off 10 builds CPU was at 110-120%. This made the UI
>> unusable and
>> >>> basically all the services were stuck, including the REST API.
>> >>> After 3-4 minutes like this and WITH all the 10 builds still running
>> the
>> >>> CPU usage went down to 5%, stayed there for 5 minutes and all was
>> smooth and
>> >>> quick again. From then on it keps oscillating, I've seen spikes of
>> 240% :(
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> On Tue, Aug 2, 2016 at 4:12 PM, Francesco Di Mizio
>> >>> <francescodimizio at gmail.com> wrote:
>> >>>>
>> >>>> Sometimes it goes up to 140%. I was not able to relate this with a
>> >>>> particular builds condition - seems like it can happen any time and
>> is not
>> >>>> related to how many builds are going on.
>> >>>>
>> >>>> I usually realize the server got into this state because the web UI
>> gets
>> >>>> stuck. As soon as the CPU% goes back to normal values (2-3% most
>> times) the
>> >>>> web finishes loading just instantly.
>> >>>>
>> >>>> Any pointers as to what might be causing this? Only reason I can
>> think
>> >>>> of is too many people trying to access the web UI simultaniously -
>> may I be
>> >>>> right?
>> >>>>
>> >>>
>> >>
>> >> _______________________________________________
>> >> users mailing list
>> >> users at buildbot.net
>> >> https://lists.buildbot.net/mailman/listinfo/users
>> >
>> >
>> > _______________________________________________
>> > users mailing list
>> > users at buildbot.net
>> > https://lists.buildbot.net/mailman/listinfo/users
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.buildbot.net/pipermail/users/attachments/20160802/2ee03360/attachment.html>
More information about the users
mailing list