[users at bb.net] buildbot CPU usage

Tue Aug 2 17:28:25 UTC 2016

Just one. Here is what the poller looks like

s = changes.P4Source(
    p4port=config.p4_server,
    p4user=config.p4_user,
    p4passwd=config.p4_password,
    p4base='//XXXXXXX/',
    pollInterval=10,
    pollAtLaunch = False,
    split_file=lambda branchfile: branchfile.split('/',1),
    encoding='cp437'
)

On Tue, Aug 2, 2016 at 7:24 PM, Pierre Tardy <tardyp at gmail.com> wrote:

> How many projects are your pulling? I'll see if I can make a PoC of a
> builder which runs statprof
>
> Le mar. 2 août 2016 à 18:53, Francesco Di Mizio <
> francescodimizio at gmail.com> a écrit :
>
>> Thanks for the kind replies both of you.
>>
>> @Pierre:
>> Not sure I get what you mean. Given the context, for a step to be CPU
>> demanding it should be a master side step right? I happen to not have any.
>> What would you be profiling with statprof?
>> I'd really appreciate if you could elaborate on your idea.
>>
>> Really all I can think of is the poller. I'll keep looking into it.
>>
>>
>>
>> On Tue, Aug 2, 2016 at 6:36 PM, Dan Kegel <dank at kegel.com> wrote:
>>
>>> With gitpoller, it was easy to see; whenever the number of git
>>> sessions from the poller went over 0 or so, web gui performance was
>>> poor.
>>> And if it went over 10, well, you could kiss the gui goodbye for
>>> several minutes.
>>>
>>> One countermeasure was to randomize the polling intervals, a la
>>>
>>>             interval=6  # minutes
>>>             self['change_source'].append(
>>>                 # Fuzz the interval to avoid slamming the git server
>>> and hitting the MaxStartups or MaxSessions limits
>>>                 # If you hit them, twistd.log will have lots of
>>> "ssh_exchange_identification: Connection closed by remote host" errors
>>>                 # See http://trac.buildbot.net/ticket/2480
>>>                 changes.GitPoller(repourl,  branches=branchnames,
>>> workdir='gitpoller-workdir-'+name, pollinterval=interval*60 +
>>> random.uniform(-10, 10)))
>>>
>>> That made life just barely bearable, at least until number of projects
>>> polled was under 50 or so.
>>> What really helped was not using pollers anymore, and switching to
>>> gitlab's webhooks.
>>> We're at 190 now, of which 57 are still using gitpoller, and it's
>>> almost ok.  (I really have
>>> to move the last 57 onto gitlab.  Or, well, since they're not
>>> critical, increase the polling
>>> interval...)
>>>
>>> On Tue, Aug 2, 2016 at 9:13 AM, Pierre Tardy <tardyp at gmail.com> wrote:
>>> > Hi,
>>> >
>>> > Pollers are usually indeed not  scaling as they, hmm, poll.
>>> > What you are describing here is hints that the twisted reactor thread
>>> is
>>> > always busy, which should not happen if you only start 10 builds.
>>> > You might have some custom steps which are doing something heavily cpu
>>> bound
>>> > in the main thread.
>>> > What I usually do is to use statprof:
>>> > https://pypi.python.org/pypi/statprof/
>>> >
>>> > in order to know what the cpu is doing.
>>> > You could create a builder which you can trig whenever you need, and
>>> which
>>> > would start the profiling, wait a few minutes, and then save profiling
>>> to a
>>> > file.
>>> >
>>> >
>>> >
>>> > Le mar. 2 août 2016 à 17:53, Francesco Di Mizio <
>>> francescodimizio at gmail.com>
>>> > a écrit :
>>> >>
>>> >> Hey Dan,
>>> >>
>>> >> I am using a p4 poller. Maybe it's suffering from the same problems?
>>> >>
>>> >> On Tue, Aug 2, 2016 at 5:45 PM, Francesco Di Mizio
>>> >> <francescodimizio at gmail.com> wrote:
>>> >>>
>>> >>> I'd like to provide a bit more context.Right after restarting the
>>> master
>>> >>> and kicking off 10 builds CPU was at 110-120%. This made the UI
>>> unusable and
>>> >>> basically all the services were stuck, including the REST API.
>>> >>> After 3-4 minutes like this and WITH all the 10 builds still running
>>> the
>>> >>> CPU usage went down to 5%, stayed there for 5 minutes and all was
>>> smooth and
>>> >>> quick again. From then on it keps oscillating, I've seen spikes of
>>> 240% :(
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>> On Tue, Aug 2, 2016 at 4:12 PM, Francesco Di Mizio
>>> >>> <francescodimizio at gmail.com> wrote:
>>> >>>>
>>> >>>> Sometimes it goes up to 140%. I was not able to relate this with a
>>> >>>> particular builds condition - seems like it can happen any time and
>>> is not
>>> >>>> related to how many builds are going on.
>>> >>>>
>>> >>>> I usually realize the server got into this state because the web UI
>>> gets
>>> >>>> stuck. As soon as the CPU% goes back to normal values (2-3% most
>>> times) the
>>> >>>> web finishes loading just instantly.
>>> >>>>
>>> >>>> Any pointers as to what might be causing this? Only reason I can
>>> think
>>> >>>> of is too many people trying to access the web UI simultaniously -
>>> may I be
>>> >>>> right?
>>> >>>>
>>> >>>
>>> >>
>>> >> _______________________________________________
>>> >> users mailing list
>>> >> users at buildbot.net
>>> >> https://lists.buildbot.net/mailman/listinfo/users
>>> >
>>> >
>>> > _______________________________________________
>>> > users mailing list
>>> > users at buildbot.net
>>> > https://lists.buildbot.net/mailman/listinfo/users
>>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.buildbot.net/pipermail/users/attachments/20160802/81542e74/attachment.html>