[users at bb.net] buildbot CPU usage

Francesco Di Mizio francescodimizio at gmail.com
Tue Aug 9 12:53:20 UTC 2016


Gave a try to the profiler option that comes with twistd

twistd --nodaemon --profile=statsObj --profiler=profile -y ./buildbot.tac

it does not seem to work. Not sure it's bbot or twisted itself.

On Tue, Aug 9, 2016 at 10:16 AM, Francesco Di Mizio <
francescodimizio at gmail.com> wrote:

>
> 2016-08-08 23:55:01+0000 [-] P4 poll failed on
> atx-p4-buildproxy.rsi.global:1666, //starcitizen/
> Traceback (most recent call last):
>  File "/usr/local/lib/python2.7/dist-packages/buildbot/changes/base.py",
> line 65, in doPoll
>    d = defer.maybeDeferred(self.poll)
>  File "/usr/local/lib/python2.7/dist-packages/twisted/internet/defer.py",
> line 150, in maybeDeferred
>    result = f(*args, **kw)
>  File "/usr/local/lib/python2.7/dist-packages/buildbot/changes/p4poller.py",
> line 162, in poll
>    d = self._poll()
>  File "/usr/local/lib/python2.7/dist-packages/twisted/internet/defer.py",
> line 1274, in unwindGenerator
>    return _inlineCallbacks(None, gen, Deferred())
> --- <exception caught here> ---
>  File "/usr/local/lib/python2.7/dist-packages/twisted/internet/defer.py",
> line 1128, in _inlineCallbacks
>    result = g.send(result)
>  File "/usr/local/lib/python2.7/dist-packages/buildbot/changes/p4poller.py",
> line 232, in _poll
>    result = yield self._get_process_output(args)
>  File "/usr/local/lib/python2.7/dist-packages/buildbot/changes/p4poller.py",
> line 170, in _get_process_output
>    d = utils.getProcessOutput(self.p4bin, args, env)
>  File "/usr/local/lib/python2.7/dist-packages/twisted/internet/utils.py",
> line 128, in getProcessOutput
>    reactor)
>  File "/usr/local/lib/python2.7/dist-packages/twisted/internet/utils.py",
> line 28, in _callProtocolWithDeferred
>    reactor.spawnProcess(p, executable, (executable,)+tuple(args), env,
> path)
>  File "/usr/local/lib/python2.7/dist-packages/twisted/internet/posixbase.py",
> line 340, in spawnProcess
>    processProtocol, uid, gid, childFDs)
>  File "/usr/local/lib/python2.7/dist-packages/twisted/internet/process.py",
> line 731, in __init__
>    self._fork(path, uid, gid, executable, args, environment, fdmap=fdmap)
>  File "/usr/local/lib/python2.7/dist-packages/twisted/internet/process.py",
> line 405, in _fork
>    self.pid = os.fork()
> exceptions.OSError: [Errno 12] Cannot allocate memory
>
> I'll run one day with the p4 poller disabled and see how it goes.
>
> On Tue, Aug 2, 2016 at 7:28 PM, Francesco Di Mizio <
> francescodimizio at gmail.com> wrote:
>
>> Just one. Here is what the poller looks like
>>
>> s = changes.P4Source(
>>     p4port=config.p4_server,
>>     p4user=config.p4_user,
>>     p4passwd=config.p4_password,
>>     p4base='//XXXXXXX/',
>>     pollInterval=10,
>>     pollAtLaunch = False,
>>     split_file=lambda branchfile: branchfile.split('/',1),
>>     encoding='cp437'
>> )
>>
>>
>>
>>
>>
>> On Tue, Aug 2, 2016 at 7:24 PM, Pierre Tardy <tardyp at gmail.com> wrote:
>>
>>> How many projects are your pulling? I'll see if I can make a PoC of a
>>> builder which runs statprof
>>>
>>> Le mar. 2 août 2016 à 18:53, Francesco Di Mizio <
>>> francescodimizio at gmail.com> a écrit :
>>>
>>>> Thanks for the kind replies both of you.
>>>>
>>>> @Pierre:
>>>> Not sure I get what you mean. Given the context, for a step to be CPU
>>>> demanding it should be a master side step right? I happen to not have any.
>>>> What would you be profiling with statprof?
>>>> I'd really appreciate if you could elaborate on your idea.
>>>>
>>>> Really all I can think of is the poller. I'll keep looking into it.
>>>>
>>>>
>>>>
>>>> On Tue, Aug 2, 2016 at 6:36 PM, Dan Kegel <dank at kegel.com> wrote:
>>>>
>>>>> With gitpoller, it was easy to see; whenever the number of git
>>>>> sessions from the poller went over 0 or so, web gui performance was
>>>>> poor.
>>>>> And if it went over 10, well, you could kiss the gui goodbye for
>>>>> several minutes.
>>>>>
>>>>> One countermeasure was to randomize the polling intervals, a la
>>>>>
>>>>>             interval=6  # minutes
>>>>>             self['change_source'].append(
>>>>>                 # Fuzz the interval to avoid slamming the git server
>>>>> and hitting the MaxStartups or MaxSessions limits
>>>>>                 # If you hit them, twistd.log will have lots of
>>>>> "ssh_exchange_identification: Connection closed by remote host" errors
>>>>>                 # See http://trac.buildbot.net/ticket/2480
>>>>>                 changes.GitPoller(repourl,  branches=branchnames,
>>>>> workdir='gitpoller-workdir-'+name, pollinterval=interval*60 +
>>>>> random.uniform(-10, 10)))
>>>>>
>>>>> That made life just barely bearable, at least until number of projects
>>>>> polled was under 50 or so.
>>>>> What really helped was not using pollers anymore, and switching to
>>>>> gitlab's webhooks.
>>>>> We're at 190 now, of which 57 are still using gitpoller, and it's
>>>>> almost ok.  (I really have
>>>>> to move the last 57 onto gitlab.  Or, well, since they're not
>>>>> critical, increase the polling
>>>>> interval...)
>>>>>
>>>>> On Tue, Aug 2, 2016 at 9:13 AM, Pierre Tardy <tardyp at gmail.com> wrote:
>>>>> > Hi,
>>>>> >
>>>>> > Pollers are usually indeed not  scaling as they, hmm, poll.
>>>>> > What you are describing here is hints that the twisted reactor
>>>>> thread is
>>>>> > always busy, which should not happen if you only start 10 builds.
>>>>> > You might have some custom steps which are doing something heavily
>>>>> cpu bound
>>>>> > in the main thread.
>>>>> > What I usually do is to use statprof:
>>>>> > https://pypi.python.org/pypi/statprof/
>>>>> >
>>>>> > in order to know what the cpu is doing.
>>>>> > You could create a builder which you can trig whenever you need, and
>>>>> which
>>>>> > would start the profiling, wait a few minutes, and then save
>>>>> profiling to a
>>>>> > file.
>>>>> >
>>>>> >
>>>>> >
>>>>> > Le mar. 2 août 2016 à 17:53, Francesco Di Mizio <
>>>>> francescodimizio at gmail.com>
>>>>> > a écrit :
>>>>> >>
>>>>> >> Hey Dan,
>>>>> >>
>>>>> >> I am using a p4 poller. Maybe it's suffering from the same problems?
>>>>> >>
>>>>> >> On Tue, Aug 2, 2016 at 5:45 PM, Francesco Di Mizio
>>>>> >> <francescodimizio at gmail.com> wrote:
>>>>> >>>
>>>>> >>> I'd like to provide a bit more context.Right after restarting the
>>>>> master
>>>>> >>> and kicking off 10 builds CPU was at 110-120%. This made the UI
>>>>> unusable and
>>>>> >>> basically all the services were stuck, including the REST API.
>>>>> >>> After 3-4 minutes like this and WITH all the 10 builds still
>>>>> running the
>>>>> >>> CPU usage went down to 5%, stayed there for 5 minutes and all was
>>>>> smooth and
>>>>> >>> quick again. From then on it keps oscillating, I've seen spikes of
>>>>> 240% :(
>>>>> >>>
>>>>> >>>
>>>>> >>>
>>>>> >>>
>>>>> >>>
>>>>> >>> On Tue, Aug 2, 2016 at 4:12 PM, Francesco Di Mizio
>>>>> >>> <francescodimizio at gmail.com> wrote:
>>>>> >>>>
>>>>> >>>> Sometimes it goes up to 140%. I was not able to relate this with a
>>>>> >>>> particular builds condition - seems like it can happen any time
>>>>> and is not
>>>>> >>>> related to how many builds are going on.
>>>>> >>>>
>>>>> >>>> I usually realize the server got into this state because the web
>>>>> UI gets
>>>>> >>>> stuck. As soon as the CPU% goes back to normal values (2-3% most
>>>>> times) the
>>>>> >>>> web finishes loading just instantly.
>>>>> >>>>
>>>>> >>>> Any pointers as to what might be causing this? Only reason I can
>>>>> think
>>>>> >>>> of is too many people trying to access the web UI simultaniously
>>>>> - may I be
>>>>> >>>> right?
>>>>> >>>>
>>>>> >>>
>>>>> >>
>>>>> >> _______________________________________________
>>>>> >> users mailing list
>>>>> >> users at buildbot.net
>>>>> >> https://lists.buildbot.net/mailman/listinfo/users
>>>>> >
>>>>> >
>>>>> > _______________________________________________
>>>>> > users mailing list
>>>>> > users at buildbot.net
>>>>> > https://lists.buildbot.net/mailman/listinfo/users
>>>>>
>>>>
>>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.buildbot.net/pipermail/users/attachments/20160809/ee93f42a/attachment.html>


More information about the users mailing list