[users at bb.net] buildbot CPU usage

Francesco Di Mizio francescodimizio at gmail.com
Tue Aug 9 08:16:26 UTC 2016


2016-08-08 23:55:01+0000 [-] P4 poll failed on
atx-p4-buildproxy.rsi.global:1666, //starcitizen/
Traceback (most recent call last):
 File "/usr/local/lib/python2.7/dist-packages/buildbot/changes/base.py",
line 65, in doPoll
   d = defer.maybeDeferred(self.poll)
 File "/usr/local/lib/python2.7/dist-packages/twisted/internet/defer.py",
line 150, in maybeDeferred
   result = f(*args, **kw)
 File
"/usr/local/lib/python2.7/dist-packages/buildbot/changes/p4poller.py", line
162, in poll
   d = self._poll()
 File "/usr/local/lib/python2.7/dist-packages/twisted/internet/defer.py",
line 1274, in unwindGenerator
   return _inlineCallbacks(None, gen, Deferred())
--- <exception caught here> ---
 File "/usr/local/lib/python2.7/dist-packages/twisted/internet/defer.py",
line 1128, in _inlineCallbacks
   result = g.send(result)
 File
"/usr/local/lib/python2.7/dist-packages/buildbot/changes/p4poller.py", line
232, in _poll
   result = yield self._get_process_output(args)
 File
"/usr/local/lib/python2.7/dist-packages/buildbot/changes/p4poller.py", line
170, in _get_process_output
   d = utils.getProcessOutput(self.p4bin, args, env)
 File "/usr/local/lib/python2.7/dist-packages/twisted/internet/utils.py",
line 128, in getProcessOutput
   reactor)
 File "/usr/local/lib/python2.7/dist-packages/twisted/internet/utils.py",
line 28, in _callProtocolWithDeferred
   reactor.spawnProcess(p, executable, (executable,)+tuple(args), env, path)
 File
"/usr/local/lib/python2.7/dist-packages/twisted/internet/posixbase.py",
line 340, in spawnProcess
   processProtocol, uid, gid, childFDs)
 File "/usr/local/lib/python2.7/dist-packages/twisted/internet/process.py",
line 731, in __init__
   self._fork(path, uid, gid, executable, args, environment, fdmap=fdmap)
 File "/usr/local/lib/python2.7/dist-packages/twisted/internet/process.py",
line 405, in _fork
   self.pid = os.fork()
exceptions.OSError: [Errno 12] Cannot allocate memory

I'll run one day with the p4 poller disabled and see how it goes.

On Tue, Aug 2, 2016 at 7:28 PM, Francesco Di Mizio <
francescodimizio at gmail.com> wrote:

> Just one. Here is what the poller looks like
>
> s = changes.P4Source(
>     p4port=config.p4_server,
>     p4user=config.p4_user,
>     p4passwd=config.p4_password,
>     p4base='//XXXXXXX/',
>     pollInterval=10,
>     pollAtLaunch = False,
>     split_file=lambda branchfile: branchfile.split('/',1),
>     encoding='cp437'
> )
>
>
>
>
>
> On Tue, Aug 2, 2016 at 7:24 PM, Pierre Tardy <tardyp at gmail.com> wrote:
>
>> How many projects are your pulling? I'll see if I can make a PoC of a
>> builder which runs statprof
>>
>> Le mar. 2 août 2016 à 18:53, Francesco Di Mizio <
>> francescodimizio at gmail.com> a écrit :
>>
>>> Thanks for the kind replies both of you.
>>>
>>> @Pierre:
>>> Not sure I get what you mean. Given the context, for a step to be CPU
>>> demanding it should be a master side step right? I happen to not have any.
>>> What would you be profiling with statprof?
>>> I'd really appreciate if you could elaborate on your idea.
>>>
>>> Really all I can think of is the poller. I'll keep looking into it.
>>>
>>>
>>>
>>> On Tue, Aug 2, 2016 at 6:36 PM, Dan Kegel <dank at kegel.com> wrote:
>>>
>>>> With gitpoller, it was easy to see; whenever the number of git
>>>> sessions from the poller went over 0 or so, web gui performance was
>>>> poor.
>>>> And if it went over 10, well, you could kiss the gui goodbye for
>>>> several minutes.
>>>>
>>>> One countermeasure was to randomize the polling intervals, a la
>>>>
>>>>             interval=6  # minutes
>>>>             self['change_source'].append(
>>>>                 # Fuzz the interval to avoid slamming the git server
>>>> and hitting the MaxStartups or MaxSessions limits
>>>>                 # If you hit them, twistd.log will have lots of
>>>> "ssh_exchange_identification: Connection closed by remote host" errors
>>>>                 # See http://trac.buildbot.net/ticket/2480
>>>>                 changes.GitPoller(repourl,  branches=branchnames,
>>>> workdir='gitpoller-workdir-'+name, pollinterval=interval*60 +
>>>> random.uniform(-10, 10)))
>>>>
>>>> That made life just barely bearable, at least until number of projects
>>>> polled was under 50 or so.
>>>> What really helped was not using pollers anymore, and switching to
>>>> gitlab's webhooks.
>>>> We're at 190 now, of which 57 are still using gitpoller, and it's
>>>> almost ok.  (I really have
>>>> to move the last 57 onto gitlab.  Or, well, since they're not
>>>> critical, increase the polling
>>>> interval...)
>>>>
>>>> On Tue, Aug 2, 2016 at 9:13 AM, Pierre Tardy <tardyp at gmail.com> wrote:
>>>> > Hi,
>>>> >
>>>> > Pollers are usually indeed not  scaling as they, hmm, poll.
>>>> > What you are describing here is hints that the twisted reactor thread
>>>> is
>>>> > always busy, which should not happen if you only start 10 builds.
>>>> > You might have some custom steps which are doing something heavily
>>>> cpu bound
>>>> > in the main thread.
>>>> > What I usually do is to use statprof:
>>>> > https://pypi.python.org/pypi/statprof/
>>>> >
>>>> > in order to know what the cpu is doing.
>>>> > You could create a builder which you can trig whenever you need, and
>>>> which
>>>> > would start the profiling, wait a few minutes, and then save
>>>> profiling to a
>>>> > file.
>>>> >
>>>> >
>>>> >
>>>> > Le mar. 2 août 2016 à 17:53, Francesco Di Mizio <
>>>> francescodimizio at gmail.com>
>>>> > a écrit :
>>>> >>
>>>> >> Hey Dan,
>>>> >>
>>>> >> I am using a p4 poller. Maybe it's suffering from the same problems?
>>>> >>
>>>> >> On Tue, Aug 2, 2016 at 5:45 PM, Francesco Di Mizio
>>>> >> <francescodimizio at gmail.com> wrote:
>>>> >>>
>>>> >>> I'd like to provide a bit more context.Right after restarting the
>>>> master
>>>> >>> and kicking off 10 builds CPU was at 110-120%. This made the UI
>>>> unusable and
>>>> >>> basically all the services were stuck, including the REST API.
>>>> >>> After 3-4 minutes like this and WITH all the 10 builds still
>>>> running the
>>>> >>> CPU usage went down to 5%, stayed there for 5 minutes and all was
>>>> smooth and
>>>> >>> quick again. From then on it keps oscillating, I've seen spikes of
>>>> 240% :(
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>> On Tue, Aug 2, 2016 at 4:12 PM, Francesco Di Mizio
>>>> >>> <francescodimizio at gmail.com> wrote:
>>>> >>>>
>>>> >>>> Sometimes it goes up to 140%. I was not able to relate this with a
>>>> >>>> particular builds condition - seems like it can happen any time
>>>> and is not
>>>> >>>> related to how many builds are going on.
>>>> >>>>
>>>> >>>> I usually realize the server got into this state because the web
>>>> UI gets
>>>> >>>> stuck. As soon as the CPU% goes back to normal values (2-3% most
>>>> times) the
>>>> >>>> web finishes loading just instantly.
>>>> >>>>
>>>> >>>> Any pointers as to what might be causing this? Only reason I can
>>>> think
>>>> >>>> of is too many people trying to access the web UI simultaniously -
>>>> may I be
>>>> >>>> right?
>>>> >>>>
>>>> >>>
>>>> >>
>>>> >> _______________________________________________
>>>> >> users mailing list
>>>> >> users at buildbot.net
>>>> >> https://lists.buildbot.net/mailman/listinfo/users
>>>> >
>>>> >
>>>> > _______________________________________________
>>>> > users mailing list
>>>> > users at buildbot.net
>>>> > https://lists.buildbot.net/mailman/listinfo/users
>>>>
>>>
>>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.buildbot.net/pipermail/users/attachments/20160809/f0c544a4/attachment.html>


More information about the users mailing list