[users at bb.net] buildbot CPU usage

Francesco Di Mizio francescodimizio at gmail.com
Wed Aug 17 09:11:44 UTC 2016


Sending again this email with a link to download the profile as prob it was
too big as attachment
http://www.filedropper.com/profile_6

I was able to run twistd with cprofile in production for one day. It was
taken with twists 15.5.0 and python 2.7.10.
If you could take a look it'd be great ;)

On Wed, Aug 17, 2016 at 11:03 AM, Francesco Di Mizio <
francescodimizio at gmail.com> wrote:

> Hey Pierre,
>
> I was able to run twistd with cprofile in production for one day. It was
> taken with twists 15.5.0 and python 2.7.10.
> If you could take a look it'd be great ;)
>
> Francesco
>
> On Fri, Aug 12, 2016 at 4:26 PM, Francesco Di Mizio <
> francescodimizio at gmail.com> wrote:
>
>> If I knew how to reproduce the slowness I wouldnt have to rpofile ;)
>>
>> Jokes aside, I tried to nuke the log file and indeed it gets re-created
>> every 10 secs but the content is just always the same even though the
>> master is actually running and I see it hitting the CPU spikes.I get the
>> same things on my test env: 50 smaples and then that's all. I am 100% that
>> the log I attached before did NOT include any spikes. That's just the
>> result of a few seconds of buildbot running. Then you know... somehow it
>> doesnt update.
>>
>> If you think it can make a difference I can try the new snippet (is it
>> incomplete? It's missing dump_stats function)
>>
>>
>>
>>
>> On Fri, Aug 12, 2016 at 3:48 PM, Pierre Tardy <tardyp at gmail.com> wrote:
>>
>>> Hy Francesco,
>>>
>>> This is weird, as on my environment I can see the sample count growing.
>>> It only grows if the process is not idle, though
>>> So I have to use the master ui to actually make the profile change on my
>>> test environment.
>>> Your profile looks like fairly idle. I can see pprint and write to
>>> logfile which might be not scalable if the data start growing do you have
>>> some heavy debug log in your prod?
>>>
>>> Here is an updated more simple snippet, but this should not change many
>>> thing.
>>> You can improve the TimerService '10' value to something bigger in order
>>> to dump less often.
>>> Are you sure you did reproduce the slowness while profiling?
>>>
>>>
>>> import statprof
>>> from  twisted.application.internet import TimerService
>>> from buildbot.util.service import BuildbotService
>>> from twisted.internet import reactor
>>>
>>> class BuildbotTimerService(TimerService, BuildbotService):
>>>     name = "timer"
>>>     def __init__(self):
>>>         BuildbotService.__init__(self)
>>>         TimerService.__init__(self, 10, self.dump_stats)
>>>         reactor.callFromThread(statprof.start)
>>>
>>>     def dump_stats(self):
>>>         statprof.stop()
>>>         with open("profile.txt", "w") as f:
>>>             statprof.display(f)
>>>             statprof.display()
>>>         statprof.start()
>>>
>>> c['services'] = [ BuildbotTimerService() ]
>>>
>>>
>>> Le ven. 12 août 2016 à 14:21, Francesco Di Mizio <
>>> francescodimizio at gmail.com> a écrit :
>>>
>>>> Sure thing. it's attached.
>>>> Please note the sampe count has been stuck at 51 for over 1 hour even
>>>> if the master is running.
>>>>
>>>> On Fri, Aug 12, 2016 at 12:38 PM, Pierre Tardy <tardyp at gmail.com>
>>>> wrote:
>>>>
>>>>> The first lines are the important ones. Can you send the full file?
>>>>>
>>>>> Le ven. 12 août 2016 12:23, Francesco Di Mizio <
>>>>> francescodimizio at gmail.com> a écrit :
>>>>>
>>>>>> It does work but statprof seems to stop getting samples. Last few
>>>>>> lines from it:
>>>>>>
>>>>>> 2016-08-12 10:19:43+0000 [-]   0.00      0.06      0.00
>>>>>>  banana.py:167:gotItem
>>>>>> 2016-08-12 10:19:43+0000 [-]   0.00      0.01      0.00
>>>>>>  policies.py:718:setTimeout
>>>>>> 2016-08-12 10:19:43+0000 [-]   0.00      0.01      0.00
>>>>>>  encoder.py:434:_iterencode
>>>>>> 2016-08-12 10:19:43+0000 [-] ---
>>>>>> 2016-08-12 10:19:43+0000 [-] Sample count: 51
>>>>>> 2016-08-12 10:19:43+0000 [-] Total time: 0.510000 seconds
>>>>>>
>>>>>> The server is running.
>>>>>>
>>>>>> On Thu, Aug 11, 2016 at 10:46 AM, Francesco Di Mizio <
>>>>>> francescodimizio at gmail.com> wrote:
>>>>>>
>>>>>>> Pierre many thanks for that. I do not remember why the force
>>>>>>> scheduler api didnt suite my needs, that was my first approach.
>>>>>>> I will try the new snippet when I get to work in a few hours.
>>>>>>>
>>>>>>> On Wed, Aug 10, 2016 at 11:18 PM, Pierre Tardy <tardyp at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> After carefully looking at your code, it does not look to have
>>>>>>>> obvious performance flaw.
>>>>>>>> I think it would have been cheaper for you to just use the
>>>>>>>> forcescheduler REST api, but if that code works for you this is totally
>>>>>>>> fine.
>>>>>>>>
>>>>>>>> I saw that you did not manage to get sample in your prod as well,
>>>>>>>> so I did look at my code again, and now I am able to see the samples.
>>>>>>>>
>>>>>>>> statprof does not like to stop before start
>>>>>>>>
>>>>>>>> Please try again with this code:
>>>>>>>>
>>>>>>>>
>>>>>>>> import statprof
>>>>>>>> from  twisted.application.internet import TimerService
>>>>>>>> from buildbot.util.service import BuildbotService
>>>>>>>> class BuildbotTimerService(TimerService, BuildbotService):
>>>>>>>>     name = "timer"
>>>>>>>>     def __init__(self):
>>>>>>>>         BuildbotService.__init__(self)
>>>>>>>>         TimerService.__init__(self, 10, self.dump_stats)
>>>>>>>>         self.started = False
>>>>>>>>     def dump_stats(self):
>>>>>>>>         if not self.started:
>>>>>>>>             statprof.start()
>>>>>>>>             self.started = True
>>>>>>>>             return
>>>>>>>>         statprof.stop()
>>>>>>>>         with open("profile.txt", "w") as f:
>>>>>>>>             statprof.display(f)
>>>>>>>>             statprof.display()
>>>>>>>>         statprof.start()
>>>>>>>>
>>>>>>>> c['services'] = [ BuildbotTimerService() ]
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Le mer. 10 août 2016 à 14:14, Francesco Di Mizio <
>>>>>>>> francescodimizio at gmail.com> a écrit :
>>>>>>>>
>>>>>>>>> Many thanks. Find it attached.
>>>>>>>>>
>>>>>>>>> It's a tiny http server. People use a web page to send an HTTP
>>>>>>>>> request to it to kick off builds.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Aug 10, 2016 at 1:55 PM, Pierre Tardy <tardyp at gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> I'll take a look.
>>>>>>>>>>
>>>>>>>>>> Le mer. 10 août 2016 12:45, Francesco Di Mizio <
>>>>>>>>>> francescodimizio at gmail.com> a écrit :
>>>>>>>>>>
>>>>>>>>>>> I am starting to think it may be due to my own custom scheduler.
>>>>>>>>>>> If I uploaded the code somewhere could anybody take a look? it's a 150
>>>>>>>>>>> lines py script.
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Aug 10, 2016 at 12:10 PM, Francesco Di Mizio <
>>>>>>>>>>> francescodimizio at gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Thanks for that Pierre. I have it on on production and the
>>>>>>>>>>>> performances do not degrade indeed.
>>>>>>>>>>>>
>>>>>>>>>>>> buildbot at e2123a50d1fb:~$ cat profile.txt
>>>>>>>>>>>> No samples recorded.
>>>>>>>>>>>>
>>>>>>>>>>>> I need to study statprof a bit now ;)
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Aug 10, 2016 at 11:20 AM, Vasily <vassnlit at gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> One can also use Intel® VTune™, you know... :-)
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Vasily
>>>>>>>>>>>>> 09 авг. 2016 г. 23:56 пользователь "Pierre Tardy" <
>>>>>>>>>>>>> tardyp at gmail.com> написал:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> You can add following snippet to your master.cfg. It looks
>>>>>>>>>>>>>> statprof does not work on osx, so I cannot test it to the end, but I know
>>>>>>>>>>>>>> similar did work on my prod
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> import statprof
>>>>>>>>>>>>>> from  twisted.application.internet import TimerService
>>>>>>>>>>>>>> from buildbot.util.service import BuildbotService
>>>>>>>>>>>>>> class BuildbotTimerService(TimerService, BuildbotService):
>>>>>>>>>>>>>>     name = "timer"
>>>>>>>>>>>>>>     def __init__(self):
>>>>>>>>>>>>>>         BuildbotService.__init__(self)
>>>>>>>>>>>>>>         TimerService.__init__(self, 10, self.dump_stats)
>>>>>>>>>>>>>>     def dump_stats(self):
>>>>>>>>>>>>>>         statprof.stop()
>>>>>>>>>>>>>>         with open("profile.txt", "w") as f:
>>>>>>>>>>>>>>             statprof.display(f)
>>>>>>>>>>>>>>         statprof.start()
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> c['services'] = [ BuildbotTimerService() ]
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Le mar. 9 août 2016 à 21:44, Francesco Di Mizio <
>>>>>>>>>>>>>> francescodimizio at gmail.com> a écrit :
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Pierre,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> if you can enlighten on how you intend to use statprof, I
>>>>>>>>>>>>>>> will be happy to give it a try.
>>>>>>>>>>>>>>> This problem here is on top of my list right now as it's
>>>>>>>>>>>>>>> making my prod env unusable.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, Aug 9, 2016 at 8:25 PM, Pierre Tardy <
>>>>>>>>>>>>>>> tardyp at gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Please note that cprofile is very intrusive and will lead
>>>>>>>>>>>>>>>> to about 3x slower code.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> So I would not run that in prod. This is why I prefer
>>>>>>>>>>>>>>>> statprof which is using a non intrusive method to get the stats
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Le mar. 9 août 2016 19:15, Francesco Di Mizio <
>>>>>>>>>>>>>>>> francescodimizio at gmail.com> a écrit :
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On a side note I have managed to get twistd to dump a blob
>>>>>>>>>>>>>>>>> I can then load and analyze. Got to see if I can run this in production.
>>>>>>>>>>>>>>>>> If anybody is curious the following worked for me:
>>>>>>>>>>>>>>>>> twistd --savestats -n --profiler=cprofile
>>>>>>>>>>>>>>>>>  --profile=/crcdata/profile.stats -y ./buildbot.tac
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Tue, Aug 9, 2016 at 5:21 PM, Francesco Di Mizio <
>>>>>>>>>>>>>>>>> francescodimizio at gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Well that's expected to an extent - p4 poller is running
>>>>>>>>>>>>>>>>>> 'p4 changes' every pollinterval seconds.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Anyway just as a test I've tried to disable the poller
>>>>>>>>>>>>>>>>>> and still at times I am seeing the CPU spiking up to above 100%. I believe
>>>>>>>>>>>>>>>>>> I really need to profile this somehow.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Tue, Aug 9, 2016 at 5:08 PM, Dan Kegel <dank at kegel.com
>>>>>>>>>>>>>>>>>> > wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Also watch 'top' and see if poller processes are hogging
>>>>>>>>>>>>>>>>>>> resources.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>> users at buildbot.net
>>>>>>>>>>>>>> https://lists.buildbot.net/mailman/listinfo/users
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.buildbot.net/pipermail/users/attachments/20160817/2041651b/attachment.html>


More information about the users mailing list