[users at bb.net] Performance issues in 2.10

Pierre Tardy tardyp at gmail.com
Thu Jan 14 09:07:32 UTC 2021


(Back to mailing list)

Thanks for this data. Here is how you would read it:


[image: image.png]
In the left, you get the main thread activity. You can say that because the
bottom frame(s) are about startReactor.
In the right, you get what happens in the threads. You can say that because
we see threading.py in the last bottom frame.

You can see that the reactor thread does pretty much nothing:
[image: image.png]
doPoll is just the reactor waiting for events (that's 1 third of the
samples)
then the left part is about getting sourcestams, and waiting the dbthread.
then the right part is log management, which is usually the major part of a
buildbot load.

On the thread side, this is pretty similar picture. We can see that most of
the load is managing source stamps.
[image: image.png]
So frankly, I wouldn't say this is an obvious profile of a blocked buildbot
master.

The reactor does pretty much nothing. Reading 300 sourcestamps requests by
second from the db shouldn't be that much of a problem with modern HW.
We can see that the profiler itself has difficulties to run in time, as we
see some profiler stacks.
BTW, I gave wrong advice last time, I told you to put 500HZ frequency while
I meant 500ms timer. we should rather put a 2HZ to have 2 sample per second
and per thread. This may avoid to disrupt too much the measurment
(Heisenberg principle).

So I thought about memory or disk starvation. Do you have any other load in
the master that would starve CPU ressources?
I can see that there is 28GB memory free so no memory starvation. but the
profiler do not output the loadaverage.
Do we have db connection pool or threadpool starvation?
Do we have disk io or network io starvation?

Regards
Pierre


Le jeu. 14 janv. 2021 à 00:09, Yngve N. Pettersen <yngve at vivaldi.com> a
écrit :

> Hi,
>
> I finally got a trace of the problem area.
>
> I wasn't able to make sense of the profile, maybe you can? zip attached
>
> The area of interest is the ~400 second segment towards the end, which
> matches the bump in the PNG.
>
> This happened while reloading the Build>Builders page after loading a
> build and then returning to the previous point in the history. Loading of
> the page took about 5 minutes of the 7 minutes of high load.
>
> The Python process hit ~120% CPU
>
>
> On Wed, 13 Jan 2021 18:35:26 +0100, Pierre Tardy <tardyp at gmail.com> wrote:
>
> >> Regarding the profiler, I also had some problems running it for a longer
> > time, so if you find a fix, please share it with me.
> > It was a bytes versus string issue indeed in the python parts.
> >
> > I did struggle to build the project again as expected, but managed to
> > hack
> > something eventually.
> >
> > see https://pypi.org/project/buildbot-profiler/1.3.1/
> >
> > I did verify the profile duration is now working.
> >
> > Regards
> > Pierre
> >
> >
> > Le mer. 13 janv. 2021 à 12:49, Vlad Bogolin <vlad at mariadb.org> a écrit :
> >
> >> Hi,
> >>
> >> So the changes issue I was referring to doesn't seem to be fixed (I
> >> checked the latest Buildbot code) and looked back over my changes. The
> >> problem that I identified is here
> >>
> https://github.com/buildbot/buildbot/blob/9f1cac1d3bb61baa0b6c836cc18812a64cfa9c2b/master/buildbot/data/resultspec.py#L320
> >> because is some cases there would be an unmatched_filter or
> >> unmatched_order. *So if this is indeed the case, you should see the
> >> warning in the buildbot log file:*
> >>
> >> "Warning: limited data api query is not backed by db because of
> >> following
> >> filters..." as defined here
> >>
> https://github.com/buildbot/buildbot/blob/9f1cac1d3bb61baa0b6c836cc18812a64cfa9c2b/master/buildbot/data/resultspec.py#L322
> >>
> >> There are two reasons why this happens. One, an incomplete definition of
> >> the fieldMapping for the data/changes.py and this would be the fix:
> >>
> >> +++ b/master/buildbot/data/changes.py
> >> @@ -42,6 +42,19 @@ class FixerMixin:
> >>          return change
> >>      fieldMapping = {
> >>          'changeid': 'changes.id',
> >> +        'author': 'changes.author',
> >> +        'committer': 'changes.committer',
> >> +        'comments': 'changes.comments',
> >> +        'branch': 'changes.branch',
> >> +        'revision': 'changes.revision',
> >> +        'revlink': 'changes.revlink',
> >> +        'when_timestamp': 'changes.when_timestamp',
> >> +        'category': 'changes.category',
> >> +        'repository': 'changes.repository',
> >> +        'codebase': 'changes.codebase',
> >> +        'project': 'changes.project',
> >> +        'sourcestampid': 'changes.sourcestampid',
> >> +        'parent_changeids': 'changes.parent_changeids',
> >>      }
> >>
> >> and two not having all the columns in the select statement in order to
> >> be
> >> able to check for matched/unmatched filtering or ordering as it is done
> >> here
> >>
> https://github.com/buildbot/buildbot/blob/9f1cac1d3bb61baa0b6c836cc18812a64cfa9c2b/master/buildbot/data/resultspec.py#L269
> .
> >> My solution was to get all the fields from the changes table from the
> >> database (don't know if this is the best approach) as you can see here:
> >>
> >> +++ b/master/buildbot/db/changes.py
> >> @@ -249,7 +249,7 @@ class
> >> ChangesConnectorComponent(base.DBConnectorComponent):
> >>          def thd(conn):
> >>              # get the changeids from the 'changes' table
> >>              changes_tbl = self.db.model.changes
> >> -            q = sa.select([changes_tbl.c.changeid])
> >> +            q = changes_tbl.select()
> >>
> >> However, since only the changeid would be needed, don't sure if this
> >> would
> >> be the right approach. Also, I would suspect some missing tests
> because
> >> I
> >> would expect this to be a pretty common use case.
> >>
> >> Regarding the profiler, I also had some problems running it for a longer
> >> time, so if you find a fix, please share it with me.
> >>
> >> Cheers,
> >> Vlad
> >>
> >> On Wed, Jan 13, 2021 at 1:33 PM Pierre Tardy <tardyp at gmail.com> wrote:
> >>
> >>>
> >>> I am not sure why this wouldn't work. I vaguely recall there was an
> >>> issue
> >>> there, but can't figure it out staring at the code.
> >>> As this is quite ancient, I am not sure the JS part will even build
> >>> anymore :-/
> >>>
> >>> You can change the default values at that line in the python
> >>>
> >>>
> https://github.com/tardyp/buildbot_profiler/blob/master/buildbot_profiler/api.py#L193
> >>> I think you will be able to force them by editing this file inside your
> >>> virtualenv..
> >>>
> >>>
> >>> Regards
> >>> Pierre
> >>>
> >>>
> >>> Le mer. 13 janv. 2021 à 11:46, Yngve N. Pettersen <yngve at vivaldi.com>
> a
> >>> écrit :
> >>>
> >>>> > I insist on the buildbot profiler. What I was saying before is
> that
> >>>> you
> >>>> > need to hit the record button before the problem appears, and put a
> >>>> large
> >>>> > enough record time to be sure to catch a spike.
> >>>> > Then, you will be able to zoom to the cpu spike and catch the issue
> >>>> > precisely.
> >>>> >
> >>>> > If the spike is in the order of minutes like you said, you can
> >>>> configure
> >>>> > it
> >>>> > like this and get enough samples to get enough evidence to where the
> >>>> code
> >>>> > is actually spending time:
> >>>> >
> >>>> > ProfilerService(frequency=500, gatherperiod=60 * 60, mode='virtual',
> >>>> > basepath=None, wantBuilds=100
> >>>>
> >>>> I tried configuring this with the settings dropdown in the WebGUI
> >>>> plugin,
> >>>> but AFAICT it is not working, it only gathers info for 30 seconds.
> >>>>
> >>>> I guess I must be holding it incorrectly.
> >>>>
> >>>> > This will record for one hour, and mitigate the memory used if you
> >>>> worry
> >>>> > about it.
> >>>>
> >>>> --
> >>>> Sincerely,
> >>>> Yngve N. Pettersen
> >>>> Vivaldi Technologies AS
> >>>>
> >>>
> >>
> >> --
> >> Vlad
> >>
>
>
> --
> Sincerely,
> Yngve N. Pettersen
> Vivaldi Technologies AS
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.buildbot.net/pipermail/users/attachments/20210114/c57a4ad0/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 72308 bytes
Desc: not available
URL: <http://lists.buildbot.net/pipermail/users/attachments/20210114/c57a4ad0/attachment-0003.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 110992 bytes
Desc: not available
URL: <http://lists.buildbot.net/pipermail/users/attachments/20210114/c57a4ad0/attachment-0004.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 46950 bytes
Desc: not available
URL: <http://lists.buildbot.net/pipermail/users/attachments/20210114/c57a4ad0/attachment-0005.png>


More information about the users mailing list