<div dir="ltr">> Could this lead to such a long queue of deferred function calls that the  server gets bogged down handling them, not being able to even forward data  that are ready to the web front-end?<div><br></div><div>This kind of massive long queue would only happen when deferredLists or gatherResults are used which would make run in parallel.</div><div>But in the code you suggested, everything is nice and serial, which takes more time, but should not block everything.</div><div><br></div><div>Indeed, there are a lot of what is called N+1 requests in the change and sourcestamp code. </div><div>This is indeed not optimal, and leads to longer processing, but those requests are trivial and should not generate that much load.</div><div>Refactoring everything to make more table join would brake API layering, and complicated code, so this hasn't been done yet, as those are not supposed to generate massive load</div><div><br></div><div><br></div><div>Regard<br clear="all"><div><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature">Pierre</div></div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Le mer. 13 janv. 2021 à 15:42, Yngve N. Pettersen <<a href="mailto:yngve@vivaldi.com">yngve@vivaldi.com</a>> a écrit :<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi,<br>

<br>

I've been thinking a bit.<br>

<br>

I am not familiar with the defer system, so I don't know the deep details  <br>

about it, so please take the following speculation with some big spoons of  <br>

salt.<br>

<br>

What I started wondering about is, what happens with the deferring system  <br>

if the following happens?<br>

<br>

  - The list of builders is requested<br>

  - for each builder the 1000 most recent builds are requested<br>

  - for each build the change entries are requests<br>

  - for each build or change entry the source stamps are requested<br>

<br>

Could this lead to such a long queue of deferred function calls that the  <br>

server gets bogged down handling them, not being able to even forward data  <br>

that are ready to the web front-end?<br>

<br>

I currently suspect that there is a missing restriction (either numbers of  <br>

builds or time-range) on requesting information about the individual  <br>

builds.<br>

<br>

Requesting a list of 1000 builds for each builder is probably OK as the  <br>

query should be quick, but the question is how many of those builds one  <br>

should requests deep details of?<br>

<br>

Another, related aspect of what I saw in the DB log is that there were  <br>

many requests for single items, especially for source stamps and changes.  <br>

Maybe they could be gathered together into a single requests for X number  <br>

of records? That does require keeping a separate list/queue with the items  <br>

that has been requested, and might be complicated to implement.<br>

<br>

On Wed, 13 Jan 2021 11:32:37 +0100, Povilas Kanapickas <<a href="mailto:povilas@radix.lt" target="_blank">povilas@radix.lt</a>>  <br>

wrote:<br>

<br>

> Hi,<br>

><br>

> Looking into diff between 2.7 and 2.10, pretty much the only difference<br>

> in the existing data and DB APIs are related to changes. So I think it's<br>

> very likely there's something not obvious we didn't consider.<br>

><br>

> I will build a test case and see if I can reproduce the problem. Long<br>

> term I think we need to have tests to prevent this kind of issues. I'll<br>

> think about how this can be done.<br>

><br>

> Yngve, thanks a lot for your work investigating this issue.<br>

><br>

> Cheers,<br>

> Povilas<br>

><br>

> On 1/13/21 12:26 PM, Pierre Tardy wrote:<br>

>> Hello,<br>

>><br>

>> getPrevSuccessfulBuild is called by getChanges for build which in turn<br>

>> is called by /builds/NN/changes Rest API.<br>

>> the bug Vlad was referring to was a perf issue on the /changes API,<br>

>> which has been fixed a while back.<br>

>><br>

>> Indeed, this algorithm is far from optimized,but I don't see why this<br>

>> would lead to main thread blocking. Looking at the code, I see that<br>

>> there are no big loops that do not yield to the main reactor loop.<br>

>><br>

>> I insist on the buildbot profiler. What I was saying before is that you<br>

>> need to hit the record button before the problem appears, and put a<br>

>> large enough record time to be sure to catch a spike.<br>

>> Then, you will be able to zoom to the cpu spike and catch the issue<br>

>> precisely.<br>

>><br>

>> If the spike is in the order of minutes like you said, you can configure<br>

>> it like this and get enough samples to get enough evidence to where the<br>

>> code is actually spending time:<br>

>><br>

>> ProfilerService(frequency=500, gatherperiod=60 * 60, mode='virtual',  <br>

>> basepath=None, wantBuilds=100<br>

>><br>

>> This will record for one hour, and mitigate the memory used if you worry<br>

>> about it.<br>

>><br>

>> Pierre<br>

>><br>

>><br>

>> Le mer. 13 janv. 2021 à 11:01, Yngve N. Pettersen <<a href="mailto:yngve@vivaldi.com" target="_blank">yngve@vivaldi.com</a><br>

>> <mailto:<a href="mailto:yngve@vivaldi.com" target="_blank">yngve@vivaldi.com</a>>> a écrit :<br>

>><br>

>><br>

>>     Hi again,<br>

>><br>

>>     I was just able to get a partial database log from a freeze incident<br>

>>     when<br>

>>     refreshing the Builds->Builders page.<br>

>><br>

>>     It looks like Vlad is on the right track.<br>

>><br>

>>     There are a *lot* of individual source stamp requests, but also  <br>

>> requests<br>

>>     to the builds and change tables.<br>

>><br>

>>     An interesting part of the builds request is this request:<br>

>><br>

>>     SELECT <a href="http://builds.id" rel="noreferrer" target="_blank">builds.id</a> <<a href="http://builds.id" rel="noreferrer" target="_blank">http://builds.id</a>>, builds.number,<br>

>>     builds.builderid, builds.buildrequestid,<br>

>>     builds.workerid, builds.masterid, builds.started_at,  <br>

>> builds.complete_at,<br>

>>     builds.state_string, builds.results<br>

>>                  FROM builds<br>

>>                  WHERE builds.builderid = 51 AND builds.number < 46 AND<br>

>>     builds.results = 0 ORDER BY builds.complete_at DESC<br>

>>                   LIMIT 1000 OFFSET 0<br>

>><br>

>>     which appears to then be followed by a lot of changes and source  <br>

>> stamp<br>

>>     requests.<br>

>><br>

>>     The log contains a lot of these requests per second; according to  <br>

>> the DB<br>

>>     graph 200 to 400 per second.<br>

>><br>

>>     The 1000 limit appears to come from<br>

>>     db.builds.BuildsConnectorComponent.getPrevSuccessfulBuild(), but  <br>

>> that<br>

>>     value seems to have been that way for a while, so the problem is  <br>

>> likely<br>

>>     caused by something else. This function did show up at the beginning<br>

>>     of my<br>

>>     traces related to these freezes.<br>

>><br>

>><br>

>>     One possibility that I can think of, is that several of these pages,<br>

>>     or    the functions they are using, are no longer restricting how  <br>

>> far back<br>

>>     in    the build history they are fetching build information for.  <br>

>> E.g the    Builders page is only supposed to show a couple of days of  <br>

>> builds<br>

>>     for each    builder, so there should be no need to fetch data for a  <br>

>> 1000 builds    (making sure you have the build ids is one thing,  <br>

>> fetching all the    associated data even for builds that are not to be  <br>

>> displayed is<br>

>>     something    else).<br>

>><br>

>>     BTW, I have noticed that another page, the waterfall, is not<br>

>>     displaying    anything, even after waiting for a very long time.<br>

>><br>

>><br>

>>     On Wed, 13 Jan 2021 01:34:57 +0100, Yngve N. Pettersen<br>

>>     <<a href="mailto:yngve@vivaldi.com" target="_blank">yngve@vivaldi.com</a> <mailto:<a href="mailto:yngve@vivaldi.com" target="_blank">yngve@vivaldi.com</a>>><br>

>>     wrote:<br>

>><br>

>>     > Hi,<br>

>>     ><br>

>>     > Thanks for that info.<br>

>>     ><br>

>>     > In my case the problem is apparently something that happens now<br>

>>     and then.<br>

>>     ><br>

>>     > As mentioned, I have seen it on the Builds->Builders and<br>

>>     Builds->Workers    > pages, neither of which includes any changelog  <br>

>> access AFAIK.<br>

>>     ><br>

>>     > I have also seen it occasionally on individual build pages, which<br>

>>     has a    > log of steps with logs, and a changelog panel.<br>

>>     ><br>

>>     > Just a few minutes ago I saw this freeze/spike happen while the<br>

>>     buildbot    > manager was completely idle, since all active tasks  <br>

>> had completed,<br>

>>     and I    > had paused all workers since I needed to restart the  <br>

>> manager (due<br>

>>     to the    > hanging build).<br>

>>     ><br>

>>     > I have also had reports about the Grid view and Console pages<br>

>>     displaying    > this issue, but have not seen it myself.<br>

>>     ><br>

>>     > At present I have enabled logging in the postgresql server, so<br>

>>     maybe I    > can figure out what requests are handled during the  <br>

>> spike.<br>

>>     ><br>

>>     ><br>

>>     ><br>

>>     > On Wed, 13 Jan 2021 00:38:34 +0100, Vlad Bogolin <<a href="mailto:vlad@mariadb.org" target="_blank">vlad@mariadb.org</a><br>

>>     <mailto:<a href="mailto:vlad@mariadb.org" target="_blank">vlad@mariadb.org</a>>>    > wrote:<br>

>>     ><br>

>>     >> Hi,<br>

>>     >><br>

>>     >> I have experienced some similar interface freezes while trying to <br>

>>     >> configure<br>

>>     >> our version of buildbot. I now remember two cases:<br>

>>     >><br>

>>     >> 1) A "changes" API problem where it seemed that the "limit"<br>

>>     argument was<br>

>>     >> ignored in some cases which translated into a full changes table<br>

>>     scan.    >> This<br>

>>     >> was reproducible when hitting the "Builds > Last Changes"<br>

>>     dashboard and<br>

>>     >> then all the other pages were frozen. There are other requests to <br>

>>     >> changes,<br>

>>     >> so this may be related to the Builds page too. Also, this only<br>

>>     >> happened when the number of changes from the db was high. I was <br>

>>     >> planning on<br>

>>     >> submitting a proper fix, but we are running a custom version of<br>

>>     2.7.1    >> where<br>

>>     >> I implemented a fast workaround and did not managed to submit a<br>

>>     proper    >> fix<br>

>>     >> (hope to be able to do it next week).<br>

>>     >><br>

>>     >> 2) We experienced the same issue as you describe when a lot of  <br>

>> logs    >> where<br>

>>     >> coming (which seems to be your case too) and the master process  <br>

>> was<br>

>>     >> overwhelmed when multiple builds were running in the same time<br>

>>     (constant<br>

>>     >> CPU usage around ~120%). We solved the issue by switching to  <br>

>> multi    >> master<br>

>>     >> and limiting the amount of logs, but if you say that this was not<br>

>>     an    >> issue<br>

>>     >> in 2.7 I would really be interested in finding out what is the  <br>

>> root    >> cause<br>

>>     >> (I thought it was the high amount of logs). You can test this <br>

>>     >> hypothesis by<br>

>>     >> limiting the amount of running builds and see if the issue keeps<br>

>>     >> reproducing.<br>

>>     >><br>

>>     >> What worked for me in order to find out the "changes" API  <br>

>> problem was<br>

>>     >> visiting each dashboard and see if the freeze occurs or not.<br>

>>     >><br>

>>     >> Hope this helps!<br>

>>     >><br>

>>     >> Cheers,<br>

>>     >> Vlad<br>

>>     >><br>

>>     >> On Wed, Jan 13, 2021 at 12:40 AM Yngve N. Pettersen<br>

>>     <<a href="mailto:yngve@vivaldi.com" target="_blank">yngve@vivaldi.com</a> <mailto:<a href="mailto:yngve@vivaldi.com" target="_blank">yngve@vivaldi.com</a>>><br>

>>     >> wrote:<br>

>>     >><br>

>>     >>> On Tue, 12 Jan 2021 22:13:50 +0100, Pierre Tardy<br>

>>     <<a href="mailto:tardyp@gmail.com" target="_blank">tardyp@gmail.com</a> <mailto:<a href="mailto:tardyp@gmail.com" target="_blank">tardyp@gmail.com</a>>>    >>> wrote:<br>

>>     >>><br>

>>     >>> > Thanks for the update.<br>

>>     >>> ><br>

>>     >>> > Some random thoughts...<br>

>>     >>> ><br>

>>     >>> > You should probably leave the profiler open until you get the <br>

>>     >>> performance<br>

>>     >>> > spike.<br>

>>     >>> > If you are inside the spike when starting, indeed, you won't<br>

>>     be able    >>> to<br>

>>     >>> > start profiler, but if it is started before the spike it for<br>

>>     sure    >>> will<br>

>>     >>> > detect exactly where the code is.<br>

>>     >>><br>

>>     >>> I did have the profiler open in this latest case; as far as I<br>

>>     could    >>> tell<br>

>>     >>> it still didn't start recording until after the spike ended<br>

>>     (there was    >>> no<br>

>>     >>> progress information in the recorder line).<br>

>>     >>><br>

>>     >>> The two major items showing up were<br>

>>     >>><br>

>>     >>>    /buildbot/db/builds.py+91:getPrevSuccessfulBuild<br>

>>     >>>    /buildbot/db/pool.py+190:__thd<br>

>>     >>><br>

>>     >>> but I think they were recorded after the spike.<br>

>>     >>><br>

>>     >>> I am planning to activate more detailed logging in the  <br>

>> postgresql    >>> server,<br>

>>     >>> but have not done that yet (probably need to shut down and  <br>

>> restart<br>

>>     >>> buildbot when I do).<br>

>>     >>><br>

>>     >>><br>

>>     >>> BTW, I suspect that this issue can also cause trouble for builds<br>

>>     whose<br>

>>     >>> steps ends at the time the problem is occurring; I just noticed<br>

>>     a task<br>

>>     >>> that is still running more than 4 hours after it started a step  <br>

>> that<br>

>>     >>> should have been killed after 20 minutes if it was hanging. It<br>

>>     should<br>

>>     >>> have<br>

>>     >>> ended at about the time one of the hangs was occuring. And it is<br>

>>     >>> impossible to stop the task for some reason, even shutting down  <br>

>> the<br>

>>     >>> worker<br>

>>     >>> process did not work. AFAIK the only way to fix the issue is to<br>

>>     shut    >>> the<br>

>>     >>> buildbot manager down.<br>

>>     >>><br>

>>     >>> > statistic profiling will use timer interrupts which will  <br>

>> preempt    >>> anything<br>

>>     >>> > that is running, and make a call stack trace.<br>

>>     >>> ><br>

>>     >>> > Waiting for repro, if, from the db log, you manage to get the<br>

>>     info of<br>

>>     >>> > what<br>

>>     >>> > kind of db data that is, maybe we can narrow down the usual <br>

>>     >>> suspects..<br>

>>     >>> ><br>

>>     >>> > If there are lots of short selects like you said, usually, you<br>

>>     would<br>

>>     >>> > have a<br>

>>     >>> > back and forth from reactor thread to db thread, so it sounds<br>

>>     weird.<br>

>>     >>> > What can be leading to your behavior is that whatever is<br>

>>     halting the<br>

>>     >>> > processing, everything is queued up in between, and unqueued<br>

>>     when it    >>> is<br>

>>     >>> > finished, which could lead to spike of db actions in the end<br>

>>     of the<br>

>>     >>> > event.<br>

>>     >>><br>

>>     >>> The DB actions were going on for the entire 3 minutes that spike <br>

>>     >>> lasted;<br>

>>     >>> it is not a burst at either end, but a ~180 second long  <br>

>> continuous<br>

>>     >>> sequence (or barrage) of approximately 70-90000 transactions, if<br>

>>     I am<br>

>>     >>> interpreting the graph data correctly.<br>

>>     >>><br>

>>     >>> > Regards<br>

>>     >>> > Pierre<br>

>>     >>> ><br>

>>     >>> ><br>

>>     >>> > Le mar. 12 janv. 2021 à 21:49, Yngve N. Pettersen    >>>  <br>

>> <<a href="mailto:yngve@vivaldi.com" target="_blank">yngve@vivaldi.com</a> <mailto:<a href="mailto:yngve@vivaldi.com" target="_blank">yngve@vivaldi.com</a>>> a<br>

>>     >>> > écrit :<br>

>>     >>> ><br>

>>     >>> >> Hi again,<br>

>>     >>> >><br>

>>     >>> >> A bit of an update.<br>

>>     >>> >><br>

>>     >>> >> I have not been able to locate the issue using the profiler.<br>

>>     >>> >><br>

>>     >>> >> It seems that when Buildbot gets into the problematic mode,<br>

>>     then the<br>

>>     >>> >> profiler is not able to work at all. It only starts  <br>

>> collecting    >>> after the<br>

>>     >>> >> locked mode is resolved.<br>

>>     >>> >><br>

>>     >>> >> It does seem like the locked mode occurs when Buildbot is<br>

>>     fetching    >>> a lot<br>

>>     >>> >> of data from the DB and then spends a lot of time processing<br>

>>     that    >>> data,<br>

>>     >>> >> without yielding to other processing needs.<br>

>>     >>> >><br>

>>     >>> >> Looking at the monitoring of the server, it also appears that <br>

>>     >>> buildbot<br>

>>     >>> >> is<br>

>>     >>> >> fetching a lot of data. During the most recent instance, the <br>

>>     >>> returned<br>

>>     >>> >> tuples count in the graph for the server indicates three<br>

>>     minutes    >>> of, on<br>

>>     >>> >> average 25000 tuples returned, with spikes to 80K and 100K,  <br>

>> per    >>> second.<br>

>>     >>> >><br>

>>     >>> >> The number of open connections rose to 6 or 7, and the<br>

>>     transaction    >>> count<br>

>>     >>> >> was 400-500 per second during the whole time (rolled back <br>

>>     >>> transactions,<br>

>>     >>> >> which I assume is just one or more selects).<br>

>>     >>> >><br>

>>     >>> >> IMO this makes it look like, while requesting these data,<br>

>>     Buildbot    >>> is<br>

>>     >>> >> *synchronously* querying the DB and processing the returned<br>

>>     data,    >>> not<br>

>>     >>> >> yielding. It might also be that it is requesting data more  <br>

>> data    >>> than it<br>

>>     >>> >> needs, and also requesting other data earlier than it is  <br>

>> actually<br>

>>     >>> >> needed.<br>

>>     >>> >><br>

>>     >>> >><br>

>>     >>> >><br>

>>     >>> >> On Tue, 12 Jan 2021 12:48:40 +0100, Yngve N. Pettersen<br>

>>     >>> >> <<a href="mailto:yngve@vivaldi.com" target="_blank">yngve@vivaldi.com</a> <mailto:<a href="mailto:yngve@vivaldi.com" target="_blank">yngve@vivaldi.com</a>>><br>

>>     >>> >><br>

>>     >>> >> wrote:<br>

>>     >>> >><br>

>>     >>> >> > Hi,<br>

>>     >>> >> ><br>

>>     >>> >> > IIRC the only real processing in our system that might be<br>

>>     heavy is<br>

>>     >>> >> done<br>

>>     >>> >> > via logobserver.LineConsumerLogObserver in a class (now)<br>

>>     derived    >>> from<br>

>>     >>> >> > ShellCommandNewStyle, so if that is the issue, and<br>

>>     deferToThread    >>> is<br>

>>     >>> >> the<br>

>>     >>> >> > solution, then if it isn't already done, my suggestion<br>

>>     would be to<br>

>>     >>> >> > implement that inside the code handling the log observers.<br>

>>     >>> >> ><br>

>>     >>> >> > I've tested the profiler a little, but haven't seen any<br>

>>     samples    >>> within<br>

>>     >>> >> > our code so far, just inside buildbot, quite a lot of log  <br>

>> DB    >>> actions,<br>

>>     >>> >> > also some TLS activity.<br>

>>     >>> >> ><br>

>>     >>> >> > The performance issue for those pages seems to be a bit<br>

>>     flaky; at<br>

>>     >>> >> > present its not happening AFAICT<br>

>>     >>> >> ><br>

>>     >>> >> > On Tue, 12 Jan 2021 10:59:42 +0100, Pierre Tardy    >>>  <br>

>> <<a href="mailto:tardyp@gmail.com" target="_blank">tardyp@gmail.com</a> <mailto:<a href="mailto:tardyp@gmail.com" target="_blank">tardyp@gmail.com</a>>><br>

>>     >>> >> > wrote:<br>

>>     >>> >> ><br>

>>     >>> >> >> Hello,<br>

>>     >>> >> >><br>

>>     >>> >> >> A lot of things happen between 2.7 and 2.10, although I<br>

>>     don't see<br>

>>     >>> >> >> anything<br>

>>     >>> >> >> which could impact the performance that much. (maybe new<br>

>>     reporter<br>

>>     >>> >> >> framework, but really not convinced)<br>

>>     >>> >> >> If you see that the db is underutilized this must be a<br>

>>     classical<br>

>>     >>> >> reactor<br>

>>     >>> >> >> starvation.<br>

>>     >>> >> >> With asynchronous systems like buildbot, you shouldn't do<br>

>>     any    >>> heavy<br>

>>     >>> >> >> computation in the main event loop thread, those must be<br>

>>     done in    >>> a<br>

>>     >>> >> >> thread<br>

>>     >>> >> >> via deferToThread and co.<br>

>>     >>> >> >><br>

>>     >>> >> >> Those are the common issues you can have with performance<br>

>>     >>> >> >> independantly from upgrade regressions:<br>

>>     >>> >> >><br>

>>     >>> >> >> 1) Custom steps:<br>

>>     >>> >> >> A lot of time, we see people struggling with performance<br>

>>     when    >>> they<br>

>>     >>> >> just<br>

>>     >>> >> >> have some custom step doing heavy computation that block<br>

>>     the main<br>

>>     >>> >> thread<br>

>>     >>> >> >> constantly, preventing all the very quick tasks to run in  <br>

>> //.<br>

>>     >>> >> >><br>

>>     >>> >> >> 2) too much logs<br>

>>     >>> >> >>  In this case, there is not much to do beside reducing  <br>

>> the log<br>

>>     >>> >> amount.<br>

>>     >>> >> >> This<br>

>>     >>> >> >> would be the time to switch to a multi-master setup, where<br>

>>     you    >>> put 2<br>

>>     >>> >> >> masters for builds, and one master for web UI.<br>

>>     >>> >> >> You can put those in the same machine/VM, no problem, the<br>

>>     only    >>> work<br>

>>     >>> >> is<br>

>>     >>> >> >> to<br>

>>     >>> >> >> have separate processes that each have several event<br>

>>     queues. You<br>

>>     >>> can<br>

>>     >>> >> use<br>

>>     >>> >> >> docker-compose or kubernetes in order to more easily<br>

>>     create such<br>

>>     >>> >> >> deployment. We don't have readily useable for that, but<br>

>>     several<br>

>>     >>> >> people<br>

>>     >>> >> >> have<br>

>>     >>> >> >> done and documented it, for example<br>

>>     >>> >> >> <a href="https://github.com/pop/buildbot-on-kubernetes" rel="noreferrer" target="_blank">https://github.com/pop/buildbot-on-kubernetes</a><br>

>>     >>> >> >><br>

>>     >>> >> >><br>

>>     >>> >> >> I have developed the buildbot profiler in order to quickly<br>

>>     find<br>

>>     >>> >> those.<br>

>>     >>> >> >> You<br>

>>     >>> >> >> just have to install it as a plugin and start a profile<br>

>>     whenever    >>> the<br>

>>     >>> >> >> buildbot feels slow.<br>

>>     >>> >> >> It is a statistical profiler, so it will not significantly <br>

>>     >>> change the<br>

>>     >>> >> >> actual performance so it is safe to run in production.<br>

>>     >>> >> >><br>

>>     >>> >> >> <a href="https://pypi.org/project/buildbot-profiler/" rel="noreferrer" target="_blank">https://pypi.org/project/buildbot-profiler/</a><br>

>>     >>> >> >><br>

>>     >>> >> >><br>

>>     >>> >> >> Regards,<br>

>>     >>> >> >> Pierre<br>

>>     >>> >> >><br>

>>     >>> >> >><br>

>>     >>> >> >> Le mar. 12 janv. 2021 à 01:29, Yngve N. Pettersen<br>

>>     >>> >> <<a href="mailto:yngve@vivaldi.com" target="_blank">yngve@vivaldi.com</a> <mailto:<a href="mailto:yngve@vivaldi.com" target="_blank">yngve@vivaldi.com</a>>> a<br>

>>     >>> >> >> écrit :<br>

>>     >>> >> >><br>

>>     >>> >> >>> Hello all,<br>

>>     >>> >> >>><br>

>>     >>> >> >>> We have just upgraded our buildbot system from 2.7 to  <br>

>> 2.10.<br>

>>     >>> >> >>><br>

>>     >>> >> >>> However, I am noticing performance issues when loading<br>

>>     these    >>> pages:<br>

>>     >>> >> >>><br>

>>     >>> >> >>>   Builds->Builders<br>

>>     >>> >> >>>   Builds->Workers<br>

>>     >>> >> >>>   individual builds<br>

>>     >>> >> >>><br>

>>     >>> >> >>> Loading these can take several minutes, although there  <br>

>> are    >>> periods<br>

>>     >>> >> of<br>

>>     >>> >> >>> immediate responses.<br>

>>     >>> >> >>><br>

>>     >>> >> >>> What I am seeing on the buildbot manager machine is that<br>

>>     the    >>> Python3<br>

>>     >>> >> >>> process hits 90-100% for the entire period.<br>

>>     >>> >> >>><br>

>>     >>> >> >>> The Python version is 3.6.9 running on Ubuntu 18.04<br>

>>     >>> >> >>><br>

>>     >>> >> >>> As far as I can tell, the Postgresql database is mostly<br>

>>     idle    >>> during<br>

>>     >>> >> >>> this<br>

>>     >>> >> >>> period. I did do a full vacuum a few hours ago, in case<br>

>>     that    >>> was the<br>

>>     >>> >> >>> issue.<br>

>>     >>> >> >>><br>

>>     >>> >> >>> There are about 40 builders, and 30 workers in the<br>

>>     system, only<br>

>>     >>> >> about<br>

>>     >>> >> >>> 10-15 of these have a 10-20 builds for the past few days, <br>

>>     >>> although<br>

>>     >>> >> most<br>

>>     >>> >> >>> of<br>

>>     >>> >> >>> these have active histories of 3000 builds (which do  <br>

>> make me<br>

>>     >>> wonder<br>

>>     >>> >> if<br>

>>     >>> >> >>> the<br>

>>     >>> >> >>> problem could be a lack of limiting the DB queries, at<br>

>>     present I<br>

>>     >>> >> have<br>

>>     >>> >> >>> not<br>

>>     >>> >> >>> inspected the DB queries).<br>

>>     >>> >> >>><br>

>>     >>> >> >>> The individual builds can have very large log files in<br>

>>     the build<br>

>>     >>> >> steps,<br>

>>     >>> >> >>> in<br>

>>     >>> >> >>> many cases tens of thousands of lines (we _are_ talking<br>

>>     about a<br>

>>     >>> >> >>> Chromium<br>

>>     >>> >> >>> based project).<br>

>>     >>> >> >>><br>

>>     >>> >> >>> Our changes in the builders and workers JS code are<br>

>>     minimal (we    >>> are<br>

>>     >>> >> >>> using<br>

>>     >>> >> >>> a custom build of www-base), just using different<br>

>>     information    >>> for<br>

>>     >>> >> the<br>

>>     >>> >> >>> build labels (build version number), and grouping the<br>

>>     builders,<br>

>>     >>> >> which<br>

>>     >>> >> >>> should not be causing any performance issues. (we have  <br>

>> larger<br>

>>     >>> >> changes<br>

>>     >>> >> >>> in<br>

>>     >>> >> >>> the individual builder view, where we include Git commit <br>

>>     >>> messages,<br>

>>     >>> >> and<br>

>>     >>> >> >>> I<br>

>>     >>> >> >>> have so far not seen any performance issues there)<br>

>>     >>> >> >>><br>

>>     >>> >> >>> BTW: The line plots for build time and successes on<br>

>>     builders    >>> seems<br>

>>     >>> >> to<br>

>>     >>> >> >>> be<br>

>>     >>> >> >>> MIA. Not sure if that is an upstream issue, or due to<br>

>>     something    >>> in<br>

>>     >>> >> our<br>

>>     >>> >> >>> www-base build.<br>

>>     >>> >> >>><br>

>>     >>> >> >>> Do you have any suggestions for where to look for the<br>

>>     cause of    >>> the<br>

>>     >>> >> >>> problem?<br>

>>     >>> >> >>><br>

>>     >>> >> >>><br>

>>     >>> >> >>> --<br>

>>     >>> >> >>> Sincerely,<br>

>>     >>> >> >>> Yngve N. Pettersen<br>

>>     >>> >> >>> Vivaldi Technologies AS<br>

>>     >>> >> >>> _______________________________________________<br>

>>     >>> >> >>> users mailing list<br>

>>     >>> >> >>> <a href="mailto:users@buildbot.net" target="_blank">users@buildbot.net</a> <mailto:<a href="mailto:users@buildbot.net" target="_blank">users@buildbot.net</a>><br>

>>     >>> >> >>> <a href="https://lists.buildbot.net/mailman/listinfo/users" rel="noreferrer" target="_blank">https://lists.buildbot.net/mailman/listinfo/users</a><br>

>>     >>> >> >>><br>

>>     >>> >> ><br>

>>     >>> >> ><br>

>>     >>> >><br>

>>     >>> >><br>

>>     >>> >> --<br>

>>     >>> >> Sincerely,<br>

>>     >>> >> Yngve N. Pettersen<br>

>>     >>> >> Vivaldi Technologies AS<br>

>>     >>> >><br>

>>     >>><br>

>>     >>><br>

>>     >>> --<br>

>>     >>> Sincerely,<br>

>>     >>> Yngve N. Pettersen<br>

>>     >>> Vivaldi Technologies AS<br>

>>     >>> _______________________________________________<br>

>>     >>> users mailing list<br>

>>     >>> <a href="mailto:users@buildbot.net" target="_blank">users@buildbot.net</a> <mailto:<a href="mailto:users@buildbot.net" target="_blank">users@buildbot.net</a>><br>

>>     >>> <a href="https://lists.buildbot.net/mailman/listinfo/users" rel="noreferrer" target="_blank">https://lists.buildbot.net/mailman/listinfo/users</a><br>

>>     >>><br>

>>     ><br>

>>     ><br>

>><br>

>><br>

>>     --<br>

>>     Sincerely,<br>

>>     Yngve N. Pettersen<br>

>>     Vivaldi Technologies AS<br>

>><br>

>><br>

>> _______________________________________________<br>

>> users mailing list<br>

>> <a href="mailto:users@buildbot.net" target="_blank">users@buildbot.net</a><br>

>> <a href="https://lists.buildbot.net/mailman/listinfo/users" rel="noreferrer" target="_blank">https://lists.buildbot.net/mailman/listinfo/users</a><br>

>><br>

><br>

<br>

<br>

-- <br>

Sincerely,<br>

Yngve N. Pettersen<br>

Vivaldi Technologies AS<br>

</blockquote></div>