<div dir="ltr">> Could this lead to such a long queue of deferred function calls that the server gets bogged down handling them, not being able to even forward data that are ready to the web front-end?<div><br></div><div>This kind of massive long queue would only happen when deferredLists or gatherResults are used which would make run in parallel.</div><div>But in the code you suggested, everything is nice and serial, which takes more time, but should not block everything.</div><div><br></div><div>Indeed, there are a lot of what is called N+1 requests in the change and sourcestamp code. </div><div>This is indeed not optimal, and leads to longer processing, but those requests are trivial and should not generate that much load.</div><div>Refactoring everything to make more table join would brake API layering, and complicated code, so this hasn't been done yet, as those are not supposed to generate massive load</div><div><br></div><div><br></div><div>Regard<br clear="all"><div><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature">Pierre</div></div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Le mer. 13 janv. 2021 à 15:42, Yngve N. Pettersen <<a href="mailto:yngve@vivaldi.com">yngve@vivaldi.com</a>> a écrit :<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi,<br>
<br>
I've been thinking a bit.<br>
<br>
I am not familiar with the defer system, so I don't know the deep details <br>
about it, so please take the following speculation with some big spoons of <br>
salt.<br>
<br>
What I started wondering about is, what happens with the deferring system <br>
if the following happens?<br>
<br>
- The list of builders is requested<br>
- for each builder the 1000 most recent builds are requested<br>
- for each build the change entries are requests<br>
- for each build or change entry the source stamps are requested<br>
<br>
Could this lead to such a long queue of deferred function calls that the <br>
server gets bogged down handling them, not being able to even forward data <br>
that are ready to the web front-end?<br>
<br>
I currently suspect that there is a missing restriction (either numbers of <br>
builds or time-range) on requesting information about the individual <br>
builds.<br>
<br>
Requesting a list of 1000 builds for each builder is probably OK as the <br>
query should be quick, but the question is how many of those builds one <br>
should requests deep details of?<br>
<br>
Another, related aspect of what I saw in the DB log is that there were <br>
many requests for single items, especially for source stamps and changes. <br>
Maybe they could be gathered together into a single requests for X number <br>
of records? That does require keeping a separate list/queue with the items <br>
that has been requested, and might be complicated to implement.<br>
<br>
On Wed, 13 Jan 2021 11:32:37 +0100, Povilas Kanapickas <<a href="mailto:povilas@radix.lt" target="_blank">povilas@radix.lt</a>> <br>
wrote:<br>
<br>
> Hi,<br>
><br>
> Looking into diff between 2.7 and 2.10, pretty much the only difference<br>
> in the existing data and DB APIs are related to changes. So I think it's<br>
> very likely there's something not obvious we didn't consider.<br>
><br>
> I will build a test case and see if I can reproduce the problem. Long<br>
> term I think we need to have tests to prevent this kind of issues. I'll<br>
> think about how this can be done.<br>
><br>
> Yngve, thanks a lot for your work investigating this issue.<br>
><br>
> Cheers,<br>
> Povilas<br>
><br>
> On 1/13/21 12:26 PM, Pierre Tardy wrote:<br>
>> Hello,<br>
>><br>
>> getPrevSuccessfulBuild is called by getChanges for build which in turn<br>
>> is called by /builds/NN/changes Rest API.<br>
>> the bug Vlad was referring to was a perf issue on the /changes API,<br>
>> which has been fixed a while back.<br>
>><br>
>> Indeed, this algorithm is far from optimized,but I don't see why this<br>
>> would lead to main thread blocking. Looking at the code, I see that<br>
>> there are no big loops that do not yield to the main reactor loop.<br>
>><br>
>> I insist on the buildbot profiler. What I was saying before is that you<br>
>> need to hit the record button before the problem appears, and put a<br>
>> large enough record time to be sure to catch a spike.<br>
>> Then, you will be able to zoom to the cpu spike and catch the issue<br>
>> precisely.<br>
>><br>
>> If the spike is in the order of minutes like you said, you can configure<br>
>> it like this and get enough samples to get enough evidence to where the<br>
>> code is actually spending time:<br>
>><br>
>> ProfilerService(frequency=500, gatherperiod=60 * 60, mode='virtual', <br>
>> basepath=None, wantBuilds=100<br>
>><br>
>> This will record for one hour, and mitigate the memory used if you worry<br>
>> about it.<br>
>><br>
>> Pierre<br>
>><br>
>><br>
>> Le mer. 13 janv. 2021 à 11:01, Yngve N. Pettersen <<a href="mailto:yngve@vivaldi.com" target="_blank">yngve@vivaldi.com</a><br>
>> <mailto:<a href="mailto:yngve@vivaldi.com" target="_blank">yngve@vivaldi.com</a>>> a écrit :<br>
>><br>
>><br>
>> Hi again,<br>
>><br>
>> I was just able to get a partial database log from a freeze incident<br>
>> when<br>
>> refreshing the Builds->Builders page.<br>
>><br>
>> It looks like Vlad is on the right track.<br>
>><br>
>> There are a *lot* of individual source stamp requests, but also <br>
>> requests<br>
>> to the builds and change tables.<br>
>><br>
>> An interesting part of the builds request is this request:<br>
>><br>
>> SELECT <a href="http://builds.id" rel="noreferrer" target="_blank">builds.id</a> <<a href="http://builds.id" rel="noreferrer" target="_blank">http://builds.id</a>>, builds.number,<br>
>> builds.builderid, builds.buildrequestid,<br>
>> builds.workerid, builds.masterid, builds.started_at, <br>
>> builds.complete_at,<br>
>> builds.state_string, builds.results<br>
>> FROM builds<br>
>> WHERE builds.builderid = 51 AND builds.number < 46 AND<br>
>> builds.results = 0 ORDER BY builds.complete_at DESC<br>
>> LIMIT 1000 OFFSET 0<br>
>><br>
>> which appears to then be followed by a lot of changes and source <br>
>> stamp<br>
>> requests.<br>
>><br>
>> The log contains a lot of these requests per second; according to <br>
>> the DB<br>
>> graph 200 to 400 per second.<br>
>><br>
>> The 1000 limit appears to come from<br>
>> db.builds.BuildsConnectorComponent.getPrevSuccessfulBuild(), but <br>
>> that<br>
>> value seems to have been that way for a while, so the problem is <br>
>> likely<br>
>> caused by something else. This function did show up at the beginning<br>
>> of my<br>
>> traces related to these freezes.<br>
>><br>
>><br>
>> One possibility that I can think of, is that several of these pages,<br>
>> or the functions they are using, are no longer restricting how <br>
>> far back<br>
>> in the build history they are fetching build information for. <br>
>> E.g the Builders page is only supposed to show a couple of days of <br>
>> builds<br>
>> for each builder, so there should be no need to fetch data for a <br>
>> 1000 builds (making sure you have the build ids is one thing, <br>
>> fetching all the associated data even for builds that are not to be <br>
>> displayed is<br>
>> something else).<br>
>><br>
>> BTW, I have noticed that another page, the waterfall, is not<br>
>> displaying anything, even after waiting for a very long time.<br>
>><br>
>><br>
>> On Wed, 13 Jan 2021 01:34:57 +0100, Yngve N. Pettersen<br>
>> <<a href="mailto:yngve@vivaldi.com" target="_blank">yngve@vivaldi.com</a> <mailto:<a href="mailto:yngve@vivaldi.com" target="_blank">yngve@vivaldi.com</a>>><br>
>> wrote:<br>
>><br>
>> > Hi,<br>
>> ><br>
>> > Thanks for that info.<br>
>> ><br>
>> > In my case the problem is apparently something that happens now<br>
>> and then.<br>
>> ><br>
>> > As mentioned, I have seen it on the Builds->Builders and<br>
>> Builds->Workers > pages, neither of which includes any changelog <br>
>> access AFAIK.<br>
>> ><br>
>> > I have also seen it occasionally on individual build pages, which<br>
>> has a > log of steps with logs, and a changelog panel.<br>
>> ><br>
>> > Just a few minutes ago I saw this freeze/spike happen while the<br>
>> buildbot > manager was completely idle, since all active tasks <br>
>> had completed,<br>
>> and I > had paused all workers since I needed to restart the <br>
>> manager (due<br>
>> to the > hanging build).<br>
>> ><br>
>> > I have also had reports about the Grid view and Console pages<br>
>> displaying > this issue, but have not seen it myself.<br>
>> ><br>
>> > At present I have enabled logging in the postgresql server, so<br>
>> maybe I > can figure out what requests are handled during the <br>
>> spike.<br>
>> ><br>
>> ><br>
>> ><br>
>> > On Wed, 13 Jan 2021 00:38:34 +0100, Vlad Bogolin <<a href="mailto:vlad@mariadb.org" target="_blank">vlad@mariadb.org</a><br>
>> <mailto:<a href="mailto:vlad@mariadb.org" target="_blank">vlad@mariadb.org</a>>> > wrote:<br>
>> ><br>
>> >> Hi,<br>
>> >><br>
>> >> I have experienced some similar interface freezes while trying to <br>
>> >> configure<br>
>> >> our version of buildbot. I now remember two cases:<br>
>> >><br>
>> >> 1) A "changes" API problem where it seemed that the "limit"<br>
>> argument was<br>
>> >> ignored in some cases which translated into a full changes table<br>
>> scan. >> This<br>
>> >> was reproducible when hitting the "Builds > Last Changes"<br>
>> dashboard and<br>
>> >> then all the other pages were frozen. There are other requests to <br>
>> >> changes,<br>
>> >> so this may be related to the Builds page too. Also, this only<br>
>> >> happened when the number of changes from the db was high. I was <br>
>> >> planning on<br>
>> >> submitting a proper fix, but we are running a custom version of<br>
>> 2.7.1 >> where<br>
>> >> I implemented a fast workaround and did not managed to submit a<br>
>> proper >> fix<br>
>> >> (hope to be able to do it next week).<br>
>> >><br>
>> >> 2) We experienced the same issue as you describe when a lot of <br>
>> logs >> where<br>
>> >> coming (which seems to be your case too) and the master process <br>
>> was<br>
>> >> overwhelmed when multiple builds were running in the same time<br>
>> (constant<br>
>> >> CPU usage around ~120%). We solved the issue by switching to <br>
>> multi >> master<br>
>> >> and limiting the amount of logs, but if you say that this was not<br>
>> an >> issue<br>
>> >> in 2.7 I would really be interested in finding out what is the <br>
>> root >> cause<br>
>> >> (I thought it was the high amount of logs). You can test this <br>
>> >> hypothesis by<br>
>> >> limiting the amount of running builds and see if the issue keeps<br>
>> >> reproducing.<br>
>> >><br>
>> >> What worked for me in order to find out the "changes" API <br>
>> problem was<br>
>> >> visiting each dashboard and see if the freeze occurs or not.<br>
>> >><br>
>> >> Hope this helps!<br>
>> >><br>
>> >> Cheers,<br>
>> >> Vlad<br>
>> >><br>
>> >> On Wed, Jan 13, 2021 at 12:40 AM Yngve N. Pettersen<br>
>> <<a href="mailto:yngve@vivaldi.com" target="_blank">yngve@vivaldi.com</a> <mailto:<a href="mailto:yngve@vivaldi.com" target="_blank">yngve@vivaldi.com</a>>><br>
>> >> wrote:<br>
>> >><br>
>> >>> On Tue, 12 Jan 2021 22:13:50 +0100, Pierre Tardy<br>
>> <<a href="mailto:tardyp@gmail.com" target="_blank">tardyp@gmail.com</a> <mailto:<a href="mailto:tardyp@gmail.com" target="_blank">tardyp@gmail.com</a>>> >>> wrote:<br>
>> >>><br>
>> >>> > Thanks for the update.<br>
>> >>> ><br>
>> >>> > Some random thoughts...<br>
>> >>> ><br>
>> >>> > You should probably leave the profiler open until you get the <br>
>> >>> performance<br>
>> >>> > spike.<br>
>> >>> > If you are inside the spike when starting, indeed, you won't<br>
>> be able >>> to<br>
>> >>> > start profiler, but if it is started before the spike it for<br>
>> sure >>> will<br>
>> >>> > detect exactly where the code is.<br>
>> >>><br>
>> >>> I did have the profiler open in this latest case; as far as I<br>
>> could >>> tell<br>
>> >>> it still didn't start recording until after the spike ended<br>
>> (there was >>> no<br>
>> >>> progress information in the recorder line).<br>
>> >>><br>
>> >>> The two major items showing up were<br>
>> >>><br>
>> >>> /buildbot/db/builds.py+91:getPrevSuccessfulBuild<br>
>> >>> /buildbot/db/pool.py+190:__thd<br>
>> >>><br>
>> >>> but I think they were recorded after the spike.<br>
>> >>><br>
>> >>> I am planning to activate more detailed logging in the <br>
>> postgresql >>> server,<br>
>> >>> but have not done that yet (probably need to shut down and <br>
>> restart<br>
>> >>> buildbot when I do).<br>
>> >>><br>
>> >>><br>
>> >>> BTW, I suspect that this issue can also cause trouble for builds<br>
>> whose<br>
>> >>> steps ends at the time the problem is occurring; I just noticed<br>
>> a task<br>
>> >>> that is still running more than 4 hours after it started a step <br>
>> that<br>
>> >>> should have been killed after 20 minutes if it was hanging. It<br>
>> should<br>
>> >>> have<br>
>> >>> ended at about the time one of the hangs was occuring. And it is<br>
>> >>> impossible to stop the task for some reason, even shutting down <br>
>> the<br>
>> >>> worker<br>
>> >>> process did not work. AFAIK the only way to fix the issue is to<br>
>> shut >>> the<br>
>> >>> buildbot manager down.<br>
>> >>><br>
>> >>> > statistic profiling will use timer interrupts which will <br>
>> preempt >>> anything<br>
>> >>> > that is running, and make a call stack trace.<br>
>> >>> ><br>
>> >>> > Waiting for repro, if, from the db log, you manage to get the<br>
>> info of<br>
>> >>> > what<br>
>> >>> > kind of db data that is, maybe we can narrow down the usual <br>
>> >>> suspects..<br>
>> >>> ><br>
>> >>> > If there are lots of short selects like you said, usually, you<br>
>> would<br>
>> >>> > have a<br>
>> >>> > back and forth from reactor thread to db thread, so it sounds<br>
>> weird.<br>
>> >>> > What can be leading to your behavior is that whatever is<br>
>> halting the<br>
>> >>> > processing, everything is queued up in between, and unqueued<br>
>> when it >>> is<br>
>> >>> > finished, which could lead to spike of db actions in the end<br>
>> of the<br>
>> >>> > event.<br>
>> >>><br>
>> >>> The DB actions were going on for the entire 3 minutes that spike <br>
>> >>> lasted;<br>
>> >>> it is not a burst at either end, but a ~180 second long <br>
>> continuous<br>
>> >>> sequence (or barrage) of approximately 70-90000 transactions, if<br>
>> I am<br>
>> >>> interpreting the graph data correctly.<br>
>> >>><br>
>> >>> > Regards<br>
>> >>> > Pierre<br>
>> >>> ><br>
>> >>> ><br>
>> >>> > Le mar. 12 janv. 2021 à 21:49, Yngve N. Pettersen >>> <br>
>> <<a href="mailto:yngve@vivaldi.com" target="_blank">yngve@vivaldi.com</a> <mailto:<a href="mailto:yngve@vivaldi.com" target="_blank">yngve@vivaldi.com</a>>> a<br>
>> >>> > écrit :<br>
>> >>> ><br>
>> >>> >> Hi again,<br>
>> >>> >><br>
>> >>> >> A bit of an update.<br>
>> >>> >><br>
>> >>> >> I have not been able to locate the issue using the profiler.<br>
>> >>> >><br>
>> >>> >> It seems that when Buildbot gets into the problematic mode,<br>
>> then the<br>
>> >>> >> profiler is not able to work at all. It only starts <br>
>> collecting >>> after the<br>
>> >>> >> locked mode is resolved.<br>
>> >>> >><br>
>> >>> >> It does seem like the locked mode occurs when Buildbot is<br>
>> fetching >>> a lot<br>
>> >>> >> of data from the DB and then spends a lot of time processing<br>
>> that >>> data,<br>
>> >>> >> without yielding to other processing needs.<br>
>> >>> >><br>
>> >>> >> Looking at the monitoring of the server, it also appears that <br>
>> >>> buildbot<br>
>> >>> >> is<br>
>> >>> >> fetching a lot of data. During the most recent instance, the <br>
>> >>> returned<br>
>> >>> >> tuples count in the graph for the server indicates three<br>
>> minutes >>> of, on<br>
>> >>> >> average 25000 tuples returned, with spikes to 80K and 100K, <br>
>> per >>> second.<br>
>> >>> >><br>
>> >>> >> The number of open connections rose to 6 or 7, and the<br>
>> transaction >>> count<br>
>> >>> >> was 400-500 per second during the whole time (rolled back <br>
>> >>> transactions,<br>
>> >>> >> which I assume is just one or more selects).<br>
>> >>> >><br>
>> >>> >> IMO this makes it look like, while requesting these data,<br>
>> Buildbot >>> is<br>
>> >>> >> *synchronously* querying the DB and processing the returned<br>
>> data, >>> not<br>
>> >>> >> yielding. It might also be that it is requesting data more <br>
>> data >>> than it<br>
>> >>> >> needs, and also requesting other data earlier than it is <br>
>> actually<br>
>> >>> >> needed.<br>
>> >>> >><br>
>> >>> >><br>
>> >>> >><br>
>> >>> >> On Tue, 12 Jan 2021 12:48:40 +0100, Yngve N. Pettersen<br>
>> >>> >> <<a href="mailto:yngve@vivaldi.com" target="_blank">yngve@vivaldi.com</a> <mailto:<a href="mailto:yngve@vivaldi.com" target="_blank">yngve@vivaldi.com</a>>><br>
>> >>> >><br>
>> >>> >> wrote:<br>
>> >>> >><br>
>> >>> >> > Hi,<br>
>> >>> >> ><br>
>> >>> >> > IIRC the only real processing in our system that might be<br>
>> heavy is<br>
>> >>> >> done<br>
>> >>> >> > via logobserver.LineConsumerLogObserver in a class (now)<br>
>> derived >>> from<br>
>> >>> >> > ShellCommandNewStyle, so if that is the issue, and<br>
>> deferToThread >>> is<br>
>> >>> >> the<br>
>> >>> >> > solution, then if it isn't already done, my suggestion<br>
>> would be to<br>
>> >>> >> > implement that inside the code handling the log observers.<br>
>> >>> >> ><br>
>> >>> >> > I've tested the profiler a little, but haven't seen any<br>
>> samples >>> within<br>
>> >>> >> > our code so far, just inside buildbot, quite a lot of log <br>
>> DB >>> actions,<br>
>> >>> >> > also some TLS activity.<br>
>> >>> >> ><br>
>> >>> >> > The performance issue for those pages seems to be a bit<br>
>> flaky; at<br>
>> >>> >> > present its not happening AFAICT<br>
>> >>> >> ><br>
>> >>> >> > On Tue, 12 Jan 2021 10:59:42 +0100, Pierre Tardy >>> <br>
>> <<a href="mailto:tardyp@gmail.com" target="_blank">tardyp@gmail.com</a> <mailto:<a href="mailto:tardyp@gmail.com" target="_blank">tardyp@gmail.com</a>>><br>
>> >>> >> > wrote:<br>
>> >>> >> ><br>
>> >>> >> >> Hello,<br>
>> >>> >> >><br>
>> >>> >> >> A lot of things happen between 2.7 and 2.10, although I<br>
>> don't see<br>
>> >>> >> >> anything<br>
>> >>> >> >> which could impact the performance that much. (maybe new<br>
>> reporter<br>
>> >>> >> >> framework, but really not convinced)<br>
>> >>> >> >> If you see that the db is underutilized this must be a<br>
>> classical<br>
>> >>> >> reactor<br>
>> >>> >> >> starvation.<br>
>> >>> >> >> With asynchronous systems like buildbot, you shouldn't do<br>
>> any >>> heavy<br>
>> >>> >> >> computation in the main event loop thread, those must be<br>
>> done in >>> a<br>
>> >>> >> >> thread<br>
>> >>> >> >> via deferToThread and co.<br>
>> >>> >> >><br>
>> >>> >> >> Those are the common issues you can have with performance<br>
>> >>> >> >> independantly from upgrade regressions:<br>
>> >>> >> >><br>
>> >>> >> >> 1) Custom steps:<br>
>> >>> >> >> A lot of time, we see people struggling with performance<br>
>> when >>> they<br>
>> >>> >> just<br>
>> >>> >> >> have some custom step doing heavy computation that block<br>
>> the main<br>
>> >>> >> thread<br>
>> >>> >> >> constantly, preventing all the very quick tasks to run in <br>
>> //.<br>
>> >>> >> >><br>
>> >>> >> >> 2) too much logs<br>
>> >>> >> >> In this case, there is not much to do beside reducing <br>
>> the log<br>
>> >>> >> amount.<br>
>> >>> >> >> This<br>
>> >>> >> >> would be the time to switch to a multi-master setup, where<br>
>> you >>> put 2<br>
>> >>> >> >> masters for builds, and one master for web UI.<br>
>> >>> >> >> You can put those in the same machine/VM, no problem, the<br>
>> only >>> work<br>
>> >>> >> is<br>
>> >>> >> >> to<br>
>> >>> >> >> have separate processes that each have several event<br>
>> queues. You<br>
>> >>> can<br>
>> >>> >> use<br>
>> >>> >> >> docker-compose or kubernetes in order to more easily<br>
>> create such<br>
>> >>> >> >> deployment. We don't have readily useable for that, but<br>
>> several<br>
>> >>> >> people<br>
>> >>> >> >> have<br>
>> >>> >> >> done and documented it, for example<br>
>> >>> >> >> <a href="https://github.com/pop/buildbot-on-kubernetes" rel="noreferrer" target="_blank">https://github.com/pop/buildbot-on-kubernetes</a><br>
>> >>> >> >><br>
>> >>> >> >><br>
>> >>> >> >> I have developed the buildbot profiler in order to quickly<br>
>> find<br>
>> >>> >> those.<br>
>> >>> >> >> You<br>
>> >>> >> >> just have to install it as a plugin and start a profile<br>
>> whenever >>> the<br>
>> >>> >> >> buildbot feels slow.<br>
>> >>> >> >> It is a statistical profiler, so it will not significantly <br>
>> >>> change the<br>
>> >>> >> >> actual performance so it is safe to run in production.<br>
>> >>> >> >><br>
>> >>> >> >> <a href="https://pypi.org/project/buildbot-profiler/" rel="noreferrer" target="_blank">https://pypi.org/project/buildbot-profiler/</a><br>
>> >>> >> >><br>
>> >>> >> >><br>
>> >>> >> >> Regards,<br>
>> >>> >> >> Pierre<br>
>> >>> >> >><br>
>> >>> >> >><br>
>> >>> >> >> Le mar. 12 janv. 2021 à 01:29, Yngve N. Pettersen<br>
>> >>> >> <<a href="mailto:yngve@vivaldi.com" target="_blank">yngve@vivaldi.com</a> <mailto:<a href="mailto:yngve@vivaldi.com" target="_blank">yngve@vivaldi.com</a>>> a<br>
>> >>> >> >> écrit :<br>
>> >>> >> >><br>
>> >>> >> >>> Hello all,<br>
>> >>> >> >>><br>
>> >>> >> >>> We have just upgraded our buildbot system from 2.7 to <br>
>> 2.10.<br>
>> >>> >> >>><br>
>> >>> >> >>> However, I am noticing performance issues when loading<br>
>> these >>> pages:<br>
>> >>> >> >>><br>
>> >>> >> >>> Builds->Builders<br>
>> >>> >> >>> Builds->Workers<br>
>> >>> >> >>> individual builds<br>
>> >>> >> >>><br>
>> >>> >> >>> Loading these can take several minutes, although there <br>
>> are >>> periods<br>
>> >>> >> of<br>
>> >>> >> >>> immediate responses.<br>
>> >>> >> >>><br>
>> >>> >> >>> What I am seeing on the buildbot manager machine is that<br>
>> the >>> Python3<br>
>> >>> >> >>> process hits 90-100% for the entire period.<br>
>> >>> >> >>><br>
>> >>> >> >>> The Python version is 3.6.9 running on Ubuntu 18.04<br>
>> >>> >> >>><br>
>> >>> >> >>> As far as I can tell, the Postgresql database is mostly<br>
>> idle >>> during<br>
>> >>> >> >>> this<br>
>> >>> >> >>> period. I did do a full vacuum a few hours ago, in case<br>
>> that >>> was the<br>
>> >>> >> >>> issue.<br>
>> >>> >> >>><br>
>> >>> >> >>> There are about 40 builders, and 30 workers in the<br>
>> system, only<br>
>> >>> >> about<br>
>> >>> >> >>> 10-15 of these have a 10-20 builds for the past few days, <br>
>> >>> although<br>
>> >>> >> most<br>
>> >>> >> >>> of<br>
>> >>> >> >>> these have active histories of 3000 builds (which do <br>
>> make me<br>
>> >>> wonder<br>
>> >>> >> if<br>
>> >>> >> >>> the<br>
>> >>> >> >>> problem could be a lack of limiting the DB queries, at<br>
>> present I<br>
>> >>> >> have<br>
>> >>> >> >>> not<br>
>> >>> >> >>> inspected the DB queries).<br>
>> >>> >> >>><br>
>> >>> >> >>> The individual builds can have very large log files in<br>
>> the build<br>
>> >>> >> steps,<br>
>> >>> >> >>> in<br>
>> >>> >> >>> many cases tens of thousands of lines (we _are_ talking<br>
>> about a<br>
>> >>> >> >>> Chromium<br>
>> >>> >> >>> based project).<br>
>> >>> >> >>><br>
>> >>> >> >>> Our changes in the builders and workers JS code are<br>
>> minimal (we >>> are<br>
>> >>> >> >>> using<br>
>> >>> >> >>> a custom build of www-base), just using different<br>
>> information >>> for<br>
>> >>> >> the<br>
>> >>> >> >>> build labels (build version number), and grouping the<br>
>> builders,<br>
>> >>> >> which<br>
>> >>> >> >>> should not be causing any performance issues. (we have <br>
>> larger<br>
>> >>> >> changes<br>
>> >>> >> >>> in<br>
>> >>> >> >>> the individual builder view, where we include Git commit <br>
>> >>> messages,<br>
>> >>> >> and<br>
>> >>> >> >>> I<br>
>> >>> >> >>> have so far not seen any performance issues there)<br>
>> >>> >> >>><br>
>> >>> >> >>> BTW: The line plots for build time and successes on<br>
>> builders >>> seems<br>
>> >>> >> to<br>
>> >>> >> >>> be<br>
>> >>> >> >>> MIA. Not sure if that is an upstream issue, or due to<br>
>> something >>> in<br>
>> >>> >> our<br>
>> >>> >> >>> www-base build.<br>
>> >>> >> >>><br>
>> >>> >> >>> Do you have any suggestions for where to look for the<br>
>> cause of >>> the<br>
>> >>> >> >>> problem?<br>
>> >>> >> >>><br>
>> >>> >> >>><br>
>> >>> >> >>> --<br>
>> >>> >> >>> Sincerely,<br>
>> >>> >> >>> Yngve N. Pettersen<br>
>> >>> >> >>> Vivaldi Technologies AS<br>
>> >>> >> >>> _______________________________________________<br>
>> >>> >> >>> users mailing list<br>
>> >>> >> >>> <a href="mailto:users@buildbot.net" target="_blank">users@buildbot.net</a> <mailto:<a href="mailto:users@buildbot.net" target="_blank">users@buildbot.net</a>><br>
>> >>> >> >>> <a href="https://lists.buildbot.net/mailman/listinfo/users" rel="noreferrer" target="_blank">https://lists.buildbot.net/mailman/listinfo/users</a><br>
>> >>> >> >>><br>
>> >>> >> ><br>
>> >>> >> ><br>
>> >>> >><br>
>> >>> >><br>
>> >>> >> --<br>
>> >>> >> Sincerely,<br>
>> >>> >> Yngve N. Pettersen<br>
>> >>> >> Vivaldi Technologies AS<br>
>> >>> >><br>
>> >>><br>
>> >>><br>
>> >>> --<br>
>> >>> Sincerely,<br>
>> >>> Yngve N. Pettersen<br>
>> >>> Vivaldi Technologies AS<br>
>> >>> _______________________________________________<br>
>> >>> users mailing list<br>
>> >>> <a href="mailto:users@buildbot.net" target="_blank">users@buildbot.net</a> <mailto:<a href="mailto:users@buildbot.net" target="_blank">users@buildbot.net</a>><br>
>> >>> <a href="https://lists.buildbot.net/mailman/listinfo/users" rel="noreferrer" target="_blank">https://lists.buildbot.net/mailman/listinfo/users</a><br>
>> >>><br>
>> ><br>
>> ><br>
>><br>
>><br>
>> --<br>
>> Sincerely,<br>
>> Yngve N. Pettersen<br>
>> Vivaldi Technologies AS<br>
>><br>
>><br>
>> _______________________________________________<br>
>> users mailing list<br>
>> <a href="mailto:users@buildbot.net" target="_blank">users@buildbot.net</a><br>
>> <a href="https://lists.buildbot.net/mailman/listinfo/users" rel="noreferrer" target="_blank">https://lists.buildbot.net/mailman/listinfo/users</a><br>
>><br>
><br>
<br>
<br>
-- <br>
Sincerely,<br>
Yngve N. Pettersen<br>
Vivaldi Technologies AS<br>
</blockquote></div>