[Buildbot-devel] RFC: Assigning builds when load is high

Jared Grubb jared.grubb at gmail.com
Thu May 7 01:12:47 UTC 2015


> On May 6, 2015, at 13:59, Vitali Lovich <vlovich at gmail.com> wrote:
> 
> The problem with #2 is that you won’t actually use your compute cluster since you’ll be waiting for a particular buildslave even though other buildslaves may be idle.

No, you misunderstand me … under #2, if no buildslave works, then the build request is not assigned to any buildslave (under #1, it is assigned to a random one, since they’re all busy) … when buildslaves go idle, the loop checks again. 

So to put it another way, if you had one builder with an exclusive slave lock, #2 guarantees each buildslave has at most one build assigned… the others “float” until something frees up.  Under #1, all builds will get assigned to something, but will get stuck in “Waiting For Lock”.

> The approach I’ve found that works better is implementing a prioritization that knows about which jobs are likely to be quick & which aren’t so that quick jobs are picked for completion first.
> This does make it a domain-specific problem unfortunately but is tractable.  Ping me offline if you want to discuss the details for our setup.
> 
> If buildbot wants to properly solve scheduling I think there are a few moving parts where a revamped ETA is crucial:
> 
> 1. ETA needs to be implemented properly & robustly.  That means being able to provide a buildslave-specific ETA for each build step + take into account domain-specific dimensions.
> In other words, the user has to be able to provide a set of properties that must match for the ETA samples so that if a builder is shared between projects the ETA is still correct or if a build request is for a clean build vs incremental.
> Similarly, some kind of fallback mechanism is likely necessary since if I’m building a branch it likely needs to use the master ETA as a baseline if we don’t have anything more up-to-date for the branch itself.

ETA is gone in nine, and hopefully will re-emerge under the new “metrics” stuff that is coming. (I hope?)

> 2. ETA needs to have a guess function that given a buildslave & domain-specific dimensions returns how long the build *would* take (including accounting for any locks we might need to acquire up-front).
> 3. The ETA would need to be available for locks based on current load by using ETA of completion of the builds/buildsteps holding the lock (read-only lock would be the max of the ETA of the things holding the lock).
> 4. The queue would need to take the ETA for a given BR for each buildslave & then try to use the buildslave that minimizes the ETA (regardless of any current locks being held).
> 
> This way, if you add a machine that is 10x faster than the rest, you’ll have jobs queue up on it leaving your slower machines idle until it’s faster to overflow to other machines.

I dont know any good reason to assign builds to buildslaves if they’ll just block. The only advantage is that there’s then something to Cancel early if you wanted to I guess.

> 
> This isn’t optimal from a total queue scheduling perspective since it’s greedy instead of co-operative, but it will actually likely behave
> per user expectations (i.e. use all the available capacity so that jobs finish the most quickly).
> 
> -Vitali
> 
>> On May 6, 2015, at 1:15 PM, Jared Grubb <jared.grubb at gmail.com> wrote:
>> 
>> Many months ago, I made a change in buildbot to enhance the way that buildslaves and builds get assigned. In particular, we added a “canStartBuild” functor that lets you adjust how these mappings happen.
>> 
>> There was a design decision I made that I’m starting to regret (and have disabled in my buildbot).
>> 
>> Question:
>> - The BRD attempts to pick buildslaves that can aquire builder locks. If no buildslave qualify (ie high load), we have two choices:
>>   1. pick a random buildslave that would work otherwise
>>   2. give up and wait until a buildslave can acquire the locks needed
>> 
>> Currently, the BRD does #1, however, I’ve seen this cause problems when quick builds get stuck behind long builds … and so I’ll see my set of buildslaves go idle except for one, which will have a few builds on it, all stuck behind one long build. If we did #2, then the short builds would get assigned immediately as the next buildslave goes idle.
>> 
>> I am thinking that #2 should be the default behavior — or at least be opt-in configurable.
>> 
>> Note this applies to both eight and nine and is a fairly trivial patch either way.
>> 
>> Anyone have any thoughts or comments?
>> 
>> Jared
>> ------------------------------------------------------------------------------
>> One dashboard for servers and applications across Physical-Virtual-Cloud 
>> Widest out-of-the-box monitoring support with 50+ applications
>> Performance metrics, stats and reports that give you Actionable Insights
>> Deep dive visibility with transaction tracing using APM Insight.
>> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>> _______________________________________________
>> Buildbot-devel mailing list
>> Buildbot-devel at lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/buildbot-devel
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://buildbot.net/pipermail/devel/attachments/20150506/c4d92b2b/attachment.html>


More information about the devel mailing list