[Buildbot-devel] Lock question: Multiple builders, load-balanced slaves

Brian Warner warner-buildbot at lothar.com
Sun Nov 5 04:45:10 UTC 2006


Stefan Seefeld <seefeld at sympatico.ca> writes:

> as you know I have submitted a RFE a long time ago
> (https://sourceforge.net/tracker/index.php?func=detail&aid=1524611&group_id=73177&atid=537004)
> which you have never replied to.

I'm really sorry about that... I think it's a great idea. I've been trying to
get my head around how we can integrate it with the slave-availability ideas
that have been kicking around here. SF#1251484 (pausing builders) is slightly
related. I remember putting a couple of ideas here:

 https://sourceforge.net/mailarchive/message.php?msg_id=36487840
 https://sourceforge.net/mailarchive/message.php?msg_id=17021841

> Thus, the only means to coordinate builds is by measuring current loads and
> dispatch builds to the least loaded machines, i.e. do proper load
> balancing.

I agree.

I think that the buildmaster needs to be aware of the "availability status"
of each buildslave. This would include both the slave's load average and a
buildslave-admin -controlled state which would let the slave admin express
things like "I'm using my workstation right now: please only use it as a
buildslave if you can't use somebody else's instead", or "I'm going to shut
down my computer soon, please don't schedule any new builds, and tell me when
you've stopped using my machine". When the buildmaster decides to start a new
build, the Builder should be able to look at the availability status of all
its buildslaves and compare them to find the best one, where the definition
of "best" can be controlled by subclassing the Builder.

So one piece is the API between the master and the slave to request or report
this availability status (master polls the slave? slave sends updates
spontaneously?). Another is the slave-side control mechanism (how does the
slave admin control its availability.. touch a file? SIGUSR1? local unix
socket? local web server?). The third piece is the master-side code that
compares statuses and chooses the best buildslave.

I have a nascent tree where I'm starting to add this support.. I've attached
what little code I've written (in the form of a patch against trunk) below.

It's too close to the next release to get this into 0.7.5, but I'm eager to
get it into the next one after that. Please let me know what you think.

thanks,
 -Brian

diff -rN -u old-slave-availability/buildbot/master.py new-slave-availability/buildbot/master.py
--- old-slave-availability/buildbot/master.py	2006-11-04 20:38:47.000000000 -0800
+++ new-slave-availability/buildbot/master.py	2006-11-04 20:38:47.000000000 -0800
@@ -32,14 +32,24 @@
 ########################################
 
 
-
+# slave moods. Larger numbers indicate more willingness to do work.
+(OFFLINE,
+ GOING_ON_HOLIDAY,
+ RELUCTANT,
+ EAGER) = range(4)
 
 class BotPerspective(NewCredPerspective):
     """This is the master-side representative for a remote buildbot slave.
     There is exactly one for each slave described in the config file (the
     c['bots'] list). When buildbots connect in (.attach), they get a
     reference to this instance. The BotMaster object is stashed as the
-    .service attribute."""
+    .service attribute.
+
+    @ivar availability: an int, indicating to what extent this slave is
+                        willing to participate in builds. One of OFFLINE,
+                        EAGER, RELUCTANT, or GOING_ON_HOLIDAY.
+
+    """
 
     slave_commands = None
 
@@ -48,6 +58,10 @@
         self.slave_status = SlaveStatus(name)
         self.builders = [] # list of b.p.builder.Builder instances
         self.slave = None # a RemoteReference to the Bot, when connected
+        self.availability = OFFLINE
+
+    def getAvailability(self):
+        return self.availability
 
     def addBuilder(self, builder):
         """Called to add a builder after the slave has connected.
@@ -216,6 +230,7 @@
         log.err(why)
 
     def _attached4(self, res):
+        self.availability = "eager"
         return self.sendBuilderList()
 
     def sendBuilderList(self):
diff -rN -u old-slave-availability/buildbot/process/builder.py new-slave-availability/buildbot/process/builder.py
--- old-slave-availability/buildbot/process/builder.py	2006-11-04 20:38:47.000000000 -0800
+++ new-slave-availability/buildbot/process/builder.py	2006-11-04 20:38:47.000000000 -0800
@@ -7,6 +7,7 @@
 from twisted.internet import reactor, defer
 
 from buildbot import interfaces, sourcestamp
+from buildbot.master import OFFLINE, GOING_ON_HOLIDAY, RELUCTANT, EAGER
 from buildbot.twcompat import implements
 from buildbot.status.progress import Expectations
 from buildbot.util import now
@@ -38,6 +39,11 @@
             return oldversion
         return self.remoteCommands.get(command)
 
+    def getAvailability(self):
+        if not self.slave:
+            return "offline"
+        return self.slave.getAvailability()
+
     def attached(self, slave, remote, commands):
         self.slave = slave
         self.remote = remote
@@ -437,16 +443,23 @@
         if not self.buildable:
             self.updateBigStatus()
             return # nothing to do
-        # find the first idle slave
-        for sb in self.slaves:
-            if sb.state == IDLE:
-                break
-        else:
+        # find all the idle slaves
+        idle_slaves = [sb for sb in self.slaves if sb.state == IDLE]
+        # but we'll only use the ones that are actually willing to take on
+        # new jobs
+        useable_slaves = [sb for sb in idle_slaves
+                          if sb.getAvailability() >= RELUCTANT]
+        # and we prefer EAGER slaves
+        useable_slaves.sort(lambda a,b: cmp(a.getAvailability(),
+                                            b.getAvailability()))
+        if not useable_slaves:
             log.msg("%s: want to start build, but we don't have a remote"
                     % self)
             self.updateBigStatus()
             return
 
+        sb = useable_slaves[-1]
+
         # there is something to build, and there is a slave on which to build
         # it. Grab the oldest request, see if we can merge it with anything
         # else.






More information about the devel mailing list