[Buildbot] #2895: Builds take a long time to get started on latent buildslaves

Buildbot trac trac at buildbot.net
Thu Sep 18 14:03:43 UTC 2014


#2895: Builds take a long time to get started on latent buildslaves
------------------------+------------------------
Reporter:  agateau      |       Owner:
    Type:  enhancement  |      Status:  new
Priority:  major        |   Milestone:  undecided
 Version:  0.8.6p1      |  Resolution:
Keywords:  ec2          |
------------------------+------------------------

Comment (by dustin):

 So you had a `print` or `log.msg` both before and after that statement,
 and the time difference between those was four minutes?

 The code for `release` is simple:

 http://twistedmatrix.com/trac/browser/tags/releases/twisted-14.0.0/twisted/internet/defer.py#L1331
 {{{
 #!python
             def release(self):
                 """
                 Release the lock.  If there is a waiting list, then the
 first
                 L{Deferred} in that waiting list will be called back.

                 Should be called by whomever did the L{acquire}() when the
 shared
                 resource is free.
                 """
                 assert self.locked, "Tried to release an unlocked lock"
                 self.locked = False
                 if self.waiting:
                     # someone is waiting to acquire lock
                     self.locked = True
                     d = self.waiting.pop(0)
                     d.callback(self)
 }}}

 I could see that taking four minutes for the *first* call if there are 872
 waiting acquire() calls; each will get a callback, and that may make a
 subsequent callback, and so on.  This would assume that none of those
 callbacks waits for a reactor iteration.

 But the builder sorting *does* wait for a reactor iteration, as
 `getOldestRequestTime` calls
 `self.master.db.buildrequests.getBuildRequests` which defers the DB
 operation to a thread.  Once that DB operation is queued, the Python stack
 unwinds and the `d.callback(self)` above completes, and the lock is
 released.

 I suspect it's the build sorting that's taking so long, and since the
 sorting is occurring for each of the 109 builders, that could add up.

 So: what, exactly, are you seeing that makes it appear that the `release`
 is taking four minutes?

--
Ticket URL: <http://trac.buildbot.net/ticket/2895#comment:8>
Buildbot <http://buildbot.net/>
Buildbot: build/test automation


More information about the bugs mailing list