[Buildbot-devel] buildslave keepalive and clock jumps

David Bolen db3l.net at gmail.com
Fri Sep 7 05:43:40 UTC 2007


I'm running a buildslave under FreeBSD within a VMWare image that uses
a vmware-guestd package to keep the clock in sync with the host, since
otherwise the FreeBSD clock runs too slowly. But under load, the clock
can jump in quite large increments in order to sync up (sometimes as
much as a minute or two).

The current slave behavior is to schedule two independent timers, one
to trigger a keepalive to the master in advance of the keepalive
timeout, and the other to check inactivity.  The problem is that if
your clock is subject to jumps, the scheduled gap between them can
shrink, to nothing in some cases, and they can both end up firing back
to back.  In such a case the slave times out the master before there
is even time to get a response back.  In theory scheduling the
keepalive in advance of the overall activity timeout should cover
this, but that falls apart if you have skips in time.

A small change seems to handle this better - if the scheduling of the
activity check is only made once the keepalive has been sent, then it
is relative to the scheduling of the keepalive generation, and slides
forward should clock updates delay execution of the keepalive, always
ensuring enough time for a properly working master to respond.

For example, the attached diff, from a 0.7.5 installation, moves
the activityTimer setup into doKeepalive.

This does perhaps open up more opportunity for an overall slippage of
the keepalive timeout window, but it should be minimal outside of
actual time loss on the system, as nothing within the doKeepalive
routine should block or take a significant amount of time.  It makes
the keepalive checking more robust should clock changes affect the
timers so that they are no longer spaced as far apart as when first
scheduled.

-- David

*** bot.py.orig	Thu Sep  6 23:40:25 2007
--- bot.py	Fri Sep  7 01:35:00 2007
***************
*** 406,414 ****
          # arrange to send a keepalive a little while before our deadline
          when = self.keepaliveInterval - self.keepaliveTimeout
          self.keepaliveTimer = reactor.callLater(when, self.doKeepalive)
-         # and check for activity too
-         self.activityTimer = reactor.callLater(self.keepaliveInterval,
-                                                self.checkActivity)
  
      def stopTimers(self):
          if self.keepaliveTimer:
--- 406,411 ----
***************
*** 429,434 ****
--- 426,435 ----
          d = self.perspective.callRemote("keepalive")
          d.addCallback(self.activity)
          d.addErrback(self.keepaliveLost)
+         # Establish timer for activity check - do this now so it is relative
+         # to the keepalive and ensures some time for a response.
+         self.activityTimer = reactor.callLater(self.keepaliveTimeout,
+                                                self.checkActivity)
  
      def keepaliveLost(self, f):
          log.msg("BotFactory.keepaliveLost")







More information about the devel mailing list