[Buildbot] #2935: Buildbot gives up on EC2 spot instance requests before EC2 does

Buildbot trac trac at buildbot.net
Thu Oct 9 06:55:21 UTC 2014


#2935: Buildbot gives up on EC2 spot instance requests before EC2 does
---------------------+-----------------------
Reporter:  bgilbert  |      Owner:
    Type:  defect    |     Status:  new
Priority:  major     |  Milestone:  undecided
 Version:  0.8.9     |   Keywords:  ec2
---------------------+-----------------------
 When Eight receives a spot request status code other than `pending-
 evaluation`, `pending-fulfillment`, or `fulfilled`, it concludes that the
 spot request has failed and gives up on it.  However,
 [http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-spot-instances-
 bid-status.html#spot-instances-bid-status-lifecycle several status codes]
 are non-terminal, and EC2 may still fulfill the request at a later time.
 Nine knows that `price-too-low` is non-terminal, and so cancels the
 request when giving up on it, but does not do this for other non-terminal
 status codes.

 As a result, EC2 may launch instances that are not tracked by Buildbot.
 These will remain running and costing money until the spot price exceeds
 the bid price, at which point EC2 will automatically terminate the
 instance.  To avoid this, Buildbot needs to cancel spot requests when it
 gives up on them.

 {{{
 2014-10-09 01:17:00-0400 [-] EC2LatentBuildSlave el6-amd64 requesting spot
 instance
 2014-10-09 01:18:06-0400 [-] EC2LatentBuildSlave el6-amd64 has waited 1
 minutes for spot request sir-022rlcrg
 2014-10-09 01:18:37-0400 [-] EC2LatentBuildSlave el6-amd64 failed to
 fulfill spot request sir-022rlcrg with status capacity-oversubscribed
 2014-10-09 01:18:37-0400 [-] Buildslave el6-amd64 detached from testsuite-
 el6-amd64
 2014-10-09 01:18:37-0400 [-] while preparing slavebuilder:
         Traceback (most recent call last):
           File "/home/buildbot/env/local/lib/python2.7/site-
 packages/twisted/internet/defer.py", line 577, in _runCallbacks
             current.result = callback(current.result, *args, **kw)
           File "/home/buildbot/env/local/lib/python2.7/site-
 packages/twisted/internet/defer.py", line 1155, in gotResult
             _inlineCallbacks(r, g, deferred)
           File "/home/buildbot/env/local/lib/python2.7/site-
 packages/twisted/internet/defer.py", line 1097, in _inlineCallbacks
             result = result.throwExceptionIntoGenerator(g)
           File "/home/buildbot/env/local/lib/python2.7/site-
 packages/twisted/python/failure.py", line 389, in
 throwExceptionIntoGenerator
             return g.throw(self.type, self.value, self.tb)
         --- <exception caught here> ---
           File "/home/buildbot/env/local/lib/python2.7/site-
 packages/buildbot/process/builder.py", line 335, in _startBuildFor
             ready = yield slavebuilder.prepare(self.builder_status, build)
           File "/home/buildbot/env/local/lib/python2.7/site-
 packages/twisted/python/threadpool.py", line 196, in _worker
             result = context.call(ctx, function, *args, **kwargs)
           File "/home/buildbot/env/local/lib/python2.7/site-
 packages/twisted/python/context.py", line 118, in callWithContext
             return self.currentContext().callWithContext(ctx, func, *args,
 **kw)
           File "/home/buildbot/env/local/lib/python2.7/site-
 packages/twisted/python/context.py", line 81, in callWithContext
             return func(*args,**kw)
           File "/home/buildbot/env/local/lib/python2.7/site-
 packages/buildbot/buildslave/ec2.py", line 364, in _request_spot_instance
             request = self._wait_for_request(reservations[0])
           File "/home/buildbot/env/local/lib/python2.7/site-
 packages/buildbot/buildslave/ec2.py", line 443, in _wait_for_request
             request.id, request.status)
         buildbot.interfaces.LatentBuildSlaveFailedToSubstantiate: (u'sir-
 022rlcrg', <Status: capacity-oversubscribed>)

 2014-10-09 01:18:37-0400 [-] slave <Build testsuite-el6-amd64> can't build
 <LatentSlaveBuilder builder='testsuite-el6-amd64'> after all; re-queueing
 the request

 [...]

 2014-10-09 01:32:37-0400 [Broker,32,127.0.0.1] slave 'el6-amd64' attaching
 from IPv4Address(TCP, '127.0.0.1', 59401)
 2014-10-09 01:32:37-0400 [Broker,32,127.0.0.1] Slave el6-amd64 received
 connection while not trying to substantiate.  Disconnecting.
 }}}

--
Ticket URL: <http://trac.buildbot.net/ticket/2935>
Buildbot <http://buildbot.net/>
Buildbot: build/test automation


More information about the bugs mailing list