[Buildbot-commits] [SPAM] Re: [Buildbot] #664: sporadic connection lost problems

Buildbot buildbot-devel at lists.sourceforge.net
Sun Dec 27 23:48:40 UTC 2009


#664: sporadic connection lost problems
--------------------+-------------------------------------------------------
Reporter:  ddunbar  |        Owner:  ddunbar 
    Type:  defect   |       Status:  assigned
Priority:  major    |    Milestone:  0.8.+   
 Version:           |   Resolution:          
Keywords:           |  
--------------------+-------------------------------------------------------
Changes (by ddunbar):

  * owner:  => ddunbar
  * status:  new => assigned


Comment:

 Based on some TCP logging and IRC discussion, I believe this is an
 approximate picture of what happens. This picture explains why it only
 happens with multiple builders, and happens more often in my long running
 builder:

 1. The slave is busy with the long running build.
 2. The master is busy with a number of builds; my particular master is
 pretty heavily loaded and is frequently maxing the CPU.
 3. A source change arrives which triggers a number of builds, including
 one on the slave with the long running builder:
 4. Because the slave has multiple builders, a new build is scheduled and
 the master initiates a ping to the slave to make sure it is available. If
 it only had one builder the build would just be pending and the bug
 wouldn't show up.
 5. The master is busy doing other work, and the slave is busy feeding it
 more data. At some point the master's TCP buffer fill up. If the slave's
 response to the ping is delayed behind the other data, it won't get to the
 master in time.
 6. The master gets a ping timeout and closes the slave connection, killing
 an otherwise active build.

-- 
Ticket URL: <http://buildbot.net/trac/ticket/664#comment:2>
Buildbot <http://buildbot.net/>
Buildbot: build/test automation


More information about the Commits mailing list