[Buildbot-devel] Non-fatal disconnects?

Markus Kramer markus.kramer at mainconcept.de
Fri Feb 16 07:19:04 UTC 2007


Hi,

i am seeing similar behaviour with our windows buildbot slaves when 
using original code, and i am pretty sure this is no firewalling issue 
as all our buildslaves are located within their own subnet.
 From my observations the disconnects do happen most times if startup of 
a new build on the windows slave is due. I had a look in the code and 
found that buildbot master actually performs a ping operation over the 
cmd channel before starting a new build, which should be answered by the 
client within a few seconds. And while all the other buildslaves like 
linux, OSX ... seem to be able to answer that ping request witihn that 
time, the windows slave takes much longer to do so. Actually in some 
cases we have 2 to 3 other builds going on for the same slave and it 
seems that at sometimes it takes up to 5 Minutes to answer that ping 
request. If the ping request is not answered within time master will 
declare that slave dead and disconnect it even so all the builds are 
working just fine. As a temporary workaround i have set the timeout for 
that ping request after starting a new build to 10 minutes on our system 
and since then never had any disconnects of the windows slave.  However 
this will seriously harm dead slave detection, which is ok for my 
environment having all the bots within their own subnet on the same 
switch, but in general I am still wondering why it takes so long for the 
windows slaves to react on any cmd request. My impression is that 
sometimes the windows slave only reacts on cmds (i.e. getting the build 
log of a build in progress on that machine) every time some of the other 
buildsteps either produce some new output on stdio or complete one 
buildstep for any of the builds  already running on that slave.

Best regards,

Markus Kramer



Carl Dionne schrieb:
> I also see this problem quite frequently, in a windows environment.  In
> my case, however, I know that this is not due to a firewall problem
> since it's all on the same LAN.  
>
> It would be nice to ensure that builds can continue if connection is
> restored, but from what I have seen, I believe that in some situations,
> a wrong disconnection is detected in the first place.  That would be
> really nice to solve that one.  I haven't found any usefull in the logs,
> however.
>
> Carl 
>
> -----Original Message-----
> From: buildbot-devel-bounces at lists.sourceforge.net
> [mailto:buildbot-devel-bounces at lists.sourceforge.net] On Behalf Of
> Jean-Paul Calderone
> Sent: February 15, 2007 11:28 AM
> To: buildbot-devel at lists.sourceforge.net
> Subject: Re: [Buildbot-devel] Non-fatal disconnects?
>
> On Thu, 15 Feb 2007 15:53:53 +0000, Simon Marlow
> <simonmarhaskell at gmail.com> wrote:
>   
>> Hi there,
>>
>> The GHC project has just set up a buildbot:
>> http://darcs.haskell.org/buildbot/all/.  The single biggest problem we 
>> have is that some of our clients get accidentally disconnected during a
>>     
>
>   
>> build, probably the main reason being flaky firewall setups.  There's 
>> not a lot we can do about this, unfortunately.  Builds take several 
>> hours, making it all the more likely that a disconnect occurs during a
>>     
> build.
>   
>> I wonder if it would be possible to make transient disconnections 
>> non-fatal, such that a disconnection wouldn't interrupt an ongoing 
>> build, as long as the connection was restored within a suitably short 
>> time period?  I think this would be our #1 feature request as far as
>>     
> improving our buildbot experience goes.
>   
>
> I second this feature request. :)
>
> Jean-Paul
>
> ------------------------------------------------------------------------
> -
> Take Surveys. Earn Cash. Influence the Future of IT Join
> SourceForge.net's Techsay panel and you'll get the chance to share your
> opinions on IT & business topics through brief surveys-and earn cash
> http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDE
> V
> _______________________________________________
> Buildbot-devel mailing list
> Buildbot-devel at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/buildbot-devel
>
> -------------------------------------------------------------------------
> Take Surveys. Earn Cash. Influence the Future of IT
> Join SourceForge.net's Techsay panel and you'll get the chance to share your
> opinions on IT & business topics through brief surveys-and earn cash
> http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
> _______________________________________________
> Buildbot-devel mailing list
> Buildbot-devel at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/buildbot-devel
>
>   


-- 

Markus Kramer


MainConcept AG
Elisabethstrasse 1
52062 Aachen
Germany
Phone: +49-(0)241/40108-0
Fax  : +49-(0)241/40108-10

markus.kramer at mainconcept.de
http://www.mainconcept.com

Amtsgericht Aachen: HRB 8938

--
This email and any attachments thereto may contain private,
confidential, and privileged material for the sole use of the intended
above named recipient(s). Any review, copying, or distribution of this
email (or any attachments thereto) by others is strictly prohibited. If
you are not an intended recipient, please contact the sender immediately
and permanently delete the original and any copies of this email and any
attachments thereto.





More information about the devel mailing list