[Buildbot-devel] Non-fatal disconnects?
Markus Kramer
markus.kramer at mainconcept.de
Fri Feb 16 07:19:04 UTC 2007
Hi,
i am seeing similar behaviour with our windows buildbot slaves when
using original code, and i am pretty sure this is no firewalling issue
as all our buildslaves are located within their own subnet.
From my observations the disconnects do happen most times if startup of
a new build on the windows slave is due. I had a look in the code and
found that buildbot master actually performs a ping operation over the
cmd channel before starting a new build, which should be answered by the
client within a few seconds. And while all the other buildslaves like
linux, OSX ... seem to be able to answer that ping request witihn that
time, the windows slave takes much longer to do so. Actually in some
cases we have 2 to 3 other builds going on for the same slave and it
seems that at sometimes it takes up to 5 Minutes to answer that ping
request. If the ping request is not answered within time master will
declare that slave dead and disconnect it even so all the builds are
working just fine. As a temporary workaround i have set the timeout for
that ping request after starting a new build to 10 minutes on our system
and since then never had any disconnects of the windows slave. However
this will seriously harm dead slave detection, which is ok for my
environment having all the bots within their own subnet on the same
switch, but in general I am still wondering why it takes so long for the
windows slaves to react on any cmd request. My impression is that
sometimes the windows slave only reacts on cmds (i.e. getting the build
log of a build in progress on that machine) every time some of the other
buildsteps either produce some new output on stdio or complete one
buildstep for any of the builds already running on that slave.
Best regards,
Markus Kramer
Carl Dionne schrieb:
> I also see this problem quite frequently, in a windows environment. In
> my case, however, I know that this is not due to a firewall problem
> since it's all on the same LAN.
>
> It would be nice to ensure that builds can continue if connection is
> restored, but from what I have seen, I believe that in some situations,
> a wrong disconnection is detected in the first place. That would be
> really nice to solve that one. I haven't found any usefull in the logs,
> however.
>
> Carl
>
> -----Original Message-----
> From: buildbot-devel-bounces at lists.sourceforge.net
> [mailto:buildbot-devel-bounces at lists.sourceforge.net] On Behalf Of
> Jean-Paul Calderone
> Sent: February 15, 2007 11:28 AM
> To: buildbot-devel at lists.sourceforge.net
> Subject: Re: [Buildbot-devel] Non-fatal disconnects?
>
> On Thu, 15 Feb 2007 15:53:53 +0000, Simon Marlow
> <simonmarhaskell at gmail.com> wrote:
>
>> Hi there,
>>
>> The GHC project has just set up a buildbot:
>> http://darcs.haskell.org/buildbot/all/. The single biggest problem we
>> have is that some of our clients get accidentally disconnected during a
>>
>
>
>> build, probably the main reason being flaky firewall setups. There's
>> not a lot we can do about this, unfortunately. Builds take several
>> hours, making it all the more likely that a disconnect occurs during a
>>
> build.
>
>> I wonder if it would be possible to make transient disconnections
>> non-fatal, such that a disconnection wouldn't interrupt an ongoing
>> build, as long as the connection was restored within a suitably short
>> time period? I think this would be our #1 feature request as far as
>>
> improving our buildbot experience goes.
>
>
> I second this feature request. :)
>
> Jean-Paul
>
> ------------------------------------------------------------------------
> -
> Take Surveys. Earn Cash. Influence the Future of IT Join
> SourceForge.net's Techsay panel and you'll get the chance to share your
> opinions on IT & business topics through brief surveys-and earn cash
> http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDE
> V
> _______________________________________________
> Buildbot-devel mailing list
> Buildbot-devel at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/buildbot-devel
>
> -------------------------------------------------------------------------
> Take Surveys. Earn Cash. Influence the Future of IT
> Join SourceForge.net's Techsay panel and you'll get the chance to share your
> opinions on IT & business topics through brief surveys-and earn cash
> http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
> _______________________________________________
> Buildbot-devel mailing list
> Buildbot-devel at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/buildbot-devel
>
>
--
Markus Kramer
MainConcept AG
Elisabethstrasse 1
52062 Aachen
Germany
Phone: +49-(0)241/40108-0
Fax : +49-(0)241/40108-10
markus.kramer at mainconcept.de
http://www.mainconcept.com
Amtsgericht Aachen: HRB 8938
--
This email and any attachments thereto may contain private,
confidential, and privileged material for the sole use of the intended
above named recipient(s). Any review, copying, or distribution of this
email (or any attachments thereto) by others is strictly prohibited. If
you are not an intended recipient, please contact the sender immediately
and permanently delete the original and any copies of this email and any
attachments thereto.
More information about the devel
mailing list