[Buildbot-devel] Win32 Slave Stability Issues?

Markus Kramer markus.kramer at mainconcept.de
Sun Dec 3 15:15:53 UTC 2006


Ok, i have found some time to take our buildbot down over the weekend 
and debug a little into that problem.
I have traced master slave communication with wireshark, and could see 
that it is actually the master closing down the connection.
After inspecting the logs a little deeper, i found that on start of 
every build, master tries to ping the slave with a 10 Second timeout, 
and if that one fails it declares that slave dead. For some reasons 
windows buildslaves seem to exhibit very odd reply times and do hardly 
manage to answer that ping within 10 seconds when already under load. 
Instead I had to put the timeout up to 5 minutes.  Somehow it seems 
windows buildslaves delay replys on the command channel for quite some 
time (my impression is actually that next reply is only delivered if one 
of the already running build steps on that builder completes, or there 
are no buildsteps in progress).
Is there any method to inspect slaves and see what they are actually 
doing that prevents them from answering pings?

Best regards,

Markus Kramer




Carl Dionne schrieb:
> I'm experiencing a similar problem on windows, but haven't had the time
> to debug it either. 
>
> -----Original Message-----
> From: buildbot-devel-bounces at lists.sourceforge.net
> [mailto:buildbot-devel-bounces at lists.sourceforge.net] On Behalf Of
> Markus Kramer
> Sent: November 29, 2006 8:59 AM
> To: buildbot-devel at lists.sourceforge.net
> Subject: [Buildbot-devel] Win32 Slave Stability Issues?
>
> Hi,
>
> I am currently working on implementing a buildbot system for all our
> companies products, which are available for multiple operating systems
> ranging from windows to different kinds of unix/linux and macosx.
> For that i have deployed a linux based buildbot master server and
> various client systems, triggering different builds depending on
> checkins to subversion repository.
> Most of the platforms are working just fine however i am experiencing a
> lot of problems with our WIN32 buildslave. Everything is working fine
> for that one as well if buildbot has to do only 1 build in parallel, but
> for some checkins there will be about 5 to 6 builds being triggered on
> the same buildslave. Whenver the WIN32 buildbot starts a third buildjob
> in parallel (sometimes even for the second one) it disconnects for about
> 20 Seconds and if it comes back it will have marked all columns related
> to that slave with "slave lost messages". The intterupt message for that
> case is:
>
> [Failure instance: Traceback (failure with no frames): 
> twisted.internet.error.ConnectionLost: Connection to the other side was
> lost in a non-clean fashion.
>
> ]
>
>
> Did anybod else experience similar issues with WIN32 slaves? I am
> willing to help with debugging, but only have very little experience in
> programming python/twisted. I already changed windows system from WIN/XP
> to Server 2003  and a different hardware but this seems to have no
> influence.
>
> Thanks in advance for any help on this,
>
> Markus Kramer
>
>
>
> ------------------------------------------------------------------------
> -
> Take Surveys. Earn Cash. Influence the Future of IT Join
> SourceForge.net's Techsay panel and you'll get the chance to share your
> opinions on IT & business topics through brief surveys - and earn cash
> http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDE
> V
> _______________________________________________
> Buildbot-devel mailing list
> Buildbot-devel at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/buildbot-devel
>
>   





More information about the devel mailing list