[Buildbot-devel] Win32 Slave Stability Issues?
markus.kramer at mainconcept.de
Sun Dec 3 15:15:53 UTC 2006
Ok, i have found some time to take our buildbot down over the weekend
and debug a little into that problem.
I have traced master slave communication with wireshark, and could see
that it is actually the master closing down the connection.
After inspecting the logs a little deeper, i found that on start of
every build, master tries to ping the slave with a 10 Second timeout,
and if that one fails it declares that slave dead. For some reasons
windows buildslaves seem to exhibit very odd reply times and do hardly
manage to answer that ping within 10 seconds when already under load.
Instead I had to put the timeout up to 5 minutes. Somehow it seems
windows buildslaves delay replys on the command channel for quite some
time (my impression is actually that next reply is only delivered if one
of the already running build steps on that builder completes, or there
are no buildsteps in progress).
Is there any method to inspect slaves and see what they are actually
doing that prevents them from answering pings?
Carl Dionne schrieb:
> I'm experiencing a similar problem on windows, but haven't had the time
> to debug it either.
> -----Original Message-----
> From: buildbot-devel-bounces at lists.sourceforge.net
> [mailto:buildbot-devel-bounces at lists.sourceforge.net] On Behalf Of
> Markus Kramer
> Sent: November 29, 2006 8:59 AM
> To: buildbot-devel at lists.sourceforge.net
> Subject: [Buildbot-devel] Win32 Slave Stability Issues?
> I am currently working on implementing a buildbot system for all our
> companies products, which are available for multiple operating systems
> ranging from windows to different kinds of unix/linux and macosx.
> For that i have deployed a linux based buildbot master server and
> various client systems, triggering different builds depending on
> checkins to subversion repository.
> Most of the platforms are working just fine however i am experiencing a
> lot of problems with our WIN32 buildslave. Everything is working fine
> for that one as well if buildbot has to do only 1 build in parallel, but
> for some checkins there will be about 5 to 6 builds being triggered on
> the same buildslave. Whenver the WIN32 buildbot starts a third buildjob
> in parallel (sometimes even for the second one) it disconnects for about
> 20 Seconds and if it comes back it will have marked all columns related
> to that slave with "slave lost messages". The intterupt message for that
> case is:
> [Failure instance: Traceback (failure with no frames):
> twisted.internet.error.ConnectionLost: Connection to the other side was
> lost in a non-clean fashion.
> Did anybod else experience similar issues with WIN32 slaves? I am
> willing to help with debugging, but only have very little experience in
> programming python/twisted. I already changed windows system from WIN/XP
> to Server 2003 and a different hardware but this seems to have no
> Thanks in advance for any help on this,
> Markus Kramer
> Take Surveys. Earn Cash. Influence the Future of IT Join
> SourceForge.net's Techsay panel and you'll get the chance to share your
> opinions on IT & business topics through brief surveys - and earn cash
> Buildbot-devel mailing list
> Buildbot-devel at lists.sourceforge.net
More information about the devel