[Buildbot-devel] Defunct processes in buildslave after ShellCommand completion

Justin Mason jm at jmason.org
Sat Dec 30 15:06:23 UTC 2006


for what it's worth, there seems to be another problem that we've run into
on the SpamAssassin buildbots (there are two) on our Solaris zone, where
a slave will hang indefinitely.  Haven't figured it out yet -- instead
we just restart the buildbot infrastructure nightly to work around it.

by the way, are there any plans to record build history (status, log
files, etc) between restarts, so that after a restart the old history
and statuses are still available via the HTTP ui?  right now, after
a restart, it's all lost.

--j.

John Pye writes:
> A followup on this problem from a week ago:
> 
> I found that I needed to set usepty=0 in the buildbot.tac file for the
> Solaris buildslave. This fixed my problem of commands timing out / not
> returning an exit code.
> 
> Found here -- obviously not a new problem:
> http://agiletesting.blogspot.com/2006/03/running-buildbot-on-various-platforms.html
> 
> Cheers
> JP
> 
> John Pye wrote:
> > Hi there
> >
> > I've just seen this problem as well. I wonder if there might be a fix
> > that doesn't involve upgrading Python?
> >
> > I'm thinking it could be a problem with the 'popen' routines in Python,
> > which were improved a lot with the release of 2.4.
> >
> > Cheers
> > JP
> >
> > On Thu, 16 Nov 2006 09:12:35 -0500, "Mitch Oliver"
> > <mitch.oliver at gmail.com> said:
> >   
> >> I wanted to reply to this as I found a solution.  Upgrading to Python
> >> 2.5 has resolved the issue.  It appears that the issue itself is in
> >> Twisted and not buildbot.
> >>
> >> On 11/10/06, Mitch Oliver <mitch.oliver at gmail.com> wrote:
> >>     
> >>> After a ShellCommand completes I frequently end up with defunct
> >>> processes in my process list, and the commands always end up with the
> >>> following exception in my buildmaster waterfall:
> >>>   command timed out: 1200 seconds without output, killing pid [pid]
> >>>   SIGKILL failed to kill process
> >>>   using fake rc=-1
> >>>   program finished with exit code -1
> >>>
> >>>   remoteFailed: [Failure instance: Traceback from remote host --
> >>> Traceback (most recent call last):
> >>>   Failure: buildbot.slave.commands.TimeoutError: SIGKILL failed to kill process
> >>>   ]
> >>>
> >>> This occurs using buildbot 0.7.4 in Python 2.3.3 with Twisted 2.4.0 on
> >>> Solaris 10 (Sparc).
> >>>
> >>> Has anyone else run into this problem?  I saw a message in the
> >>> archives about something similar on NetBSD, but the thread seems to
> >>> have died.
> >>>
> >>> Thanks,
> >>> Mitch Oliver
> >>>
> >>>       
> >> -------------------------------------------------------------------------
> >> Take Surveys. Earn Cash. Influence the Future of IT
> >> Join SourceForge.net's Techsay panel and you'll get the chance to share
> >> your
> >> opinions on IT & business topics through brief surveys - and earn cash
> >> http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
> >> _______________________________________________
> >> Buildbot-devel mailing list
> >> Buildbot-devel at lists.sourceforge.net
> >> https://lists.sourceforge.net/lists/listinfo/buildbot-devel
> >>     
> 
> -------------------------------------------------------------------------
> Take Surveys. Earn Cash. Influence the Future of IT
> Join SourceForge.net's Techsay panel and you'll get the chance to share your
> opinions on IT & business topics through brief surveys - and earn cash
> http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
> _______________________________________________
> Buildbot-devel mailing list
> Buildbot-devel at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/buildbot-devel




More information about the devel mailing list