[Buildbot-devel] Perplexing issue on Windows

Mark Roddy markroddy at gmail.com
Thu Jul 31 21:13:57 UTC 2008


Oh I didn't realize.  In that case I can only blindly guess:
* Check the hanging process, make sure it isn't doing some that it
doesn't get a chance to handle the KILL before the timeout.
* Run the buildbot tests to make sure everything passes on your system.
* There could be a problem with cygwin python on x64.  You might want
to try running without cygwin and see if you have the same problems.
I recommend UnxUtils if you need the bash environment for you build.

Good luck!

-Mark

On 7/31/08, Daniel <e_list1 at earthlink.net> wrote:
> On Jul 31, 2008, at 16:38, Mark Roddy wrote:
>
> > On 7/31/08, Daniel <e_list1 at earthlink.net> wrote:
> >
> > > On Jul 30, 2008, at 15:19, Daniel wrote:
> > >
> > > > Greetings, all.
> > > >
> > > > I have a new build slave that I have never been able to get to work.
> > > > I am moving a buildbot slave from one Windows XP 64-bit system to
> > > > another, and the new slave locks up every time it tries to build
> > > > something.  Specifically what happens is: one build might work (or it
> > > > might not), but subsequent builds always (yes, always) fail.  There
> > > > are 2 builds on this slave, and both fail with the same two problems.
> > > >
> > > > The first time they fail it is always due to a 1200-second timeout,
> > > > with an error that buildbot could not kill processes.  Every
> > > > subsequent failure (that is, every failure prior to my restarting the
> > > > buildbot slave on that system) is an OS error that a specific file
> > > > could not be removed.  If I check with procexp, I can see that named
> > > > the file is still in use by a running shell process.  However, even if
> > > > I kill that process, the build does not complete.  The only solution I
> > > > have found is to quit Cygwin, which sometimes takes a reboot, since at
> > > > that point Cygwin is usually hung and often refuses to die.
> > > >
> > > > The builds still (and always) run fine on the original slave system.
> > > > All of my other build slaves (Linux and OS X) are without issue.
> > > >
> > > > I was running 0.7.7 but have upgraded to 0.7.8 and I see the same
> > > > results on that version.  I am running buildbot in Cygwin on XP 64-
> > > > bit.
> > > >
> > > > I would appreciate any guidance you can provide on how to figure out
> > > > what's going on here, and how to get past it.
> > > >
> > > > Thanks!
> > > >
> > > > Daniel
> > > >
> > >
> > >
> > > Anyone have any ideas on this one?  Since I posted last, I have
> > > completely wiped and reinstalled Buildbot, Python (2.4.4), the Python
> > > Win32 extensions, and Twisted.  Results have not changed.
> > >
> > > The first error is:
> > >
> > > > command timed out: 1200 seconds without output, killing pid 2720
> > > > SIGKILL failed to kill process
> > > > using fake rc=-1
> > > > program finished with exit code -1
> > > >
> > > > remoteFailed: [Failure instance: Traceback from remote host --
> > > > Traceback (most recent call last):
> > > > Failure: buildbot.slave.commands.TimeoutError:
> SIGKILL failed to
> > > > kill process
> > > > ]
> > > >
> > >
> > > Subsequent errors are:
> > >
> > > > remoteFailed: [Failure instance: Traceback from remote host --
> > > > Traceback (most recent call last):
> > > >  File
> "C:\PYTHON24\Lib\site-packages\buildbot\slave\commands.py",
> > > > line 1468, in _didLogin
> > > >  return SourceBase.start(self)
> > > >  File
> "C:\PYTHON24\Lib\site-packages\buildbot\slave\commands.py",
> > > > line 1216, in start
> > > >  d.addCallback(self.doClobber, self.workdir)
> > > >  File
> "C:\Python24\Lib\site-packages\twisted\internet\defer.py",
> > > > line 195, in addCallback
> > > >  callbackKeywords=kw)
> > > >  File
> "C:\Python24\Lib\site-packages\twisted\internet\defer.py",
> > > > line 186, in addCallbacks
> > > >  self._runCallbacks()
> > > > --- <exception caught here> ---
> > > >  File
> "C:\Python24\Lib\site-packages\twisted\internet\defer.py",
> > > > line 328, in _runCallbacks
> > > >  self.result = callback(self.result, *args, **kw)
> > > >  File
> "C:\PYTHON24\Lib\site-packages\buildbot\slave\commands.py",
> > > > line 1374, in doClobber
> > > >  rmdirRecursive(d)
> > > >  File
> "C:\PYTHON24\Lib\site-packages\buildbot\slave\commands.py",
> > > > line 90, in rmdirRecursive
> > > >  os.rmdir(dir)
> > > > exceptions.OSError: [Errno 13] Permission denied: 'c:\\buildbot\
> > > > \sanity-90-xp64\\config3'
> > > > ]
> > > >
> > >
> > > As I mentioned, I can see that the file config3 is held open by a
> > > process, but even if I kill that process the builds will not complete.
> > >
> > > And, actually, now the builds never succeed - the always fail with
> > > these errors.
> > >
> > > I sincerely appreciate any assistance you can provide.
> > >
> > >
> > > Thanks.
> > >
> > > Daniel
> > >
> > >
> >
> > I seem to remember something similar the first time I setup a build
> > slave on xp.  What happened was that I had the service set to run as
> > some user, but started the slave myself to make sure it would work.
> > Then when I ran it as a service it died as all the files created were
> > owned by my user name and not the user the service was running as.
> > Not sure if this is you're issue or not, but I thought I'd share just
> > in case.
> >
> > -Mark
> >
>
>  Mark,
>
>  Thanks - I should have mentioned that I am running this from the command
> line - same way that I was running it on the old system.  I'm not using the
> buildbot service on either system.
>
>  Daniel
>




More information about the devel mailing list