[Buildbot-devel] Perplexing issue on Windows

Daniel e_list1 at earthlink.net
Thu Jul 31 20:22:39 UTC 2008

On Jul 30, 2008, at 15:19, Daniel wrote:
> Greetings, all.
> I have a new build slave that I have never been able to get to work.
> I am moving a buildbot slave from one Windows XP 64-bit system to
> another, and the new slave locks up every time it tries to build
> something.  Specifically what happens is: one build might work (or it
> might not), but subsequent builds always (yes, always) fail.  There
> are 2 builds on this slave, and both fail with the same two problems.
> The first time they fail it is always due to a 1200-second timeout,
> with an error that buildbot could not kill processes.  Every
> subsequent failure (that is, every failure prior to my restarting the
> buildbot slave on that system) is an OS error that a specific file
> could not be removed.  If I check with procexp, I can see that named
> the file is still in use by a running shell process.  However, even if
> I kill that process, the build does not complete.  The only solution I
> have found is to quit Cygwin, which sometimes takes a reboot, since at
> that point Cygwin is usually hung and often refuses to die.
> The builds still (and always) run fine on the original slave system.
> All of my other build slaves (Linux and OS X) are without issue.
> I was running 0.7.7 but have upgraded to 0.7.8 and I see the same
> results on that version.  I am running buildbot in Cygwin on XP 64- 
> bit.
> I would appreciate any guidance you can provide on how to figure out
> what's going on here, and how to get past it.
> Thanks!
> Daniel

Anyone have any ideas on this one?  Since I posted last, I have  
completely wiped and reinstalled Buildbot, Python (2.4.4), the Python  
Win32 extensions, and Twisted.  Results have not changed.

The first error is:
> command timed out: 1200 seconds without output, killing pid 2720
> SIGKILL failed to kill process
> using fake rc=-1
> program finished with exit code -1
> remoteFailed: [Failure instance: Traceback from remote host --  
> Traceback (most recent call last):
> Failure: buildbot.slave.commands.TimeoutError: SIGKILL failed to  
> kill process
> ]

Subsequent errors are:
> remoteFailed: [Failure instance: Traceback from remote host --  
> Traceback (most recent call last):
>   File "C:\PYTHON24\Lib\site-packages\buildbot\slave\commands.py",  
> line 1468, in _didLogin
>   return SourceBase.start(self)
>   File "C:\PYTHON24\Lib\site-packages\buildbot\slave\commands.py",  
> line 1216, in start
>   d.addCallback(self.doClobber, self.workdir)
>   File "C:\Python24\Lib\site-packages\twisted\internet\defer.py",  
> line 195, in addCallback
>   callbackKeywords=kw)
>   File "C:\Python24\Lib\site-packages\twisted\internet\defer.py",  
> line 186, in addCallbacks
>   self._runCallbacks()
> --- <exception caught here> ---
>   File "C:\Python24\Lib\site-packages\twisted\internet\defer.py",  
> line 328, in _runCallbacks
>   self.result = callback(self.result, *args, **kw)
>   File "C:\PYTHON24\Lib\site-packages\buildbot\slave\commands.py",  
> line 1374, in doClobber
>   rmdirRecursive(d)
>   File "C:\PYTHON24\Lib\site-packages\buildbot\slave\commands.py",  
> line 90, in rmdirRecursive
>   os.rmdir(dir)
> exceptions.OSError: [Errno 13] Permission denied: 'c:\\buildbot\ 
> \sanity-90-xp64\\config3'
> ]

As I mentioned, I can see that the file config3 is held open by a  
process, but even if I kill that process the builds will not complete.

And, actually, now the builds never succeed - the always fail with  
these errors.

I sincerely appreciate any assistance you can provide.



