[Buildbot-devel] Re: buildbot hangs...
warner at lothar.com
Sat Jul 10 23:44:31 UTC 2004
[forwarded to the mailing list for the benefit of others]
> Using the latest version 0.4.3, the following appears in the updating
> step, and then it just hangs...
> cvs operation
> command '['rm', '-rf', '/home/ed/BuildBot/lenny/lenny_tip/build']' in dir /home/ed/BuildBot/lenny/lenny_tip/. [None]
> This only happens on one system, a SunOS...
> Any thoughts?
I've seen problems on NFS-mounted filesystems under SunOS where deletes would
stall if any process still had a file open. Usually this was a test case that
got stuck (and then orphaned somehow), sometimes it was a daemon process
launched from the working directory that was later blown away. It wasn't
necessary for the program to actually open a file, as the "current directory"
reference (obtained by all processes when they start) seemed to be enough to
prevent the delete from happening.
This was the reason I started running the test suites in a PTY, as it made it
easier to automatically kill off all the descendants of the test program when
they would hang.
The other workaround was to move the about-to-be-deleted directory to a
(unique) temporary name, then spawn a 'rm -rf .del-tmp-2 &' and forget about
it. If it hangs, the next build will create a new temp name, and that one can
hang too, etc. Eventually you do a 'ps' and notice the lingering processes,
kill them, and *foom* all the leftover directories finish deleting.
You might be able to ask SunOS to not behave this way (maybe a sysctl knob
somewhere), but I wouldn't even know where to begin to look. Running the
tests on a local (non-NFS) disk might help.
hope that helps,
More information about the devel