[Buildbot-devel] buildbot slave hangs on SunOS?

Ed Hartnett ed at unidata.ucar.edu
Fri Jan 30 16:00:24 UTC 2004


Brian Warner <warner-buildbot at lothar.com> writes:

> > 	  File "/opt/lib/python2.2/site-packages/twisted/internet/default.py", line 162, in spawnProcess
> > 	    return process.PTYProcess(self, executable, args, env, path, processProtocol, uid, gid, usePTY)
> > 	  File "/opt/lib/python2.2/site-packages/twisted/internet/process.py", line 509, in __init__
> > 	    stderr.flush()
> > 	exceptions.IOError: [Errno 9] Bad file number
> > 	
> > 2004/01/27 09:22 MST [-] Malformed file descriptor found.  Preening lists.
> > 2004/01/27 09:22 MST [-] bad descriptor <twisted.internet.tcp.Client to ('rodney', 8007) at 82a2f74>
> > 2004/01/27 09:22 MST [-] bad descriptor <twisted.internet.tcp.Client to ('rodney', 8007) at 82a2f74>
> 
> Eww. That's weird.
> 
> It looks like stderr.flush() is only called when the child process (after
> fork) experienced some other error, like execvpe failing because it couldn't
> find the command or something. There should have been a message written to
> stdout or stderr with the exception.. it wouldn't be written to the log, but
> rather to the stdout of the twistd process, which probably goes away when
> twistd daemonizes. The error in .flush() might be related to twistd closing
> fd 1, but then I don't know why the original os.fdopen didn't fail, and why
> you're able to see the log message at all.
> 
> Try running that buildslave in the foreground (twistd -n) and watch the
> stdout/stderr to see if it emits more information.
> 
> If that doesn't help, maybe it's something more esoteric. Does the Twisted
> test suite pass on that system? (from the top of the Twisted source tree, do
> './bin/trial -v twisted.test'. It takes maybe 5 minutes).
> 
> If so, try editing slavecommand.py and set the 'usePTY = 1' at line 170ish to
> =0. That will use regular pipes instead of a PTY for the child process. PTYs
> are one of those funky system-dependent things, and maybe Solaris does it
> just differently enough that it exposes a bug.
> 
> Also, if possible, try running the slave under python2.3 instead of 2.2,
> maybe there's a difference in the behavior of the built-in python libraries
> that are responsible for creating PTYs.
> 
> weird,
>  -Brian

OK, I upgraded to python 2.3.3 and also reinstalled twisted and
buildbot.

Now I get further in the process, but it still hangs:

laraine.unidata.ucar.edu% twistd -n -f buildbot.tap 
2004/01/30 08:57 MST [-] Log opened.
2004/01/30 08:57 MST [-] twistd 1.1.1 (/opt/bin/python 2.3.3) starting up
2004/01/30 08:57 MST [-] reactor class: twisted.internet.default.SelectReactor
2004/01/30 08:57 MST [-] Removing stale pidfile twistd.pid
2004/01/30 08:57 MST [-] Loading buildbot.tap...
2004/01/30 08:57 MST [-] Loaded.
2004/01/30 08:57 MST [-] set uid/gid 4178/2000
2004/01/30 08:57 MST [-] Starting factory <buildbot.bot.BotFactory instance at 0x2e07d8>
2004/01/30 08:57 MST [Broker,client] message from master: attached
2004/01/30 08:57 MST [Broker,client] setBuilderList [('laraine_tip', 'laraine_tip'), ('laraine_3_5_1', 'laraine_3_5_1')]
2004/01/30 08:57 MST [Broker,client] builder 'laraine_tip' message from master: attached
2004/01/30 08:57 MST [Broker,client] builder 'laraine_3_5_1' message from master: attached
2004/01/30 08:58 MST [Broker,client] startBuild
2004/01/30 08:58 MST [Broker,client]  startCommand:cvs [id 6 325182]
2004/01/30 08:58 MST [Broker,client]   command 'rm -rf /home/ed/BuildBot/laraine/laraine_tip/build' in dir /home/ed/BuildBot/laraine/laraine_tip/. [None]
2004/01/30 08:58 MST [-] command finished with signal None, exit code 0
2004/01/30 08:58 MST [-]   command 'cvs -d /upc/share/CVS -z3 checkout -r HEAD -d build netcdf-3' in dir /home/ed/BuildBot/laraine/laraine_tip/. [None]





More information about the devel mailing list