[Buildbot-devel] Unreliable behaviour of buildslave on Solaris

Al Nikolov a.nikolov at drweb.com
Wed Apr 29 14:04:58 UTC 2009


http://buildbot.net/trac/ticket/555

> I'm sure there was many talks about this issue, but actually i can't find here any tickets related to it. 
>  Buildslave on Solaris (Solaris 10 in my case) tends to wrongly detect command timeouts in substantial number of cases. 
>  After days of googling i see people suffered from this issue and suggesting to run buildslave on Solaris with usePTY=0 which fixes the problem mysteriously, excepting that actually it doesn't. 
> 2009-04-21 12:36:29+0400 [-] ShellCommand._startCommand
> 2009-04-21 12:36:29+0400 [-]  /opt/csw/bin/cvs -d :ext:xxx at xxx:/CVS -z3 checkout -d build -r xxx xxx
> 2009-04-21 12:36:29+0400 [-]   in dir /export/home/builder/bbslave/xxx-solaris (timeout 1200 secs)
> 2009-04-21 12:36:29+0400 [-]   watching logfiles {}
> 2009-04-21 12:36:29+0400 [-]   argv: ['/opt/csw/bin/cvs', '-d', ':ext:xxx at xxx:/CVS', '-z3', 'checkout', '-d', 'build', '-r', 'unix-5_0-branch', 'unified-updater']
> 2009-04-21 12:36:29+0400 [-]  environment: {'TERM': 'xterm', 'SHELL': '/usr/bin/bash', 'TZ': 'Europe/Moscow', 'MC_SID': '1679', 'SHLVL': '2', 'SSH_TTY': '/dev/pts/4', 'OLDPWD': '/usr/perl5/5.6.1', 'PWD': '/export/home/builder/bbslave/xxx-solaris', 'HISTCONTROL': 'ignorespace', 'EDITOR': 'mcedit', 'SSH_CLIENT': '192.168.150.1 45300 22', 'CVS_RSH': 'ssh', 'LOGNAME': 'root', 'USER': 'root', 'MC_TMPDIR': '/tmp/mc-root', 'MAIL': '/var/mail//root', 'SSH_CONNECTION': '192.168.150.1 45300 192.168.150.5 22', 'HOME': '/', '_': '/usr/bin/buildbot', 'PATH': '/usr/sbin:/usr/bin:/usr/local/bin:/usr/openwin/bin:/usr/ucb:/opt/csw/bin/:/opt/csw/gcc4/bin/:/opt/csw/i386-pc-solaris2.8/bin:'}
> 2009-04-21 12:36:29+0400 [-]   closing stdin
> 2009-04-21 12:36:29+0400 [-]   using PTY: False
> 2009-04-21 12:36:43+0400 [-] command timed out: 1200 seconds without output, killing pid 23174
> 2009-04-21 12:36:43+0400 [-] trying os.kill(-pid, 9)
> 2009-04-21 12:36:43+0400 [-] trying process.signalProcess('KILL')
> 2009-04-21 12:36:43+0400 [-]  signal KILL sent successfully
> 2009-04-21 12:36:43+0400 [-] command finished with signal 9, exit code None, elapsedTime: 13.902941
> 
>  As you may see, after 14 seconds of execution it detects weird 20 minutes timeout. 

Any ideas why this could be reproduced? Need to make a quick workaround.





More information about the devel mailing list