[Buildbot-devel] Unreliable behaviour of buildslave on Solaris
Al Nikolov
a.nikolov at drweb.com
Wed Apr 29 14:04:58 UTC 2009
http://buildbot.net/trac/ticket/555
> I'm sure there was many talks about this issue, but actually i can't find here any tickets related to it.
> Buildslave on Solaris (Solaris 10 in my case) tends to wrongly detect command timeouts in substantial number of cases.
> After days of googling i see people suffered from this issue and suggesting to run buildslave on Solaris with usePTY=0 which fixes the problem mysteriously, excepting that actually it doesn't.
> 2009-04-21 12:36:29+0400 [-] ShellCommand._startCommand
> 2009-04-21 12:36:29+0400 [-] /opt/csw/bin/cvs -d :ext:xxx at xxx:/CVS -z3 checkout -d build -r xxx xxx
> 2009-04-21 12:36:29+0400 [-] in dir /export/home/builder/bbslave/xxx-solaris (timeout 1200 secs)
> 2009-04-21 12:36:29+0400 [-] watching logfiles {}
> 2009-04-21 12:36:29+0400 [-] argv: ['/opt/csw/bin/cvs', '-d', ':ext:xxx at xxx:/CVS', '-z3', 'checkout', '-d', 'build', '-r', 'unix-5_0-branch', 'unified-updater']
> 2009-04-21 12:36:29+0400 [-] environment: {'TERM': 'xterm', 'SHELL': '/usr/bin/bash', 'TZ': 'Europe/Moscow', 'MC_SID': '1679', 'SHLVL': '2', 'SSH_TTY': '/dev/pts/4', 'OLDPWD': '/usr/perl5/5.6.1', 'PWD': '/export/home/builder/bbslave/xxx-solaris', 'HISTCONTROL': 'ignorespace', 'EDITOR': 'mcedit', 'SSH_CLIENT': '192.168.150.1 45300 22', 'CVS_RSH': 'ssh', 'LOGNAME': 'root', 'USER': 'root', 'MC_TMPDIR': '/tmp/mc-root', 'MAIL': '/var/mail//root', 'SSH_CONNECTION': '192.168.150.1 45300 192.168.150.5 22', 'HOME': '/', '_': '/usr/bin/buildbot', 'PATH': '/usr/sbin:/usr/bin:/usr/local/bin:/usr/openwin/bin:/usr/ucb:/opt/csw/bin/:/opt/csw/gcc4/bin/:/opt/csw/i386-pc-solaris2.8/bin:'}
> 2009-04-21 12:36:29+0400 [-] closing stdin
> 2009-04-21 12:36:29+0400 [-] using PTY: False
> 2009-04-21 12:36:43+0400 [-] command timed out: 1200 seconds without output, killing pid 23174
> 2009-04-21 12:36:43+0400 [-] trying os.kill(-pid, 9)
> 2009-04-21 12:36:43+0400 [-] trying process.signalProcess('KILL')
> 2009-04-21 12:36:43+0400 [-] signal KILL sent successfully
> 2009-04-21 12:36:43+0400 [-] command finished with signal 9, exit code None, elapsedTime: 13.902941
>
> As you may see, after 14 seconds of execution it detects weird 20 minutes timeout.
Any ideas why this could be reproduced? Need to make a quick workaround.
More information about the devel
mailing list