[Buildbot-commits] [Buildbot] #2147: buildslave losing connection to buildmaster

Buildbot nobody at buildbot.net
Thu Nov 17 05:58:06 UTC 2011


#2147: buildslave losing connection to buildmaster
-------------------------+--------------------
Reporter:  mariamarcano  |       Owner:
    Type:  defect        |      Status:  new
Priority:  major         |   Milestone:  0.8.3
 Version:  0.8.3         |  Resolution:
Keywords:                |
-------------------------+--------------------
Changes (by dustin):

 * keywords:  buildslave losing connection to buildmaster =>
 * type:  undecided => defect
 * milestone:  undecided => 0.8.3


Old description:

> I Started getting the following issue when running the builds:
> this is what i see on the buildstep / buildbot waterfall:
>
> remoteFailed: [Failure instance: Traceback (failure with no frames):
> <class 'twisted.internet.error.ConnectionLost'>: Connection to the other
> side was lost in a non-clean fashion.
> ]
>
> builds were running fine, maybe related to adding more slaves to the
> master?.
>
> Currently using Slave:
>
> C:\Python27\Scripts>buildslave --version
>
> Buildslave version: 0.8.3
>
> Twisted version: 10.2.0
>
> Master:
>
> c:\Python27\Scripts>buildbot --version
>
> Buildbot version: 0.8.3p1
>
> Twisted version: 10.2.0
>

> This is what I see on the slave twistd.log:
>
> 2011-11-15 10:36:35-0800 [Broker,client] RunProcess._startCommand
>
> 2011-11-15 10:36:35-0800 [Broker,client]  'xunit.console.exe'
> 'XXTests.dll' '/html' 'xunit-output.html'
>
> 2011-11-15 10:36:35-0800 [Broker,client]   in dir
> C:\Source\xx\AutomatedTests\XXTests\bin\Release (timeout 21600 secs)
>
> 2011-11-15 10:36:35-0800 [Broker,client]   watching logfiles {'xunit
> log': 'C:\\temp\\XX_log.txt', 'xunit-output.html': 'xunit-output.html'}
>
> 2011-11-15 10:36:35-0800 [Broker,client]   argv: ['xunit.console.exe',
> 'XXTests.dll', '/html', 'xunit-output.html']
>
> 2011-11-15 10:36:35-0800 [Broker,client]  environment: {..}
>
> 2011-11-15 10:36:35-0800 [Broker,client]   closing stdin
>
> 2011-11-15 10:36:35-0800 [Broker,client]   using PTY: False
>
> 2011-11-15 10:46:24-0800 [-] sending app-level keepalive
>
> 2011-11-15 10:56:24-0800 [-] sending app-level keepalive
>
> 2011-11-15 11:06:24-0800 [-] sending app-level keepalive
>
> 2011-11-15 11:16:24-0800 [-] sending app-level keepalive
>
> 2011-11-15 11:26:24-0800 [-] sending app-level keepalive
>
> 2011-11-15 11:36:24-0800 [-] sending app-level keepalive
>
> 2011-11-15 11:46:24-0800 [-] sending app-level keepalive
>
> 2011-11-15 11:56:24-0800 [-] sending app-level keepalive
>
> 2011-11-15 12:06:24-0800 [-] sending app-level keepalive
>
> 2011-11-15 12:06:54-0800 [-] BotFactory.checkActivity: nothing from
> master for 630 secs
>
> 2011-11-15 12:06:54-0800 [Broker,client] BotFactory.keepaliveLost
>
> 2011-11-15 12:06:54-0800 [Broker,client] lost remote
>
> 2011-11-15 12:06:54-0800 [Broker,client] lost remote step
>
> 2011-11-15 12:06:54-0800 [Broker,client] stopCommand: halting current
> command <buildslave.commands.shell.SlaveShellCommand instance at
> 0x01BF2670>
>
> 2011-11-15 12:06:54-0800 [Broker,client] command interrupted, killing pid
> 2624
>
> 2011-11-15 12:06:54-0800 [Broker,client] trying
> process.signalProcess('KILL')
>
> 2011-11-15 12:06:54-0800 [Broker,client]  signal KILL sent successfully
>
> 2011-11-15 12:06:54-0800 [Broker,client] Lost connection to
> buildbot.app.com:9989
>
> 2011-11-15 12:06:54-0800 [Broker,client] <twisted.internet.tcp.Connector
> instance at 0x010D3328> will retry in 1 seconds
>
> 2011-11-15 12:06:54-0800 [Broker,client] Stopping factory
> <buildslave.bot.BotFactory instance at 0x011EF9E0>
>
> 2011-11-15 12:06:56-0800 [-] Starting factory <buildslave.bot.BotFactory
> instance at 0x011EF9E0>
>
> 2011-11-15 12:06:56-0800 [-] Connecting to buildbot.app.com:9989
>
> 2011-11-15 12:06:59-0800 [-] we tried to kill the process, and it
> wouldn't die.. finish anyway
>
> 2011-11-15 12:06:59-0800 [-] RunProcess.failed: command failed: SIGKILL
> failed to kill process
>
> 2011-11-15 12:06:59-0800 [-] SlaveBuilder.commandFailed None
>
> 2011-11-15 12:06:59-0800 [-] Unhandled Error
>         Traceback (most recent call last):
>         Failure: exceptions.RuntimeError: SIGKILL failed to kill process
>
> 2011-11-15 12:07:29-0800 [Broker,client] message from master: attached
>
> 2011-11-15 12:07:29-0800 [Broker,client] SlaveBuilder.remote_print(rc38
> -app-selenium-tests): message from master: attached
>
> 2011-11-15 12:07:29-0800 [Broker,client] Connected to
> buildbot.app.com:9989; slave is ready
>
> 2011-11-15 12:07:29-0800 [Broker,client] sending application-level
> keepalives every 600 seconds
>
> 2011-11-15 12:16:59-0800 [-] sending app-level keepalive
>
> 2011-11-15 12:26:59-0800 [-] sending app-level keepalive
>
> 2011-11-15 12:34:39-0800 [-] command finished with signal None, exit code
> 1, elapsedTime: 7084.500000
>
> 2011-11-15 12:34:39-0800 [-] Hey, command <RunProcess
> '['xunit.console.exe', 'XXTests.dll', '/html', 'xunit-output.html']'>
> finished twice
>
> 2011-11-15 12:36:59-0800 [-] sending app-level keepalive
>
> 2011-11-15 12:46:59-0800 [-] sending app-level keepalive

New description:

 I Started getting the following issue when running the builds:
 this is what i see on the buildstep / buildbot waterfall:

 {{{
 remoteFailed: [Failure instance: Traceback (failure with no frames):
 <class 'twisted.internet.error.ConnectionLost'>: Connection to the other
 side was lost in a non-clean fashion.
 ]
 }}}

 builds were running fine, maybe related to adding more slaves to the
 master?.

 Currently using Slave:

 {{{
 C:\Python27\Scripts>buildslave --version
 Buildslave version: 0.8.3
 Twisted version: 10.2.0
 }}}

 Master:

 {{{
 c:\Python27\Scripts>buildbot --version
 Buildbot version: 0.8.3p1
 Twisted version: 10.2.0
 }}}

 This is what I see on the slave twistd.log:

 {{{
 2011-11-15 10:36:35-0800 [Broker,client] RunProcess._startCommand

 2011-11-15 10:36:35-0800 [Broker,client]  'xunit.console.exe'
 'XXTests.dll' '/html' 'xunit-output.html'

 2011-11-15 10:36:35-0800 [Broker,client]   in dir
 C:\Source\xx\AutomatedTests\XXTests\bin\Release (timeout 21600 secs)

 2011-11-15 10:36:35-0800 [Broker,client]   watching logfiles {'xunit log':
 'C:\\temp\\XX_log.txt', 'xunit-output.html': 'xunit-output.html'}

 2011-11-15 10:36:35-0800 [Broker,client]   argv: ['xunit.console.exe',
 'XXTests.dll', '/html', 'xunit-output.html']

 2011-11-15 10:36:35-0800 [Broker,client]  environment: {..}

 2011-11-15 10:36:35-0800 [Broker,client]   closing stdin

 2011-11-15 10:36:35-0800 [Broker,client]   using PTY: False

 2011-11-15 10:46:24-0800 [-] sending app-level keepalive

 2011-11-15 10:56:24-0800 [-] sending app-level keepalive

 2011-11-15 11:06:24-0800 [-] sending app-level keepalive

 2011-11-15 11:16:24-0800 [-] sending app-level keepalive

 2011-11-15 11:26:24-0800 [-] sending app-level keepalive

 2011-11-15 11:36:24-0800 [-] sending app-level keepalive

 2011-11-15 11:46:24-0800 [-] sending app-level keepalive

 2011-11-15 11:56:24-0800 [-] sending app-level keepalive

 2011-11-15 12:06:24-0800 [-] sending app-level keepalive

 2011-11-15 12:06:54-0800 [-] BotFactory.checkActivity: nothing from master
 for 630 secs

 2011-11-15 12:06:54-0800 [Broker,client] BotFactory.keepaliveLost

 2011-11-15 12:06:54-0800 [Broker,client] lost remote

 2011-11-15 12:06:54-0800 [Broker,client] lost remote step

 2011-11-15 12:06:54-0800 [Broker,client] stopCommand: halting current
 command <buildslave.commands.shell.SlaveShellCommand instance at
 0x01BF2670>

 2011-11-15 12:06:54-0800 [Broker,client] command interrupted, killing pid
 2624

 2011-11-15 12:06:54-0800 [Broker,client] trying
 process.signalProcess('KILL')

 2011-11-15 12:06:54-0800 [Broker,client]  signal KILL sent successfully

 2011-11-15 12:06:54-0800 [Broker,client] Lost connection to
 buildbot.app.com:9989

 2011-11-15 12:06:54-0800 [Broker,client] <twisted.internet.tcp.Connector
 instance at 0x010D3328> will retry in 1 seconds

 2011-11-15 12:06:54-0800 [Broker,client] Stopping factory
 <buildslave.bot.BotFactory instance at 0x011EF9E0>

 2011-11-15 12:06:56-0800 [-] Starting factory <buildslave.bot.BotFactory
 instance at 0x011EF9E0>

 2011-11-15 12:06:56-0800 [-] Connecting to buildbot.app.com:9989

 2011-11-15 12:06:59-0800 [-] we tried to kill the process, and it wouldn't
 die.. finish anyway

 2011-11-15 12:06:59-0800 [-] RunProcess.failed: command failed: SIGKILL
 failed to kill process

 2011-11-15 12:06:59-0800 [-] SlaveBuilder.commandFailed None

 2011-11-15 12:06:59-0800 [-] Unhandled Error
         Traceback (most recent call last):
         Failure: exceptions.RuntimeError: SIGKILL failed to kill process

 2011-11-15 12:07:29-0800 [Broker,client] message from master: attached

 2011-11-15 12:07:29-0800 [Broker,client] SlaveBuilder.remote_print(rc38
 -app-selenium-tests): message from master: attached

 2011-11-15 12:07:29-0800 [Broker,client] Connected to
 buildbot.app.com:9989; slave is ready

 2011-11-15 12:07:29-0800 [Broker,client] sending application-level
 keepalives every 600 seconds

 2011-11-15 12:16:59-0800 [-] sending app-level keepalive

 2011-11-15 12:26:59-0800 [-] sending app-level keepalive

 2011-11-15 12:34:39-0800 [-] command finished with signal None, exit code
 1, elapsedTime: 7084.500000

 2011-11-15 12:34:39-0800 [-] Hey, command <RunProcess
 '['xunit.console.exe', 'XXTests.dll', '/html', 'xunit-output.html']'>
 finished twice

 2011-11-15 12:36:59-0800 [-] sending app-level keepalive

 2011-11-15 12:46:59-0800 [-] sending app-level keepalive
 }}}

--

-- 
Ticket URL: <http://trac.buildbot.net/ticket/2147#comment:2>
Buildbot <http://buildbot.net/>
Buildbot: build/test automation


More information about the Commits mailing list