[Buildbot-devel] "Connection lost in a non-clean fashion" / duplicate slave

Harry Percival harry at pythonanywhere.com
Mon Nov 17 16:02:32 UTC 2014


PS - having inspected the logs on the slave, it looks like one of the 
keepalives being sent from the slave is failing maybe?

    2014-11-17 09:02:14+0000 [-] sending app-level keepalive
    2014-11-17 09:12:14+0000 [-] sending app-level keepalive
    2014-11-17 09:18:50+0000 [Broker,client] SlaveBuilder._ackFailed:
    SlaveBuilder.sendUpdate
    2014-11-17 09:18:50+0000 [Broker,client] Unhandled Error
         Traceback (most recent call last):
         Failure: twisted.spread.pb.PBConnectionLost: [Failure instance:
    Traceback (failure with no frames): <class
    'twisted.internet.error.ConnectionLost'>: Connection to the other
    side was lost in a non-clean fashion.
         ]

    2014-11-17 09:18:50+0000 [Broker,client] SlaveBuilder._ackFailed:
    SlaveBuilder.sendUpdate
    2014-11-17 09:18:50+0000 [Broker,client] Unhandled Error
         Traceback (most recent call last):
         Failure: twisted.spread.pb.PBConnectionLost: [Failure instance:
    Traceback (failure with no frames): <class
    'twisted.internet.error.ConnectionLost'>: Connection to the other
    side was lost in a non-clean fashion.
         ]

    2014-11-17 09:18:50+0000 [Broker,client] SlaveBuilder._ackFailed:
    SlaveBuilder.sendUpdate
    2014-11-17 09:18:50+0000 [Broker,client] Unhandled Error
         Traceback (most recent call last):
         Failure: twisted.spread.pb.PBConnectionLost: [Failure instance:
    Traceback (failure with no frames): <class
    'twisted.internet.error.ConnectionLost'>: Connection to the other
    side was lost in a non-clean fashion.
         ]

    2014-11-17 09:18:50+0000 [Broker,client] lost remote
    2014-11-17 09:18:50+0000 [Broker,client] lost remote step
    2014-11-17 09:18:50+0000 [Broker,client] stopCommand: halting
    current command <buildslave.commands.shell.SlaveShellCommand
    instance at 0x0000000002DE6C08>
    2014-11-17 09:18:50+0000 [Broker,client] command interrupted,
    attempting to kill
    2014-11-17 09:18:50+0000 [Broker,client] using TASKKILL PID /F /T to
    kill pid 3356
    2014-11-17 09:18:51+0000 [Broker,client] taskkill'd pid 3356
    2014-11-17 09:18:51+0000 [Broker,client] Lost connection to
    integration.company.com:9886
    2014-11-17 09:18:51+0000 [Broker,client]
    <twisted.internet.tcp.Connector instance at 0x0000000002BD6108> will
    retry in 2 seconds
    2014-11-17 09:18:51+0000 [Broker,client] Stopping factory
    <buildslave.bot.BotFactory instance at 0x0000000002BC6E48>
    2014-11-17 09:18:51+0000 [-] command finished with signal None, exit
    code 1, elapsedTime: 2062.839000
    2014-11-17 09:18:51+0000 [-] would sendStatus but not .running
    2014-11-17 09:18:51+0000 [-] SlaveBuilder.commandComplete None
    2014-11-17 09:18:54+0000 [-] Starting factory
    <buildslave.bot.BotFactory instance at 0x0000000002BC6E48>
    2014-11-17 09:18:54+0000 [-] Connecting to integration.company.com:9886
    2014-11-17 09:18:54+0000 [Broker,client] message from master: master
    already has a connection named 'redacted' - checking its liveness
    2014-11-17 09:19:04+0000 [Broker,client] message from master: attached
    2014-11-17 09:19:04+0000 [Broker,client]
    SlaveBuilder.remote_print(google chrome stress): message from
    master: attached
    2014-11-17 09:19:04+0000 [Broker,client] Connected to
    integration.company.com:9886; slave is ready
    2014-11-17 09:19:04+0000 [Broker,client] sending application-level
    keepalives every 600 seconds



-- 
Harry Percival
Developer
harry at pythonanywhere.com

PythonAnywhere - a fully browser-based Python development and hosting environment
<http://www.pythonanywhere.com/>

PythonAnywhere LLP
17a Clerkenwell Road, London EC1M 5RD, UK
VAT No.: GB 893 5643 79
Registered in England and Wales as company number OC378414.
Registered address: 28 Ely Place, 3rd Floor, London EC1N 6TD, UK

On 17/11/14 09:28, Harry Percival wrote:
> Hi there,
>
> We've been running a build farm with a linux master and windows slaves
> for many years.  Have been experimenting with moving the slaves to
> Azure, and I'm seeing a lot of errors saying:
>
>       remoteFailed: [Failure instance: Traceback (failure with no
> frames): <class 'twisted.internet.error.ConnectionLost'>: Connection to
> the other side was lost in a non-clean fashion.
>
> Which abort the build.  In the twistd.log, I'm seeing these messages at
> around the same time:
>
>       <timestamp> [Broker,<n>,<ip>] duplicate slave <slavename>; delaying
> new slave (IPv4Address(TCP, '<ip>', <port>)) and pinging old
> (IPv4Address(TCP, '<port>', <other-port>))
>       <timestamp+10s> [Broker,<n-1>,<ip>] BuildSlave.detached(<slavename>)
>
> What could be happening here?
>
> The slaves are running Windows Server 2012R2 Datacenter.
>
> I've had a look at this:
> https://mariadb.com/kb/en/mariadb/development/tools/buildbot/buildbot-setup/buildbot-setup-buildbot-setup-for-windows/
> and tried changing the buildbot.tax keepalive variable, no apparent change.
>
> thanks for any help!
>
> Harry
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://buildbot.net/pipermail/devel/attachments/20141117/a075de99/attachment.html>


More information about the devel mailing list