[users at bb.net] Issue connecting slave to master

Colin Chargy Colin.Chargy at bentley.com
Thu Jun 22 15:06:25 UTC 2017


Hi Jim,
Thanks for the input. I tested what you suggested. The same slave folder on another computer wworks fine and another slave folder (from another computer) on this one doesn’t work.

I’m assuming that both slaves on that machine are sharing the same python installation, and therefore the same buildslave code?  So the only thing unique is the actual slavedir and the name/password?
Yes

And the master that it’s talking to has other working slaves on different machines?
Yes, plenty and no one has thoses issues.

Any other idea ?

Regards,
Colin Chargy

From: Jim Rowan [mailto:jmr at computing.com]
Sent: Tuesday, June 20, 2017 21:35
To: Colin Chargy <Colin.Chargy at bentley.com>
Cc: users at buildbot.net
Subject: Re: [users at bb.net] Issue connecting slave to master



Wow .. so it’s apparently something specific to this particular slave on this particular machine, or the tuple of those with the particular master.   Have you tried instantiating and starting the slave on a different windows 10 machine?   Or changing buildslave.tac to use a different (and working) slavename that is defined on that same master?  (I think you said you did test this.).

I’m assuming that both slaves on that machine are sharing the same python installation, and therefore the same buildslave code?  So the only thing unique is the actual slavedir and the name/password?

And the master that it’s talking to has other working slaves on different machines?


On Jun 20, 2017, at 10:13 AM, Colin Chargy <Colin.Chargy at bentley.com<mailto:Colin.Chargy at bentley.com>> wrote:

Hi,
Thanks for your input. It doesn’t change anything. ☹

Best regards,
Colin Chargy

From: Jim Rowan [mailto:jmr at computing.com]
Sent: Tuesday, June 20, 2017 17:11
To: Colin Chargy <Colin.Chargy at bentley.com<mailto:Colin.Chargy at bentley.com>>
Cc: Pierre Tardy <tardyp at gmail.com<mailto:tardyp at gmail.com>>; users at buildbot.net<mailto:users at buildbot.net>
Subject: Re: [users at bb.net<mailto:users at bb.net>] Issue connecting slave to master

It’s a bit of a wild guess, but what happens if you stop the second (working) slave that is on the same machine before trying to start this one?

On Jun 20, 2017, at 8:10 AM, Colin Chargy <Colin.Chargy at bentley.com<mailto:Colin.Chargy at bentley.com>> wrote:

Hi Pierre,
I tried with the following version :
$ buildslave --version
Buildslave version: 0.8.8
Twisted version: 12.3.0

It’s now the exact same of the master and the behavior continues…

Anything else I could try ?

I’ll ask the admin of the server to update twisted.

Best regards,
Colin Chargy

From: Pierre Tardy [mailto:tardyp at gmail.com]
Sent: Tuesday, June 20, 2017 15:02
To: Colin Chargy <Colin.Chargy at bentley.com<mailto:Colin.Chargy at bentley.com>>; users at buildbot.net<mailto:users at buildbot.net>
Subject: Re: [users at bb.net<mailto:users at bb.net>] FW: Issue connecting slave to master

Oh, I did not realize the very old twisted version. you can try to downgrade on the worker indeed.

I see no reason not to upgrade twisted on master, though

Pierre

On Tue, Jun 20, 2017 at 2:45 PM Colin Chargy <Colin.Chargy at bentley.com<mailto:Colin.Chargy at bentley.com>> wrote:
Hi Pierre,
I tested what you suggested :
$ buildslave --version
Buildslave version: 0.8.8
Twisted version: 17.5.0

This does not change the behavior. Should I test with another twisted version ?

Regards,
Colin

From: Pierre Tardy [mailto:tardyp at gmail.com<mailto:tardyp at gmail.com>]
Sent: Tuesday, June 20, 2017 14:15

To: Colin Chargy <Colin.Chargy at bentley.com<mailto:Colin.Chargy at bentley.com>>; users at buildbot.net<mailto:users at buildbot.net>
Subject: Re: [users at bb.net<mailto:users at bb.net>] FW: Issue connecting slave to master

Colin,
Its a bit harder to me to efficiently help you as 0.8.8 is quite an old version. I imagine upgrading is not an option..

it might be an incompatibility of the slave version string. We usually try to maintain compatibility for new master version to old slave version, but we might not always take care of supporting running new slaves with older master.
Did you try downgrading your slave version to 0.8.8?

Pierre

On Tue, Jun 20, 2017 at 11:53 AM Colin Chargy <Colin.Chargy at bentley.com<mailto:Colin.Chargy at bentley.com>> wrote:
Hi Pierre,
Thanks for your reply.
Indeed, I’ve seen in the failedToGetPerspective doc that it could fail with a wrong login password. However, the slave name and password seems correct (ie the same on the slave .toc file and on the server config). We also tested multiple login/password couple to see if that changes anything (with no luck). The TCP dump seems to show that the last things which are  sent are  the host name and slave info which are the default one (I tried modify them with no luck). What happen after/inside failedToGetPerspective ? Does the connection changes port/connection/setting or anything else at this point ?

I should probably add about info our set up : the server runs 2 buildbot masters and the slave computer also 2 buildbot slave (one for each master). We do have other computer that work that way without any problem. Of course, we checked that each slave is connecting to the correct master. Only one of the slave/master couple fails (and as already said, only on this computer).

Best regards,
Colin Chargy

From: Pierre Tardy [mailto:tardyp at gmail.com<mailto:tardyp at gmail.com>]
Sent: Tuesday, June 20, 2017 11:41
To: Colin Chargy <Colin.Chargy at bentley.com<mailto:Colin.Chargy at bentley.com>>; users at buildbot.net<mailto:users at buildbot.net>
Subject: Re: [users at bb.net<mailto:users at bb.net>] FW: Issue connecting slave to master

Hi Colin
Could that be a problem with your slave password?


 def failedToGetPerspective(self, why):
        """The login process failed, most likely because of an authorization
        failure (bad password), but it is also possible that we lost the new
        connection before we managed to send our credentials.
        """
        log.msg("ReconnectingPBClientFactory.failedToGetPerspective")
        if why.check(pb.PBConnectionLost):
            log.msg("we lost the brand-new connection")
            # retrying might help here, let clientConnectionLost decide
            return
        # probably authorization
        self.stopTrying()  # logging in harder won't help
        log.err(why)


On Tue, Jun 20, 2017 at 9:18 AM Colin Chargy <Colin.Chargy at bentley.com<mailto:Colin.Chargy at bentley.com>> wrote:
Hi everyone,
Before I start describing my issue, let me say to we have dozen of slaves (Win, Mac and Linux platform perfectly working right now), only one is problematic :
We are facing an issue with slave connection to master. Here is the log on the slave side (see enclosed twisted.log for complete log) :
[Broker,client] message from master: attached [Broker,client] ReconnectingPBClientFactory.failedToGetPerspective
[Broker,client] we lost the brand-new connection [Broker,client] Lost connection to 192.168.0.1:9989<http://192.168.0.1:9989/> [Broker,client] <twisted.internet.tcp.Connector instance at 0x03471918> will retry in 3 seconds

And it starts it again.
On the server side, the following log is produced :
2017-06-19 16:11:27+0200 [Broker,9423,192.168.0.254] slave 'lrttestauto-test' attaching from IPv4Address(TCP, '192.168.0.254', 35524)
2017-06-19 16:11:27+0200 [Broker,9423,192.168.0.254] Starting buildslave keepalive timer for 'lrttestauto-test'
2017-06-19 16:11:27+0200 [Broker,9423,192.168.0.254] Peer will receive following PB traceback:
2017-06-19 16:11:27+0200 [Broker,9423,192.168.0.254] Unhandled Error
        Traceback (most recent call last):
        Failure: twisted.spread.pb.PBConnectionLost: [Failure instance: Traceback (failure with no frames): <class 'twisted.internet.error.ConnectionLost'>: Connection to the other side was lost in a non-clean fashion.
        ]

I've checked that the login and password are correct and Buildbot version are the following :
On the server-side (which is a Debian):
Buildbot version: 0.8.8
Twisted version: 12.3.0

On the slave side (which is a Windows 10, buildslave installed via pip):
Buildslave version: 0.8.14
Twisted version: 17.5.0

I've enclosed the slave log, the slave tac file and a tcpdump showing data transfer between slave and server (I've tried to debug it with Wireshark with no luck).

What can I do to debug or to solve this issue ?

Best regards,
Colin Chargy
_______________________________________________
users mailing list
users at buildbot.net<mailto:users at buildbot.net>
https://lists.buildbot.net/mailman/listinfo/users
_______________________________________________
users mailing list
users at buildbot.net<mailto:users at buildbot.net>
https://lists.buildbot.net/mailman/listinfo/users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.buildbot.net/pipermail/users/attachments/20170622/371c06c1/attachment.html>


More information about the users mailing list