[users at bb.net] Issue connecting slave to master

Pierre Tardy tardyp at gmail.com
Fri Jun 30 09:01:06 UTC 2017


I think at that point I would start thinking about swapping network
cables.. :)


On Fri, Jun 30, 2017 at 10:32 AM Colin Chargy <Colin.Chargy at bentley.com>
wrote:

> Hi Jim,
>
> Thanks for the input.
>
> I did copy the save folder from slave to slave.
>
> I did try to reboot the slave with no luck.
>
> I don’t know the details about the network. We do have an IT team to
> handle it. All other slaves are with the same network setup.
>
> I don’t know any other software running on the computer.
>
>
>
> Except TCDdump/Wireshark, anyone knows another way to debug that ? Does
> buildbot or twisted has a verbose mode ?
>
> I’m gonna try to open a direct ssh port forwarding tunnel between the
> master and the slave to see if that changes anything (might help us to
> understand the cause of the issue). I keep you posted.
>
>
>
> Regards,
>
> Colin Chargy
>
>
>
> *From:* Jim Rowan [mailto:jmr at computing.com]
>
> *Sent:* Friday, June 23, 2017 19:54
>
>
> *To:* Colin Chargy <Colin.Chargy at bentley.com>
> *Cc:* users at buildbot.net
> *Subject:* Re: [users at bb.net] Issue connecting slave to master
>
>
>
> hmmm.    :).   Some thoughts/questions interleaved below:
>
>
>
>
>
> On Jun 22, 2017, at 10:06 AM, Colin Chargy <Colin.Chargy at bentley.com>
> wrote:
>
>
>
> Hi Jim,
>
> Thanks for the input. I tested what you suggested. The same slave folder
> on another computer wworks fine and another slave folder (from another
> computer) on this one doesn’t work.
>
>
>
> I’m not positive what you’re saying — I think what you did above was to
> copy the actual slave folders in question from machine to machine, and then
> try to start them up?   If so, that’s fine … just trying to fully
> understand.
>
>
>
> In both of the tests you mention above, are the slaves talking to the
> “second” master — the one that doesn’t work?
>
>
>
> If so, I think that pretty much proves that it’s something about *this
> machine*, and seems to me to be almost certainly external to buildbot.
>
>
>
> I have a few somewhat-wild-guess things to look at:
>
>
>
> 1.) If you haven’t already, reboot the slave.
>
>
>
> 2.) I notice the slave's address is  192.168.0.254 and the master’s
> address is 192.168.0.1.   Assuming a /24 network, those are by convention
> both a bit special — people might configure either one of them as gateways
> to other subnets.   Although that isn’t technically a problem, it makes me
> suspicious.   Is this indeed on a /24 subnet?  Are you sure that no other
> system is using these addresses?   Do both machines have the correct subnet
> mask?  (Some of these might be answered by your tcpdump file, but I didn’t
> crack it open..)
>
>
>
> 3.) Is there some other software running on this particular slave machine
> that makes it unusual compared to the others?
>
> For grins, don’t start anything else after a reboot, and just try to run
> this one slave.  (By hand, if you are normally starting it as a service.)
>
>
>
>
>
> I’m assuming that both slaves on that machine are sharing the same python
> installation, and therefore the same buildslave code?  So the only thing
> unique is the actual slavedir and the name/password?
>
> Yes
>
>
>
> And the master that it’s talking to has other working slaves on different
> machines?
>
> Yes, plenty and no one has thoses issues.
>
>
>
> Any other idea ?
>
>
>
> Regards,
>
> Colin Chargy
>
>
>
> *From:* Jim Rowan [mailto:jmr at computing.com <jmr at computing.com>]
> *Sent:* Tuesday, June 20, 2017 21:35
> *To:* Colin Chargy <Colin.Chargy at bentley.com>
> *Cc:* users at buildbot.net
> *Subject:* Re: [users at bb.net] Issue connecting slave to master
>
>
>
>
>
>
>
> Wow .. so it’s apparently something specific to this particular slave on
> this particular machine, or the tuple of those with the particular master.
>   Have you tried instantiating and starting the slave on a different
> windows 10 machine?   Or changing buildslave.tac to use a different (and
> working) slavename that is defined on that same master?  (I think you said
> you did test this.).
>
>
>
> I’m assuming that both slaves on that machine are sharing the same python
> installation, and therefore the same buildslave code?  So the only thing
> unique is the actual slavedir and the name/password?
>
>
>
> And the master that it’s talking to has other working slaves on different
> machines?
>
>
>
>
>
> On Jun 20, 2017, at 10:13 AM, Colin Chargy <Colin.Chargy at bentley.com>
> wrote:
>
>
>
> Hi,
>
> Thanks for your input. It doesn’t change anything. ☹
>
>
>
> Best regards,
>
> Colin Chargy
>
>
>
> *From:* Jim Rowan [mailto:jmr at computing.com <jmr at computing.com>]
> *Sent:* Tuesday, June 20, 2017 17:11
> *To:* Colin Chargy <Colin.Chargy at bentley.com>
> *Cc:* Pierre Tardy <tardyp at gmail.com>; users at buildbot.net
> *Subject:* Re: [users at bb.net] Issue connecting slave to master
>
>
>
> It’s a bit of a wild guess, but what happens if you stop the second
> (working) slave that is on the same machine before trying to start this one?
>
>
>
> On Jun 20, 2017, at 8:10 AM, Colin Chargy <Colin.Chargy at bentley.com>
> wrote:
>
>
>
> Hi Pierre,
>
> I tried with the following version :
>
> $ buildslave --version
>
> Buildslave version: 0.8.8
>
> Twisted version: 12.3.0
>
>
>
> It’s now the exact same of the master and the behavior continues…
>
>
>
> Anything else I could try ?
>
>
>
> I’ll ask the admin of the server to update twisted.
>
>
>
> Best regards,
>
> Colin Chargy
>
>
>
> *From:* Pierre Tardy [mailto:tardyp at gmail.com <tardyp at gmail.com>]
> *Sent:* Tuesday, June 20, 2017 15:02
> *To:* Colin Chargy <Colin.Chargy at bentley.com>; users at buildbot.net
> *Subject:* Re: [users at bb.net] FW: Issue connecting slave to master
>
>
>
> Oh, I did not realize the very old twisted version. you can try to
> downgrade on the worker indeed.
>
>
>
> I see no reason not to upgrade twisted on master, though
>
>
>
> Pierre
>
>
>
> On Tue, Jun 20, 2017 at 2:45 PM Colin Chargy <Colin.Chargy at bentley.com>
> wrote:
>
> Hi Pierre,
>
> I tested what you suggested :
>
> $ buildslave --version
>
> Buildslave version: 0.8.8
>
> Twisted version: 17.5.0
>
>
>
> This does not change the behavior. Should I test with another twisted
> version ?
>
>
>
> Regards,
>
> Colin
>
>
>
> *From:* Pierre Tardy [mailto:tardyp at gmail.com]
>
> *Sent:* Tuesday, June 20, 2017 14:15
>
>
> *To:* Colin Chargy <Colin.Chargy at bentley.com>; users at buildbot.net
> *Subject:* Re: [users at bb.net] FW: Issue connecting slave to master
>
>
>
> Colin,
>
> Its a bit harder to me to efficiently help you as 0.8.8 is quite an old
> version. I imagine upgrading is not an option..
>
>
>
> it might be an incompatibility of the slave version string. We usually try
> to maintain compatibility for new master version to old slave version, but
> we might not always take care of supporting running new slaves with older
> master.
>
> Did you try downgrading your slave version to 0.8.8?
>
>
>
> Pierre
>
>
>
> On Tue, Jun 20, 2017 at 11:53 AM Colin Chargy <Colin.Chargy at bentley.com>
> wrote:
>
> Hi Pierre,
>
> Thanks for your reply.
>
> Indeed, I’ve seen in the failedToGetPerspective doc that it could fail
> with a wrong login password. However, the slave name and password seems
> correct (ie the same on the slave .toc file and on the server config). We
> also tested multiple login/password couple to see if that changes anything
> (with no luck). The TCP dump seems to show that the last things which are
>  sent are  the host name and slave info which are the default one (I tried
> modify them with no luck). What happen after/inside failedToGetPerspective
> ? Does the connection changes port/connection/setting or anything else at
> this point ?
>
>
>
> I should probably add about info our set up : the server runs 2 buildbot
> masters and the slave computer also 2 buildbot slave (one for each master).
> We do have other computer that work that way without any problem. Of
> course, we checked that each slave is connecting to the correct master.
> Only one of the slave/master couple fails (and as already said, only on
> this computer).
>
>
>
> Best regards,
>
> Colin Chargy
>
>
>
> *From:* Pierre Tardy [mailto:tardyp at gmail.com]
> *Sent:* Tuesday, June 20, 2017 11:41
> *To:* Colin Chargy <Colin.Chargy at bentley.com>; users at buildbot.net
> *Subject:* Re: [users at bb.net] FW: Issue connecting slave to master
>
>
>
> Hi Colin
>
> Could that be a problem with your slave password?
>
>
>
>
>
>  def failedToGetPerspective(self, why):
>
>         """The login process failed, most likely because of an
> authorization
>
>         failure (bad password), but it is also possible that we lost the
> new
>
>         connection before we managed to send our credentials.
>
>         """
>
>         log.msg("ReconnectingPBClientFactory.failedToGetPerspective")
>
>         if why.check(pb.PBConnectionLost):
>
>             log.msg("we lost the brand-new connection")
>
>             # retrying might help here, let clientConnectionLost decide
>
>             return
>
>         # probably authorization
>
>         self.stopTrying()  # logging in harder won't help
>
>         log.err(why)
>
>
>
>
>
> On Tue, Jun 20, 2017 at 9:18 AM Colin Chargy <Colin.Chargy at bentley.com>
> wrote:
>
> Hi everyone,
> Before I start describing my issue, let me say to we have dozen of slaves
> (Win, Mac and Linux platform perfectly working right now), only one is
> problematic :
> We are facing an issue with slave connection to master. Here is the log on
> the slave side (see enclosed twisted.log for complete log) :
> [Broker,client] message from master: attached [Broker,client]
> ReconnectingPBClientFactory.failedToGetPerspective
> [Broker,client] we lost the brand-new connection [Broker,client] Lost
> connection to 192.168.0.1:9989 [Broker,client]
> <twisted.internet.tcp.Connector instance at 0x03471918> will retry in 3
> seconds
>
> And it starts it again.
> On the server side, the following log is produced :
> 2017-06-19 16:11:27+0200 [Broker,9423,192.168.0.254] slave
> 'lrttestauto-test' attaching from IPv4Address(TCP, '192.168.0.254', 35524)
> 2017-06-19 16:11:27+0200 [Broker,9423,192.168.0.254] Starting buildslave
> keepalive timer for 'lrttestauto-test'
> 2017-06-19 16:11:27+0200 [Broker,9423,192.168.0.254] Peer will receive
> following PB traceback:
> 2017-06-19 16:11:27+0200 [Broker,9423,192.168.0.254] Unhandled Error
>         Traceback (most recent call last):
>         Failure: twisted.spread.pb.PBConnectionLost: [Failure instance:
> Traceback (failure with no frames): <class
> 'twisted.internet.error.ConnectionLost'>: Connection to the other side was
> lost in a non-clean fashion.
>         ]
>
> I've checked that the login and password are correct and Buildbot version
> are the following :
> On the server-side (which is a Debian):
> Buildbot version: 0.8.8
> Twisted version: 12.3.0
>
> On the slave side (which is a Windows 10, buildslave installed via pip):
> Buildslave version: 0.8.14
> Twisted version: 17.5.0
>
> I've enclosed the slave log, the slave tac file and a tcpdump showing data
> transfer between slave and server (I've tried to debug it with Wireshark
> with no luck).
>
> What can I do to debug or to solve this issue ?
>
> Best regards,
> Colin Chargy
> _______________________________________________
> users mailing list
> users at buildbot.net
> https://lists.buildbot.net/mailman/listinfo/users
>
> _______________________________________________
> users mailing list
> users at buildbot.net
> https://lists.buildbot.net/mailman/listinfo/users
>
>
> _______________________________________________
> users mailing list
> users at buildbot.net
> https://lists.buildbot.net/mailman/listinfo/users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.buildbot.net/pipermail/users/attachments/20170630/e7a550af/attachment.html>


More information about the users mailing list