[users at bb.net] Multi-master 0.9.3 anecdotes.

Pierre Tardy tardyp at gmail.com
Fri Feb 3 21:17:57 UTC 2017


Hi Neil,

The timer starts when the worker is first configured.
but only if notify_on_missing is configured.

that may be a reason why you do not see the bug for ancient workers

Pierre

Le ven. 3 févr. 2017 à 21:59, Neil Gilmore <ngilmore at grammatech.com> a
écrit :

> Hi Andrej,
>
> Thanks for the reply.
>
> I don't see missing_timeout in our master.cfg anywhere. But I do see this:
>
> c['workers'] = [Worker(host, '<password>',
> notify_on_missing=bots_email[host]) for host in bots_list]
>
> Let's see if I understood you. The default missing_timeout is 60
> minutes. If I start the master and wait 60 minutes, then start the
> worker, the worker won't attach?
>
> In our case, we're not even adding the worker to master.cfg until well
> after that 60 minutes (a couple days after). We're adding new workers.
> Do you figure this could be the same problem?
>
> What happens with a default notify_on_missing? I figure I can try the
> patch in your PR when we restart the masters.
>
> Neil Gilmore
> raito at raito.com
>
> On 2/3/2017 2:42 PM, Andrej Rode wrote:
> > Hi Neil,
> >
> >> 2017-02-03T12:39:09-0500 [Broker,28906,10.233.216.43] worker '<name>'
> >> attaching from IPv4Address(TCP, '<ip>', 35642)
> >> 2017-02-03T12:39:09-0500 [Broker,28906,10.233.216.43] Got workerinfo
> >> from '<name>'
> >> 2017-02-03T12:39:09-0500 [-] bot attached
> >> 2017-02-03T12:39:09-0500 [-] worker <name> cannot attach
> >>          Traceback (most recent call last):
> >>          Failure: twisted.internet.error.AlreadyCalled: Tried to cancel
> >> an already-called event.
> > I had the same problembs but with a single-master setup. By any chance
> > are you using a non-default `missing_timeout` and/or `notify_on_missing`
> > on your workers?
> >
> > For my issue I've a PR up [0] and now I can detach and  attach workers
> > as I like. But it is still not clear why we even run into problems here.
> >
> > I figured out that attaching a worker after longer than
> > `missing_timeout` after a master start results in this problem on my
> > setup. (Default `missing_timeout` is 60 minutes.)
> >
> > Cheers,
> > Andrej
> >
> > [0] https://github.com/buildbot/buildbot/pull/2708
> > _______________________________________________
> > users mailing list
> > users at buildbot.net
> > https://lists.buildbot.net/mailman/listinfo/users
>
> _______________________________________________
> users mailing list
> users at buildbot.net
> https://lists.buildbot.net/mailman/listinfo/users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.buildbot.net/pipermail/users/attachments/20170203/d7c5ddb5/attachment.html>


More information about the users mailing list