[users at bb.net] More anecdotes.

Pierre Tardy tardyp at gmail.com
Thu Sep 29 08:54:36 UTC 2016


Hi Neil,

Good to know your multimaster setup is nearly done!
Indeed scripted multimaster would be great!
docker-compose setup or ansible playbook would be perfect I think


For the other part of the annecdote I think we are still chasing the same
bug right.
At some point somehow, you got a worker that is in a bad state and won't
accept anymore build.

The idea to move to master is a workaround indeed. Another one is to change
the name of the worker.

Obviously the best long term option is to debug the problem, but it does
n't look like easy to reproduce nor to debug :-(
hacking the database? naah.. dont. really.
It wont even help as most likely the corrupted state is in the python
objects

Pierre

Le mer. 28 sept. 2016 à 22:33, Neil Gilmore <ngilmore at grammatech.com> a
écrit :

> Hi everyone,
>
> Congrats on rc4.
>
> More anecdotes from rc1. I got tangled up a bit trying to get
> multi-master working. I'm still not sure why all the parts would build
> one day, then not the next (in this case, it was setuptools). Nor why
> crossbar requires libffi to be installed on one machine but not the
> other. Nor why SQLAlchemy will be downloaded and installed automatically
> but not psycopg2. These troubles seem to have straightened themselves
> out, and I have multi-master buildbots in sandboxes on 2 different
> machines. There's light at the end of the tunnel, I hope.
>
> As side note, Pierre, I ended up scripting the whole install/build/run
> thing. That may have to do for a tutorial.
>
> I got asked for help with a builder. Seems it was taking inordinately
> long to do a build, and the user tried cancelling, forcing, etc. There's
> 3 builders for this worker. 1 doesn't use locks, but the other 2 do.
> It's pretty common for our workers to have a builder that doesn't lock,
> and the rest do.
>
> The current situation is that the build of the builder in question shows
> not 1, but 2 builds building. Sort of, the current build is shown as
> acquiring locks. The older building build is clearly stalled.
>
> The other builder for the worker is proceeding well (but its builds take
> about 3 days). Obviously, it was able to get the lock. But it has
> started another build after finishing the first one. So it appears that
> it got the lock again before the original builder (unless there's
> something else going on).
>
> I also had a different worker's build stall, so I moved that worker to
> our alternate master. Unfortunately, it's a trick that only works once.
> If I move it back, it'll still be stalled. Is there any way to remove a
> no longer active worker from the database? I tried once, but I messed it
> up and had to start with an empty database. I didn't try again.
>
> Neil Gilmore
> grammatech.com
> _______________________________________________
> users mailing list
> users at buildbot.net
> https://lists.buildbot.net/mailman/listinfo/users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.buildbot.net/pipermail/users/attachments/20160929/270137f6/attachment.html>


More information about the users mailing list