<div dir="ltr">Hi Neil,<div><br></div><div>Good to know your multimaster setup is nearly done!</div><div>Indeed scripted multimaster would be great!</div><div>docker-compose setup or ansible playbook would be perfect I think</div><div><br></div><div><br></div><div>For the other part of the annecdote I think we are still chasing the same bug right.</div><div>At some point somehow, you got a worker that is in a bad state and won't accept anymore build.</div><div><br></div><div>The idea to move to master is a workaround indeed. Another one is to change the name of the worker.</div><div><br></div><div>Obviously the best long term option is to debug the problem, but it does n't look like easy to reproduce nor to debug :-(</div><div>hacking the database? naah.. dont. really.</div><div>It wont even help as most likely the corrupted state is in the python objects</div><div><br></div><div>Pierre</div></div><br><div class="gmail_quote"><div dir="ltr">Le mer. 28 sept. 2016 à 22:33, Neil Gilmore <<a href="mailto:ngilmore@grammatech.com">ngilmore@grammatech.com</a>> a écrit :<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi everyone,<br>

<br>

Congrats on rc4.<br>

<br>

More anecdotes from rc1. I got tangled up a bit trying to get<br>

multi-master working. I'm still not sure why all the parts would build<br>

one day, then not the next (in this case, it was setuptools). Nor why<br>

crossbar requires libffi to be installed on one machine but not the<br>

other. Nor why SQLAlchemy will be downloaded and installed automatically<br>

but not psycopg2. These troubles seem to have straightened themselves<br>

out, and I have multi-master buildbots in sandboxes on 2 different<br>

machines. There's light at the end of the tunnel, I hope.<br>

<br>

As side note, Pierre, I ended up scripting the whole install/build/run<br>

thing. That may have to do for a tutorial.<br>

<br>

I got asked for help with a builder. Seems it was taking inordinately<br>

long to do a build, and the user tried cancelling, forcing, etc. There's<br>

3 builders for this worker. 1 doesn't use locks, but the other 2 do.<br>

It's pretty common for our workers to have a builder that doesn't lock,<br>

and the rest do.<br>

<br>

The current situation is that the build of the builder in question shows<br>

not 1, but 2 builds building. Sort of, the current build is shown as<br>

acquiring locks. The older building build is clearly stalled.<br>

<br>

The other builder for the worker is proceeding well (but its builds take<br>

about 3 days). Obviously, it was able to get the lock. But it has<br>

started another build after finishing the first one. So it appears that<br>

it got the lock again before the original builder (unless there's<br>

something else going on).<br>

<br>

I also had a different worker's build stall, so I moved that worker to<br>

our alternate master. Unfortunately, it's a trick that only works once.<br>

If I move it back, it'll still be stalled. Is there any way to remove a<br>

no longer active worker from the database? I tried once, but I messed it<br>

up and had to start with an empty database. I didn't try again.<br>

<br>

Neil Gilmore<br>

<a href="http://grammatech.com" rel="noreferrer" target="_blank">grammatech.com</a><br>

_______________________________________________<br>

users mailing list<br>

<a href="mailto:users@buildbot.net" target="_blank">users@buildbot.net</a><br>

<a href="https://lists.buildbot.net/mailman/listinfo/users" rel="noreferrer" target="_blank">https://lists.buildbot.net/mailman/listinfo/users</a><br>

</blockquote></div>