[users at bb.net] More from the land of multi-master.

Thu Oct 5 16:06:09 UTC 2017

On Thu, Oct 5, 2017 at 4:13 PM Neil Gilmore <ngilmore at grammatech.com> wrote:

> Pierre,
>
> As always, thanks for the reply and advice.
>
> Note that I've clipped items that were addressed and that I have no more
> comments on.
>
>
> On 10/5/2017 3:45 AM, Pierre Tardy wrote:
>
> On Wed, Oct 4, 2017 at 5:12 PM Neil Gilmore <ngilmore at grammatech.com>
> wrote:
> We have also been getting a lot of errors apparently tied to build
>
>> collapsing, which we have turned on globally. If you've been following
>> along with the anecdotes, you'll know that we've also slightly modified
>> the circumstances under which a build will be collapsed to ignore
>> revision (in our case, we always want to use the latest -- we don't care
>> about building anything 'intermediate'). We'd been getting a lot of
>> 'tried to complete N buildequests, but only completed M' warnings.
>
> We have seen also people seeing those issues. I have made a fix in 0.9.10,
> but it looks like there are still people  complaining about it, but without
> much clue of what is wrong beyond what was fixed.
> The known problem was that the N buildrequests were actually not uniques
> buildrequests, the list contained duplicated.
> So those warnings should be pretty harmless beyond the noise.
>
>
> Even though the transaction involving those buildrequests cancels the
> transaction, so that the original work of marking requests doesn't happen?
> Or would that just mean the requests don't get skipped?
>

No, there is no transaction canceled with this issue, this is only the
upper layer freaking out because the db layer is returning less row updated
than it is expecting.
The requests will get skipped as expected. This is just a spurious warning
message.

>
> The builder we have to start workers does not use the custom steps, though
> we have collapsing turned on globally. I have not seen that builder having
> any skipped requests. It appears to be running normally since yesterday.
>
so everything back to normal?

> I've used the manhole before, but not for this. I've had to use it in the
> past to manually finish stuck builds, and to manually release locks when
> necessary (though I haven't had to do that in a long time).
>
> But we don't leave the manhole open, which means that I reconfig when I'm
> going to use it (and since we use the same master.cfg for all the masters,
> the manhole would try, and probably fail, to open for all of them). Lately,
> that hasn't been a good option, because when we were having the CPU spikes,
> the reconfig would never finish (it might run for 24 hours or more until we
> were going to restart the master anyway). It might work now, though, since
> we seem to have solved the CPU problem.
>
> sounds good!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.buildbot.net/pipermail/users/attachments/20171005/ebcde98b/attachment.html>