<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
Pierre,<br>
<br>
Thanks for the information and advice. I'll be looking more closely
at it in a few days. I understand about the changes. <br>
<br>
Currently, I'm working on moving to a multi-master setup. I'm told
that when on 0.8.6p1 we occasionally had trouble with builds
stalling for unknown reasons. It's been a lot worse with 9, and
we're trying to find ways to deal with it. It may be a scaling
issue. If nothing else, using multi-master will mean that it's less
disruptive taking down a master than it is now.<br>
<br>
We have several builders per worker, but the workers don't always
have enough resources to have a build active for every builder. So
we have a lock that makes sure only one build is really doing
anything at a time. Unfortunately, we've run into situations that,
looking at the logs, commands complete but the builder never sends
another command to the worker. This makes things look worse than
they are, as a build may be sitting doing nothing while the others
are acquiring locks. Restarting the worker doesn't help.<br>
<br>
We also have seen an odd problem, likely from the same cause, where
a compilation finishes (ours sometimes take hours or a day or two),
but some part of the system drops the ball. This leaves things is a
state where one side thinks things finished fine, but the other says
it didn't. Sorry I don't have more details on that one. I've been
nearly drowning just trying to get people their builds.<br>
<br>
We also seem to have a situation where, even though there are build
requests queued and none active, the master takes quite a long while
(if ever), to start a build. Restarting the worker sometimes helps.<br>
<br>
It may be some sort of scaling issue. The logs I see say 489
schedulers, 195 workers (currently 205, if I remember). I'm not sure
if that's big or not.<br>
<br>
In order to try to mitigate some issues, I'm running two completely
separate masters, and I move workers that need to produce critical
builds from one to the other. It's far from ideal, but we did need
to get builds done, and that worked for the short term.<br>
<br>
Again, thank you for your time.<br>
<br>
Neil Gilmore<br>
<a moz-do-not-send="true" href="http://grammatech.com"
rel="noreferrer" target="_blank">grammatech.com</a><br>
<br>
<div class="moz-cite-prefix">On 8/22/2016 3:52 AM, Pierre Tardy
wrote:<br>
</div>
<blockquote
cite="mid:CAJ+soVfQuE_X_--ujfnJk6t+fDDLTJgTm3ycD4fVYfDCKhRM4w@mail.gmail.com"
type="cite">
<div dir="ltr">Hi Neil,
<div><br>
</div>
<div>In buildbot nine, some big change had to be made in order
to make the log api asynchronous. Indeed, as logs are now
written in db, there can be some significant latency, thus the
need for asynchronous api.</div>
<div><br>
</div>
<div>We tried our best to keep synchronous api backward
compatibility but we couldn't support all of them. Only write
apis are supported with some restrictions. Read apis are not
supported (like getText), even if those api are not properly
cleaned-out yet.</div>
<div><br>
</div>
<div>For your described use case, I would rather do the log
shortening in the email reporter module, rather than in the
steps.</div>
<div><br>
</div>
<div>You can find here some example code on how to fetch log
content</div>
<div><a moz-do-not-send="true"
href="https://github.com/buildbot/buildbot/blob/master/master/buildbot/test/util/integration.py#L188">https://github.com/buildbot/buildbot/blob/master/master/buildbot/test/util/integration.py#L188</a><br>
</div>
<div>
<div><br>
</div>
<div>you can use resultSpec in order to limit the amount of
lines you want (offset and limit at in line):</div>
<div>
<div>from buildbot.data import resultspec</div>
</div>
<div>first_100_lines = yield self.master.data.get(("logs",
log['logid'], "contents"),
resultSpec=resultspec.ResultSpec(offset=0, limit=100))</div>
</div>
<div><br>
</div>
<div>You can also use a logobserver that will stop concatenating
when it reaches the 64k limit. This shouldn't have more memory
footprint has you have right now (getText() will load the
entire log in memory!)</div>
<div><br>
</div>
<div>Thanks for testing out nine, and letting us know your
issues!</div>
<div><br>
</div>
<div>Pierre</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr">Le ven. 19 août 2016 à 16:41, Neil Gilmore <<a
moz-do-not-send="true" href="mailto:ngilmore@grammatech.com">ngilmore@grammatech.com</a>>
a écrit :<br>
</div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">Hi everyone,<br>
<br>
We've been trying to move from primarily 0.8.6p1(?) to
0.9.0rc1. We're<br>
having some problems.<br>
<br>
Among them is a custom build step that no longer works. It
looks<br>
substantially like this:<br>
<br>
class MyCustomStep(ShellCommand):<br>
name = "errorlog"<br>
haltOnFailure = 1<br>
description = ["checking for errors"]<br>
descriptionDone = ["done checking for errors"]<br>
<br>
OFFprogressMetrics = ('output',)<br>
# things to track: number of files compiled, number of
directories<br>
# traversed (assuming 'make' is being used)<br>
<br>
def createSummary(self, cmd):<br>
self.addCompleteLog('stdio_head',cmd.getText()[0:65536])<br>
<br>
We use this step to create a shorter log that we can use in
emails<br>
without choking our systems.<br>
<br>
cmd.getText() never returns any text.<br>
<br>
cmd is a SyncLogFileWrapper, and I've tried writing nearly
everything in<br>
it to twistd.log via log.msg. It never seems to have any<br>
finishDeferreds, or chunks, and always gets marked finished.<br>
<br>
I thought that maybe I needed to call waitUntilFinished().<br>
Unfortunately, this function excepts in code from the 0.9.0rc1
tarball,<br>
and the error is in both the github master and 0.9.0 branch<br>
(finishDeferreds is spelled finishDefereds). Fixing it didn't
help, as<br>
there were no finishDefeereds out there to finish anyway.<br>
<br>
That, combined with the fact that the only place where the log
argument<br>
to createSummary() is used is in the integration tests, leads
me to<br>
think that this hasn't been tested too much, if at all.<br>
<br>
I've thought about using the data API, but I'd need a log id.<br>
SyncLogFileWrapper doesn't seem to have one. I'm currently
working on<br>
getting it from the step, because I can at least get the step
id, and I<br>
hope I can work my way down from there.<br>
<br>
I think I'd rather not try using a LogObserver, as I don't
think we want<br>
to be accumulating large logs in memory.<br>
<br>
Has anyone out there successfully gotten text from the log
argument of<br>
CreateSummary in 0.9.0rc1?<br>
<br>
Thank you for your time.<br>
<br>
Neil Gilmore<br>
<a moz-do-not-send="true" href="http://grammatech.com"
rel="noreferrer" target="_blank">grammatech.com</a><br>
_______________________________________________<br>
users mailing list<br>
<a moz-do-not-send="true" href="mailto:users@buildbot.net"
target="_blank">users@buildbot.net</a><br>
<a moz-do-not-send="true"
href="https://lists.buildbot.net/mailman/listinfo/users"
rel="noreferrer" target="_blank">https://lists.buildbot.net/mailman/listinfo/users</a><br>
</blockquote>
</div>
</blockquote>
<br>
</body>
</html>