[users at bb.net] buildbot-0.9.0rc1, createSummary, and SyncLogFileWrapper troubles.

Neil Gilmore ngilmore at grammatech.com
Mon Aug 22 16:56:46 UTC 2016


Pierre,

Thanks for the information and advice. I'll be looking more closely at 
it in a few days. I understand about the changes.

Currently, I'm working on moving to a multi-master setup. I'm told that 
when on 0.8.6p1 we occasionally had trouble with builds stalling for 
unknown reasons. It's been a lot worse with 9, and we're trying to find 
ways to deal with it. It may be a scaling issue. If nothing else, using 
multi-master will mean that it's less disruptive taking down a master 
than it is now.

We have several builders per worker, but the workers don't always have 
enough resources to have a build active for every builder. So we have a 
lock that makes sure only one build is really doing anything at a time. 
Unfortunately, we've run into situations that, looking at the logs, 
commands complete but the builder never sends another command to the 
worker. This makes things look worse than they are, as a build may be 
sitting doing nothing while the others are acquiring locks. Restarting 
the worker doesn't help.

We also have seen an odd problem, likely from the same cause, where a 
compilation finishes (ours sometimes take hours or a day or two), but 
some part of the system drops the ball. This leaves things is a state 
where one side thinks things finished fine, but the other says it 
didn't. Sorry I don't have more details on that one. I've been nearly 
drowning just trying to get people their builds.

We also seem to have a situation where, even though there are build 
requests queued and none active, the master takes quite a long while (if 
ever), to start a build. Restarting the worker sometimes helps.

It may be some sort of scaling issue. The logs I see say 489 schedulers, 
195 workers (currently 205, if I remember). I'm not sure if that's big 
or not.

In order to try to mitigate some issues, I'm running two completely 
separate masters, and I move workers that need to produce critical 
builds from one to the other. It's far from ideal, but we did need to 
get builds done, and that worked for the short term.

Again, thank you for your time.

Neil Gilmore
grammatech.com <http://grammatech.com>

On 8/22/2016 3:52 AM, Pierre Tardy wrote:
> Hi Neil,
>
> In buildbot nine, some big change had to be made in order to make the 
> log api asynchronous. Indeed, as logs are now written in db, there can 
> be some significant latency, thus the need for asynchronous api.
>
> We tried our best to keep synchronous api backward compatibility but 
> we couldn't support all of them. Only write apis are supported with 
> some restrictions. Read apis are not supported (like getText), even if 
> those api are not properly cleaned-out yet.
>
> For your described use case, I would rather do the log shortening in 
> the email reporter module, rather than in the steps.
>
> You can find here some example code on how to fetch log content
> https://github.com/buildbot/buildbot/blob/master/master/buildbot/test/util/integration.py#L188
>
> you can use resultSpec in order to limit the amount of lines you want 
> (offset and limit at in line):
> from buildbot.data import resultspec
> first_100_lines = yield self.master.data.get(("logs", log['logid'], 
> "contents"), resultSpec=resultspec.ResultSpec(offset=0, limit=100))
>
> You can also use a logobserver that will stop concatenating when it 
> reaches the 64k limit. This shouldn't have more memory footprint has 
> you have right now (getText() will load the entire log in memory!)
>
> Thanks for testing out nine, and letting us know your issues!
>
> Pierre
>
> Le ven. 19 août 2016 à 16:41, Neil Gilmore <ngilmore at grammatech.com 
> <mailto:ngilmore at grammatech.com>> a écrit :
>
>     Hi everyone,
>
>     We've been trying to move from primarily 0.8.6p1(?) to 0.9.0rc1. We're
>     having some problems.
>
>     Among them is a custom build step that no longer works. It looks
>     substantially like this:
>
>     class MyCustomStep(ShellCommand):
>          name = "errorlog"
>          haltOnFailure = 1
>          description = ["checking for errors"]
>          descriptionDone = ["done checking for errors"]
>
>          OFFprogressMetrics = ('output',)
>          # things to track: number of files compiled, number of
>     directories
>          # traversed (assuming 'make' is being used)
>
>          def createSummary(self, cmd):
>      self.addCompleteLog('stdio_head',cmd.getText()[0:65536])
>
>     We use this step to create a shorter log that we can use in emails
>     without choking our systems.
>
>     cmd.getText() never returns any text.
>
>     cmd is a SyncLogFileWrapper, and I've tried writing nearly
>     everything in
>     it to twistd.log via log.msg. It never seems to have any
>     finishDeferreds, or chunks, and always gets marked finished.
>
>     I thought that maybe I needed to call waitUntilFinished().
>     Unfortunately, this function excepts in code from the 0.9.0rc1
>     tarball,
>     and the error is in both the github master and 0.9.0 branch
>     (finishDeferreds is spelled finishDefereds). Fixing it didn't help, as
>     there were no finishDefeereds out there to finish anyway.
>
>     That, combined with the fact that the only place where the log
>     argument
>     to createSummary() is used is in the integration tests, leads me to
>     think that this hasn't been tested too much, if at all.
>
>     I've thought about using the data API, but I'd need a log id.
>     SyncLogFileWrapper doesn't seem to have one. I'm currently working on
>     getting it from the step, because I can at least get the step id,
>     and I
>     hope I can work my way down from there.
>
>     I think I'd rather not try using a LogObserver, as I don't think
>     we want
>     to be accumulating large logs in memory.
>
>     Has anyone out there successfully gotten text from the log argument of
>     CreateSummary in 0.9.0rc1?
>
>     Thank you for your time.
>
>     Neil Gilmore
>     grammatech.com <http://grammatech.com>
>     _______________________________________________
>     users mailing list
>     users at buildbot.net <mailto:users at buildbot.net>
>     https://lists.buildbot.net/mailman/listinfo/users
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.buildbot.net/pipermail/users/attachments/20160822/7d8c6a9e/attachment.html>


More information about the users mailing list