[Buildbot-devel] Maximum log size

Chris AtLee chris at atlee.ca
Tue Sep 22 16:23:30 UTC 2009


Hey Axel (and list this time),

> I'm curious, what can go so wrong that we're running into this problem? This
> might be interesting as defense in depth, but the fix is probably elsewhere.

We've run into this a few places, which had the same root cause: code
coverage tests, and debug unit tests.  In both cases the browser would
enter into an infinite loop on exit.  Because both of these builds are
debug builds, they constantly generate output, which prevented the
regular buildbot timeout from working and the job from being killed.
They also generated many gigabytes worth of log data, which caused a
few problems:
- compressing them afterwards took too long (several hours per file).
I think I filed a bug upstream to allow for gzip compression, which is
_much_ faster.
- many of the status handlers (the tinderbox mailer, e.g.), and output
parsers want to load the entire output into memory and then do their
work.  this would cause the machine to start swapping, and eventually
buildbot would crash with an out of memory exception, or be killed off
by the kernel.

I added the maxTime parameter to build steps to prevent them from
running forever, but even with a reasonable maxTime, the process can
still generate around a gigabyte of data.

It would be nice to have the status handlers be more streamy in
handling data, but that's a much bigger issue to fix right now.

> For your patch:
>
> I think the config variable should be called maxLogSize instead of
> logMaxSize.

Hmm, I chose logMaxSize to line up with logCompressionLimit.

> I don't think that you should worry about header chunks, it's more about
> stdout vs stderr.

Ok, so logMaxSize should not count data in the HEADER channel?

> If we drop chunks and let the command run on, we should drop chunks in the
> middle of the log, IMHO.

Yeah, I kind of agree, but I couldn't come up with a good way of
specifying how large the head/tail should be in the config, and you'd
also have to keep a buffer for the current tail in memory.  If the
tail isn't too large, that would be fine.

> I think I favour having at least the option to kill the command if the log
> grows beyond the limit, in our case, that's probably what we want anyway.
> Dropping output sounds like an error condition either way, and should be
> signaled as a build failure. You could add an option abortOnLargeLog =
> WARN/ERROR/EXCEPTION.

Ok, I can look at adding support for a flag like that as well.  Should
it add a message to that effect to the header channel of the stdio
log, or to another warnings/error log?

Thanks for the ideas!
Chris

On Tue, Sep 22, 2009 at 12:08 PM, Axel Hecht <l10n.moz at googlemail.com> wrote:
> Hey Chris,
>
> I'm curious, what can go so wrong that we're running into this problem? This
> might be interesting as defense in depth, but the fix is probably elsewhere.
>
> For your patch:
>
> I think the config variable should be called maxLogSize instead of
> logMaxSize.
>
> I don't think that you should worry about header chunks, it's more about
> stdout vs stderr.
>
> If we drop chunks and let the command run on, we should drop chunks in the
> middle of the log, IMHO.
>
> I think I favour having at least the option to kill the command if the log
> grows beyond the limit, in our case, that's probably what we want anyway.
> Dropping output sounds like an error condition either way, and should be
> signaled as a build failure. You could add an option abortOnLargeLog =
> WARN/ERROR/EXCEPTION.
>
> Axel
>
> 2009/9/15 Chris AtLee <chris at atlee.ca>
>>
>> Hi everyone,
>>
>> We've been running into a few problems where runaway processes
>> generate HUGE log files (several hundred MB, sometimes a few GB),
>> which causes the master to become unresponsive and then eventually
>> crash or be killed by the oom-killer in the kernel.
>>
>> I'm looking at one way of addressing this, which is to be able to set
>> an upper limit on the size of the log.  I've got a basic
>> implementation working [1], but ran into a few things where I wasn't
>> sure what the right thing to do was.
>>
>> - Should the HEADER channel be included in the maximum size?
>> Currently I'm including it in the total size of the log, but not
>> preventing additional entries made to the HEADER channel from being
>> logged.  I've considered adding another counter that keeps track of
>> everything in the log except headers, so that the header channel
>> doesn't count towards the maximum log size.
>>
>> - I'm not truncating individual chunks.  If the log file is under the
>> maximum size, the entire chunk will get logged, otherwise the chunk
>> won't get logged at all, with the exception of HEADER chunks.
>>
>> - If the log is truncated, should the step be terminated at that time?
>>  I would guess no, but I'd to know what the community thinks.
>>
>> - Should some portion of tail of the log be kept even if the log is
>> truncated?
>>
>> Cheers,
>> Chris
>>
>> [1] http://github.com/catlee/buildbot/commits/maxlogsize
>>
>>
>> ------------------------------------------------------------------------------
>> Come build with us! The BlackBerry® Developer Conference in SF, CA
>> is the only developer event you need to attend this year. Jumpstart your
>> developing skills, take BlackBerry mobile applications to market and stay
>> ahead of the curve. Join us from November 9-12, 2009. Register
>> now!
>> http://p.sf.net/sfu/devconf
>> _______________________________________________
>> Buildbot-devel mailing list
>> Buildbot-devel at lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/buildbot-devel
>
>




More information about the devel mailing list