[devel at bb.net] deadlock upon errors on log.addRawLines

Ion Alberdi nolaridebi at gmail.com
Thu Apr 28 15:25:47 UTC 2016


Great !
I mean, if we're on the
- yield addStdout situation,
the issue might be solved with
https://github.com/buildbot/buildbot/pull/2166 only as,
- the caller of addStdout, etc will be aware of the 'panic' situation and
could take appropriate measures (either discard the error, signal it, etc)



2016-04-28 17:17 GMT+02:00 Pierre Tardy <tardyp at gmail.com>:

> Hi Ion,
>
> For me, the db.logs module should never fail. If there is a need for a
> retry, it shall probably be implemented in the db layer.
>
> I think this would make sense to first have a better idea of the rootcause
> of the deadlock.
> Vladimir @rutsky has implemented some powerful sql logging, maybe this
> would make sense to activate this
>
> I agree that there can be case of unrecoverable error which prevents the
> logs to be written (I can think of a disk full on the sql server).
> In this case, this is more a kind of panic issue. There are several option
> I can think of:
>
> - stop the step in exception status.
>    if we can't write logs, will we be able to change the step status to
> exception?
> - panic the master, and stop it.
>    We let an upper orchestration layer handle the availability issue
> - retry forever until some db admin restores the sql server.
>   This has the advantage of not failing any build.
>
> About addstdout, in buildbot nine and buildbot 0.8.12, for "new style"
> steps, addLog, addStdout already returns a deferred, which only completes
> when the write has been done.
> http://docs.buildbot.net/latest/manual/new-style-steps.html
> so, yes, you should yield the addStdout, and make sure your steps are
> considered "newStyle" (which means they have a run() instead of start()
> method)
>
> Pierre
>
> Le jeu. 28 avr. 2016 à 13:54, Ion Alberdi <nolaridebi at gmail.com> a écrit :
>
>> Hello to all,
>> As the error might require more analysis, I copy/paste the issue
>> mentioned on irc.
>>
>> From my understanding,
>> when an error appears in the buildbot<->db conversation to add a log line:
>>
>> https://github.com/buildbot/buildbot/blob/master/master/buildbot/process/log.py#L76
>>
>> the step calling addStdout will not be able to finish, as it will wait
>> for the lock to be released, forever.
>>
>> I see two solutions for now:
>> 1. return the deferred in log.addStdout calls, and let the developper
>>     handle the issue if there is any. It has the drawback of changing
>>     the way steps are implemented:
>>     before:
>>         log.addStdout
>>         log.addStdout
>>     after:
>>         yield log.addStdout
>>         yield log.addStdout
>>
>> 2. implement a retry mechanism in addRawLines (with a random sleep
>> between)
>>     and raise an error if the issue is not solved (that the developer
>> will handle or not).
>>     This aims at reducing the number of discarded logs so that they could
>> become tolerable.
>>
>> gracinet (correct me if i'm wrong) prefers 2, inputs are welcome :)
>>
>>
>> P.S:
>> I'm currently testing solution 2 in a use case that stresses the logs
>> long enough
>> to make the db (mysql innodb) report the following issues:
>>
>> - sqlalchemy.exc.OperationalError: (OperationalError) (1213, 'Deadlock
>> found when trying to get lock; try restarting transaction') None None
>> - sqlalchemy.exc.OperationalError: (OperationalError) (1213, 'Deadlock
>> found when trying to get lock; try restarting transaction') 'INSERT INTO
>> logchunks (logid, first_line, last_line, content, compressed) VALUES (%s,
>> %s, %s, %s, %s)' (924L, 315L, 315L,
>> 'x\xdaKU\x80\x02\xbf|\x85\x92\xcc\xdcT\x85\xaa\xfc\xbcT\x85\xcc\xbc\xb4|
>> \xa1P\x92\x91Y\xac\x90\x92X\x92\xaa\xa0ad`h\xa6k`\xa2kd\xa1```\x05F\x9a:\n\xc5\xa9%%\x99y\xe9\n%\xf9\n\xa1!\xce\x00BM\x15\x96',
>> 1)
>>
>>
>>
>> --
>> Ion
>> _______________________________________________
>> devel mailing list
>> devel at buildbot.net
>> https://lists.buildbot.net/mailman/listinfo/devel
>
>


-- 
Ion
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.buildbot.net/pipermail/devel/attachments/20160428/de2ef030/attachment.html>


More information about the devel mailing list