<div dir="ltr">Great !<div>I mean, if we're on the </div><div>- yield addStdout situation, </div><div><div>the issue might be solved with <a href="https://github.com/buildbot/buildbot/pull/2166">https://github.com/buildbot/buildbot/pull/2166</a> only as,</div></div><div>- the caller of addStdout, etc will be aware of the 'panic' situation and could take appropriate measures (either discard the error, signal it, etc)</div><div><br></div><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">2016-04-28 17:17 GMT+02:00 Pierre Tardy <span dir="ltr"><<a href="mailto:tardyp@gmail.com" target="_blank">tardyp@gmail.com</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Hi Ion,<div><br></div><div>For me, the db.logs module should never fail. If there is a need for a retry, it shall probably be implemented in the db layer.</div><div><br></div><div>I think this would make sense to first have a better idea of the rootcause of the deadlock.</div><div>Vladimir @rutsky has implemented some powerful sql logging, maybe this would make sense to activate this</div><div><br></div><div>I agree that there can be case of unrecoverable error which prevents the logs to be written (I can think of a disk full on the sql server).</div><div>In this case, this is more a kind of panic issue. There are several option I can think of:</div><div><br></div><div>- stop the step in exception status.</div><div> if we can't write logs, will we be able to change the step status to exception?</div><div>- panic the master, and stop it.</div><div> We let an upper orchestration layer handle the availability issue</div><div>- retry forever until some db admin restores the sql server.</div><div> This has the advantage of not failing any build.</div><div><br></div><div>About addstdout, in buildbot nine and buildbot 0.8.12, for "new style" steps, addLog, addStdout already returns a deferred, which only completes when the write has been done.</div><div><a href="http://docs.buildbot.net/latest/manual/new-style-steps.html" target="_blank">http://docs.buildbot.net/latest/manual/new-style-steps.html</a><br></div><div>so, yes, you should yield the addStdout, and make sure your steps are considered "newStyle" (which means they have a run() instead of start() method)</div><div><br></div><div>Pierre</div></div><br><div class="gmail_quote"><div><div class="h5"><div dir="ltr">Le jeu. 28 avr. 2016 à 13:54, Ion Alberdi <<a href="mailto:nolaridebi@gmail.com" target="_blank">nolaridebi@gmail.com</a>> a écrit :<br></div></div></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div class="h5"><div dir="ltr">Hello to all,<div>As the error might require more analysis, I copy/paste the issue mentioned on irc.</div><div><br></div><div>From my understanding, </div><div>when an error appears in the buildbot<->db conversation to add a log line:</div><div><a href="https://github.com/buildbot/buildbot/blob/master/master/buildbot/process/log.py#L76" target="_blank">https://github.com/buildbot/buildbot/blob/master/master/buildbot/process/log.py#L76</a></div><div><br></div><div>the step calling addStdout will not be able to finish, as it will wait </div><div>for the lock to be released, forever.</div><div><br></div><div>I see two solutions for now:</div><div>1. return the deferred in log.addStdout calls, and let the developper</div><div> handle the issue if there is any. It has the drawback of changing</div><div> the way steps are implemented:</div><div> before: </div><div> log.addStdout</div><div> log.addStdout</div><div> after: </div><div> yield log.addStdout</div><div> yield log.addStdout</div><div><br></div><div>2. implement a retry mechanism in addRawLines (with a random sleep between)</div><div> and raise an error if the issue is not solved (that the developer will handle or not). </div><div> This aims at reducing the number of discarded logs so that they could become tolerable.</div><div><br></div><div>gracinet (correct me if i'm wrong) prefers 2, inputs are welcome :)</div><div><br></div><div><br></div><div>P.S: </div><div>I'm currently testing solution 2 in a use case that stresses the logs long enough</div><div>to make the db (mysql innodb) report the following issues:</div><div><br></div><div>- sqlalchemy.exc.OperationalError: (OperationalError) (1213, 'Deadlock found when trying to get lock; try restarting transaction') None None<br></div><div><div><span style="white-space:pre-wrap">- </span>sqlalchemy.exc.OperationalError: (OperationalError) (1213, 'Deadlock found when trying to get lock; try restarting transaction') 'INSERT INTO logchunks (logid, first_line, last_line, content, compressed) VALUES (%s, %s, %s, %s, %s)' (924L, 315L, 315L, 'x\xdaKU\x80\x02\xbf|\x85\x92\xcc\xdcT\x85\xaa\xfc\xbcT\x85\xcc\xbc\xb4| \xa1P\x92\x91Y\xac\x90\x92X\x92\xaa\xa0ad`h\xa6k`\xa2kd\xa1```\x05F\x9a:\n\xc5\xa9%%\x99y\xe9\n%\xf9\n\xa1!\xce\x00BM\x15\x96', 1)</div></div><div><br></div><div><br clear="all"><div><br></div>-- <br><div>Ion</div>
</div></div></div></div>
_______________________________________________<br>
devel mailing list<br>
<a href="mailto:devel@buildbot.net" target="_blank">devel@buildbot.net</a><br>
<a href="https://lists.buildbot.net/mailman/listinfo/devel" rel="noreferrer" target="_blank">https://lists.buildbot.net/mailman/listinfo/devel</a></blockquote></div>
</blockquote></div><br><br clear="all"><div><br></div>-- <br><div class="gmail_signature">Ion</div>
</div>