<html>

  <head>

    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    Pierre,<br>

    <br>

    Thanks for the information and advice. I'll be looking more closely

    at it in a few days. I understand about the changes. <br>

    <br>

    Currently, I'm working on moving to a multi-master setup. I'm told

    that when on 0.8.6p1 we occasionally had trouble with builds

    stalling for unknown reasons. It's been a lot worse with 9, and

    we're trying to find ways to deal with it. It may be a scaling

    issue. If nothing else, using multi-master will mean that it's less

    disruptive taking down a master than it is now.<br>

    <br>

    We have several builders per worker, but the workers don't always

    have enough resources to have a build active for every builder. So

    we have a lock that makes sure only one build is really doing

    anything at a time. Unfortunately, we've run into situations that,

    looking at the logs, commands complete but the builder never sends

    another command to the worker. This makes things look worse than

    they are, as a build may be sitting doing nothing while the others

    are acquiring locks. Restarting the worker doesn't help.<br>

    <br>

    We also have seen an odd problem, likely from the same cause, where

    a compilation finishes (ours sometimes take hours or a day or two),

    but some part of the system drops the ball. This leaves things is a

    state where one side thinks things finished fine, but the other says

    it didn't. Sorry I don't have more details on that one. I've been

    nearly drowning just trying to get people their builds.<br>

    <br>

    We also seem to have a situation where, even though there are build

    requests queued and none active, the master takes quite a long while

    (if ever), to start a build. Restarting the worker sometimes helps.<br>

    <br>

    It may be some sort of scaling issue. The logs I see say 489

    schedulers, 195 workers (currently 205, if I remember). I'm not sure

    if that's big or not.<br>

    <br>

    In order to try to mitigate some issues, I'm running two completely

    separate masters, and I move workers that need to produce critical

    builds from one to the other. It's far from ideal, but we did need

    to get builds done, and that worked for the short term.<br>

    <br>

    Again, thank you for your time.<br>

    <br>

    Neil Gilmore<br>

    <a moz-do-not-send="true" href="http://grammatech.com"

      rel="noreferrer" target="_blank">grammatech.com</a><br>

    <br>

    <div class="moz-cite-prefix">On 8/22/2016 3:52 AM, Pierre Tardy

      wrote:<br>

    </div>

    <blockquote

cite="mid:CAJ+soVfQuE_X_--ujfnJk6t+fDDLTJgTm3ycD4fVYfDCKhRM4w@mail.gmail.com"

      type="cite">

      <div dir="ltr">Hi Neil,

        <div><br>

        </div>

        <div>In buildbot nine, some big change had to be made in order

          to make the log api asynchronous. Indeed, as logs are now

          written in db, there can be some significant latency, thus the

          need for asynchronous api.</div>

        <div><br>

        </div>

        <div>We tried our best to keep synchronous api backward

          compatibility but we couldn't support all of them. Only write

          apis are supported with some restrictions. Read apis are not

          supported (like getText), even if those api are not properly

          cleaned-out yet.</div>

        <div><br>

        </div>

        <div>For your described use case, I would rather do the log

          shortening in the email reporter module, rather than in the

          steps.</div>

        <div><br>

        </div>

        <div>You can find here some example code on how to fetch log

          content</div>

        <div><a moz-do-not-send="true"

href="https://github.com/buildbot/buildbot/blob/master/master/buildbot/test/util/integration.py#L188">https://github.com/buildbot/buildbot/blob/master/master/buildbot/test/util/integration.py#L188</a><br>

        </div>

        <div>

          <div><br>

          </div>

          <div>you can use resultSpec in order to limit the amount of

            lines you want (offset and limit at in line):</div>

          <div>

            <div>from buildbot.data import resultspec</div>

          </div>

          <div>first_100_lines = yield self.master.data.get(("logs",

            log['logid'], "contents"),

            resultSpec=resultspec.ResultSpec(offset=0, limit=100))</div>

        </div>

        <div><br>

        </div>

        <div>You can also use a logobserver that will stop concatenating

          when it reaches the 64k limit. This shouldn't have more memory

          footprint has you have right now (getText() will load the

          entire log in memory!)</div>

        <div><br>

        </div>

        <div>Thanks for testing out nine, and letting us know your

          issues!</div>

        <div><br>

        </div>

        <div>Pierre</div>

      </div>

      <br>

      <div class="gmail_quote">

        <div dir="ltr">Le ven. 19 août 2016 à 16:41, Neil Gilmore <<a

            moz-do-not-send="true" href="mailto:ngilmore@grammatech.com">ngilmore@grammatech.com</a>>

          a écrit :<br>

        </div>

        <blockquote class="gmail_quote" style="margin:0 0 0

          .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi everyone,<br>

          <br>

          We've been trying to move from primarily 0.8.6p1(?) to

          0.9.0rc1. We're<br>

          having some problems.<br>

          <br>

          Among them is a custom build step that no longer works. It

          looks<br>

          substantially like this:<br>

          <br>

          class MyCustomStep(ShellCommand):<br>

               name = "errorlog"<br>

               haltOnFailure = 1<br>

               description = ["checking for errors"]<br>

               descriptionDone = ["done checking for errors"]<br>

          <br>

               OFFprogressMetrics = ('output',)<br>

               # things to track: number of files compiled, number of

          directories<br>

               # traversed (assuming 'make' is being used)<br>

          <br>

               def createSummary(self, cmd):<br>

                 

           self.addCompleteLog('stdio_head',cmd.getText()[0:65536])<br>

          <br>

          We use this step to create a shorter log that we can use in

          emails<br>

          without choking our systems.<br>

          <br>

          cmd.getText() never returns any text.<br>

          <br>

          cmd is a SyncLogFileWrapper, and I've tried writing nearly

          everything in<br>

          it to twistd.log via log.msg. It never seems to have any<br>

          finishDeferreds, or chunks, and always gets marked finished.<br>

          <br>

          I thought that maybe I needed to call waitUntilFinished().<br>

          Unfortunately, this function excepts in code from the 0.9.0rc1

          tarball,<br>

          and the error is in both the github master and 0.9.0 branch<br>

          (finishDeferreds is spelled finishDefereds). Fixing it didn't

          help, as<br>

          there were no finishDefeereds out there to finish anyway.<br>

          <br>

          That, combined with the fact that the only place where the log

          argument<br>

          to createSummary() is used is in the integration tests, leads

          me to<br>

          think that this hasn't been tested too much, if at all.<br>

          <br>

          I've thought about using the data API, but I'd need a log id.<br>

          SyncLogFileWrapper doesn't seem to have one. I'm currently

          working on<br>

          getting it from the step, because I can at least get the step

          id, and I<br>

          hope I can work my way down from there.<br>

          <br>

          I think I'd rather not try using a LogObserver, as I don't

          think we want<br>

          to be accumulating large logs in memory.<br>

          <br>

          Has anyone out there successfully gotten text from the log

          argument of<br>

          CreateSummary in 0.9.0rc1?<br>

          <br>

          Thank you for your time.<br>

          <br>

          Neil Gilmore<br>

          <a moz-do-not-send="true" href="http://grammatech.com"

            rel="noreferrer" target="_blank">grammatech.com</a><br>

          _______________________________________________<br>

          users mailing list<br>

          <a moz-do-not-send="true" href="mailto:users@buildbot.net"

            target="_blank">users@buildbot.net</a><br>

          <a moz-do-not-send="true"

            href="https://lists.buildbot.net/mailman/listinfo/users"

            rel="noreferrer" target="_blank">https://lists.buildbot.net/mailman/listinfo/users</a><br>

        </blockquote>

      </div>

    </blockquote>

    <br>

  </body>

</html>