[Buildbot-devel] Slave not recognizing process end

"Martin v. Löwis" martin at v.loewis.de
Mon Jan 4 08:08:44 UTC 2010


> I'm also not extremely familiar with the code, but let's say that
> there's a timeout (which I think there is) which tries to kill the
> process.  To avoid the exception, the callback for handling process
> termination (with or without error) should cancel the timeout.

And indeed, it does.

> Thus, no
> attempt is made by the timeout-related code to kill the process.  If
> there are other bits of code which would try to kill the process,
> perhaps they can have their event sources aborted as well.

And they do.

However, there is *still* a race condition.

> The main reason this is better than just handling the exception is that
> it fits right in with the process tracking that BuildBot really should
> already be doing.  If it is going off and killing a process that exited
> already, then the obvious question to ask is why it didn't notice the
> process already exited, and what consequences this has for any related
> status view and other internal state tracking.

That is indeed the question, and there is apparently also a bug lurking
there - in my case, the process terminated well before buildbot tried to
kill it. Normally, buildbot would also recognize the child's
termination, but in this specific case (under yet-to-be-determined
circumstances), it wouldn't.

However, *independent* of this possible bug, there is a race condition
where killing the child may fail even if you follow the guidelines
you outline above.

Regards,
Martin





More information about the devel mailing list