[Buildbot-commits] [Buildbot] #2454: SiGHUP doesn't always work

Sun Aug 11 03:23:33 UTC 2013

#2454: SiGHUP doesn't always work
----------------------------+----------------------
Reporter:  virgilg          |       Owner:
    Type:  support-request  |      Status:  new
Priority:  major            |   Milestone:  ongoing
 Version:  0.8.7p1          |  Resolution:
Keywords:                   |
----------------------------+----------------------

Comment (by dustin):

 Replying to [comment:12 virgilg]:
 > I can repro this as follows:
 >
 > 1) save the attached script as e.g. test.py[[BR]]
 > 2) comment and uncomment line 12: time.sleep(1)
 >
 > With time.sleep(1) uncommented, one cannot deliver a SIG<MUMBLE> to the
 process, since the MainThread is too busy inside the while True: loop and
 python is not really multithreaded.
 >
 > With time.sleep(1) commented out, handler() will get a chance to run.

 I can't reproduce this.  In either case - sleep or not - I see "Interrupt
 received" when I send it a SIGINT.  On both Mountain Lion (using the
 system Python) and on Linux.

 Given the way signal handling works, that's not terribly surprising.  UNIX
 will deliver a signal to the main thread of a process immediately, either
 by pre-empting the process (if it's currently running), by scheduling it
 (if it's ready to run), or by returning early from a syscall (if it's
 currently in a syscall).  In any of those cases, Python sets an internal
 flag for the signal, sets the global is_tripped, adds a pending call to
 PyErr_CheckSignals, and if necessary uses a selfpipe to wake up the main
 process.  Python checks for pending calls every few Python opcodes.  A
 tight python loop (`while True: pass`) is still executing opcodes, so the
 pending call gets checked, and the exception gets handled.  You indicated
 that the loop *with* `time.sleep(1)` was not interruptible.  In that case,
 the signal is most likely delivered during the sleep, which is either a
 syscall that will be awakened, or a select() that includes the selfpipe,
 so the signal should *still* be delivered immediately.  Which is what I'm
 seeing.

 Since we're seeing different behavior from a very simple Python script, I
 think we should look toward the version and build of Python that we're
 using.  If we get those to match and we're *still* seeing different
 behavior, then we'll have to look more closely at the reproduction recipe,
 and try to find a suitable machine we both have access to.

 > I see it process a ton of "events" that never finish:
 events_company.com/11305023
 > Where do these events come from? What generates them?
 events_company.com/state is not the droid I'm looking for, is it?

 This is a separate issue, which I think you filed a different bug on.
 Your events shouldn't be backing up into the disk storage like that.  But
 let's focus on this bug for now.

 If I accept for a moment your hypothesis that "tight" Python loops
 preclude UNIX signals, then I see how this would be related.  But I don't
 know what "tight" means - if I loop over a list with two elements, then
 for a moment the CPU is just as busy as it would be if I were looping over
 1,000,000 elements.  If a signal's delivered during that time, is it
 ignored?  What if it's delivered while a multiplication operation is
 taking place?  What defines "tight"?

 > The other 6 threads are all stuck in threading.py:

 Yep, that's the `Condition` class, Those are all worker threads helpfully
 waiting for work.

 I think that we should talk through some medium other than Trac.  I'll try
 to send you an email at the address in Trac, but if that fails, please get
 in touch.

-- 
Ticket URL: <http://trac.buildbot.net/ticket/2454#comment:14>
Buildbot <http://buildbot.net/>
Buildbot: build/test automation