[users at bb.net] How to Check Worker Status?

Michał Łyszczek michal.lyszczek at bofc.pl
Thu Feb 1 19:39:14 UTC 2018

1 February 2018 19:19:43 CET Chris Spencer:
> I'm having a problem with workers randomly stopping. From the worker's
> logs, I'm seeing:
> 2018-01-26 01:22:33-0500 [-] sending app-level keepalive
> 2018-01-26 01:32:33-0500 [-] sending app-level keepalive
> 2018-01-26 01:42:33-0500 [-] sending app-level keepalive
> 2018-01-26 01:52:33-0500 [-] sending app-level keepalive
> 2018-01-26 02:00:00-0500 [-] Received SIGTERM, shutting down.
> 2018-01-26 02:00:00-0500 [HangCheckProtocol,client] Lost connection to
> 2018-01-26 02:00:00-0500 [-] Stopping factory
> <buildbot_worker.pb.BotFactory instance at 0x7f50af441950>
> 2018-01-26 02:00:00-0500 [-] Main loop terminated.
> 2018-01-26 02:00:00-0500 [-] Server Shut Down.

It doesn't look random to me. Look at a date, exactly 02:00:00. Might be 
coincidance as we see here only one log. But look at this line

> 2018-01-26 02:00:00-0500 [-] Received SIGTERM, shutting down.

Someone sends SIGTERM to your application. At exact time. Does this SIGTERM 
happen always at the same time? Maybe some cron job kills all users 

Grep your worker log for "SIGTERM" and check out the dates and really see if 
they are random or maybe not.

On linux in C, you can check what PID sent SIGTERM to your application. I 
don't think python delivers siginfo_t, but I think you should be possible to 
read that with ctypes maybe? Somone more proficient in python should verify my 
word. If that is true, you could add your code to signal handler too see who 
sent you SIGTERM and then try to fix this.

Best Regards
Michal Lyszczek
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.buildbot.net/pipermail/users/attachments/20180201/fff25afc/attachment.bin>

More information about the users mailing list