<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
Hi Pierre,<br>
<br>
Yes, I'm describing multiple symptoms here. But the message queues
were the problem, despite not seeing anything in the logs.<br>
<br>
5. The integrity errors look like this (not a failing disk in this
case.):<br>
2017-03-07T07:36:04-0500 [-] Got fatal Exception on DB<br>
Traceback (most recent call last):<br>
Failure: sqlalchemy.exc.IntegrityError: (IntegrityError)
update or delete on table "changes" violates foreign key constraint
"changes_parent_changeids_fkey" on table "changes"<br>
DETAIL: Key (changeid)=(5983) is still referenced from
table "changes".<br>
'DELETE FROM changes WHERE changes.changeid IN
(%(changeid_1)s, %(changeid_2)s, %(changeid_3)s, %(changeid_4)s,
%(changeid_5)s, %(changeid_6)s, %(changeid_7)s, %(changeid_8)s,
%(changeid_9)s, %(changeid_10)s, %(changeid_11)s, %(changeid_12)s,
%(changeid_13)s, %(changeid_14)s, %(changeid_15)s, %(changeid_16)s,
%(changeid_17)s, %(changeid_18)s, %(changeid_19)s, %(changeid_20)s,
%(changeid_21)s, %(changeid_22)s, %(changeid_23)s, %(changeid_24)s,
%(changeid_25)s, %(changeid_26)s, %(changeid_27)s, %(changeid_28)s,
%(changeid_29)s, %(changeid_30)s, %(changeid_31)s, %(changeid_32)s,
%(changeid_33)s, %(changeid_34)s, %(changeid_35)s, %(changeid_36)s,
%(changeid_37)s, %(changeid_38)s, %(changeid_39)s, %(changeid_40)s,
%(changeid_41)s, %(changeid_42)s, %(changeid_43)s, %(changeid_44)s,
%(changeid_45)s, %(changeid_46)s, %(changeid_47)s, %(changeid_48)s,
%(changeid_49)s, %(changeid_50)s, %(changeid_51)s, %(changeid_52)s,
%(changeid_53)s, %(changeid_54)s, %(changeid_55)s, %(changeid_56)s,
%(changeid_57)s, %(changeid_58)s, %(changeid_59)s, %(changeid_60)s,
%(changeid_61)s, %(changeid_62)s, %(changeid_63)s, %(changeid_64)s,
%(changeid_65)s, %(changeid_66)s, %(changeid_67)s, %(changeid_68)s,
%(changeid_69)s, %(changeid_70)s, %(changeid_71)s, %(changeid_72)s,
%(changeid_73)s, %(changeid_74)s, %(changeid_75)s, %(changeid_76)s,
%(changeid_77)s, %(changeid_78)s, %(changeid_79)s, %(changeid_80)s,
%(changeid_81)s, %(changeid_82)s, %(changeid_83)s, %(changeid_84)s,
%(changeid_85)s, %(changeid_86)s, %(changeid_87)s, %(changeid_88)s,
%(changeid_89)s, %(changeid_90)s, %(changeid_91)s, %(changeid_92)s,
%(changeid_93)s, %(changeid_94)s, %(changeid_95)s, %(changeid_96)s,
%(changeid_97)s, %(changeid_98)s, %(changeid_99)s,
%(changeid_100)s)' {'changeid_100': 5903, 'changeid_29': 5974,
'changeid_28': 5975, 'changeid_27': 5976, 'changeid_26': 5977,
'changeid_25': 5978, 'changeid_24': 5979, 'changeid_23': 5980,
'changeid_22': 5981, 'changeid_21': 5982, 'changeid_20': 5983,
'changeid_89': 5914, 'changeid_88': 5915, 'changeid_81': 5922,
'changeid_80': 5923, 'changeid_83': 5920, 'changeid_82': 5921,
'changeid_85': 5918, 'changeid_84': 5919, 'changeid_87': 5916,
'changeid_86': 5917, 'changeid_38': 5965, 'changeid_39': 5964,
'changeid_34': 5969, 'changeid_35': 5968, 'changeid_36': 5967,
'changeid_37': 5966, 'changeid_30': 5973, 'changeid_31': 5972,
'changeid_32': 5971, 'changeid_33': 5970, 'changeid_98': 5905,
'changeid_99': 5904, 'changeid_96': 5907, 'changeid_97': 5906,
'changeid_94': 5909, 'changeid_95': 5908, 'changeid_92': 5911,
'changeid_93': 5910, 'changeid_90': 5913, 'changeid_91': 5912,
'changeid_63': 5940, 'changeid_62': 5941, 'changeid_61': 5942,
'changeid_60': 5943, 'changeid_67': 5936, 'changeid_66': 5937,
'changeid_65': 5938, 'changeid_64': 5939, 'changeid_69': 5934,
'changeid_68': 5935, 'changeid_16': 5987, 'changeid_17': 5986,
'changeid_14': 5989, 'changeid_15': 5988, 'changeid_12': 5991,
'changeid_13': 5990, 'changeid_10': 5993, 'changeid_11': 5992,
'changeid_18': 5985, 'changeid_19': 5984, 'changeid_70': 5933,
'changeid_71': 5932, 'changeid_72': 5931, 'changeid_73': 5930,
'changeid_74': 5929, 'changeid_75': 5928, 'changeid_76': 5927,
'changeid_77': 5926, 'changeid_78': 5925, 'changeid_79': 5924,
'changeid_45': 5958, 'changeid_44': 5959, 'changeid_47': 5956,
'changeid_46': 5957, 'changeid_41': 5962, 'changeid_40': 5963,
'changeid_43': 5960, 'changeid_42': 5961, 'changeid_49': 5954,
'changeid_48': 5955, 'changeid_4': 5999, 'changeid_5': 5998,
'changeid_6': 5997, 'changeid_7': 5996, 'changeid_1': 6002,
'changeid_2': 6001, 'changeid_3': 6000, 'changeid_8': 5995,
'changeid_9': 5994, 'changeid_58': 5945, 'changeid_59': 5944,
'changeid_52': 5951, 'changeid_53': 5950, 'changeid_50': 5953,
'changeid_51': 5952, 'changeid_56': 5947, 'changeid_57': 5946,
'changeid_54': 5949, 'changeid_55': 5948}<br>
<br>
2017-03-07T07:36:04-0500 [-] while pruning changes<br>
Traceback (most recent call last):<br>
File "/usr/lib/python2.7/threading.py", line 551, in
__bootstrap_inner<br>
self.run()<br>
File "/usr/lib/python2.7/threading.py", line 504, in run<br>
self.__target(*self.__args, **self.__kwargs)<br>
File
"/usr/local/lib/python2.7/dist-packages/Twisted-16.3.0-py2.7-linux-x86_64.egg/twisted/_threads/_threadworker.py",
line 46, in work<br>
task()<br>
File
"/usr/local/lib/python2.7/dist-packages/Twisted-16.3.0-py2.7-linux-x86_64.egg/twisted/_threads/_team.py",
line 190, in doWork<br>
task()<br>
--- <exception caught here> ---<br>
File
"/usr/local/lib/python2.7/dist-packages/Twisted-16.3.0-py2.7-linux-x86_64.egg/twisted/python/threadpool.py",
line 246, in inContext<br>
result = inContext.theWork()<br>
File
"/usr/local/lib/python2.7/dist-packages/Twisted-16.3.0-py2.7-linux-x86_64.egg/twisted/python/threadpool.py",
line 262, in <lambda><br>
inContext.theWork = lambda: context.call(ctx, func,
*args, **kw)<br>
File
"/usr/local/lib/python2.7/dist-packages/Twisted-16.3.0-py2.7-linux-x86_64.egg/twisted/python/context.py",
line 118, in callWithContext<br>
return self.currentContext().callWithContext(ctx, func,
*args, **kw)<br>
File
"/usr/local/lib/python2.7/dist-packages/Twisted-16.3.0-py2.7-linux-x86_64.egg/twisted/python/context.py",
line 81, in callWithContext<br>
return func(*args,**kw)<br>
File
"/usr/local/lib/python2.7/dist-packages/buildbot-0.9.3-py2.7.egg/buildbot/db/pool.py",
line 180, in __thd<br>
rv = callable(arg, *args, **kwargs)<br>
File
"/usr/local/lib/python2.7/dist-packages/buildbot-0.9.3-py2.7.egg/buildbot/db/changes.py",
line 338, in thd<br>
table.delete(table.c.changeid.in_(batch)))<br>
File
"build/bdist.linux-x86_64/egg/sqlalchemy/engine/base.py", line 662,
in execute<br>
<br>
File
"build/bdist.linux-x86_64/egg/sqlalchemy/engine/base.py", line 761,
in _execute_clauseelement<br>
<br>
File
"build/bdist.linux-x86_64/egg/sqlalchemy/engine/base.py", line 874,
in _execute_context<br>
<br>
File
"build/bdist.linux-x86_64/egg/sqlalchemy/engine/base.py", line 1024,
in _handle_dbapi_exception<br>
<br>
File
"build/bdist.linux-x86_64/egg/sqlalchemy/util/compat.py", line 195,
in raise_from_cause<br>
<br>
File
"build/bdist.linux-x86_64/egg/sqlalchemy/engine/base.py", line 867,
in _execute_context<br>
<br>
File
"build/bdist.linux-x86_64/egg/sqlalchemy/engine/default.py", line
324, in do_execute<br>
<br>
sqlalchemy.exc.IntegrityError: (IntegrityError) update or
delete on table "changes" violates foreign key constraint
"changes_parent_changeids_fkey" on table "changes"<br>
DETAIL: Key (changeid)=(5983) is still referenced from
table "changes".<br>
'DELETE FROM changes WHERE changes.changeid IN
(%(changeid_1)s, %(changeid_2)s, %(changeid_3)s, %(changeid_4)s,
%(changeid_5)s, %(changeid_6)s, %(changeid_7)s, %(changeid_8)s,
%(changeid_9)s, %(changeid_10)s, %(changeid_11)s, %(changeid_12)s,
%(changeid_13)s, %(changeid_14)s, %(changeid_15)s, %(changeid_16)s,
%(changeid_17)s, %(changeid_18)s, %(changeid_19)s, %(changeid_20)s,
%(changeid_21)s, %(changeid_22)s, %(changeid_23)s, %(changeid_24)s,
%(changeid_25)s, %(changeid_26)s, %(changeid_27)s, %(changeid_28)s,
%(changeid_29)s, %(changeid_30)s, %(changeid_31)s, %(changeid_32)s,
%(changeid_33)s, %(changeid_34)s, %(changeid_35)s, %(changeid_36)s,
%(changeid_37)s, %(changeid_38)s, %(changeid_39)s, %(changeid_40)s,
%(changeid_41)s, %(changeid_42)s, %(changeid_43)s, %(changeid_44)s,
%(changeid_45)s, %(changeid_46)s, %(changeid_47)s, %(changeid_48)s,
%(changeid_49)s, %(changeid_50)s, %(changeid_51)s, %(changeid_52)s,
%(changeid_53)s, %(changeid_54)s, %(changeid_55)s, %(changeid_56)s,
%(changeid_57)s, %(changeid_58)s, %(changeid_59)s, %(changeid_60)s,
%(changeid_61)s, %(changeid_62)s, %(changeid_63)s, %(changeid_64)s,
%(changeid_65)s, %(changeid_66)s, %(changeid_67)s, %(changeid_68)s,
%(changeid_69)s, %(changeid_70)s, %(changeid_71)s, %(changeid_72)s,
%(changeid_73)s, %(changeid_74)s, %(changeid_75)s, %(changeid_76)s,
%(changeid_77)s, %(changeid_78)s, %(changeid_79)s, %(changeid_80)s,
%(changeid_81)s, %(changeid_82)s, %(changeid_83)s, %(changeid_84)s,
%(changeid_85)s, %(changeid_86)s, %(changeid_87)s, %(changeid_88)s,
%(changeid_89)s, %(changeid_90)s, %(changeid_91)s, %(changeid_92)s,
%(changeid_93)s, %(changeid_94)s, %(changeid_95)s, %(changeid_96)s,
%(changeid_97)s, %(changeid_98)s, %(changeid_99)s,
%(changeid_100)s)' {'changeid_100': 5903, 'changeid_29': 5974,
'changeid_28': 5975, 'changeid_27': 5976, 'changeid_26': 5977,
'changeid_25': 5978, 'changeid_24': 5979, 'changeid_23': 5980,
'changeid_22': 5981, 'changeid_21': 5982, 'changeid_20': 5983,
'changeid_89': 5914, 'changeid_88': 5915, 'changeid_81': 5922,
'changeid_80': 5923, 'changeid_83': 5920, 'changeid_82': 5921,
'changeid_85': 5918, 'changeid_84': 5919, 'changeid_87': 5916,
'changeid_86': 5917, 'changeid_38': 5965, 'changeid_39': 5964,
'changeid_34': 5969, 'changeid_35': 5968, 'changeid_36': 5967,
'changeid_37': 5966, 'changeid_30': 5973, 'changeid_31': 5972,
'changeid_32': 5971, 'changeid_33': 5970, 'changeid_98': 5905,
'changeid_99': 5904, 'changeid_96': 5907, 'changeid_97': 5906,
'changeid_94': 5909, 'changeid_95': 5908, 'changeid_92': 5911,
'changeid_93': 5910, 'changeid_90': 5913, 'changeid_91': 5912,
'changeid_63': 5940, 'changeid_62': 5941, 'changeid_61': 5942,
'changeid_60': 5943, 'changeid_67': 5936, 'changeid_66': 5937,
'changeid_65': 5938, 'changeid_64': 5939, 'changeid_69': 5934,
'changeid_68': 5935, 'changeid_16': 5987, 'changeid_17': 5986,
'changeid_14': 5989, 'changeid_15': 5988, 'changeid_12': 5991,
'changeid_13': 5990, 'changeid_10': 5993, 'changeid_11': 5992,
'changeid_18': 5985, 'changeid_19': 5984, 'changeid_70': 5933,
'changeid_71': 5932, 'changeid_72': 5931, 'changeid_73': 5930,
'changeid_74': 5929, 'changeid_75': 5928, 'changeid_76': 5927,
'changeid_77': 5926, 'changeid_78': 5925, 'changeid_79': 5924,
'changeid_45': 5958, 'changeid_44': 5959, 'changeid_47': 5956,
'changeid_46': 5957, 'changeid_41': 5962, 'changeid_40': 5963,
'changeid_43': 5960, 'changeid_42': 5961, 'changeid_49': 5954,
'changeid_48': 5955, 'changeid_4': 5999, 'changeid_5': 5998,
'changeid_6': 5997, 'changeid_7': 5996, 'changeid_1': 6002,
'changeid_2': 6001, 'changeid_3': 6000, 'changeid_8': 5995,
'changeid_9': 5994, 'changeid_58': 5945, 'changeid_59': 5944,
'changeid_52': 5951, 'changeid_53': 5950, 'changeid_50': 5953,
'changeid_51': 5952, 'changeid_56': 5947, 'changeid_57': 5946,
'changeid_54': 5949, 'changeid_55': 5948}<br>
<br>
Thanks!<br>
<br>
Neil Gilmore<br>
grammatech.com<br>
<br>
<div class="moz-cite-prefix">On 3/7/2017 3:49 AM, Pierre Tardy
wrote:<br>
</div>
<blockquote
cite="mid:CAJ+soVcTcT09M3S5ofTXR9oYXEz0Fi=2PMq1sdp6SuuDhfvupQ@mail.gmail.com"
type="cite">
<div dir="ltr">Hi Neil,
<div>I am not sure exactly how I can help on this as you are
describing lots of symptoms.</div>
<div><br>
</div>
<div>What goes to my mind right now is a problem with the
message queue. In the multimaster tests I am doing, I figured
out that a disconnection of the message queue is not recovered
right now, which could explain why build do not start (the
master will not check for new requests unless they receive a
message)</div>
<div><br>
</div>
<div>However, when the mq fails, I can see evidence of it in the
logs, but you don't mention any issue in the logs.</div>
<div><br>
</div>
<div>Database integrity errors looks bad also, what kind of
errors is that? We already had some reports of those which
were due to a failing disk. Could that be the case?</div>
<div><br>
</div>
<div>Regards</div>
<div>Pierre</div>
<div><br>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr">On Mon, Mar 6, 2017 at 10:36 PM Neil Gilmore <<a
moz-do-not-send="true" href="mailto:ngilmore@grammatech.com">ngilmore@grammatech.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">Hi everyone,<br
class="gmail_msg">
<br class="gmail_msg">
Well, things ran OK for a couple weeks. But we had some
problems<br class="gmail_msg">
starting last weekend. At least some failure emails don't seem
to be<br class="gmail_msg">
getting sent out. And a problem we'd been having a bit of got
a lot worse.<br class="gmail_msg">
<br class="gmail_msg">
For whatever reason, queued builds don't seem to want to
start.<br class="gmail_msg">
Sometimes for hours. Even forced builds. This doesn't seem to
be a<br class="gmail_msg">
locking problem, though I'll be having a look at that side in
a bit. But<br class="gmail_msg">
we'll have builds sitting for hours before they start. If they
start.<br class="gmail_msg">
Some of our people get antsy and cancel the current queue then
force a<br class="gmail_msg">
build. But sometimes those wait, too.<br class="gmail_msg">
<br class="gmail_msg">
And we're having trouble getting the masters to deal with new
revisions<br class="gmail_msg">
fro svn. Everything else looks OK (postcommit hooks, etc.) I'm
just not<br class="gmail_msg">
sure what's going on.<br class="gmail_msg">
<br class="gmail_msg">
Reconfig hasn't helped, nor has restarting one of the masters.<br
class="gmail_msg">
<br class="gmail_msg">
We are getting integrity errors in our database, too.<br
class="gmail_msg">
<br class="gmail_msg">
Except for the database problem, the rest looks like network
connection<br class="gmail_msg">
stuff, perhaps, though we haven't had any problems there for a
while.<br class="gmail_msg">
<br class="gmail_msg">
Neil Gilmore<br class="gmail_msg">
<a moz-do-not-send="true" href="http://grammatech.com"
rel="noreferrer" class="gmail_msg" target="_blank">grammatech.com</a><br
class="gmail_msg">
_______________________________________________<br
class="gmail_msg">
users mailing list<br class="gmail_msg">
<a moz-do-not-send="true" href="mailto:users@buildbot.net"
class="gmail_msg" target="_blank">users@buildbot.net</a><br
class="gmail_msg">
<a moz-do-not-send="true"
href="https://lists.buildbot.net/mailman/listinfo/users"
rel="noreferrer" class="gmail_msg" target="_blank">https://lists.buildbot.net/mailman/listinfo/users</a><br
class="gmail_msg">
</blockquote>
</div>
</blockquote>
<br>
</body>
</html>