[users at bb.net] More 0.9.0rc2 multi-master anecdotes.
Neil Gilmore
ngilmore at grammatech.com
Tue Mar 7 16:59:21 UTC 2017
Hi Pierre,
Yes, I'm describing multiple symptoms here. But the message queues were
the problem, despite not seeing anything in the logs.
5. The integrity errors look like this (not a failing disk in this case.):
2017-03-07T07:36:04-0500 [-] Got fatal Exception on DB
Traceback (most recent call last):
Failure: sqlalchemy.exc.IntegrityError: (IntegrityError) update
or delete on table "changes" violates foreign key constraint
"changes_parent_changeids_fkey" on table "changes"
DETAIL: Key (changeid)=(5983) is still referenced from table
"changes".
'DELETE FROM changes WHERE changes.changeid IN
(%(changeid_1)s, %(changeid_2)s, %(changeid_3)s, %(changeid_4)s,
%(changeid_5)s, %(changeid_6)s, %(changeid_7)s, %(changeid_8)s,
%(changeid_9)s, %(changeid_10)s, %(changeid_11)s, %(changeid_12)s,
%(changeid_13)s, %(changeid_14)s, %(changeid_15)s, %(changeid_16)s,
%(changeid_17)s, %(changeid_18)s, %(changeid_19)s, %(changeid_20)s,
%(changeid_21)s, %(changeid_22)s, %(changeid_23)s, %(changeid_24)s,
%(changeid_25)s, %(changeid_26)s, %(changeid_27)s, %(changeid_28)s,
%(changeid_29)s, %(changeid_30)s, %(changeid_31)s, %(changeid_32)s,
%(changeid_33)s, %(changeid_34)s, %(changeid_35)s, %(changeid_36)s,
%(changeid_37)s, %(changeid_38)s, %(changeid_39)s, %(changeid_40)s,
%(changeid_41)s, %(changeid_42)s, %(changeid_43)s, %(changeid_44)s,
%(changeid_45)s, %(changeid_46)s, %(changeid_47)s, %(changeid_48)s,
%(changeid_49)s, %(changeid_50)s, %(changeid_51)s, %(changeid_52)s,
%(changeid_53)s, %(changeid_54)s, %(changeid_55)s, %(changeid_56)s,
%(changeid_57)s, %(changeid_58)s, %(changeid_59)s, %(changeid_60)s,
%(changeid_61)s, %(changeid_62)s, %(changeid_63)s, %(changeid_64)s,
%(changeid_65)s, %(changeid_66)s, %(changeid_67)s, %(changeid_68)s,
%(changeid_69)s, %(changeid_70)s, %(changeid_71)s, %(changeid_72)s,
%(changeid_73)s, %(changeid_74)s, %(changeid_75)s, %(changeid_76)s,
%(changeid_77)s, %(changeid_78)s, %(changeid_79)s, %(changeid_80)s,
%(changeid_81)s, %(changeid_82)s, %(changeid_83)s, %(changeid_84)s,
%(changeid_85)s, %(changeid_86)s, %(changeid_87)s, %(changeid_88)s,
%(changeid_89)s, %(changeid_90)s, %(changeid_91)s, %(changeid_92)s,
%(changeid_93)s, %(changeid_94)s, %(changeid_95)s, %(changeid_96)s,
%(changeid_97)s, %(changeid_98)s, %(changeid_99)s, %(changeid_100)s)'
{'changeid_100': 5903, 'changeid_29': 5974, 'changeid_28': 5975,
'changeid_27': 5976, 'changeid_26': 5977, 'changeid_25': 5978,
'changeid_24': 5979, 'changeid_23': 5980, 'changeid_22': 5981,
'changeid_21': 5982, 'changeid_20': 5983, 'changeid_89': 5914,
'changeid_88': 5915, 'changeid_81': 5922, 'changeid_80': 5923,
'changeid_83': 5920, 'changeid_82': 5921, 'changeid_85': 5918,
'changeid_84': 5919, 'changeid_87': 5916, 'changeid_86': 5917,
'changeid_38': 5965, 'changeid_39': 5964, 'changeid_34': 5969,
'changeid_35': 5968, 'changeid_36': 5967, 'changeid_37': 5966,
'changeid_30': 5973, 'changeid_31': 5972, 'changeid_32': 5971,
'changeid_33': 5970, 'changeid_98': 5905, 'changeid_99': 5904,
'changeid_96': 5907, 'changeid_97': 5906, 'changeid_94': 5909,
'changeid_95': 5908, 'changeid_92': 5911, 'changeid_93': 5910,
'changeid_90': 5913, 'changeid_91': 5912, 'changeid_63': 5940,
'changeid_62': 5941, 'changeid_61': 5942, 'changeid_60': 5943,
'changeid_67': 5936, 'changeid_66': 5937, 'changeid_65': 5938,
'changeid_64': 5939, 'changeid_69': 5934, 'changeid_68': 5935,
'changeid_16': 5987, 'changeid_17': 5986, 'changeid_14': 5989,
'changeid_15': 5988, 'changeid_12': 5991, 'changeid_13': 5990,
'changeid_10': 5993, 'changeid_11': 5992, 'changeid_18': 5985,
'changeid_19': 5984, 'changeid_70': 5933, 'changeid_71': 5932,
'changeid_72': 5931, 'changeid_73': 5930, 'changeid_74': 5929,
'changeid_75': 5928, 'changeid_76': 5927, 'changeid_77': 5926,
'changeid_78': 5925, 'changeid_79': 5924, 'changeid_45': 5958,
'changeid_44': 5959, 'changeid_47': 5956, 'changeid_46': 5957,
'changeid_41': 5962, 'changeid_40': 5963, 'changeid_43': 5960,
'changeid_42': 5961, 'changeid_49': 5954, 'changeid_48': 5955,
'changeid_4': 5999, 'changeid_5': 5998, 'changeid_6': 5997,
'changeid_7': 5996, 'changeid_1': 6002, 'changeid_2': 6001,
'changeid_3': 6000, 'changeid_8': 5995, 'changeid_9': 5994,
'changeid_58': 5945, 'changeid_59': 5944, 'changeid_52': 5951,
'changeid_53': 5950, 'changeid_50': 5953, 'changeid_51': 5952,
'changeid_56': 5947, 'changeid_57': 5946, 'changeid_54': 5949,
'changeid_55': 5948}
2017-03-07T07:36:04-0500 [-] while pruning changes
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 551, in
__bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 504, in run
self.__target(*self.__args, **self.__kwargs)
File
"/usr/local/lib/python2.7/dist-packages/Twisted-16.3.0-py2.7-linux-x86_64.egg/twisted/_threads/_threadworker.py",
line 46, in work
task()
File
"/usr/local/lib/python2.7/dist-packages/Twisted-16.3.0-py2.7-linux-x86_64.egg/twisted/_threads/_team.py",
line 190, in doWork
task()
--- <exception caught here> ---
File
"/usr/local/lib/python2.7/dist-packages/Twisted-16.3.0-py2.7-linux-x86_64.egg/twisted/python/threadpool.py",
line 246, in inContext
result = inContext.theWork()
File
"/usr/local/lib/python2.7/dist-packages/Twisted-16.3.0-py2.7-linux-x86_64.egg/twisted/python/threadpool.py",
line 262, in <lambda>
inContext.theWork = lambda: context.call(ctx, func, *args,
**kw)
File
"/usr/local/lib/python2.7/dist-packages/Twisted-16.3.0-py2.7-linux-x86_64.egg/twisted/python/context.py",
line 118, in callWithContext
return self.currentContext().callWithContext(ctx, func,
*args, **kw)
File
"/usr/local/lib/python2.7/dist-packages/Twisted-16.3.0-py2.7-linux-x86_64.egg/twisted/python/context.py",
line 81, in callWithContext
return func(*args,**kw)
File
"/usr/local/lib/python2.7/dist-packages/buildbot-0.9.3-py2.7.egg/buildbot/db/pool.py",
line 180, in __thd
rv = callable(arg, *args, **kwargs)
File
"/usr/local/lib/python2.7/dist-packages/buildbot-0.9.3-py2.7.egg/buildbot/db/changes.py",
line 338, in thd
table.delete(table.c.changeid.in_(batch)))
File
"build/bdist.linux-x86_64/egg/sqlalchemy/engine/base.py", line 662, in
execute
File
"build/bdist.linux-x86_64/egg/sqlalchemy/engine/base.py", line 761, in
_execute_clauseelement
File
"build/bdist.linux-x86_64/egg/sqlalchemy/engine/base.py", line 874, in
_execute_context
File
"build/bdist.linux-x86_64/egg/sqlalchemy/engine/base.py", line 1024, in
_handle_dbapi_exception
File
"build/bdist.linux-x86_64/egg/sqlalchemy/util/compat.py", line 195, in
raise_from_cause
File
"build/bdist.linux-x86_64/egg/sqlalchemy/engine/base.py", line 867, in
_execute_context
File
"build/bdist.linux-x86_64/egg/sqlalchemy/engine/default.py", line 324,
in do_execute
sqlalchemy.exc.IntegrityError: (IntegrityError) update or
delete on table "changes" violates foreign key constraint
"changes_parent_changeids_fkey" on table "changes"
DETAIL: Key (changeid)=(5983) is still referenced from table
"changes".
'DELETE FROM changes WHERE changes.changeid IN
(%(changeid_1)s, %(changeid_2)s, %(changeid_3)s, %(changeid_4)s,
%(changeid_5)s, %(changeid_6)s, %(changeid_7)s, %(changeid_8)s,
%(changeid_9)s, %(changeid_10)s, %(changeid_11)s, %(changeid_12)s,
%(changeid_13)s, %(changeid_14)s, %(changeid_15)s, %(changeid_16)s,
%(changeid_17)s, %(changeid_18)s, %(changeid_19)s, %(changeid_20)s,
%(changeid_21)s, %(changeid_22)s, %(changeid_23)s, %(changeid_24)s,
%(changeid_25)s, %(changeid_26)s, %(changeid_27)s, %(changeid_28)s,
%(changeid_29)s, %(changeid_30)s, %(changeid_31)s, %(changeid_32)s,
%(changeid_33)s, %(changeid_34)s, %(changeid_35)s, %(changeid_36)s,
%(changeid_37)s, %(changeid_38)s, %(changeid_39)s, %(changeid_40)s,
%(changeid_41)s, %(changeid_42)s, %(changeid_43)s, %(changeid_44)s,
%(changeid_45)s, %(changeid_46)s, %(changeid_47)s, %(changeid_48)s,
%(changeid_49)s, %(changeid_50)s, %(changeid_51)s, %(changeid_52)s,
%(changeid_53)s, %(changeid_54)s, %(changeid_55)s, %(changeid_56)s,
%(changeid_57)s, %(changeid_58)s, %(changeid_59)s, %(changeid_60)s,
%(changeid_61)s, %(changeid_62)s, %(changeid_63)s, %(changeid_64)s,
%(changeid_65)s, %(changeid_66)s, %(changeid_67)s, %(changeid_68)s,
%(changeid_69)s, %(changeid_70)s, %(changeid_71)s, %(changeid_72)s,
%(changeid_73)s, %(changeid_74)s, %(changeid_75)s, %(changeid_76)s,
%(changeid_77)s, %(changeid_78)s, %(changeid_79)s, %(changeid_80)s,
%(changeid_81)s, %(changeid_82)s, %(changeid_83)s, %(changeid_84)s,
%(changeid_85)s, %(changeid_86)s, %(changeid_87)s, %(changeid_88)s,
%(changeid_89)s, %(changeid_90)s, %(changeid_91)s, %(changeid_92)s,
%(changeid_93)s, %(changeid_94)s, %(changeid_95)s, %(changeid_96)s,
%(changeid_97)s, %(changeid_98)s, %(changeid_99)s, %(changeid_100)s)'
{'changeid_100': 5903, 'changeid_29': 5974, 'changeid_28': 5975,
'changeid_27': 5976, 'changeid_26': 5977, 'changeid_25': 5978,
'changeid_24': 5979, 'changeid_23': 5980, 'changeid_22': 5981,
'changeid_21': 5982, 'changeid_20': 5983, 'changeid_89': 5914,
'changeid_88': 5915, 'changeid_81': 5922, 'changeid_80': 5923,
'changeid_83': 5920, 'changeid_82': 5921, 'changeid_85': 5918,
'changeid_84': 5919, 'changeid_87': 5916, 'changeid_86': 5917,
'changeid_38': 5965, 'changeid_39': 5964, 'changeid_34': 5969,
'changeid_35': 5968, 'changeid_36': 5967, 'changeid_37': 5966,
'changeid_30': 5973, 'changeid_31': 5972, 'changeid_32': 5971,
'changeid_33': 5970, 'changeid_98': 5905, 'changeid_99': 5904,
'changeid_96': 5907, 'changeid_97': 5906, 'changeid_94': 5909,
'changeid_95': 5908, 'changeid_92': 5911, 'changeid_93': 5910,
'changeid_90': 5913, 'changeid_91': 5912, 'changeid_63': 5940,
'changeid_62': 5941, 'changeid_61': 5942, 'changeid_60': 5943,
'changeid_67': 5936, 'changeid_66': 5937, 'changeid_65': 5938,
'changeid_64': 5939, 'changeid_69': 5934, 'changeid_68': 5935,
'changeid_16': 5987, 'changeid_17': 5986, 'changeid_14': 5989,
'changeid_15': 5988, 'changeid_12': 5991, 'changeid_13': 5990,
'changeid_10': 5993, 'changeid_11': 5992, 'changeid_18': 5985,
'changeid_19': 5984, 'changeid_70': 5933, 'changeid_71': 5932,
'changeid_72': 5931, 'changeid_73': 5930, 'changeid_74': 5929,
'changeid_75': 5928, 'changeid_76': 5927, 'changeid_77': 5926,
'changeid_78': 5925, 'changeid_79': 5924, 'changeid_45': 5958,
'changeid_44': 5959, 'changeid_47': 5956, 'changeid_46': 5957,
'changeid_41': 5962, 'changeid_40': 5963, 'changeid_43': 5960,
'changeid_42': 5961, 'changeid_49': 5954, 'changeid_48': 5955,
'changeid_4': 5999, 'changeid_5': 5998, 'changeid_6': 5997,
'changeid_7': 5996, 'changeid_1': 6002, 'changeid_2': 6001,
'changeid_3': 6000, 'changeid_8': 5995, 'changeid_9': 5994,
'changeid_58': 5945, 'changeid_59': 5944, 'changeid_52': 5951,
'changeid_53': 5950, 'changeid_50': 5953, 'changeid_51': 5952,
'changeid_56': 5947, 'changeid_57': 5946, 'changeid_54': 5949,
'changeid_55': 5948}
Thanks!
Neil Gilmore
grammatech.com
On 3/7/2017 3:49 AM, Pierre Tardy wrote:
> Hi Neil,
> I am not sure exactly how I can help on this as you are describing
> lots of symptoms.
>
> What goes to my mind right now is a problem with the message queue. In
> the multimaster tests I am doing, I figured out that a disconnection
> of the message queue is not recovered right now, which could explain
> why build do not start (the master will not check for new requests
> unless they receive a message)
>
> However, when the mq fails, I can see evidence of it in the logs, but
> you don't mention any issue in the logs.
>
> Database integrity errors looks bad also, what kind of errors is that?
> We already had some reports of those which were due to a failing disk.
> Could that be the case?
>
> Regards
> Pierre
>
>
> On Mon, Mar 6, 2017 at 10:36 PM Neil Gilmore <ngilmore at grammatech.com
> <mailto:ngilmore at grammatech.com>> wrote:
>
> Hi everyone,
>
> Well, things ran OK for a couple weeks. But we had some problems
> starting last weekend. At least some failure emails don't seem to be
> getting sent out. And a problem we'd been having a bit of got a
> lot worse.
>
> For whatever reason, queued builds don't seem to want to start.
> Sometimes for hours. Even forced builds. This doesn't seem to be a
> locking problem, though I'll be having a look at that side in a
> bit. But
> we'll have builds sitting for hours before they start. If they start.
> Some of our people get antsy and cancel the current queue then force a
> build. But sometimes those wait, too.
>
> And we're having trouble getting the masters to deal with new
> revisions
> fro svn. Everything else looks OK (postcommit hooks, etc.) I'm
> just not
> sure what's going on.
>
> Reconfig hasn't helped, nor has restarting one of the masters.
>
> We are getting integrity errors in our database, too.
>
> Except for the database problem, the rest looks like network
> connection
> stuff, perhaps, though we haven't had any problems there for a while.
>
> Neil Gilmore
> grammatech.com <http://grammatech.com>
> _______________________________________________
> users mailing list
> users at buildbot.net <mailto:users at buildbot.net>
> https://lists.buildbot.net/mailman/listinfo/users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.buildbot.net/pipermail/users/attachments/20170307/0f87eda0/attachment.html>
More information about the users
mailing list