[users at bb.net] More 0.9.0rc2 multi-master anecdotes.

Neil Gilmore ngilmore at grammatech.com
Tue Mar 7 16:59:21 UTC 2017


Hi Pierre,

Yes, I'm describing multiple symptoms here. But the message queues were 
the problem, despite not seeing anything in the logs.

5. The integrity errors look like this (not a failing disk in this case.):
2017-03-07T07:36:04-0500 [-] Got fatal Exception on DB
         Traceback (most recent call last):
         Failure: sqlalchemy.exc.IntegrityError: (IntegrityError) update 
or delete on table "changes" violates foreign key constraint 
"changes_parent_changeids_fkey" on table "changes"
         DETAIL:  Key (changeid)=(5983) is still referenced from table 
"changes".
          'DELETE FROM changes WHERE changes.changeid IN 
(%(changeid_1)s, %(changeid_2)s, %(changeid_3)s, %(changeid_4)s, 
%(changeid_5)s, %(changeid_6)s, %(changeid_7)s, %(changeid_8)s, 
%(changeid_9)s, %(changeid_10)s, %(changeid_11)s, %(changeid_12)s, 
%(changeid_13)s, %(changeid_14)s, %(changeid_15)s, %(changeid_16)s, 
%(changeid_17)s, %(changeid_18)s, %(changeid_19)s, %(changeid_20)s, 
%(changeid_21)s, %(changeid_22)s, %(changeid_23)s, %(changeid_24)s, 
%(changeid_25)s, %(changeid_26)s, %(changeid_27)s, %(changeid_28)s, 
%(changeid_29)s, %(changeid_30)s, %(changeid_31)s, %(changeid_32)s, 
%(changeid_33)s, %(changeid_34)s, %(changeid_35)s, %(changeid_36)s, 
%(changeid_37)s, %(changeid_38)s, %(changeid_39)s, %(changeid_40)s, 
%(changeid_41)s, %(changeid_42)s, %(changeid_43)s, %(changeid_44)s, 
%(changeid_45)s, %(changeid_46)s, %(changeid_47)s, %(changeid_48)s, 
%(changeid_49)s, %(changeid_50)s, %(changeid_51)s, %(changeid_52)s, 
%(changeid_53)s, %(changeid_54)s, %(changeid_55)s, %(changeid_56)s, 
%(changeid_57)s, %(changeid_58)s, %(changeid_59)s, %(changeid_60)s, 
%(changeid_61)s, %(changeid_62)s, %(changeid_63)s, %(changeid_64)s, 
%(changeid_65)s, %(changeid_66)s, %(changeid_67)s, %(changeid_68)s, 
%(changeid_69)s, %(changeid_70)s, %(changeid_71)s, %(changeid_72)s, 
%(changeid_73)s, %(changeid_74)s, %(changeid_75)s, %(changeid_76)s, 
%(changeid_77)s, %(changeid_78)s, %(changeid_79)s, %(changeid_80)s, 
%(changeid_81)s, %(changeid_82)s, %(changeid_83)s, %(changeid_84)s, 
%(changeid_85)s, %(changeid_86)s, %(changeid_87)s, %(changeid_88)s, 
%(changeid_89)s, %(changeid_90)s, %(changeid_91)s, %(changeid_92)s, 
%(changeid_93)s, %(changeid_94)s, %(changeid_95)s, %(changeid_96)s, 
%(changeid_97)s, %(changeid_98)s, %(changeid_99)s, %(changeid_100)s)' 
{'changeid_100': 5903, 'changeid_29': 5974, 'changeid_28': 5975, 
'changeid_27': 5976, 'changeid_26': 5977, 'changeid_25': 5978, 
'changeid_24': 5979, 'changeid_23': 5980, 'changeid_22': 5981, 
'changeid_21': 5982, 'changeid_20': 5983, 'changeid_89': 5914, 
'changeid_88': 5915, 'changeid_81': 5922, 'changeid_80': 5923, 
'changeid_83': 5920, 'changeid_82': 5921, 'changeid_85': 5918, 
'changeid_84': 5919, 'changeid_87': 5916, 'changeid_86': 5917, 
'changeid_38': 5965, 'changeid_39': 5964, 'changeid_34': 5969, 
'changeid_35': 5968, 'changeid_36': 5967, 'changeid_37': 5966, 
'changeid_30': 5973, 'changeid_31': 5972, 'changeid_32': 5971, 
'changeid_33': 5970, 'changeid_98': 5905, 'changeid_99': 5904, 
'changeid_96': 5907, 'changeid_97': 5906, 'changeid_94': 5909, 
'changeid_95': 5908, 'changeid_92': 5911, 'changeid_93': 5910, 
'changeid_90': 5913, 'changeid_91': 5912, 'changeid_63': 5940, 
'changeid_62': 5941, 'changeid_61': 5942, 'changeid_60': 5943, 
'changeid_67': 5936, 'changeid_66': 5937, 'changeid_65': 5938, 
'changeid_64': 5939, 'changeid_69': 5934, 'changeid_68': 5935, 
'changeid_16': 5987, 'changeid_17': 5986, 'changeid_14': 5989, 
'changeid_15': 5988, 'changeid_12': 5991, 'changeid_13': 5990, 
'changeid_10': 5993, 'changeid_11': 5992, 'changeid_18': 5985, 
'changeid_19': 5984, 'changeid_70': 5933, 'changeid_71': 5932, 
'changeid_72': 5931, 'changeid_73': 5930, 'changeid_74': 5929, 
'changeid_75': 5928, 'changeid_76': 5927, 'changeid_77': 5926, 
'changeid_78': 5925, 'changeid_79': 5924, 'changeid_45': 5958, 
'changeid_44': 5959, 'changeid_47': 5956, 'changeid_46': 5957, 
'changeid_41': 5962, 'changeid_40': 5963, 'changeid_43': 5960, 
'changeid_42': 5961, 'changeid_49': 5954, 'changeid_48': 5955, 
'changeid_4': 5999, 'changeid_5': 5998, 'changeid_6': 5997, 
'changeid_7': 5996, 'changeid_1': 6002, 'changeid_2': 6001, 
'changeid_3': 6000, 'changeid_8': 5995, 'changeid_9': 5994, 
'changeid_58': 5945, 'changeid_59': 5944, 'changeid_52': 5951, 
'changeid_53': 5950, 'changeid_50': 5953, 'changeid_51': 5952, 
'changeid_56': 5947, 'changeid_57': 5946, 'changeid_54': 5949, 
'changeid_55': 5948}

2017-03-07T07:36:04-0500 [-] while pruning changes
         Traceback (most recent call last):
           File "/usr/lib/python2.7/threading.py", line 551, in 
__bootstrap_inner
             self.run()
           File "/usr/lib/python2.7/threading.py", line 504, in run
             self.__target(*self.__args, **self.__kwargs)
           File 
"/usr/local/lib/python2.7/dist-packages/Twisted-16.3.0-py2.7-linux-x86_64.egg/twisted/_threads/_threadworker.py", 
line 46, in work
             task()
           File 
"/usr/local/lib/python2.7/dist-packages/Twisted-16.3.0-py2.7-linux-x86_64.egg/twisted/_threads/_team.py", 
line 190, in doWork
             task()
         --- <exception caught here> ---
           File 
"/usr/local/lib/python2.7/dist-packages/Twisted-16.3.0-py2.7-linux-x86_64.egg/twisted/python/threadpool.py", 
line 246, in inContext
             result = inContext.theWork()
           File 
"/usr/local/lib/python2.7/dist-packages/Twisted-16.3.0-py2.7-linux-x86_64.egg/twisted/python/threadpool.py", 
line 262, in <lambda>
             inContext.theWork = lambda: context.call(ctx, func, *args, 
**kw)
           File 
"/usr/local/lib/python2.7/dist-packages/Twisted-16.3.0-py2.7-linux-x86_64.egg/twisted/python/context.py", 
line 118, in callWithContext
             return self.currentContext().callWithContext(ctx, func, 
*args, **kw)
           File 
"/usr/local/lib/python2.7/dist-packages/Twisted-16.3.0-py2.7-linux-x86_64.egg/twisted/python/context.py", 
line 81, in callWithContext
             return func(*args,**kw)
           File 
"/usr/local/lib/python2.7/dist-packages/buildbot-0.9.3-py2.7.egg/buildbot/db/pool.py", 
line 180, in __thd
             rv = callable(arg, *args, **kwargs)
           File 
"/usr/local/lib/python2.7/dist-packages/buildbot-0.9.3-py2.7.egg/buildbot/db/changes.py", 
line 338, in thd
             table.delete(table.c.changeid.in_(batch)))
           File 
"build/bdist.linux-x86_64/egg/sqlalchemy/engine/base.py", line 662, in 
execute

           File 
"build/bdist.linux-x86_64/egg/sqlalchemy/engine/base.py", line 761, in 
_execute_clauseelement

           File 
"build/bdist.linux-x86_64/egg/sqlalchemy/engine/base.py", line 874, in 
_execute_context

           File 
"build/bdist.linux-x86_64/egg/sqlalchemy/engine/base.py", line 1024, in 
_handle_dbapi_exception

           File 
"build/bdist.linux-x86_64/egg/sqlalchemy/util/compat.py", line 195, in 
raise_from_cause

           File 
"build/bdist.linux-x86_64/egg/sqlalchemy/engine/base.py", line 867, in 
_execute_context

           File 
"build/bdist.linux-x86_64/egg/sqlalchemy/engine/default.py", line 324, 
in do_execute

         sqlalchemy.exc.IntegrityError: (IntegrityError) update or 
delete on table "changes" violates foreign key constraint 
"changes_parent_changeids_fkey" on table "changes"
         DETAIL:  Key (changeid)=(5983) is still referenced from table 
"changes".
          'DELETE FROM changes WHERE changes.changeid IN 
(%(changeid_1)s, %(changeid_2)s, %(changeid_3)s, %(changeid_4)s, 
%(changeid_5)s, %(changeid_6)s, %(changeid_7)s, %(changeid_8)s, 
%(changeid_9)s, %(changeid_10)s, %(changeid_11)s, %(changeid_12)s, 
%(changeid_13)s, %(changeid_14)s, %(changeid_15)s, %(changeid_16)s, 
%(changeid_17)s, %(changeid_18)s, %(changeid_19)s, %(changeid_20)s, 
%(changeid_21)s, %(changeid_22)s, %(changeid_23)s, %(changeid_24)s, 
%(changeid_25)s, %(changeid_26)s, %(changeid_27)s, %(changeid_28)s, 
%(changeid_29)s, %(changeid_30)s, %(changeid_31)s, %(changeid_32)s, 
%(changeid_33)s, %(changeid_34)s, %(changeid_35)s, %(changeid_36)s, 
%(changeid_37)s, %(changeid_38)s, %(changeid_39)s, %(changeid_40)s, 
%(changeid_41)s, %(changeid_42)s, %(changeid_43)s, %(changeid_44)s, 
%(changeid_45)s, %(changeid_46)s, %(changeid_47)s, %(changeid_48)s, 
%(changeid_49)s, %(changeid_50)s, %(changeid_51)s, %(changeid_52)s, 
%(changeid_53)s, %(changeid_54)s, %(changeid_55)s, %(changeid_56)s, 
%(changeid_57)s, %(changeid_58)s, %(changeid_59)s, %(changeid_60)s, 
%(changeid_61)s, %(changeid_62)s, %(changeid_63)s, %(changeid_64)s, 
%(changeid_65)s, %(changeid_66)s, %(changeid_67)s, %(changeid_68)s, 
%(changeid_69)s, %(changeid_70)s, %(changeid_71)s, %(changeid_72)s, 
%(changeid_73)s, %(changeid_74)s, %(changeid_75)s, %(changeid_76)s, 
%(changeid_77)s, %(changeid_78)s, %(changeid_79)s, %(changeid_80)s, 
%(changeid_81)s, %(changeid_82)s, %(changeid_83)s, %(changeid_84)s, 
%(changeid_85)s, %(changeid_86)s, %(changeid_87)s, %(changeid_88)s, 
%(changeid_89)s, %(changeid_90)s, %(changeid_91)s, %(changeid_92)s, 
%(changeid_93)s, %(changeid_94)s, %(changeid_95)s, %(changeid_96)s, 
%(changeid_97)s, %(changeid_98)s, %(changeid_99)s, %(changeid_100)s)' 
{'changeid_100': 5903, 'changeid_29': 5974, 'changeid_28': 5975, 
'changeid_27': 5976, 'changeid_26': 5977, 'changeid_25': 5978, 
'changeid_24': 5979, 'changeid_23': 5980, 'changeid_22': 5981, 
'changeid_21': 5982, 'changeid_20': 5983, 'changeid_89': 5914, 
'changeid_88': 5915, 'changeid_81': 5922, 'changeid_80': 5923, 
'changeid_83': 5920, 'changeid_82': 5921, 'changeid_85': 5918, 
'changeid_84': 5919, 'changeid_87': 5916, 'changeid_86': 5917, 
'changeid_38': 5965, 'changeid_39': 5964, 'changeid_34': 5969, 
'changeid_35': 5968, 'changeid_36': 5967, 'changeid_37': 5966, 
'changeid_30': 5973, 'changeid_31': 5972, 'changeid_32': 5971, 
'changeid_33': 5970, 'changeid_98': 5905, 'changeid_99': 5904, 
'changeid_96': 5907, 'changeid_97': 5906, 'changeid_94': 5909, 
'changeid_95': 5908, 'changeid_92': 5911, 'changeid_93': 5910, 
'changeid_90': 5913, 'changeid_91': 5912, 'changeid_63': 5940, 
'changeid_62': 5941, 'changeid_61': 5942, 'changeid_60': 5943, 
'changeid_67': 5936, 'changeid_66': 5937, 'changeid_65': 5938, 
'changeid_64': 5939, 'changeid_69': 5934, 'changeid_68': 5935, 
'changeid_16': 5987, 'changeid_17': 5986, 'changeid_14': 5989, 
'changeid_15': 5988, 'changeid_12': 5991, 'changeid_13': 5990, 
'changeid_10': 5993, 'changeid_11': 5992, 'changeid_18': 5985, 
'changeid_19': 5984, 'changeid_70': 5933, 'changeid_71': 5932, 
'changeid_72': 5931, 'changeid_73': 5930, 'changeid_74': 5929, 
'changeid_75': 5928, 'changeid_76': 5927, 'changeid_77': 5926, 
'changeid_78': 5925, 'changeid_79': 5924, 'changeid_45': 5958, 
'changeid_44': 5959, 'changeid_47': 5956, 'changeid_46': 5957, 
'changeid_41': 5962, 'changeid_40': 5963, 'changeid_43': 5960, 
'changeid_42': 5961, 'changeid_49': 5954, 'changeid_48': 5955, 
'changeid_4': 5999, 'changeid_5': 5998, 'changeid_6': 5997, 
'changeid_7': 5996, 'changeid_1': 6002, 'changeid_2': 6001, 
'changeid_3': 6000, 'changeid_8': 5995, 'changeid_9': 5994, 
'changeid_58': 5945, 'changeid_59': 5944, 'changeid_52': 5951, 
'changeid_53': 5950, 'changeid_50': 5953, 'changeid_51': 5952, 
'changeid_56': 5947, 'changeid_57': 5946, 'changeid_54': 5949, 
'changeid_55': 5948}

Thanks!

Neil Gilmore
grammatech.com

On 3/7/2017 3:49 AM, Pierre Tardy wrote:
> Hi Neil,
> I am not sure exactly how I can help on this as you are describing 
> lots of symptoms.
>
> What goes to my mind right now is a problem with the message queue. In 
> the multimaster tests I am doing, I figured out that a disconnection 
> of the message queue is not recovered right now, which could explain 
> why build do not start (the master will not check for new requests 
> unless they receive a message)
>
> However, when the mq fails, I can see evidence of it in the logs, but 
> you don't mention any issue in the logs.
>
> Database integrity errors looks bad also, what kind of errors is that? 
> We already had some reports of those which were due to a failing disk. 
> Could that be the case?
>
> Regards
> Pierre
>
>
> On Mon, Mar 6, 2017 at 10:36 PM Neil Gilmore <ngilmore at grammatech.com 
> <mailto:ngilmore at grammatech.com>> wrote:
>
>     Hi everyone,
>
>     Well, things ran OK for a couple weeks. But we had some problems
>     starting last weekend. At least some failure emails don't seem to be
>     getting sent out. And a problem we'd been having a bit of got a
>     lot worse.
>
>     For whatever reason, queued builds don't seem to want to start.
>     Sometimes for hours. Even forced builds. This doesn't seem to be a
>     locking problem, though I'll be having a look at that side in a
>     bit. But
>     we'll have builds sitting for hours before they start. If they start.
>     Some of our people get antsy and cancel the current queue then force a
>     build. But sometimes those wait, too.
>
>     And we're having trouble getting the masters to deal with new
>     revisions
>     fro svn. Everything else looks OK (postcommit hooks, etc.) I'm
>     just not
>     sure what's going on.
>
>     Reconfig hasn't helped, nor has restarting one of the masters.
>
>     We are getting integrity errors in our database, too.
>
>     Except for the database problem, the rest looks like network
>     connection
>     stuff, perhaps, though we haven't had any problems there for a while.
>
>     Neil Gilmore
>     grammatech.com <http://grammatech.com>
>     _______________________________________________
>     users mailing list
>     users at buildbot.net <mailto:users at buildbot.net>
>     https://lists.buildbot.net/mailman/listinfo/users
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.buildbot.net/pipermail/users/attachments/20170307/0f87eda0/attachment.html>


More information about the users mailing list