[Buildbot-devel] schedulerdb progress report: week 1

Fri Sep 18 01:35:24 UTC 2009

Kristian Nielsen wrote:
> Some comments/suggestions:
> 
>> TABLES = [
>>     # the schema here is defined as version 1
>>     """CREATE TABLE version
>>     (
>>     version INTEGER -- contains one row, currently set to 1
>>     );""",
>>     """INSERT INTO version VALUES (1);""",
>>
>>     # last_access is used for logging, to record the last time that
>>     # each client (or rather class of clients) touched the DB. The
>>     # idea is that if something gets weird, you can check this and
>>     # discover that you have an older tool (which uses a different
>>     # schema) mucking things up.
>>     """CREATE TABLE last_access
>>     (
>>     who VARCHAR(256), -- like 'buildbot-0.8.0'
>>     writing INTEGER, -- 1 if you are writing, 0 if you are reading
>>     last_access TIMESTAMP     -- seconds since epoch
>>     );""",
> 
> Sounds like you intend here PRIMARY KEY (who, writing) ?

Sounds reasonable. I'm having some slight problems getting a syntax that
is accepted by both SQLite and MySQL, though. Is it:

 CREATE TABLE last_access
 (
  who VARCHAR(256),
  writing INTEGER,
  PRIMARY KEY (who, writing),
  last_access TIMESTAMP,
 )

like that? I think sqlite complained about it, but I didn't investigate
very deeply.

>>     """CREATE TABLE changes_nextid (next_changeid INTEGER);""",
>>     """INSERT INTO changes_nextid VALUES (0);""",
> 
> Hm, what is the purpose of this table? Is it something specific to
> sqlite to get AUTOINCREMENT working, not needed in other databases?

Yeah. In talking with John (who has far more SQL experience than I'll
ever have), it sounded like this was the best (most portable) way to
implement the usual "allocate the next highest number" operation. Most
databases have a special form (like SQLite's implicit rowid) to let you
insert a new row with an AUTOINCREMENT value and simultaneously find out
what autoincremented value was created. But I'm told that having a
separate table for the counter is the only way to get it done in basic
works-on-all-databases SQL, without those vendor-specific extensions.

>>     """CREATE TABLE changes
>>     (
>>     changeid INTEGER PRIMARY KEY AUTOINCREMENT,
>>     number INTEGER UNIQUE,
> 
> Why 'number' ? How is it different from changeid?

Good point, I've unified those. Yeah, 'number' was for migrating old
Changes from the pickle files. I got rid of the AUTOINCREMENT and copy
changeid from the old pickle (or create a new unique value for new
changes).

>>     author VARCHAR(1024),
>>     comments VARCHAR(1024), -- too short?
> 
> Should be TEXT/BLOB.

Is "TEXT" standard SQL? From what I read, it's a MySQL extension. I'd
certainly prefer to use something without an arbitrary length limit like
that.

> Possibly you could put it in a different (changeid,comments) table,
> though that's probably premature optimisation.

Yes, I don't expect there to be a lot of commonality between checking
comments :).

>>     revlink VARCHAR(256),
> 
> What is 'revlink'?

You know, I'm not sure. It's a property of the Change that gets recorded
in the pickle, so I copied it into the schema. I think that both revlink
and the "links" list (stored in the "change_links" table) were intended
to point to github/launchpad/cvsview pages. I don't know why we now have
two of them (one singular, the other a list).

>>     """CREATE TABLE change_links
>>     (
>>     changeid INTEGER,
>>     link VARCHAR[1024]
>>     );""",
> 
> Is the link unique, or can there be many. In other words what is the
> primary key?

I think that each change (really each revision) would have a distinct
link (i.e. no two changes would reference the same link). The fact that
"links" is a list suggests that some users might have multiple links
that describe a single change, perhaps two different views in a
github-like service.

I'm not confident enough of that distinctness to have the schema force
them to be unique, though.

>>     """CREATE TABLE change_files
>>     (
>>     changeid INTEGER,
>>     filename VARCHAR[1024]
>>     );""",
> 
> Ok, files are not unique, so add 'idx INTEGER' and use PRIMARY KEY
> (changeid, idx).

Ok, with a separate table that maps 'idx' to 'filename', right? I think
I understand that. I never know how to trade off the
normalization/uniqueness benefits against the additional query syntax
that will be required to get back the filenames. For example, this
change_files table is mainly used to get back a list of filenames for a
given changeid. What would be the SQL syntax to do that lookup with the
intermediate idx<->filename table?

>>     """CREATE TABLE change_properties
>>     (
>>     changeid INTEGER,
>>     property_name VARCHAR[256],
>>     property_value VARCHAR[1024] -- too short?
>>     );""",
>>     ]
> 
> PRIMARY KEY (changeid, property_name) here seems to make sense.

Ok, right, so each property_name has one and only one value, that sounds
good.

good stuff, thanks!
 -Brian