[Buildbot-devel] schedulerdb progress report: week 1

Fri Sep 18 07:16:35 UTC 2009

Brian Warner <warner at lothar.com> writes:

> Kristian Nielsen wrote:
>> Some comments/suggestions:
>> 
>>> TABLES = [
>>>     # the schema here is defined as version 1
>>>     """CREATE TABLE version
>>>     (
>>>     version INTEGER -- contains one row, currently set to 1
>>>     );""",
>>>     """INSERT INTO version VALUES (1);""",
>>>
>>>     # last_access is used for logging, to record the last time that
>>>     # each client (or rather class of clients) touched the DB. The
>>>     # idea is that if something gets weird, you can check this and
>>>     # discover that you have an older tool (which uses a different
>>>     # schema) mucking things up.
>>>     """CREATE TABLE last_access
>>>     (
>>>     who VARCHAR(256), -- like 'buildbot-0.8.0'
>>>     writing INTEGER, -- 1 if you are writing, 0 if you are reading
>>>     last_access TIMESTAMP     -- seconds since epoch
>>>     );""",
>> 
>> Sounds like you intend here PRIMARY KEY (who, writing) ?
>
> Sounds reasonable. I'm having some slight problems getting a syntax that
> is accepted by both SQLite and MySQL, though. Is it:
>
>  CREATE TABLE last_access
>  (
>   who VARCHAR(256),
>   writing INTEGER,
>   PRIMARY KEY (who, writing),
>   last_access TIMESTAMP,
>  )
>
> like that? I think sqlite complained about it, but I didn't investigate
> very deeply.

Yes, but the last trailing comma is not allowed (and I usually but constraints
like PRIMARY KEY at the end, after columns).

But more generally, I think it is the wrong approach to go through all kinds
of hoops to get the exact same syntax working with all databases. Allow to add
special variants for the CREATE statements for specific tables, to allow for
flexibility. Many tables can probably be made to work fine across databases
with the same syntax, of course.

>>>     """CREATE TABLE changes_nextid (next_changeid INTEGER);""",
>>>     """INSERT INTO changes_nextid VALUES (0);""",
>> 
>> Hm, what is the purpose of this table? Is it something specific to
>> sqlite to get AUTOINCREMENT working, not needed in other databases?
>
> Yeah. In talking with John (who has far more SQL experience than I'll
> ever have), it sounded like this was the best (most portable) way to
> implement the usual "allocate the next highest number" operation. Most
> databases have a special form (like SQLite's implicit rowid) to let you
> insert a new row with an AUTOINCREMENT value and simultaneously find out
> what autoincremented value was created. But I'm told that having a
> separate table for the counter is the only way to get it done in basic
> works-on-all-databases SQL, without those vendor-specific extensions.

Yes, not all databases have AUTOINCREMENT (and it is actually AUTO_INCREMENT
in MySQL). Still, you also had AUTOINCREMENT in the table :-).

Actually, there is another way that I often prefer:

    SELECT 1+MAX(id) FROM mytable FOR UPDATE;
    INSERT INTO mytable SET id=<result of select>, otherfield=42

or in a single statement:

    INSERT INTO mytable(id, otherfield) SELECT IFNULL(1+MAX(id), 0), 42 FROM mytable;

But I guess your plan was to drop the AUTOINCREMENT from the primary table and
use the extra table to track it instead, that should work fine as well.

>>>     author VARCHAR(1024),
>>>     comments VARCHAR(1024), -- too short?
>> 
>> Should be TEXT/BLOB.
>
> Is "TEXT" standard SQL? From what I read, it's a MySQL extension. I'd
> certainly prefer to use something without an arbitrary length limit like
> that.

Right, so in Oracle it is called CLOB... But then in Oracle VARCHAR is called
VARCAR2 :-)

Anyway, users could just ALTER TABLE themselves, not a bit deal at this point
I guess.

>>>     """CREATE TABLE change_files
>>>     (
>>>     changeid INTEGER,
>>>     filename VARCHAR[1024]
>>>     );""",
>> 
>> Ok, files are not unique, so add 'idx INTEGER' and use PRIMARY KEY
>> (changeid, idx).
>
> Ok, with a separate table that maps 'idx' to 'filename', right? I think

No, that's not what I meant. My suggestion is this:

    CREATE TABLE change_files
    (
    changeid INTEGER,
    idx INTEGER,
    filename VARCHAR(1024),
    PRIMARY KEY (changeid, idx)
    );

Sample data:

    changeid    idx    filename
         103      0    README
         104      0    foo.h
         104      1    foo.cc
         104      2    man/foo.1
         105      0    README
         105      1    INSTALL

So idx is only unique within a single changeid.

> I understand that. I never know how to trade off the
> normalization/uniqueness benefits against the additional query syntax
> that will be required to get back the filenames. For example, this
> change_files table is mainly used to get back a list of filenames for a
> given changeid. What would be the SQL syntax to do that lookup with the
> intermediate idx<->filename table?

    SELECT filename FROM changes_files WHERE changeid = ? ORDER BY idx;

Doesn't get much simpler than that ;-)

 - Kristian.