Tempest banner

Tempest

9 devlogs
160h 21m 15s

A relational database written in Rust with a novel approach to query languages. Tempest’s Typed Query Language (TQL) interface will have first class support for declarative schemas, by providing integrated migration tooling.

Demo Repository

Loading README...

Johann

Implemented Interactive REPL-Shell

You can now type tempest repl <DATA_DIR> to create/open a database at the specified path.
After doing so, you will be put in an interactive shell. The shell has different command built-in commands, all starting with a .:

  • .help | .h show help menu
  • .quit | .q terminate the REPL session
  • .databases | .dbs list all databases
  • .tables <database> list all tables, optionally scoped to a database
  • .types <database> list all types, optionally scoped to a database

Aside from the built-in commands, you can execute valid TQL statements. If you happen to make a syntax error, it will show nice error messages using the span data I collect inside of the parser. The formatted error report is then rendered using the ariadne crate, so I don’t have to do all that ugly string interpolation myself.

Here is a list of queries you can try out:

  • create database main;: create a new database called main
  • create type main.User { id: Int64, username: String }: create a type called User inside of the database main. The fields are id, which is a 64 bit integer, and username, which is a String.
  • create table main.users : main.User { primary key (id) };: create a table called users inside of the database main with the type User from the same database and with the field id as the primary key.
  • insert into main.users { id: 42, username: "John" };: insert an entry into the table users inside of the database main. The entry has the value 42 for the field id, and the value "John" for the field username.
  • select * from main.users;: select every column of all the entries from the table users inside of the database main. Alternatively, you can supply specific fields, instead of *, to only get out those fields.

Not sure what I’ll be doing next, but it will be fun!

0
Johann

Select Queries Now Working

After improving the Storage implementation to support seek(search_key) on the StorageIterator trait, which continuously advances, until it gets a key that is logically greater than or equal to the search_key. Logically here means, that it will not include the suffix data, which is internal to the engine, specifically a HlcTimestamp, which is a specific 64 bit timestamp that will come in handy later, when implementing migrations and going distributed!

Next up is implementing the REPL, so y’all can finally try out writing some TQL for yourself!


The attachment shows you a test that verifies that the engine is correctly getting “John” from the setup of the test in the previous devlog. It also shows projections, i.e. selecting just the username column.

Was finally able to merge in the feature branch for executing simple queries: https://github.com/Vimthusiast/tempest/commit/9389ad4

Attachment
0
Johann

Query Planning and Execution in the Engine

Implemented the first execution of queries! What’s working right now, is creating databases, types, and tables, and also inserting into the tables. Next thing I’m working on will be selects, to have the first vertical slice!

The attachment shows the (currently sparse) debug outputs of the following commands:

create database main;

create type main.User {
    id: Int64,
    username: String,
};


create table main.users : main.User {
    primary key (id),
};

insert into main.users {
    id: 4,
    username: "John",
};

Note that you cannot see the inserting output, since the logging is filtered for just the tempest_engine module, while the storage layer sits inside of the tempest_kv crate.

Also notice how there is ids for everything? FieldIds, DatabaseIds, TypeIds? That’s so I can later support renaming them without breaking old code. I keep the IDs stable, so I can just modify the name simpler. Still have to think about how I will implement schema migrations without having to rewrite EVERYTHING, which would be bad. I guess I have to track the old schema states for every migration so I can, when decoding a row, look up the right strategy and apply the migrations lazily.
But that’s way too complex for the duration of Flavortown and will be implemented much, much later.

Next up is of course selecting, before moving on to the REPL for the fully interactive experience!

Attachment
0
Johann

Completed The Core TQL Parser

I managed to finish implementing the core of the TQL parser, allowing us to get a typed representation - called an Abstract Syntax Tree, or AST for short - of the following syntax examples, that can be passed to our engine for execution in the future:

create database main;

create type User {
    id       : Int64,
    username : String,
};

create table mydb.users : User {
    primary key (id),
};

insert into mydb.users { id: 1, username: "john" };
insert into mydb.users { id: 2, username: "bob" };

select * from mydb.users;

select id, username from mydb.users;

It actually gracefully recovers errors as much as possible, to show us everything that is wrong in our code, and not just the first thing at the top (see example attachment with garbage statement).

Next up is implementing the actual database engine itself! It will take in a produced AST, create a query plan for it, run it, and return to us what happened: AST -> Plan -> Execution -> Result!


The attachments show our tests that verify the parser is working correctly. The actual usage of all of these ~1.1K LoC will be really visible, once I get to wire up the Engine to the REPL shell - both don’t really exist yet, hehe ;)

Relevant commits/diff for this Devlog: https://github.com/Vimthusiast/tempest/compare/b1c19ea..61fd23d

Attachment
Attachment
Attachment
Attachment
Attachment
0
mahad

Tagged your project as well cooked!

🔥 mahad marked your project as well cooked! As a prize for your nicely cooked project, look out for a bonus prize in the mail :)

Johann

The Time Sync Issue

So uhm… considering this devlog is 123 HOURS long, it’s kind of tough to explain, but I’ll try.
The other two devlogs were from this week, but now, thanks to @mahad, I was able to import the data all the way up from the end of January when I started this project. The thing is, I did not connect my WakaTime client with HackaTime… so I did not get anything tracked 😭. Now, I’ve got the remaining hours here as well :D

…this devlog got a bit long, so if you wanna read a project summary for up until now, you can do so in this gist: https://gist.github.com/Vimthusiast/f8ee05a3df1a98abc55bab584a41c72a

The code has been growing steadily, and now I’m quite sure it has a mind of its own.

Attachment
0
Johann

Row Encoder/Decoder

Implemented the RowEncoder and the RowDecoder! They take in the specified TableSchema and are able to encode the values efficiently into the key and value parts for our storage layer.

Next up is TQL language parsing for our REPL!


The attachment contains the example output of encoding and decoding a few rows using a simple schema definition (currently schema is just in code, soon it will be runtime editable through TQL!).

Relevant commit: https://github.com/Vimthusiast/tempest/commit/8a59a92

Attachment
Attachment
0
Johann

Catalog (+ Value Encoding)

Implemented the catalog today — Tempest’s registry of every database and table, with full journal-backed persistence.

The catalog tracks two things: DatabaseSchema (name, set of table IDs) and TableSchema (name, columns, primary key, parent database). Both get stable numeric IDs (DatabaseId(u32), TableId(u32)) allocated at creation time and never reused - even after a drop. Renaming a table is a single metadata write with zero row rewrites, because names never appear in data keys.

Edits are versioned from day one: CatalogEdit::V1(CatalogEditV1). If the format needs to evolve later, old journals stay readable. The edit set for now is CreateDatabase, CreateTable, and Snapshot - the snapshot deliberately omits dropped entries - which will come later - rather than recording a drop, so the journal never needs to reason about create/drop closed loops. All of this results in a more compact catalog file after our internal journal file rotation.

Raw and lexical encoding/decoding for TempestValue: the typed representation of data in the engine. Lexical encoding is order-preserving (important for key sorting), raw encoding is compact (used for values). It is important, that - for example with strings - we implement the lexical decoding, so that “banana” comes before “car”. With normal length-prefixes, the byte-based ordering would mix it up, and put “car” first, because “car” has length 3 and “banana” has length 6. Therefore we use null-terminators for byte sequences, which doesn’t mess up the lexicographical ordering.

Next up is key encoding! The bridge that turns a TableId + primary key values into the actual bytes that hit the KV layer.


The video attachment shows how the catalog usage. There is currently no use of the value encoding/decoding, but I will need it soon.

Relevant commits/diff for this devlog: https://github.com/Vimthusiast/tempest/compare/1f3b0b0..6e45cef

0
Johann

SST Compaction + File GC

Built size-tiered compaction and file garbage collection for Tempest’s storage engine today.

Without compaction, SST files accumulate on disk indefinitely. Every read had to scan all of them, so read amplification grew unboundedly. The fix: once L0 exceeds a file count threshold (default 4), merge all SSTs into one using the existing MergingIterator + DeduplicatingIterator infrastructure I already had. The deduplicator handles MVCC automatically, newest version of each key wins, tombstones are preserved. (Naive first STCS for the MVP)

The initially tricky part was crash safety. The manifest update (new SST in, old SSTs out) is the atomic commit point. Crash before it - old SSTs still valid, new one is just an orphan. Crash after - new SST is the source of truth. No partial state is possible. It was easy thanks to my Journal<T> abstraction.

That orphan case is what file GC handles. On startup, Tempest scans the SST and WAL directories, diffs against the manifest, and spawns a background task to delete anything not accounted for. During normal compaction, the exact list of removed filenums is passed directly to the GC task - no scan needed, no race possible. Note that I still have to implement GC of old manifest/journal files, but they’re so small and few, that it does not really matter yet.

All of this is tested: basic functionality/correctness, deduplication, tombstone preservation, SST count reduction, and recovery after compaction.

The KV layer is now complete enough to build on. Next up: key encoding and the Value type, so basically the bridge between raw bytes and the typed world of TQL and the schema layer.


In the attachment you can see the log/tracing output of some tests I wrote that verify that our compaction and file gc is working (I marked the relevant lines).

Relevant Commit: https://github.com/Vimthusiast/tempest/commit/1f3b0b08

0