db_engine — a relational database engine in modern C++17

A from-scratch, single-file relational database engine written in C++17, supporting a SQL-like dialect over a paged, B+ tree–indexed file format. Inspired by Connor Stack's Let's Build a Simple Database and SQLite's overall architecture.

This is the v2 evolution of an earlier C implementation. It moves the project from a hard-coded single-table schema to a dynamic, multi-table catalog with typed values, real SQL parsing, WHERE expressions over any column, ORDER BY / LIMIT, aggregates, and an EXPLAIN plan visualizer.

Highlights

Hand-written SQL frontend. A streaming lexer (zero-copy via std::string_view) feeds a recursive-descent parser that produces typed AST nodes and a compiled bytecode Program. Parse errors carry line/column info.
Bytecode VM with a query planner. Eighteen opcodes; the parser-side planner picks OP_FIND_KEY for primary-key equality lookups and OP_REWIND + OP_FILTER scans otherwise. Try EXPLAIN <stmt> to see exactly what the engine will run.
Real B+ tree index. Leaf and internal node layouts with on-page binary headers, leaf splits, internal splits, root-split-induced tree height growth, parent pointer maintenance, and a leaf sibling chain for ordered scans.
Dynamic, persisted schema. A typed system catalog lives on page 0 with a 4-byte magic and version; foreign files are rejected cleanly. Up to four user tables, each with its own B+ tree.
Typed values. std::variant<std::monostate, int64_t, std::string> for in-memory values; fixed-width on-disk layout (length-prefixed TEXT) keeps cell sizing predictable.
Modern C++ idioms throughout. RAII for the pager, database, and program; std::unique_ptr for AST ownership; std::sort for ORDER BY; enum class opcodes; exceptions caught at the REPL boundary.
Tested under AddressSanitizer + UBSan in CI. 30 doctest unit assertions + 74 shell-based integration checks across 10 test files.

Architecture

┌────────────────────────────────────────────────────────────────┐
│  REPL  (src/main.cpp)                                          │
│  reads a line, dispatches meta-commands (.schema, .stats, …)   │
│  or hands SQL to the frontend.                                 │
└────────────────────────────────────────────────────────────────┘
            │
            ▼
┌────────────────────────────────────────────────────────────────┐
│  Frontend                                                      │
│  ┌──────────────┐    ┌──────────────┐    ┌──────────────────┐  │
│  │ Lexer        │ -> │ Parser       │ -> │ Planner          │  │
│  │ string_view  │    │ recursive    │    │ pk-eq fast path  │  │
│  │ tokens       │    │ descent      │    │ vs scan+filter   │  │
│  └──────────────┘    └──────────────┘    └──────────────────┘  │
│              produces  Program { code[], predicate, … }        │
└────────────────────────────────────────────────────────────────┘
            │
            ▼
┌────────────────────────────────────────────────────────────────┐
│  Bytecode VM  (src/vm.cpp)                                     │
│  switch over enum class Opcode { OPEN, REWIND, NEXT,           │
│    FIND_KEY, FILTER, RESULT, COLLECT, SORT_EMIT,               │
│    AGG_COUNT / MIN / MAX / SUM, AGG_EMIT,                      │
│    INSERT, UPDATE, DELETE, CLOSE, HALT }                       │
│  Holds: aggregate accumulators, collected rows for ORDER BY    │
└────────────────────────────────────────────────────────────────┘
            │
            ▼
┌────────────────────────────────────────────────────────────────┐
│  Storage                                                       │
│  ┌──────────────┐   ┌──────────────┐   ┌─────────────────────┐ │
│  │ Cursor       │ ← │ B+ Tree      │ ← │ Pager (RAII)        │ │
│  │ leaf+cell    │   │ leaf/intern. │   │ open/read/write/    │ │
│  │ iteration    │   │ splits, root │   │ lseek + page cache  │ │
│  │              │   │ height grow  │   │ + cache stats       │ │
│  └──────────────┘   └──────────────┘   └─────────────────────┘ │
└────────────────────────────────────────────────────────────────┘
            │
            ▼
   ┌──────────────────────────────────────────────────────────┐
   │  File on disk                                            │
   │  page 0: Catalog (magic, version, tables[], root pages)  │
   │  page N: B+ tree leaves / internals (per table)          │
   └──────────────────────────────────────────────────────────┘

The on-disk catalog is a static_assert-enforced trivially-copyable POD; the in-memory representation uses STL containers. The row codec is the bridge.

Quick start

make                         # build (gcc / g++17)
make test                    # build + run unit + integration tests
make asan                    # rebuild under AddressSanitizer + UBSan and re-test
make clean
./db_engine mydata.db        # open or create a database file

Example session

db > CREATE TABLE users (id INT PRIMARY KEY, name TEXT(32), age INT);
Executed.
db > INSERT INTO users VALUES (1, 'alice', 30);
db > INSERT INTO users VALUES (2, 'bob', 25);
db > INSERT INTO users VALUES (3, 'carol', 35);
db > SELECT * FROM users WHERE age > 26 ORDER BY age DESC;
(3, carol, 35)
(1, alice, 30)
db > SELECT COUNT(*) FROM users;
(3)
db > EXPLAIN SELECT * FROM users WHERE id = 2;
Program table=users predicate=id = INT(2)
0 OP_OPEN
1 OP_FIND_KEY 2 4 0
2 OP_RESULT
3 OP_CLOSE
4 OP_HALT
db > EXPLAIN SELECT * FROM users WHERE age > 30;
Program table=users predicate=age > INT(30)
0 OP_OPEN
1 OP_REWIND 5 0 0
2 OP_FILTER 4 0 0
3 OP_RESULT
4 OP_NEXT 2 0 0
5 OP_CLOSE
6 OP_HALT

Notice the planner difference: an equality lookup on the primary key compiles to OP_FIND_KEY (B+ tree descent — O(log n)), while a non-PK predicate compiles to a OP_REWIND / OP_NEXT loop with an in-VM OP_FILTER (full leaf-chain scan — O(n)).

Supported SQL

CREATE TABLE name (col TYPE [PRIMARY KEY], ...);   -- TYPE = INT | TEXT(n)
DROP TABLE name;

INSERT INTO name VALUES (v1, v2, ...);

SELECT * | col[, col...] | agg(...) FROM name
  [WHERE expr]
  [ORDER BY col [ASC|DESC]]
  [LIMIT n];

UPDATE name SET col = v [, col = v]... [WHERE expr];
DELETE FROM name [WHERE expr];

EXPLAIN <any-statement>;

expr supports = != < <= > >= AND OR over any column.
Aggregates: COUNT(*), MIN(col), MAX(col), SUM(col), AVG(col) (INT only for SUM/AVG).
Exactly one PRIMARY KEY column is required per table; it must be INT.

Meta-commands

Command	Purpose
`.help`	List commands
`.exit`	Flush and quit
`.tables`	List user tables
`.schema [name]`	Pretty-print schema
`.schema --json`	Dump catalog as JSON (no library — handwritten)
`.stats`	Pages, rows per table, cache hits/misses, dirty writes
`.btree [name]`	Print the tree structure of a table
`.constants`	Sizing constants

What changed from v1

	v1 (C)	v2 (C++17)
Schema	Hard-coded `(id, username, email)` row	Dynamic `CREATE TABLE` with INT and TEXT(n)
Tables	One	Up to four, with their own B+ tree each
Values	Fixed C struct	`std::variant<monostate, int64_t, std::string>`
Parsing	`strtok`-style splitter	Lexer (`std::string_view`) + recursive-descent parser with line/col errors
`WHERE`	PK only	Any column; `= != < <= > >= AND OR`
`SELECT`	`*` only	Column projection, ORDER BY (any column, ASC/DESC), LIMIT
Aggregates	None	`COUNT(*) MIN MAX SUM AVG`
Plan visibility	None	`EXPLAIN` prints opcodes; planner picks PK fast path vs scan
Errors	Return codes	Exceptions caught at REPL boundary
Memory model	Manual `free`	RAII; `unique_ptr` for AST
Build	Make	Make (primary) + CMakeLists.txt provided
Tests	16 shell checks	30 doctest assertions + 74 integration checks
CI	None	GitHub Actions: build+test and ASan

Source layout

include/dbe/      typed value, schema, catalog, pager, btree,
                  cursor, lexer, expr, parser, bytecode, vm, db
src/              one .cpp per header, plus main.cpp (REPL)
tests/unit/       doctest-based C++ tests
tests/integration/ shell-driven end-to-end tests
.github/workflows ci.yml — build+test and asan jobs on Ubuntu

About 3,300 lines of C++ across the engine.

Limitations & future work

Deliberately scoped out for v2; each is a possible follow-up:

No WAL/journal. A crash mid-write can corrupt the file. A rollback journal (sidecar .journal, the SQLite-3 model) is sketched out but not implemented.
No LRU eviction. The page cache is unbounded; the working set must fit in memory.
No secondary indexes. Non-PK WHERE always scans.
No JOINs, no GROUP BY, no transactions.
DROP TABLE leaks pages. A free-page list would reclaim them.
Fixed-width rows. Variable-length records (SQLite's record format) would let TEXT be unbounded; this is the single largest follow-up.
No NULL surfaced through SQL. The variant has a monostate slot but no syntax produces it yet.

The architecture was designed so each of these can be added without rewriting the layers below it.

License

Personal/educational. No warranty.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
include/dbe		include/dbe
src		src
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CMakeLists.txt		CMakeLists.txt
Makefile		Makefile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

db_engine — a relational database engine in modern C++17

Highlights

Architecture

Quick start

Example session

Supported SQL

Meta-commands

What changed from v1

Source layout

Limitations & future work

License

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

db_engine — a relational database engine in modern C++17

Highlights

Architecture

Quick start

Example session

Supported SQL

Meta-commands

What changed from v1

Source layout

Limitations & future work

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages