Skip to content

Commit 50d6324

Browse files
authored
add infrastructure for migrations (#60)
* add infrastructure for migrations * changed pg validation version to 14
1 parent 851d60d commit 50d6324

5 files changed

Lines changed: 2021 additions & 0 deletions

File tree

.claude/commands/make-migration.md

Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
# Create Migration
2+
3+
Generate a new migration file based on changes to `sql/schema.sql`.
4+
5+
## Arguments
6+
7+
- `$ARGUMENTS` - The migration name (e.g., "add_user_table", "update_claim_indexes")
8+
9+
## Workflow
10+
11+
1. **Read the current schema**: Read `sql/schema.sql` to understand the current desired state.
12+
13+
2. **Read existing migrations**: Read all files in `src/postgres/migrations/` to understand what's already been migrated.
14+
15+
3. **Determine the changes**: Compare the schema.sql against what the migrations would produce. Identify:
16+
- New tables, columns, indexes, or constraints to add
17+
- Modified functions or triggers
18+
- Any DROP statements needed (be careful with these)
19+
20+
4. **Generate the migration SQL**: Create SQL that transforms the database from the current migrated state to the new schema.sql state.
21+
- For new tables/indexes: Use `CREATE TABLE IF NOT EXISTS`, `CREATE INDEX IF NOT EXISTS`
22+
- For function updates: Use `CREATE OR REPLACE FUNCTION`
23+
- For existing queues that need new indexes: Include a `DO $$ ... END $$` block that applies changes to existing queue tables
24+
25+
5. **Create the migration file**: Generate a timestamped migration file:
26+
- Filename format: `YYYYMMDDHHMMSS_<name>.sql`
27+
- Place in: `src/postgres/migrations/`
28+
- Use current UTC time for the timestamp
29+
30+
6. **Run validation**: Execute `./scripts/validate-schema` to verify the migration produces the correct schema.
31+
32+
## Example
33+
34+
If the user has added a new index to `ensure_queue_tables` in schema.sql:
35+
36+
```sql
37+
-- New migration: 20260115143022_add_new_index.sql
38+
39+
-- Update ensure_queue_tables to include the new index for future queues
40+
create or replace function durable.ensure_queue_tables (p_queue_name text)
41+
returns void
42+
language plpgsql
43+
as $$
44+
begin
45+
-- ... (full function with new index)
46+
end;
47+
$$;
48+
49+
-- Apply the new index to existing queues
50+
do $$
51+
declare
52+
v_queue text;
53+
begin
54+
for v_queue in select queue_name from durable.queues loop
55+
execute format(
56+
'create index if not exists %I on durable.%I (...)',
57+
('t_' || v_queue) || '_new_idx',
58+
't_' || v_queue
59+
);
60+
end loop;
61+
end;
62+
$$;
63+
```
64+
65+
## Important Notes
66+
67+
- Always use `IF NOT EXISTS` for idempotent migrations
68+
- For function changes, the full function must be included (not just the diff)
69+
- The `DO $$ ... END $$` block for existing queues should NOT be in schema.sql (it's migration-only logic)
70+
- Run validation after creating the migration to ensure schema.sql matches
Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
# Validate Schema
2+
3+
Run the schema validation script to verify that `sql/schema.sql` matches the result of applying all migrations.
4+
5+
## How It Works
6+
7+
The validation script (`scripts/validate-schema`) uses testcontainers to:
8+
9+
1. Start two PostgreSQL 16 containers
10+
2. Apply `sql/schema.sql` directly to container A
11+
3. Apply all migrations in `src/postgres/migrations/` to container B
12+
4. Dump both schemas using `pg_dump --schema-only --schema=durable`
13+
5. Compare the normalized dumps
14+
6. Report pass/fail with a diff on failure
15+
16+
## Running Validation
17+
18+
```bash
19+
./scripts/validate-schema
20+
```
21+
22+
## Requirements
23+
24+
- Docker must be running
25+
- `uv` must be installed (the script uses inline dependencies)
26+
27+
## When to Run
28+
29+
- After creating a new migration with `/make-migration`
30+
- Before committing schema changes
31+
- CI runs this automatically on pull requests
32+
33+
## Troubleshooting
34+
35+
If validation fails, the output will show a unified diff between:
36+
- `schema.sql` - What the schema file defines
37+
- `migrations` - What applying all migrations produces
38+
39+
Common causes of failure:
40+
- Forgot to update schema.sql after adding a migration
41+
- Migration has different SQL than what's in schema.sql
42+
- Migration includes logic that shouldn't be in schema.sql (like `DO $$ ... END $$` blocks for existing queues)

.github/workflows/ci.yml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,12 @@ jobs:
3232
steps:
3333
- uses: actions/checkout@v4
3434

35+
- name: Install uv
36+
uses: astral-sh/setup-uv@v4
37+
38+
- name: Validate schema matches migrations
39+
run: ./scripts/validate-schema
40+
3541
- name: Install Rust toolchain
3642
uses: dtolnay/rust-toolchain@stable
3743
with:

scripts/validate-schema

Lines changed: 223 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,223 @@
1+
#!/usr/bin/env -S uv run --script
2+
# /// script
3+
# requires-python = ">=3.11"
4+
# dependencies = ["psycopg>=3.2.0", "testcontainers>=4.0.0"]
5+
# ///
6+
"""
7+
Validates that sql/schema.sql matches the result of applying all migrations.
8+
9+
This script:
10+
1. Starts two PostgreSQL 14 containers (to match TensorZero's minimum supported version)
11+
2. Container A: Applies sql/schema.sql directly
12+
3. Container B: Applies all migrations in src/postgres/migrations/ in timestamp order
13+
4. Dumps both schemas using pg_dump --schema-only --schema=durable
14+
5. Compares the dumps (excluding the _sqlx_migrations table)
15+
6. Reports pass/fail with diff on failure
16+
"""
17+
18+
import difflib
19+
import re
20+
import subprocess
21+
import sys
22+
from pathlib import Path
23+
24+
import psycopg
25+
from testcontainers.postgres import PostgresContainer
26+
27+
28+
def get_project_root() -> Path:
29+
"""Find the project root by looking for Cargo.toml."""
30+
current = Path(__file__).resolve().parent
31+
while current != current.parent:
32+
if (current / "Cargo.toml").exists():
33+
return current
34+
current = current.parent
35+
raise RuntimeError("Could not find project root (no Cargo.toml found)")
36+
37+
38+
def get_migrations(project_root: Path) -> list[Path]:
39+
"""Get all migration files sorted by timestamp."""
40+
migrations_dir = project_root / "src" / "postgres" / "migrations"
41+
if not migrations_dir.exists():
42+
raise RuntimeError(f"Migrations directory not found: {migrations_dir}")
43+
44+
migrations = sorted(migrations_dir.glob("*.sql"))
45+
if not migrations:
46+
raise RuntimeError(f"No migration files found in {migrations_dir}")
47+
48+
return migrations
49+
50+
51+
def get_psycopg_url(container: PostgresContainer) -> str:
52+
"""Get a psycopg-compatible connection URL from a testcontainer."""
53+
# testcontainers returns a SQLAlchemy-style URL, we need to convert it
54+
host = container.get_container_host_ip()
55+
port = container.get_exposed_port(5432)
56+
return f"postgresql://{container.username}:{container.password}@{host}:{port}/{container.dbname}"
57+
58+
59+
def apply_schema(conn: psycopg.Connection, schema_path: Path) -> None:
60+
"""Apply the schema.sql file to a database."""
61+
sql = schema_path.read_text()
62+
conn.execute(sql)
63+
conn.commit()
64+
65+
66+
def apply_migrations(conn: psycopg.Connection, migrations: list[Path]) -> None:
67+
"""Apply all migrations to a database in order."""
68+
for migration in migrations:
69+
sql = migration.read_text()
70+
conn.execute(sql)
71+
conn.commit()
72+
73+
74+
def dump_schema(container: PostgresContainer) -> str:
75+
"""Dump the durable schema from a database using pg_dump."""
76+
result = subprocess.run(
77+
[
78+
"docker",
79+
"exec",
80+
container.get_wrapped_container().id,
81+
"pg_dump",
82+
"-U",
83+
container.username,
84+
"-d",
85+
container.dbname,
86+
"--schema-only",
87+
"--schema=durable",
88+
"--no-owner",
89+
"--no-privileges",
90+
"--no-comments",
91+
],
92+
capture_output=True,
93+
text=True,
94+
check=True,
95+
)
96+
return result.stdout
97+
98+
99+
def normalize_dump(dump: str) -> str:
100+
r"""Normalize a pg_dump output for comparison.
101+
102+
Removes:
103+
- SET statements and other session configuration
104+
- Comments
105+
- Empty lines
106+
- The _sqlx_migrations table and related objects
107+
- pg_dump session markers (\\restrict, \\unrestrict)
108+
"""
109+
lines = dump.split("\n")
110+
normalized = []
111+
skip_until_semicolon = False
112+
113+
for line in lines:
114+
# Skip SET statements
115+
if line.startswith("SET "):
116+
continue
117+
118+
# Skip SELECT statements (like pg_catalog.set_config)
119+
if line.startswith("SELECT "):
120+
continue
121+
122+
# Skip comments
123+
if line.startswith("--"):
124+
continue
125+
126+
# Skip empty lines
127+
if not line.strip():
128+
continue
129+
130+
# Skip pg_dump session markers
131+
if line.startswith("\\restrict") or line.startswith("\\unrestrict"):
132+
continue
133+
134+
# Skip _sqlx_migrations table and related objects
135+
if "_sqlx_migrations" in line:
136+
skip_until_semicolon = True
137+
continue
138+
139+
if skip_until_semicolon:
140+
if ";" in line:
141+
skip_until_semicolon = False
142+
continue
143+
144+
normalized.append(line)
145+
146+
return "\n".join(normalized)
147+
148+
149+
def main() -> int:
150+
project_root = get_project_root()
151+
schema_path = project_root / "sql" / "schema.sql"
152+
migrations = get_migrations(project_root)
153+
154+
print(f"Project root: {project_root}")
155+
print(f"Schema file: {schema_path}")
156+
print(f"Found {len(migrations)} migrations:")
157+
for m in migrations:
158+
print(f" - {m.name}")
159+
print()
160+
161+
if not schema_path.exists():
162+
print(f"ERROR: Schema file not found: {schema_path}", file=sys.stderr)
163+
return 1
164+
165+
# Use PostgreSQL 14 to match production
166+
postgres_image = "postgres:14-alpine"
167+
168+
print("Starting PostgreSQL containers...")
169+
170+
with (
171+
PostgresContainer(postgres_image) as schema_container,
172+
PostgresContainer(postgres_image) as migrations_container,
173+
):
174+
print("Containers started.")
175+
print()
176+
177+
# Apply schema.sql to container A
178+
print("Applying schema.sql to container A...")
179+
with psycopg.connect(get_psycopg_url(schema_container)) as conn:
180+
apply_schema(conn, schema_path)
181+
print("Schema applied.")
182+
183+
# Apply migrations to container B
184+
print("Applying migrations to container B...")
185+
with psycopg.connect(get_psycopg_url(migrations_container)) as conn:
186+
apply_migrations(conn, migrations)
187+
print("Migrations applied.")
188+
print()
189+
190+
# Dump both schemas
191+
print("Dumping schemas...")
192+
schema_dump = dump_schema(schema_container)
193+
migrations_dump = dump_schema(migrations_container)
194+
195+
# Normalize for comparison
196+
schema_normalized = normalize_dump(schema_dump)
197+
migrations_normalized = normalize_dump(migrations_dump)
198+
199+
# Compare
200+
if schema_normalized == migrations_normalized:
201+
print("SUCCESS: schema.sql matches migrations")
202+
return 0
203+
204+
print("FAILURE: schema.sql does not match migrations")
205+
print()
206+
print("Diff (schema.sql vs migrations):")
207+
print("-" * 60)
208+
209+
diff = difflib.unified_diff(
210+
schema_normalized.split("\n"),
211+
migrations_normalized.split("\n"),
212+
fromfile="schema.sql",
213+
tofile="migrations",
214+
lineterm="",
215+
)
216+
for line in diff:
217+
print(line)
218+
219+
return 1
220+
221+
222+
if __name__ == "__main__":
223+
sys.exit(main())

0 commit comments

Comments
 (0)