Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
76 changes: 76 additions & 0 deletions benchmarks-website/AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
<!--
SPDX-License-Identifier: Apache-2.0
SPDX-FileCopyrightText: Copyright the Vortex contributors
-->

# AGENTS.md — `benchmarks-website/`

Read [`README.md`](README.md) first for the architecture and the v2/v3
side-by-side situation. Then this file. The root [`CLAUDE.md`](../CLAUDE.md)
covers Rust style, test layout, commit conventions.

## Don't touch the v2 site

Until the cutover PR lands, the top-level v2 files
(`server.js`, `src/`, `index.html`, `vite.config.js`, `package.json`,
`package-lock.json`, `public/`, the top-level `Dockerfile`,
`docker-compose.yml`, `ec2-init.txt`) and the `benchmarks-website` service
in `docker-compose.yml` and the `publish-benchmarks-website.yml` workflow
are production. Don't edit them as part of unrelated work.

## v3 specifics

- **Wire shapes are a coordinated change.** [`server/src/records.rs`](server/src/records.rs),
[`vortex-bench/src/v3.rs`](../vortex-bench/src/v3.rs), and (until cutover)
[`migrate/src/classifier.rs`](migrate/src/classifier.rs) must agree.
Bumping a shape means changing all three plus the snapshot fixtures in
one commit.
- **`measurement_id` is server-internal.** Never put it on the wire. It is
a deterministic hash over `commit_sha` plus the dim tuple, computed in
[`server/src/db.rs`](server/src/db.rs) and reused by the migrator via
the same crate.
- **Don't write a server-side classifier for live ingest.** The emitter
produces v3-shape records directly; the migrator's classifier only
exists to translate v2 records once and goes away after cutover.
- **Don't reach for WASM.** SSR + a thin hydration script in
[`server/static/chart-init.js`](server/static/chart-init.js) is the
whole client.
- **Don't re-introduce a server-side commit cap.** `?n=all` is the default
for HTML routes; visual downsampling happens client-side via LTTB on the
visible commit range only.
- **Don't refetch on every scope change.** The chart fetches its full
history once. Pan, zoom, slider, and the range strip rebuild in place
via the in-memory LTTB pass on the cached payload. The single exception
is the inline-payload zoom-out path: when the user zooms past the first
group's inlined `LANDING_INLINE_N` window for the first time,
`chart-init.js` lazy-fetches `?n=all` once and replaces the payload.

## Footguns we have already hit

- **Reverse predecessor walk in the tooltip.** `payload.commits[]` is
sorted oldest-first by SQL — `commits[0]` is the oldest, `commits[N-1]`
is the newest. For per-row delta the predecessor of `commits[idx]` is
at `idx - 1`. We caught a regression where a "fix" flipped this to
`idx + 1`; the original walk-backward direction is right.
- **`pointer-events: auto` on the tooltip host.** The tooltip is
positioned at the cursor; making it pointer-interactive causes a
flicker loop. Keep it `pointer-events: none` and offset via
`transform: translate(12px, 12px)`.
- **`change` events on the slider.** Use `input` events with a small
throttle; `change` only fires on release and feels broken.

## Local dev

```bash
INGEST_BEARER_TOKEN=dev cargo run -p vortex-bench-server
cargo nextest run -p vortex-bench-server -p vortex-bench-migrate
INSTA_UPDATE=auto cargo nextest run -p vortex-bench-server # update snapshots
```

For the migrator end-to-end against the real S3 dump:

```bash
cargo run -p vortex-bench-migrate -- run --output ./bench.duckdb
VORTEX_BENCH_DB=./bench.duckdb INGEST_BEARER_TOKEN=dev \
cargo run -p vortex-bench-server
```
108 changes: 108 additions & 0 deletions benchmarks-website/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
<!--
SPDX-License-Identifier: Apache-2.0
SPDX-FileCopyrightText: Copyright the Vortex contributors
-->

# bench.vortex.dev

The website behind `bench.vortex.dev`. The directory currently houses **two
implementations side by side**, run together until the v3 cutover lands:

- **v2** (top-level files: `server.js`, `src/`, `index.html`, `vite.config.js`,
`package.json`, `Dockerfile`, `docker-compose.yml`, `ec2-init.txt`,
`public/`). The Node + React stack that has shipped to production for the
life of the site. Built and published by
[`.github/workflows/publish-benchmarks-website.yml`](../.github/workflows/publish-benchmarks-website.yml).
- **v3** (`server/` + `migrate/`). A single Rust binary —
[`vortex-bench-server`](server/) — that owns a DuckDB file on local disk,
serves the API, and renders the HTML. Compiles all static assets
(`chart.umd.js`, `chart-init.js`, `style.css`) into the binary so deploys
are one file plus a database. Container image at
`ghcr.io/vortex-data/vortex/vortex-bench-server:latest`.
[`migrate/`](migrate/) is a one-shot tool that loads v2's S3 dataset into a
v3 DuckDB; it is throwaway and goes away after cutover.

Live results are produced by
[`.github/workflows/bench.yml`](../.github/workflows/bench.yml) and
[`.github/workflows/sql-benchmarks.yml`](../.github/workflows/sql-benchmarks.yml),
which CI runs after every push to `develop`. Until cutover the same payload is
emitted to both stacks (v2 via the legacy `--gh-json` path appended to a public
S3 bucket; v3 via `--gh-json-v3` POSTed to `/api/ingest`).

## v3 architecture in one paragraph

`axum` (HTTP) + `maud` (compile-time HTML) + embedded `duckdb-rs` over a single
local DB file. Five fact tables (`query_measurements`, `compression_times`,
`compression_sizes`, `random_access_times`, `vector_search_runs`) plus a
`commits` dim table — see [`server/src/schema.rs`](server/src/schema.rs) for
the column contracts. Three HTML routes (`/`, `/chart/{slug}`,
`/group/{slug}`) and four JSON routes (`GET /api/groups`,
`GET /api/chart/{slug}`, `GET /api/group/{slug}`, `GET /health`), plus a
bearer-gated `POST /api/ingest`. Charts render inline on the landing page via
SSR + lazy hydration; visual downsampling (LTTB at most
`MAX_VISIBLE_POINTS = 500`) is client-side in
[`server/static/chart-init.js`](server/static/chart-init.js).

For the per-module crate map and the request-flow walkthrough, see the
`//!` doc on [`server/src/lib.rs`](server/src/lib.rs). The producer side of
the ingest contract lives in
[`vortex-bench/src/v3.rs`](../vortex-bench/src/v3.rs); the historical-data
side in [`migrate/src/classifier.rs`](migrate/src/classifier.rs).

## Local dev

```bash
# v3 server (DuckDB lives at ./bench.duckdb by default).
INGEST_BEARER_TOKEN=dev cargo run -p vortex-bench-server
# server logs: "bench server listening addr=127.0.0.1:3000 db=bench.duckdb"

# v3 historical migrator (writes a fully populated DuckDB the server can open).
cargo run -p vortex-bench-migrate -- run --output ./bench.duckdb
```

Ingest fixture data via the snapshot tests' envelopes (see
[`server/tests/common/mod.rs`](server/tests/common/mod.rs)) or by hand-rolling
a JSONL file and POSTing through `scripts/post-ingest.py`.

```bash
cargo nextest run -p vortex-bench-server -p vortex-bench-migrate
INSTA_UPDATE=auto cargo nextest run -p vortex-bench-server # update snapshots
```

For the v2 stack:

```bash
cd benchmarks-website
npm install
npm run dev
```

## Deployment

`docker-compose.yml` runs both stacks side by side: v2 on `:80` and v3 on
`:3001`. `watchtower` polls GHCR every 60s so a fresh image push lands
automatically. v3 reads `INGEST_BEARER_TOKEN` from
`/etc/vortex-bench/secrets.env`, persists DuckDB to
`/opt/benchmarks-website/data/bench.duckdb`, and binds `0.0.0.0:3000` so the
container's `:3001` host port forwards through.

The v3 server is throwaway-friendly: every request runs against the local
DuckDB file, and a fresh boot reapplies the schema DDL idempotently. The
migrator deletes the target file (and its `.wal`) before populating it, so
re-running `vortex-bench-migrate run --output ...` is safe.

## Cutover plan (in flight)

The work to flip `bench.vortex.dev` from v2 to v3 is tracked outside this
repo. The relevant code-side bits:

- v3 runs alongside v2 on the same EC2 host today (v2 on `:80`, v3 on
`:3001`) and is fed by CI's dual-write `--gh-json-v3` path.
- v2 keeps shipping unchanged until DNS flips. **Do not touch the top-level
v2 files unless you are doing the cleanup PR opened post-flip.**
- The v2 cleanup PR removes everything top-level under `benchmarks-website/`
that belongs to v2 (`server.js`, `src/`, `index.html`, `vite.config.js`,
`package.json`, `package-lock.json`, `public/`, the top-level `Dockerfile`,
`docker-compose.yml`, `ec2-init.txt`, and the
`publish-benchmarks-website.yml` workflow). The v3 tree under `server/` and
`migrate/` is untouched.
20 changes: 17 additions & 3 deletions benchmarks-website/migrate/src/classifier.rs
Original file line number Diff line number Diff line change
Expand Up @@ -103,14 +103,22 @@ const ENGINE_RENAMES: &[(&str, &str)] = &[
("lance", "lance"),
];

/// One entry of `QUERY_SUITES`.
/// One entry of [`QUERY_SUITES`].
#[derive(Debug, Clone, Copy)]
pub struct QuerySuite {
/// Lowercase suite prefix used to match v2 record names (e.g. `tpch`).
pub prefix: &'static str,
/// Human-readable suite name as v2 served it from `/api/metadata`.
pub display_name: &'static str,
/// Uppercase prefix v2's `formatQuery` produced (e.g. `TPC-H`).
pub query_prefix: &'static str,
/// Override for the dataset key v2 records use inside their `dataset`
/// object. Falls back to `prefix` when `None`.
pub dataset_key: Option<&'static str>,
/// True if the suite's group name fans out by `(storage, scale_factor)`
/// (e.g. `TPC-H (NVMe) (SF=1)`); false collapses to a single group.
pub fan_out: bool,
/// True if v2 deliberately ignored this suite (no live group is rendered).
pub skip: bool,
}

Expand Down Expand Up @@ -300,8 +308,12 @@ pub fn get_group(record: &V2Record) -> Option<V2Group> {
/// `(group, chartName, seriesName)` triple after rename / skip rules.
#[derive(Debug, Clone, PartialEq, Eq)]
pub struct V2Classification {
/// Group the v2 server would place this record in.
pub group: V2Group,
/// Chart name v2 displayed for this record (uppercase, separators
/// normalized).
pub chart: String,
/// Series name after v2's `ENGINE_RENAMES` was applied.
pub series: String,
}

Expand Down Expand Up @@ -751,8 +763,10 @@ fn bin_query(cls: &V2Classification, record: &V2Record) -> Option<V3Bin> {
_ => "nvme".to_string(),
};

// ClickBench's "flavor" lives in dataset_variant per benchmark-mapping.md
// - we don't have it from a v2 name string, so we leave it None.
// ClickBench's "flavor" lives in `dataset_variant`, but v2 record names
// never encoded it — leave it `None` so historical and live rows merge
// (the live emitter does the same; see `vortex-bench/src/v3.rs`'s
// `benchmark_dataset_dims` for the matching shape).
Some(V3Bin::Query {
dataset: suite.prefix.to_string(),
dataset_variant: None,
Expand Down
2 changes: 2 additions & 0 deletions benchmarks-website/migrate/src/commits.rs
Original file line number Diff line number Diff line change
Expand Up @@ -91,5 +91,7 @@ fn optional_field(field: &Option<String>) -> Option<String> {
/// Per-call warning bag returned to the caller for logging.
#[derive(Debug, Default)]
pub struct UpsertOutcome {
/// Human-readable warnings — typically one per missing required field on
/// the v2 commit (timestamp, tree_id, url).
pub warnings: Vec<String>,
}
32 changes: 32 additions & 0 deletions benchmarks-website/migrate/src/v2.rs
Original file line number Diff line number Diff line change
Expand Up @@ -19,19 +19,32 @@ use serde::Deserialize;
/// optional because different benches emit different subsets.
#[derive(Debug, Clone, Deserialize)]
pub struct V2Record {
/// Slash-separated benchmark identifier (e.g. `tpch_q01/datafusion:vortex-file-compressed`).
/// The classifier parses this string to recover dim values.
pub name: String,
/// 40-hex commit SHA. Present on every well-formed v2 record.
#[serde(default)]
pub commit_id: Option<String>,
/// v2 unit string (`ns`, `bytes`, `ratio`, ...). Not used for routing —
/// the classifier picks the v3 fact table from the `name` prefix instead.
#[serde(default)]
pub unit: Option<String>,
/// Polymorphic value — emitters wrote both numbers and stringified
/// numbers. Use [`value_as_f64`] to normalize.
#[serde(default)]
pub value: Option<serde_json::Value>,
/// Storage backend the run targeted (`S3` or `NVMe`, mixed case in v2).
#[serde(default)]
pub storage: Option<String>,
/// Polymorphic dataset block — sometimes a string, sometimes an object
/// keyed by suite name with a `scale_factor` inside (use
/// [`dataset_scale_factor`]).
#[serde(default)]
pub dataset: Option<serde_json::Value>,
/// Per-iteration runtimes; same numeric polymorphism as `value`.
#[serde(default)]
pub all_runtimes: Option<Vec<serde_json::Value>>,
/// Host environment triple block.
#[serde(default)]
pub env_triple: Option<V2EnvTriple>,
}
Expand Down Expand Up @@ -101,10 +114,13 @@ pub fn runtime_as_i64(value: &serde_json::Value) -> Option<i64> {
/// stored it as an object; we serialize it back out as `arch-os-env`.
#[derive(Debug, Clone, Deserialize)]
pub struct V2EnvTriple {
/// Host CPU architecture (e.g. `x86_64`).
#[serde(default)]
pub architecture: Option<String>,
/// Operating system name (e.g. `linux`).
#[serde(default)]
pub operating_system: Option<String>,
/// Host environment label (e.g. `gnu`).
#[serde(default)]
pub environment: Option<String>,
}
Expand All @@ -122,26 +138,36 @@ impl V2EnvTriple {
/// One JSONL line of `commits.json`.
#[derive(Debug, Clone, Deserialize)]
pub struct V2Commit {
/// 40-hex commit SHA (the v2 schema named this `id`, not `commit_sha`).
pub id: String,
/// RFC 3339 commit timestamp; required for the v3 row but tolerated as
/// missing in the source dump.
#[serde(default)]
pub timestamp: Option<String>,
/// Full commit message.
#[serde(default)]
pub message: Option<String>,
/// Author block.
#[serde(default)]
pub author: Option<V2Person>,
/// Committer block.
#[serde(default)]
pub committer: Option<V2Person>,
/// Git tree SHA.
#[serde(default)]
pub tree_id: Option<String>,
/// GitHub commit URL.
#[serde(default)]
pub url: Option<String>,
}

/// Author or committer block on a v2 commit record.
#[derive(Debug, Clone, Deserialize)]
pub struct V2Person {
/// Display name.
#[serde(default)]
pub name: Option<String>,
/// Email address.
#[serde(default)]
pub email: Option<String>,
}
Expand All @@ -150,12 +176,18 @@ pub struct V2Person {
/// `scripts/capture-file-sizes.py`.
#[derive(Debug, Clone, Deserialize)]
pub struct V2FileSize {
/// 40-hex commit SHA.
pub commit_id: String,
/// Compression dataset name (`benchmark` is the v2 field name).
pub benchmark: String,
/// TPC SF as a string when relevant.
#[serde(default)]
pub scale_factor: Option<String>,
/// Format the file was produced in.
pub format: String,
/// Path of the underlying file (e.g. `lineitem.parquet`); informational.
pub file: String,
/// Size in bytes; summed across files in the same `(commit, dataset, format)`.
pub size_bytes: i64,
}

Expand Down
8 changes: 8 additions & 0 deletions benchmarks-website/migrate/src/verify.rs
Original file line number Diff line number Diff line change
Expand Up @@ -24,18 +24,26 @@ use crate::classifier::QUERY_SUITES;
/// Result of one `verify` run.
#[derive(Debug, Default)]
pub struct VerifyReport {
/// Group display names present in both v2 and v3.
pub matched_groups: Vec<String>,
/// Group display names that exist in v3 but not v2.
pub only_in_v3: Vec<String>,
/// Group display names that exist in v2 but not v3 — these gate the CLI's
/// non-zero exit.
pub only_in_v2: Vec<String>,
/// Per-group chart-count diffs for groups present on both sides.
pub chart_diffs: Vec<ChartDiff>,
}

/// One group's chart-count divergence between v2 and v3, captured when the
/// group is structurally present on both sides but the counts differ.
#[derive(Debug, Clone)]
pub struct ChartDiff {
/// Group display name.
pub group: String,
/// Number of charts v2 reported for this group.
pub v2_count: usize,
/// Number of charts the migrated v3 DuckDB has for this group.
pub v3_count: usize,
}

Expand Down
Loading
Loading