docs: durable-execution feasibility study by NikolayS · Pull Request #259 · NikolayS/PgQue

NikolayS · 2026-05-30T08:43:24Z

What

Refreshes the durable-execution feasibility draft and workflow spec after review of Microsoft pg_durable and after checking PR #259 for internal mismatches.

What changed

Aligns the PR thesis around PgQue-native event-sourced durable execution over the rotating log.
Adds pg_durable as fresh prior art and records the product boundary: workflow durability in Postgres, workflow code in app repositories.
Fixes the stale workflow architecture placeholder and updates the generated HTML / JSON artifacts.
Replaces broad "exactly-once workflows" language with the honest contract: at-least-once step execution, exactly-once transactional handoff, idempotent external effects.
Downgrades throughput from an asserted target to a benchmark hypothesis.
Adds blueprints/workflows/HOT_PATH_BENCHMARK.md, the first gate for the batching question: compare mutable workflow_status updates vs PgQue continuation events, plus dedup and wf_live variants.
Sharpens the workflow_id capability model: raw ids exist in protected hot queue rows / ev_extra1; lower-trust audit, DLQ, metrics, and error/export surfaces must hash or truncate.

Why

The old PR body and generated spec bundle contradicted the revised feasibility doc, and the draft predated pg_durable, which is now the most relevant fresh prior art for Postgres-native durable execution.

The new hot-path benchmark doc exists because the whole workflow idea lives or dies on one narrow claim: N workflow step-events in one PgQue batch should append N successors and advance the subscription once, without recreating per-workflow update churn.

Verification

Documentation only.

git diff --check
rg scan for stale active claims / architecture placeholder

Current status

Keep as draft. The feasibility doc is useful now; the next real gate is the hot-path benchmark/prototype, not merging a claim.

Evaluate whether PgQue should extend into a durable-workflow engine (DBOS/absurd-style) on Postgres, and the adoption odds if so. Synthesizes deep research on DBOS, absurd, Temporal, Restate, Rivet, and Gadget Silo, grounded against SPECx 2.3 positioning and the PgQ engine constraints. Key finding: the durable layer needs SKIP-LOCKED claim/lease semantics, a second concurrency model beside PgQ rotation, so the zero-bloat differentiator does not transfer. Recommends a thin transactional-durable-enqueue + experimental checkpointed-steps path rather than a head-on Temporal/DBOS competitor.

Earlier draft concluded the zero-bloat differentiator does not transfer to a workflow layer, assuming a mutable workflow_status row updated per step (the DBOS/absurd strategy). That was wrong. Model workflow state transitions as appended events over the rotating log (continuation-passing): each step enqueues its successor instead of mutating a row. Transitions become appends, not UPDATEs, so zero-bloat carries through. Exactly-once handoff falls out of insert_event + finish_batch in one transaction; sleep/timers use the rotating send_at from PR #237; exclusivity is structural via cooperative consumers; the only mutable state is a current-state projection bounded by concurrency. Verdict flips from 'do not compete' to 'compete on a substrate SKIP-LOCKED systems cannot match for high-throughput durable workflows'. Remaining real risk: awaitEvent/join semantics.

Event-sourced durable-execution layer authored with samospec (all-Claude panel). Ships SPEC.md, self-contained HTML brief (BRIEF.html/index.html), and auxiliary artifacts under blueprints/workflows/. .nojekyll added for GitHub Pages.

Map the durable-workflow design to pgque's real primitives and verify the keystone against sql/pgque.sql: insert_event + finish_batch compose atomically in the caller transaction (exactly-once handoff), finish_batch is one subscription UPDATE per batch (amortization), ev_extra1..4 are settable+indexable (workflow_id lookup). Flags the retry_queue DELETE-bloat constraint (route sleeps through rotating send_at, PR #237), gives the new coordination DDL, concrete awaitEvent/emit + join SQL, a bloat audit, and the pgque gaps to close (promote send_at, ev_extra1 index, durable.sql).

NikolayS

REV for PR #259 (docs: durable-execution feasibility study)

Verdict: do not merge this as public blueprint yet. It is docs-only, but a few claims are still stronger than the evidence and one generated artifact leaked into the markdown.

Findings:

Blocking — benchmark result now contradicts the acceptance/kill criteria and headline.
- blueprints/workflows/HOT_PATH_BENCHMARK.md:127 defines viability as workload B having materially better throughput than A, and blueprints/workflows/HOT_PATH_BENCHMARK.md:144 says to stop if B does not clearly beat A on dead-tuple growth and sustained throughput.
- blueprints/DURABLE_EXECUTION_FEASIBILITY.md:300 still says “zero-bloat at high step-throughput” and blueprints/DURABLE_EXECUTION_FEASIBILITY.md:304 says a million agent iterations leave zero dead tuples and “append+rotate structurally beats update+vacuum.”
- But the 2026-06-06 1M-transition hot-path run against this PR branch showed: A mutable baseline 186,770 tps / ~815k dead tuples; B PgQue continuation 52,310 tps / 0 event-table dead tuples / ~2k subscription dead tuples. So the tuple-churn claim survives, but the throughput-win claim does not. Before merge, update the docs to say exactly that, or keep the PR draft until a revised benchmark proves a throughput claim. As written, it invites us to publish the part that benchmark just failed.
Medium — stale feasibility framing conflicts with SPEC v0.6’s “hypothesis, not promise” posture.
- blueprints/DURABLE_EXECUTION_FEASIBILITY.md:326 still has the older “up to a few thousand workflow transitions/sec per database; concede hyperscale to Temporal” framing, while blueprints/workflows/SPEC.md:517 says throughput is a benchmark hypothesis and blueprints/workflows/SPEC.md:526 says v0.6 downgraded asserted throughput claims.
- Pick one posture. My vote: remove the numeric ceiling and the “concede to Temporal” sentence entirely until benchmark data is in the repo, then frame measured numbers with workload/hardware caveats.
Low — generated wrapper leaked into a markdown file.
- blueprints/workflows/IMPLEMENTATION_RESEARCH.md:292 contains a literal </content> line. That should not ship.

Notes:

git diff --check origin/main...HEAD is clean.
BRIEF.html and index.html are byte-identical.
The core spec is much better than the earlier draft: it now scopes exactly-once to handoff, calls throughput a hypothesis, handles pg_durable, and has the right caution around await/join-heavy dead tuples. The stale/overconfident bits are mostly in the feasibility and benchmark wrapper docs.
GitHub Pages is still not serving this brief (pgque.dev/blueprints/workflows/ returns 404), so nothing is publicly published from the PR branch yet.

claude added 3 commits May 30, 2026 08:42

NikolayS temporarily deployed to github-pages May 30, 2026 09:48 — with GitHub Pages Inactive

spec(workflows): iterate to v0.2.0 + refresh brief

aafaf7b

NikolayS temporarily deployed to github-pages May 30, 2026 09:50 — with GitHub Pages Inactive

spec(workflows): iterate to v0.3.0 + refresh brief

9e7751b

NikolayS temporarily deployed to github-pages May 30, 2026 10:08 — with GitHub Pages Inactive

spec(workflows): iterate to v0.5.0 + refresh brief

9fd8472

NikolayS temporarily deployed to github-pages May 30, 2026 12:19 — with GitHub Pages Inactive

spec(workflows): iterate to v0.5.0 + refresh brief

53b94e2

NikolayS temporarily deployed to github-pages May 30, 2026 12:30 — with GitHub Pages Inactive

NikolayS temporarily deployed to github-pages June 2, 2026 12:22 — with GitHub Pages Inactive

NikolayS added 2 commits June 6, 2026 00:55

docs: refresh durable workflow study

40d53b0

docs: add workflow hot-path benchmark

adb723b

NikolayS commented Jun 7, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: durable-execution feasibility study#259

docs: durable-execution feasibility study#259
NikolayS wants to merge 10 commits into
mainfrom
claude/hn-discussion-study-JLOve

NikolayS commented May 30, 2026 •

edited

Loading

Uh oh!

NikolayS left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

NikolayS commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

What changed

Why

Verification

Current status

Uh oh!

NikolayS left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

NikolayS commented May 30, 2026 •

edited

Loading