Skip to content

docs: durable-execution feasibility study#259

Draft
NikolayS wants to merge 10 commits into
mainfrom
claude/hn-discussion-study-JLOve
Draft

docs: durable-execution feasibility study#259
NikolayS wants to merge 10 commits into
mainfrom
claude/hn-discussion-study-JLOve

Conversation

@NikolayS
Copy link
Copy Markdown
Owner

@NikolayS NikolayS commented May 30, 2026

What

Refreshes the durable-execution feasibility draft and workflow spec after review of Microsoft pg_durable and after checking PR #259 for internal mismatches.

What changed

  • Aligns the PR thesis around PgQue-native event-sourced durable execution over the rotating log.
  • Adds pg_durable as fresh prior art and records the product boundary: workflow durability in Postgres, workflow code in app repositories.
  • Fixes the stale workflow architecture placeholder and updates the generated HTML / JSON artifacts.
  • Replaces broad "exactly-once workflows" language with the honest contract: at-least-once step execution, exactly-once transactional handoff, idempotent external effects.
  • Downgrades throughput from an asserted target to a benchmark hypothesis.
  • Adds blueprints/workflows/HOT_PATH_BENCHMARK.md, the first gate for the batching question: compare mutable workflow_status updates vs PgQue continuation events, plus dedup and wf_live variants.
  • Sharpens the workflow_id capability model: raw ids exist in protected hot queue rows / ev_extra1; lower-trust audit, DLQ, metrics, and error/export surfaces must hash or truncate.

Why

The old PR body and generated spec bundle contradicted the revised feasibility doc, and the draft predated pg_durable, which is now the most relevant fresh prior art for Postgres-native durable execution.

The new hot-path benchmark doc exists because the whole workflow idea lives or dies on one narrow claim: N workflow step-events in one PgQue batch should append N successors and advance the subscription once, without recreating per-workflow update churn.

Verification

Documentation only.

  • git diff --check
  • rg scan for stale active claims / architecture placeholder

Current status

Keep as draft. The feasibility doc is useful now; the next real gate is the hot-path benchmark/prototype, not merging a claim.

claude added 3 commits May 30, 2026 08:42
Evaluate whether PgQue should extend into a durable-workflow engine
(DBOS/absurd-style) on Postgres, and the adoption odds if so.

Synthesizes deep research on DBOS, absurd, Temporal, Restate, Rivet,
and Gadget Silo, grounded against SPECx 2.3 positioning and the PgQ
engine constraints. Key finding: the durable layer needs SKIP-LOCKED
claim/lease semantics, a second concurrency model beside PgQ rotation,
so the zero-bloat differentiator does not transfer. Recommends a thin
transactional-durable-enqueue + experimental checkpointed-steps path
rather than a head-on Temporal/DBOS competitor.
Earlier draft concluded the zero-bloat differentiator does not transfer
to a workflow layer, assuming a mutable workflow_status row updated per
step (the DBOS/absurd strategy). That was wrong.

Model workflow state transitions as appended events over the rotating
log (continuation-passing): each step enqueues its successor instead of
mutating a row. Transitions become appends, not UPDATEs, so zero-bloat
carries through. Exactly-once handoff falls out of insert_event +
finish_batch in one transaction; sleep/timers use the rotating send_at
from PR #237; exclusivity is structural via cooperative consumers; the
only mutable state is a current-state projection bounded by concurrency.

Verdict flips from 'do not compete' to 'compete on a substrate
SKIP-LOCKED systems cannot match for high-throughput durable workflows'.
Remaining real risk: awaitEvent/join semantics.
Event-sourced durable-execution layer authored with samospec (all-Claude
panel). Ships SPEC.md, self-contained HTML brief (BRIEF.html/index.html),
and auxiliary artifacts under blueprints/workflows/. .nojekyll added for
GitHub Pages.
@NikolayS NikolayS temporarily deployed to github-pages May 30, 2026 09:48 — with GitHub Pages Inactive
@NikolayS NikolayS temporarily deployed to github-pages May 30, 2026 09:50 — with GitHub Pages Inactive
@NikolayS NikolayS temporarily deployed to github-pages May 30, 2026 10:08 — with GitHub Pages Inactive
@NikolayS NikolayS temporarily deployed to github-pages May 30, 2026 12:19 — with GitHub Pages Inactive
@NikolayS NikolayS temporarily deployed to github-pages May 30, 2026 12:30 — with GitHub Pages Inactive
Map the durable-workflow design to pgque's real primitives and verify
the keystone against sql/pgque.sql: insert_event + finish_batch compose
atomically in the caller transaction (exactly-once handoff), finish_batch
is one subscription UPDATE per batch (amortization), ev_extra1..4 are
settable+indexable (workflow_id lookup). Flags the retry_queue DELETE-bloat
constraint (route sleeps through rotating send_at, PR #237), gives the new
coordination DDL, concrete awaitEvent/emit + join SQL, a bloat audit, and
the pgque gaps to close (promote send_at, ev_extra1 index, durable.sql).
@NikolayS NikolayS temporarily deployed to github-pages June 2, 2026 12:22 — with GitHub Pages Inactive
Copy link
Copy Markdown
Owner Author

@NikolayS NikolayS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

REV for PR #259 (docs: durable-execution feasibility study)

Verdict: do not merge this as public blueprint yet. It is docs-only, but a few claims are still stronger than the evidence and one generated artifact leaked into the markdown.

Findings:

  1. Blocking — benchmark result now contradicts the acceptance/kill criteria and headline.

    • blueprints/workflows/HOT_PATH_BENCHMARK.md:127 defines viability as workload B having materially better throughput than A, and blueprints/workflows/HOT_PATH_BENCHMARK.md:144 says to stop if B does not clearly beat A on dead-tuple growth and sustained throughput.
    • blueprints/DURABLE_EXECUTION_FEASIBILITY.md:300 still says “zero-bloat at high step-throughput” and blueprints/DURABLE_EXECUTION_FEASIBILITY.md:304 says a million agent iterations leave zero dead tuples and “append+rotate structurally beats update+vacuum.”
    • But the 2026-06-06 1M-transition hot-path run against this PR branch showed: A mutable baseline 186,770 tps / ~815k dead tuples; B PgQue continuation 52,310 tps / 0 event-table dead tuples / ~2k subscription dead tuples. So the tuple-churn claim survives, but the throughput-win claim does not. Before merge, update the docs to say exactly that, or keep the PR draft until a revised benchmark proves a throughput claim. As written, it invites us to publish the part that benchmark just failed.
  2. Medium — stale feasibility framing conflicts with SPEC v0.6’s “hypothesis, not promise” posture.

    • blueprints/DURABLE_EXECUTION_FEASIBILITY.md:326 still has the older “up to a few thousand workflow transitions/sec per database; concede hyperscale to Temporal” framing, while blueprints/workflows/SPEC.md:517 says throughput is a benchmark hypothesis and blueprints/workflows/SPEC.md:526 says v0.6 downgraded asserted throughput claims.
    • Pick one posture. My vote: remove the numeric ceiling and the “concede to Temporal” sentence entirely until benchmark data is in the repo, then frame measured numbers with workload/hardware caveats.
  3. Low — generated wrapper leaked into a markdown file.

    • blueprints/workflows/IMPLEMENTATION_RESEARCH.md:292 contains a literal </content> line. That should not ship.

Notes:

  • git diff --check origin/main...HEAD is clean.
  • BRIEF.html and index.html are byte-identical.
  • The core spec is much better than the earlier draft: it now scopes exactly-once to handoff, calls throughput a hypothesis, handles pg_durable, and has the right caution around await/join-heavy dead tuples. The stale/overconfident bits are mostly in the feasibility and benchmark wrapper docs.
  • GitHub Pages is still not serving this brief (pgque.dev/blueprints/workflows/ returns 404), so nothing is publicly published from the PR branch yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants