test(core): de-flake sleep/stepWinsRace e2e wall-clock bounds#2551
test(core): de-flake sleep/stepWinsRace e2e wall-clock bounds#2551pranaygp wants to merge 1 commit into
Conversation
The sleepWinsRaceWorkflow e2e races a 1s sleep against a 10s step and asserts the sleep wins. It carried a brittle `durationMs < 5000` guard that intermittently failed in CI at ~5110ms — the fast branch is ~1s but on preview environments it accrues queue round-trips, cold starts, and replay overhead that can push wall-clock just past the hard 5s bound. The real invariant (which branch won) is already asserted via `winner === 'sleep'`; the duration check only exists to prove we did not block on the 10s losing branch. Widen both this test and its sibling `stepWinsRaceWorkflow` to `< 8000ms`: still comfortably below the 10s loser (so it still proves the slow branch didn't win) while leaving generous headroom for preview jitter. Justify the number in a comment, mirroring the generous-bound pattern already used by parallelSleepWorkflow. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
🧪 E2E Test Results✅ All tests passed Summary
Details by Category✅ ▲ Vercel Production
✅ 💻 Local Development
✅ 📦 Local Production
✅ 🐘 Local Postgres
✅ 🪟 Windows
✅ 📋 Other
|
📊 Benchmark Results
workflow with no steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro workflow with 1 step💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro workflow with 10 sequential steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro workflow with 25 sequential steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro workflow with 50 sequential steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro Promise.all with 10 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro Promise.all with 25 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro Promise.all with 50 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro Promise.race with 10 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro Promise.race with 25 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro Promise.race with 50 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro workflow with 10 sequential data payload steps (10KB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro workflow with 25 sequential data payload steps (10KB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro workflow with 50 sequential data payload steps (10KB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro workflow with 10 concurrent data payload steps (10KB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro workflow with 25 concurrent data payload steps (10KB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro workflow with 50 concurrent data payload steps (10KB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro Stream Benchmarks (includes TTFB metrics)workflow with stream💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro stream pipeline with 5 transform steps (1MB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro 10 parallel streams (1MB each)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro fan-out fan-in 10 streams (1MB each)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro SummaryFastest Framework by WorldWinner determined by most benchmark wins
Fastest World by FrameworkWinner determined by most benchmark wins
Column Definitions
Worlds:
❌ Some benchmark jobs failed:
Check the workflow run for details. |
|
closing this - we're solving the actual timing issues that cause this jitter rather than de-flaking the tests |
Problem
The
sleepWinsRaceWorkflowe2e test (packages/core/e2e/e2e.test.ts) intermittently fails in CI at ~5110ms, just over its hardexpect(returnValue.durationMs).toBeLessThan(5_000)wall-clock bound.Root cause
The workflow runs a
Promise.racebetween a 1ssleepand a 10s step (delayMsStep(10_000, 'step')), and the sleep is expected to win:The meaningful invariant is the race outcome —
winner === 'sleep'— which the test already asserts. ThedurationMs < 5000check is only a secondary guard that we didn't block on the losing 10s branch. But it's an absolute wall-clock bound on a fast branch whose ~1s logical duration accrues preview-environment overhead (cold starts, VQS queue round-trips, replay), which intermittently pushes total elapsed time just past 5s. That makes 5000ms a brittle threshold, not a meaningful one.Fix
winner === 'sleep'/winner === 'step') as the primary assertion.durationMsguard from< 5_000to< 8_000for bothsleepWinsRaceWorkflowand its siblingstepWinsRaceWorkflow(same structure, same fragility).parallelSleepWorkflowabove (< 25_000for a ~10s-sequential worst case).This reduces flakiness without weakening coverage — the test still fails loudly if the slow branch ever wins.
Verification
Run in an isolated worktree off
origin/main:pnpm build(turbo, 27/27 tasks) — greenpnpm typecheck(turbo, 40/40 tasks incl.packages/core) — greenpnpm lint(biome) — no new diagnostics on the changed lines (pre-existing warnings elsewhere in the file are untouched)Not run locally: the full e2e suite (
test:e2e) requires a deployed preview/DEPLOYMENT_URLenvironment and cannot execute on a workstation, so the runtime behavior of the widened bound was not exercised here — only static checks were.🤖 Generated with Claude Code