feat(run-ops): webapp write path — trigger/batch minting, idempotency routing, run lifecycle#4118
Conversation
|
WalkthroughThis PR migrates webapp write and read paths for run triggering, batch processing, idempotency-key resolution, and alert/session serialization from direct Prisma access to a ChangesCohort: RunStore/control-plane seam migration
Sequence Diagram(s)sequenceDiagram
participant Caller
participant TriggerTaskService
participant ResolveMintKind
participant RunStore
participant PrismaNew
participant PrismaLegacy
Caller->>TriggerTaskService: call(triggerRequest)
TriggerTaskService->>ResolveMintKind: resolveRunIdMintKind / resolveInheritedMintKind
ResolveMintKind-->>TriggerTaskService: mintKind (ksuid | cuid)
TriggerTaskService->>TriggerTaskService: mintRunFriendlyId(environment, parentRunFriendlyId)
TriggerTaskService->>RunStore: findRun(parentRunId) [if parentRunId provided]
RunStore->>PrismaNew: query (if NEW residency)
RunStore->>PrismaLegacy: query (if LEGACY residency)
RunStore-->>TriggerTaskService: parentRun | null
TriggerTaskService-->>Caller: created run (friendlyId, depth, parentTaskRunId)
Compact MetadataEstimated code review effort: 5/5 (very high — 60+ files touched, high-risk data-routing logic, extensive but distributed test coverage) Poem: 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
413a945 to
99643f8
Compare
515b897 to
cb97148
Compare
@trigger.dev/build
trigger.dev
@trigger.dev/core
@trigger.dev/python
@trigger.dev/react-hooks
@trigger.dev/redis-worker
@trigger.dev/rsc
@trigger.dev/schema-to-json
@trigger.dev/sdk
commit: |
|
Addressed the outside-diff note on |
26871d5 to
cdc4eb9
Compare
c59d9c5 to
d5d7fa1
Compare
cdc4eb9 to
e0b35d5
Compare
0db90f0 to
d5415e8
Compare
8024e36 to
f9b9b0b
Compare
aa55b6b to
3153bc4
Compare
f9b9b0b to
0937b15
Compare
3153bc4 to
d561590
Compare
0937b15 to
729daf1
Compare
d561590 to
9e7c367
Compare
729daf1 to
bd6fc79
Compare
9e7c367 to
e23432d
Compare
bd6fc79 to
a7e0846
Compare
e23432d to
8dff8b2
Compare
a7e0846 to
4119616
Compare
891d81a to
5140cbc
Compare
d087c25 to
b554794
Compare
b554794 to
071cdc1
Compare
f8f3096 to
4bda37a
Compare
… routing, run lifecycle Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…e in TriggerFailedTaskService TriggerFailedTaskService read the parent run via the ambient module-singleton store while the engine wrote the run through its own store, so a ksuid parent's row was not found and parentTaskRunId came back null. Add an optional injected runStore (defaults to the shared singleton, preserving production behaviour) and resolve the parent through it at both call sites, mirroring triggerTask.server.ts. Align the three affected webapp tests to read through the same store the engine wrote to: triggerFailedTask.test.ts passes engine.runStore; performTaskRunAlerts routing passes a passthrough store over the seeded container; triggerTask.test.ts stubs the run-ops db handles and pins split mode off so the idempotency dedup uses the container client. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…id-shape only Migration is deferred, so child/batch residency is a pure id-shape check. Remove the isKnownMigrated (and mint-only isSplitEnabled) deps from the mint sites (triggerTask, triggerFailedTask, batchTriggerV3) and call the now- synchronous resolveInheritedMintKind(parentFriendlyId) with no deps arg. Read paths: drop the isKnownMigrated re-probe-avoidance from the ClickHouse runs hydrate (probe all missing on legacy), the runsRepository readThrough options type, resolveWaitpointThroughReadThrough deps, and the BulkActionV2 batch seam adapter — keeping the genuine cross-seam fallback that reads NEW first for unclassifiable/legacy-candidate ids. Delete the injected-marker test cases; the remaining residency tests assert pure id-shape inheritance. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…s and test names Review hygiene only: remove the NEW-1 label, Test X: name prefixes, and [TEST-NEWSEED] comment label. No product logic or test behavior changed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…o safe run-ops clients The read-through concern defaulted both newClient and legacyReplica to $replica (control-plane), so a bare caller that omits `deps` — the waitpoints wait route — never queried the dedicated run-ops replica. A co-located, NEW-resident waitpoint minted by streams.input().wait() lives on the run-ops-new DB, so the read missed, returned null, and the route 404'd (re-serialized to 500). Match the deps the complete/callback routes pass: default newClient to runOpsNewReplica, legacyReplica to $replica, and splitEnabled to runOpsSplitReadEnabled — mirroring readThroughRun's own self-defaulting. This immunizes any bare caller (present or future) against the control-plane pin, without touching the wait route. The wait/complete/callback call sites live on a higher branch and are unchanged; complete/callback keep their explicit deps (now redundant but harmless). Adds a heteroRunOps regression case driving the concern with no `deps` via the `defaults` DI seam: proves the old $replica default misses a NEW-resident waitpoint (null) while the safe run-ops default finds it. No mocks; the fallback is exercised against real PG14/PG17 containers. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…rvice prisma to the resolved store - Block the idempotent parent run's waitpoint via the residency-resolved dedup client instead of the fallback prisma, so the write lands on the store that owns the parent run. - Pass the caller-provided _prisma into WithRunEngine so a custom store isn't silently overridden by the module singleton. - Throw when a run-backed alert's environment can't be resolved instead of marking it SENT, so a transient replica miss doesn't permanently suppress the alert. - Pin splitEnabled:false in the waitpoint passthrough test so it exercises single-DB behaviour rather than relying on ksuid residency. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The write-path split added static `runOpsLegacyPrisma`/`runOpsNewPrisma` imports to idempotencyKeys.server.ts, which this test loads. vitest validates every named import against the `~/db.server` mock, so the mock now errored on the missing run-ops singletons. Add the four run-ops exports (empty stubs, same boundary pattern as the batchTriggerV3 residency test) and pin isSplitEnabled() to false so the dedup routing deterministically returns the injected fake prisma regardless of the ambient RUN_OPS_SPLIT_ENABLED. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…setup Worker/engine/marqs/pubsub/socket singletons each construct an ioredis client at import time (singleton() + no lazyConnect), so any test importing the service graph opened real Redis connections on import. In CI there is no Redis, so these accumulate infinite-retry clients across a shard and take the suite down (locally they pass only because dev Redis is up). Globally mock the eager-Redis modules to no-op stubs in test/setup.ts: commonWorker, batchTriggerWorker, legacyRunEngineWorker, alertsWorker, the RunEngine and MarQS singletons, devPubSub and the socket.io server. Only these singletons are mocked — never the run store (~/v3/runStore.server, ~/db.server), which store-routing/residency tests need real against testcontainer Postgres. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…yConnect + stub runtime Redis singletons The setup-file mocks of the six eager worker/engine singletons were not enough: CI shards still flooded ECONNREFUSED/maxRetries. Two further classes of env-Redis usage survived them, reproduced locally by running the failing shards with REDIS_PORT pointed at a dead port: 1. Import-time construction: ~15 more singletons (platform cache, billing-limit reconcile queue, alerts rate limiter, DevPresence, auto-increment counter, s2 token cache, v1 streams cache, ...) build ioredis clients at module import, and ioredis dials on construction. A global ioredis mock now forces lazyConnect: true so clients only dial on first command — testcontainer-backed tests are unaffected (their first command connects as before). 2. Runtime commands inside code under test: tracePubSub.publish() (eventRepository writes), alertsRateLimiter.check() (deliverAlert) and the task metadata cache each issue commands against env-configured Redis mid-test; every command burns ~20 reconnect cycles before its error surfaces, which times the tests out. These three modules are now stubbed (metadata cache pinned to its Noop implementation, which is what CI's unset env resolves to anyway). Verified: webapp shards 2/5/6/8 (the ones failing on the pr06+ stack) run green with Redis pointed at a dead port, and shards 2/8 stay green against live Redis (store-routing suites still exercise the real run store). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…in CI CI runners have no .env, no REDIS_HOST/REDIS_PORT, and no Postgres at localhost:5432, which surfaced two failure layers that local runs mask (the dev stack answers on both): - suites transitively importing triggerTaskV1.server failed to collect because autoIncrementCounter.server.ts throws at import when REDIS_HOST/REDIS_PORT are unset (shards 2/5/6). Default the pair in test/setup.ts — the global ioredis lazyConnect mock means nothing dials. - TriggerFailedTaskService.call() resolved its event repository via getEventRepository → global prisma (feature-flag read + Prisma event repo), so in CI the swallowed connect error returned null friendlyIds (shard 8). Allow injecting the repository/store pair and bind the test to an EventRepository over the testcontainer DB. - once the cancelDevSessionRuns suite could collect, findLatestSession's hardwired global $replica was the next masked layer; give it an injectable client (defaulting to $replica) and pass the service's _replica through. Verified by replaying the exact CI env locally (.env hidden, workflow env vars, dead localhost DB, GITHUB_ACTIONS set): all four failing suites and full shards 2/5/6/8 reproduce the CI failures before and pass after. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…method access Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…am regressions Two regression tests for the write-path read seams: - runsRepository: paginating the full keyset over interleaved cuid/ksuid runs enumerates every id once, no empty page, in ClickHouse (created_at DESC, run_id DESC) order -- fails if hydration reverts to lexical id desc across the id-space seam. - runReader: a NEW-resident (ksuid) run's terminal metadata hydrates through the owning store, never a generic legacy replica. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
4bda37a to
ea22f52
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 24f52900-ef0f-43d1-973c-1bc57696ef16
📒 Files selected for processing (2)
.server-changes/run-ops-split-webapp-write-path.mdapps/webapp/app/models/runtimeEnvironment.server.ts
📜 Review details
⏰ Context from checks skipped due to timeout. (11)
- GitHub Check: webapp / 🧪 Unit Tests: Webapp (4, 10)
- GitHub Check: webapp / 🧪 Unit Tests: Webapp (7, 10)
- GitHub Check: webapp / 🧪 Unit Tests: Webapp (10, 10)
- GitHub Check: webapp / 🧪 Unit Tests: Webapp (9, 10)
- GitHub Check: webapp / 🧪 Unit Tests: Webapp (6, 10)
- GitHub Check: webapp / 🧪 Unit Tests: Webapp (1, 10)
- GitHub Check: webapp / 🧪 Unit Tests: Webapp (5, 10)
- GitHub Check: webapp / 🧪 Unit Tests: Webapp (8, 10)
- GitHub Check: webapp / 🧪 Unit Tests: Webapp (2, 10)
- GitHub Check: webapp / 🧪 Unit Tests: Webapp (3, 10)
- GitHub Check: typecheck / typecheck
⚠️ CI failures not shown inline (4)
GitHub Actions: 📝 CLAUDE.md Audit / audit: feat(run-ops): webapp write path — trigger/batch minting, idempotency routing, run lifecycle
Conclusion: failure
##[group]Run anthropics/claude-code-action@428971d2ecd6e3a7cb0ee0da2a3a8b33fdb3678d
with:
anthropic_***REDACTED***
use_sticky_comment: true
allowed_bots: devin-ai-integration[bot]
claude_args: --max-turns 25
--model claude-opus-4-8
--allowedTools "Read,Glob,Grep,Bash(git diff:*)"
prompt: You are reviewing a PR to check whether any CLAUDE.md files or .claude/rules/ files need updating.
## Your task
1. Run `git diff origin/main...HEAD --name-only` to see which files changed in this PR.
2. For each changed directory, check if there's a CLAUDE.md in that directory or a parent directory.
3. Determine if any CLAUDE.md or .claude/rules/ file should be updated based on the changes. Consider:
- New files/directories that aren't covered by existing documentation
- Changed architecture or patterns that contradict current CLAUDE.md guidance
- New dependencies, services, or infrastructure that Claude should know about
- Renamed or moved files that are referenced in CLAUDE.md
- Changes to build commands, test patterns, or development workflows
## Response format
If NO updates are needed, respond with exactly:
✅ CLAUDE.md files look current for this PR.
If updates ARE needed, respond with a short list:
📝 **CLAUDE.md updates suggested:**
- `path/to/CLAUDE.md`: [what should be added/changed]
- `.claude/rules/file.md`: [what should be added/changed]
Keep suggestions specific and brief. Only flag things that would actually mislead Claude in future sessions.
Do NOT suggest updates for trivial changes (bug fixes, small refactors within existing patterns).
Do NOT suggest creating new CLAUDE.md files - only updates to existing ones.
trigger_phrase: `@claude`
label_trigger: claude
branch_prefix: claude/
use_bedrock: false
use_vertex: false
use_foundry: false
classify_inline_comments: true
use_commit_signing: false
bot_id: 41898282
bot_name: claude[bot]
track_progress: false
include_fix_links: true
display_report: false...
GitHub Actions: 📝 CLAUDE.md Audit / 0_audit.txt: feat(run-ops): webapp write path — trigger/batch minting, idempotency routing, run lifecycle
Conclusion: failure
alt@3.0.0-beta.46 -> `@trigger.dev/yalt`@3.0.0-beta.46
* [new tag] `@trigger.dev/yalt`@3.0.0-beta.47 -> `@trigger.dev/yalt`@3.0.0-beta.47
* [new tag] `@trigger.dev/yalt`@3.0.0-beta.48 -> `@trigger.dev/yalt`@3.0.0-beta.48
* [new tag] `@trigger.dev/yalt`@3.0.0-beta.49 -> `@trigger.dev/yalt`@3.0.0-beta.49
* [new tag] `@trigger.dev/yalt`@3.0.0-beta.5 -> `@trigger.dev/yalt`@3.0.0-beta.5
* [new tag] `@trigger.dev/yalt`@3.0.0-beta.50 -> `@trigger.dev/yalt`@3.0.0-beta.50
* [new tag] `@trigger.dev/yalt`@3.0.0-beta.51 -> `@trigger.dev/yalt`@3.0.0-beta.51
* [new tag] `@trigger.dev/yalt`@3.0.0-beta.52 -> `@trigger.dev/yalt`@3.0.0-beta.52
* [new tag] `@trigger.dev/yalt`@3.0.0-beta.53 -> `@trigger.dev/yalt`@3.0.0-beta.53
* [new tag] `@trigger.dev/yalt`@3.0.0-beta.55 -> `@trigger.dev/yalt`@3.0.0-beta.55
* [new tag] `@trigger.dev/yalt`@3.0.0-beta.6 -> `@trigger.dev/yalt`@3.0.0-beta.6
* [new tag] `@trigger.dev/yalt`@3.0.0-beta.7 -> `@trigger.dev/yalt`@3.0.0-beta.7
* [new tag] build-alert-hotfix.rc1 -> build-alert-hotfix.rc1
* [new tag] build-alert-hotfix.rc2 -> build-alert-hotfix.rc2
* [new tag] build-arm-builds-rc.1 -> build-arm-builds-rc.1
* [new tag] build-arm-builds-rc.2 -> build-arm-builds-rc.2
* [new tag] build-arm-builds-rc.3 -> build-arm-builds-rc.3
* [new tag] build-batchid-carryover-rc.0 -> build-batchid-carryover-rc.0
* [new tag] build-batching-rc.1 -> build-batching-rc.1
* [new tag] build-batching-rc.2 -> build-batching-rc.2
* [new tag] build-billing-0.0.1 -> build-billing-0.0.1
* [new tag] build-billing-0.0.2 -> build-billing-0.0.2
* [new tag] build-billing-0.0.3 -> build-billing-0.0.3
* [new tag] build-buildinfo-rc.0 -> b...
GitHub Actions: 🔎 REVIEW.md Drift Audit / audit: feat(run-ops): webapp write path — trigger/batch minting, idempotency routing, run lifecycle
Conclusion: failure
##[group]Run anthropics/claude-code-action@428971d2ecd6e3a7cb0ee0da2a3a8b33fdb3678d
with:
anthropic_***REDACTED***
use_sticky_comment: true
allowed_bots: devin-ai-integration[bot]
claude_args: --max-turns 30
--allowedTools "Read,Glob,Grep,Bash(git diff:*)"
prompt: You are auditing this PR for drift against `.claude/REVIEW.md`.
## Context
`.claude/REVIEW.md` is the repo's source of truth for what AI / agent code reviewers should treat as critical findings (rolling-deploy safety, hot-table indexes, recovery-path queries, testcontainers usage, Lua versioning, etc.). It is consumed by review agents to calibrate severity. If REVIEW.md goes stale, every future agent review degrades.
## Strategy — read this first
You have a hard turn budget. Spend it on signal, not coverage. The audit is allowed to miss things; it is NOT allowed to time out.
1. Read `.claude/REVIEW.md` once, in full.
2. Run `git diff origin/main...HEAD --name-only` to get the list of changed files. Do NOT read the diff content yet.
3. Scan the file-list for relevance to REVIEW.md scope. Relevance signals: changes to Prisma schema, Redis / queue / Lua code, hot tables, recovery / restart loops, new packages, deletions of paths REVIEW.md cites. Skim everything else.
4. Open at most **5 files** total — only the ones most likely to surface a real signal. If nothing in the file-list looks relevant to any REVIEW.md rule, do NOT read any files; go straight to the verdict.
5. Form a verdict and stop. Do not exhaust the turn budget exploring.
Large PRs (>50 files changed) are a strong signal to be MORE selective, not more thorough. Pick 3-5 files at most.
## What to look for
- **Stale references** — does any REVIEW.md rule cite a file, directory, function, table, Prisma model, or package name that has been removed or renamed in this PR (or is already gone from `main`)?
- **Contradictions** — does code in this PR clearly violate a current REVIEW.md rule? (Don't re-review the PR. Only flag if REVIE...
GitHub Actions: 🔎 REVIEW.md Drift Audit / 0_audit.txt: feat(run-ops): webapp write path — trigger/batch minting, idempotency routing, run lifecycle
Conclusion: failure
y-run-engine.fix3 -> build-legacy-run-engine.fix3
* [new tag] build-manual-checkpoints.rc1 -> build-manual-checkpoints.rc1
* [new tag] build-metadata-upgrade-logging.rc1 -> build-metadata-upgrade-logging.rc1
* [new tag] build-metadata-upgrade-logging.rc2 -> build-metadata-upgrade-logging.rc2
* [new tag] build-metadata-upgrade-logging.rc3 -> build-metadata-upgrade-logging.rc3
* [new tag] build-new-build-system.rc.1 -> build-new-build-system.rc.1
* [new tag] build-otel-upgrade-rc.0 -> build-otel-upgrade-rc.0
* [new tag] build-otel-upgrade-rc.1 -> build-otel-upgrade-rc.1
* [new tag] build-pre-pull-deployments-rc.1 -> build-pre-pull-deployments-rc.1
* [new tag] build-prod-rescue-rc.1 -> build-prod-rescue-rc.1
* [new tag] build-rate-limiter-fix-rc.1 -> build-rate-limiter-fix-rc.1
* [new tag] build-re2.rc0 -> build-re2.rc0
* [new tag] build-realtime-v2-stream-fix -> build-realtime-v2-stream-fix
* [new tag] build-realtime-v2-stream-fix-2 -> build-realtime-v2-stream-fix-2
* [new tag] build-realtime-v2-stream-fix-3 -> build-realtime-v2-stream-fix-3
* [new tag] build-realtime-v2-stream-fix-4 -> build-realtime-v2-stream-fix-4
* [new tag] build-realtime-v2-stream-fix-5 -> build-realtime-v2-stream-fix-5
* [new tag] build-realtimestreams-dedupe -> build-realtimestreams-dedupe
* [new tag] build-registry-maintenance-rc.1 -> build-registry-maintenance-rc.1
* [new tag] build-registry-maintenance-rc.2 -> build-registry-maintenance-rc.2
* [new tag] build-remote-ecr-rc.0 -> build-remote-ecr-rc.0
* [new tag] build-reschedule-hotfix.rc1 -> build-reschedule-hotfix.rc1
* [new tag] build-resume-fixes.rc1 -> build-resume-fixes.rc1
* [new tag] ...
🧰 Additional context used
📓 Path-based instructions (8)
**/*.{ts,tsx}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
**/*.{ts,tsx}: Use types over interfaces for TypeScript
Avoid using enums; prefer string unions or const objects instead
Files:
apps/webapp/app/models/runtimeEnvironment.server.ts
{packages/core,apps/webapp}/**/*.{ts,tsx}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Use zod for validation in packages/core and apps/webapp
Files:
apps/webapp/app/models/runtimeEnvironment.server.ts
**/*.{ts,tsx,js,jsx}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Use function declarations instead of default exports
Files:
apps/webapp/app/models/runtimeEnvironment.server.ts
**/*.ts
📄 CodeRabbit inference engine (.cursor/rules/otel-metrics.mdc)
**/*.ts: When creating or editing OTEL metrics (counters, histograms, gauges), ensure metric attributes have low cardinality by using only enums, booleans, bounded error codes, or bounded shard IDs
Do not use high-cardinality attributes in OTEL metrics such as UUIDs/IDs (envId, userId, runId, projectId, organizationId), unbounded integers (itemCount, batchSize, retryCount), timestamps (createdAt, startTime), or free-form strings (errorMessage, taskName, queueName)
When exporting OTEL metrics via OTLP to Prometheus, be aware that the exporter automatically adds unit suffixes to metric names (e.g., 'my_duration_ms' becomes 'my_duration_ms_milliseconds', 'my_counter' becomes 'my_counter_total'). Account for these transformations when writing Grafana dashboards or Prometheus queries
Files:
apps/webapp/app/models/runtimeEnvironment.server.ts
apps/webapp/**/*.{ts,tsx}
📄 CodeRabbit inference engine (.cursor/rules/webapp.mdc)
apps/webapp/**/*.{ts,tsx}: Access environment variables through theenvexport ofenv.server.tsinstead of directly accessingprocess.env
Use subpath exports from@trigger.dev/corepackage instead of importing from the root@trigger.dev/corepathUse named constants for sentinel/placeholder values (e.g.
const UNSET_VALUE = '__unset__') instead of raw string literals scattered across comparisons
Files:
apps/webapp/app/models/runtimeEnvironment.server.ts
apps/webapp/**/*.server.ts
📄 CodeRabbit inference engine (apps/webapp/CLAUDE.md)
apps/webapp/**/*.server.ts: Never userequest.signalfor detecting client disconnects. UsegetRequestAbortSignal()fromapp/services/httpAsyncStorage.server.tsinstead, which is wired directly to Expressres.on('close')and fires reliably
Access environment variables viaenvexport fromapp/env.server.ts. Never useprocess.envdirectly
Always usefindFirstinstead offindUniquein Prisma queries.findUniquehas an implicit DataLoader that batches concurrent calls and has active bugs even in Prisma 6.x (uppercase UUIDs returning null, composite key SQL correctness issues, 5-10x worse performance).findFirstis never batched and avoids this entire class of issues
Files:
apps/webapp/app/models/runtimeEnvironment.server.ts
**/*.{ts,tsx,js,jsx,mts,cts,mjs,cjs}
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.{ts,tsx,js,jsx,mts,cts,mjs,cjs}: Usepnpm run typecheckfor changes in apps (apps/*) and internal packages (internal-packages/*), and never usebuildto verify those changes.
Use Vitest for tests, and never mock anything; use testcontainers instead.
Prefer static imports over dynamicimport(), and only use dynamic imports for unresolved circular dependencies, genuine code-splitting needs, or conditional runtime loading.
Files:
apps/webapp/app/models/runtimeEnvironment.server.ts
**/*.{ts,tsx,js,jsx,mts,cts,mjs,cjs,md,mdx}
📄 CodeRabbit inference engine (CLAUDE.md)
Always import from
@trigger.dev/sdkwhen writing Trigger.dev tasks; never use@trigger.dev/sdk/v3or deprecatedclient.defineJob.
Files:
apps/webapp/app/models/runtimeEnvironment.server.ts
🧠 Learnings (14)
📚 Learning: 2026-03-22T13:26:12.060Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3244
File: apps/webapp/app/components/code/TextEditor.tsx:81-86
Timestamp: 2026-03-22T13:26:12.060Z
Learning: In the triggerdotdev/trigger.dev codebase, do not flag `navigator.clipboard.writeText(...)` calls for `missing-await`/`unhandled-promise` issues. These clipboard writes are intentionally invoked without `await` and without `catch` handlers across the project; keep that behavior consistent when reviewing TypeScript/TSX files (e.g., usages like in `apps/webapp/app/components/code/TextEditor.tsx`).
Applied to files:
apps/webapp/app/models/runtimeEnvironment.server.ts
📚 Learning: 2026-03-22T19:24:14.403Z
Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 3187
File: apps/webapp/app/v3/services/alerts/deliverErrorGroupAlert.server.ts:200-204
Timestamp: 2026-03-22T19:24:14.403Z
Learning: In the triggerdotdev/trigger.dev codebase, webhook URLs are not expected to contain embedded credentials/secrets (e.g., fields like `ProjectAlertWebhookProperties` should only hold credential-free webhook endpoints). During code review, if you see logging or inclusion of raw webhook URLs in error messages, do not automatically treat it as a credential-leak/secrets-in-logs issue by default—first verify the URL does not contain embedded credentials (for example, no username/password in the URL, no obvious secret/token query params or fragments). If the URL is credential-free per this project’s conventions, allow the logging.
Applied to files:
apps/webapp/app/models/runtimeEnvironment.server.ts
📚 Learning: 2026-05-18T08:21:27.694Z
Learnt from: d-cs
Repo: triggerdotdev/trigger.dev PR: 3632
File: apps/webapp/sentry.server.ts:4-21
Timestamp: 2026-05-18T08:21:27.694Z
Learning: When handling Prisma error P1001 ("Can't reach database server") in TypeScript, don’t assume a single error shape. Prisma can surface P1001 via two different error classes/fields: `PrismaClientKnownRequestError` exposes it as `err.code === "P1001"` (common during mid-query connection drops), while `PrismaClientInitializationError` exposes it as `err.errorCode === "P1001"` (common on client startup failure). Therefore, predicates should use `err.code === "P1001" || err.errorCode === "P1001"`. Do not flag `err.code === "P1001"` as “unreachable/never matches,” as it is expected in production.
Applied to files:
apps/webapp/app/models/runtimeEnvironment.server.ts
📚 Learning: 2026-05-18T08:21:27.694Z
Learnt from: d-cs
Repo: triggerdotdev/trigger.dev PR: 3632
File: apps/webapp/sentry.server.ts:4-21
Timestamp: 2026-05-18T08:21:27.694Z
Learning: When handling Prisma errors for P1001 ("Can't reach database server"), do not assume it only appears under a single property name. Prisma may surface P1001 via either `PrismaClientKnownRequestError` (`err.code === "P1001"`, e.g., mid-query connection drops) or `PrismaClientInitializationError` (`err.errorCode === "P1001"`, e.g., client startup connection failure). To reliably detect the condition, check `err.code === "P1001" || err.errorCode === "P1001"`, and avoid review rules that would incorrectly flag `err.code === "P1001"` as unreachable/never-matching.
Applied to files:
apps/webapp/app/models/runtimeEnvironment.server.ts
📚 Learning: 2026-06-13T19:53:13.759Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3937
File: packages/trigger-sdk/skills/realtime-and-frontend/SKILL.md:258-260
Timestamp: 2026-06-13T19:53:13.759Z
Learning: When reviewing code that uses `trigger.dev/react-hooks`’s `useRealtimeRun`, preserve the call signature where the first argument is the full realtime handle object (not `handle.id`). This is intentional to maintain type-safety and is consistent with the official docs; do not suggest changing the first argument from the handle object to `handle.id`.
Applied to files:
apps/webapp/app/models/runtimeEnvironment.server.ts
📚 Learning: 2026-06-17T17:13:49.929Z
Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 3948
File: apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.bulk-actions.$bulkActionParam/route.tsx:48-62
Timestamp: 2026-06-17T17:13:49.929Z
Learning: In triggerdotdev/trigger.dev, within `dashboardLoader`/`dashboardAction` (or similar context resolver code) whenever you resolve an organization ID from an organization slug for RBAC/enterprise authorization scope, always read from the primary Prisma client (`prisma`), not `$replica`. Using `$replica` can hit replica-lag and cause the RBAC lookup/authorization to run without the correct org scope (bypassing intended role enforcement). Implement the slug→org lookup with `prisma.organization.findFirst(...)` (or equivalent primary-client query) and add an inline comment documenting why the primary client is required (replica lag could lead to unscoped RBAC checks).
Applied to files:
apps/webapp/app/models/runtimeEnvironment.server.ts
📚 Learning: 2026-06-23T13:04:21.413Z
Learnt from: carderne
Repo: triggerdotdev/trigger.dev PR: 4023
File: apps/webapp/app/services/upsertBranch.server.ts:14-18
Timestamp: 2026-06-23T13:04:21.413Z
Learning: In TypeScript, it’s valid to `import { type X }` and then use `typeof X` in a type-only position, e.g. `type Alias = z.infer<typeof X>`. The `type` modifier suppresses the runtime import, but the type checker still has the full exported type so `z.infer<typeof X>` can resolve correctly. In code reviews, don’t flag this as a TypeScript compile error as long as `typeof X` is used in a type context (e.g., with `z.infer`, `type` aliases, generics), not as a runtime value.
Applied to files:
apps/webapp/app/models/runtimeEnvironment.server.ts
📚 Learning: 2026-05-05T09:38:02.512Z
Learnt from: d-cs
Repo: triggerdotdev/trigger.dev PR: 3523
File: apps/webapp/app/routes/api.v3.batches.ts:178-181
Timestamp: 2026-05-05T09:38:02.512Z
Learning: When reviewing code that catches `ServiceValidationError` in `*.server.ts` files, do not blindly forward `error.status` to HTTP responses, because SVEs may be thrown with non-default statuses (e.g., 400/500) and forwarding them can cause client-visible behavioral regressions (e.g., surfacing 500s to clients). Prefer a safe default response status of `error.status ?? 422`, but only after confirming via the reachable call graph that the caught `ServiceValidationError` instances are expected to carry those non-default statuses; otherwise, normalize to `422` to avoid unexpected client-visible 5xx behavior.
Applied to files:
apps/webapp/app/models/runtimeEnvironment.server.ts
📚 Learning: 2026-05-12T21:04:05.815Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3542
File: apps/webapp/app/components/sessions/v1/SessionStatus.tsx:1-3
Timestamp: 2026-05-12T21:04:05.815Z
Learning: In this Remix + TypeScript codebase, do not flag a server/client boundary violation when a file imports only types from a module matching `*.server`.
Specifically, it’s safe to import types using `import type { Foo } from "*.server"` or `import { type Foo } from "*.server"` because TypeScript erases type-only imports at compile time and they emit no JavaScript, so they won’t cross the Remix server/client bundle boundary.
Only raise the boundary concern for value imports (e.g., `import { Foo }` without `type`, or `import Foo`), since those produce JavaScript output.
Applied to files:
apps/webapp/app/models/runtimeEnvironment.server.ts
📚 Learning: 2026-06-25T18:21:51.905Z
Learnt from: carderne
Repo: triggerdotdev/trigger.dev PR: 4039
File: apps/webapp/app/routes/invite-revoke.tsx:0-0
Timestamp: 2026-06-25T18:21:51.905Z
Learning: During the Zod v4 migration in the triggerdotdev/trigger.dev webapp, ensure any imports from `conform-to/zod` use the Zod-4 subpath: `conform-to/zod/v4` (e.g., `import { parseWithZod } from "conform-to/zod/v4"`). Do not import from the package root `conform-to/zod`, because it is the Zod 3 implementation and may load Zod-3-only symbols (e.g., `ZodBranded`, `ZodEffects`), which can throw at module load (notably with `zod4.4.3`). This should be enforced across `apps/webapp/**/*` where helpers like `parseWithZod` and `conformZodMessage` are used.
Applied to files:
apps/webapp/app/models/runtimeEnvironment.server.ts
📚 Learning: 2026-07-03T17:10:21.498Z
Learnt from: 0ski
Repo: triggerdotdev/trigger.dev PR: 4148
File: apps/webapp/app/models/orgMember.server.ts:149-168
Timestamp: 2026-07-03T17:10:21.498Z
Learning: In triggerdotdev/trigger.dev, `User.email` (Prisma schema: `internal-packages/database/prisma/schema.prisma`) currently does NOT use `citext` and does NOT have a `lower(email)` functional unique index. Therefore, do not introduce Prisma queries like `where: { email: { equals: <value>, mode: "insensitive" } }` (or any case-insensitive lookup) against `User.email`, because it can force sequential scans of the `users` table under load. During review, ensure email is normalized (e.g., lowercased/trimmed) before both writes and subsequent lookups, and if true case-insensitive behavior/uniqueness is required, implement it via a separate app-wide migration (e.g., switch to `citext` and/or add a functional unique index with backfill) rather than bolting it onto individual feature PRs.
Applied to files:
apps/webapp/app/models/runtimeEnvironment.server.ts
📚 Learning: 2026-06-04T18:16:35.386Z
Learnt from: nicktrn
Repo: triggerdotdev/trigger.dev PR: 3836
File: apps/supervisor/src/backpressure/backpressureMonitor.ts:3-5
Timestamp: 2026-06-04T18:16:35.386Z
Learning: When reviewing TypeScript in this repo, apply the rule “prefer type aliases over interfaces” only to data/object shapes and union/intersection type modeling. If an interface is being used as a behavioral contract for collaborators to implement (e.g., method-shape interfaces that define required behavior, such as `BackpressureLogger` / `BackpressureSignalSource` in `apps/supervisor/src/backpressure/backpressureMonitor.ts`), keep it as an `interface` and do not flag it as a type-alias-vs-interface violation.
Applied to files:
apps/webapp/app/models/runtimeEnvironment.server.ts
📚 Learning: 2026-06-09T17:58:04.699Z
Learnt from: 0ski
Repo: triggerdotdev/trigger.dev PR: 3879
File: apps/webapp/app/models/vercelIntegration.server.ts:619-630
Timestamp: 2026-06-09T17:58:04.699Z
Learning: In this codebase, outbound raw `fetch` calls should typically rely on Node/undici’s default request timeout (about ~300s) rather than adding a per-call `AbortController` + `setTimeout` wrapper inside individual functions (e.g. in files like `apps/webapp/app/models/vercelIntegration.server.ts`). During code review, do not flag the absence of a per-call timeout on a single `fetch` as an issue; if per-call timeouts are needed, they should be implemented via a codebase-wide convention (e.g., a shared fetch wrapper or documented pattern) rather than ad-hoc per-function changes.
Applied to files:
apps/webapp/app/models/runtimeEnvironment.server.ts
📚 Learning: 2026-05-14T14:54:39.095Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3545
File: .server-changes/agent-view-sessions.md:10-10
Timestamp: 2026-05-14T14:54:39.095Z
Learning: In the `trigger.dev` repository, do not flag inconsistent dot vs slash notation in route/path strings inside `.server-changes/*.md` files. These markdown files are consumed verbatim into the changelog, so the mixed notation (e.g., `resources.orgs.../runs.$runParam/...`) is intentional and should be preserved as-is.
Applied to files:
.server-changes/run-ops-split-webapp-write-path.md
🪛 LanguageTool
.server-changes/run-ops-split-webapp-write-path.md
[grammar] ~6-~6: Ensure spelling is correct
Context: ...ea: webapp type: feature --- Route the webapp write path — trigger/batch run minting,...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
🔇 Additional comments (2)
.server-changes/run-ops-split-webapp-write-path.md (1)
6-6: LGTM!apps/webapp/app/models/runtimeEnvironment.server.ts (1)
361-375: LGTM!
There was a problem hiding this comment.
Caution
Inline review comments failed to post. This is likely due to GitHub's internal server error or limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.
Actionable comments posted: 1
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 24f52900-ef0f-43d1-973c-1bc57696ef16
📒 Files selected for processing (2)
.server-changes/run-ops-split-webapp-write-path.mdapps/webapp/app/models/runtimeEnvironment.server.ts
📜 Review details
🔇 Additional comments (2)
.server-changes/run-ops-split-webapp-write-path.md (1)
6-6: LGTM!apps/webapp/app/models/runtimeEnvironment.server.ts (1)
361-375: LGTM!
🛑 Comments failed to post (1)
apps/webapp/app/models/runtimeEnvironment.server.ts (1)
5-5: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash # Inspect the resolver implementation to confirm tx-forwarding and return shape fd -t f 'controlPlaneResolver.server.ts' -x cat -n {} # Check downstream consumers of findEnvironmentFromRun for reliance on nested relations rg -nP -A5 -B5 '\bfindEnvironmentFromRun\s*\(' --type=tsRepository: triggerdotdev/trigger.dev
Length of output: 18773
🏁 Script executed:
#!/bin/bash set -euo pipefail echo '--- apps/webapp/app/models/runtimeEnvironment.server.ts (relevant section) ---' sed -n '240,340p' apps/webapp/app/models/runtimeEnvironment.server.ts echo echo '--- type definitions / exports in apps/webapp/app/models/runtimeEnvironment.server.ts ---' rg -n "type EnvironmentFromRun|interface EnvironmentFromRun|export type EnvironmentFromRun|export async function findEnvironmentFromRun|resolveAuthenticatedEnv" apps/webapp/app/models/runtimeEnvironment.server.ts echo echo '--- downstream consumers of EnvironmentFromRun.environment ---' rg -n -A3 -B3 "EnvironmentFromRun|environment\." apps/webapp/app/models -g '*.ts' -g '*.tsx' echo echo '--- callsites of findEnvironmentFromRun ---' rg -n -A4 -B4 "\bfindEnvironmentFromRun\s*\(" apps/webapp -g '*.ts' -g '*.tsx'Repository: triggerdotdev/trigger.dev
Length of output: 18779
🏁 Script executed:
#!/bin/bash set -euo pipefail echo '--- runEngineHandlers usage around findEnvironmentFromRun result ---' sed -n '597,660p' apps/webapp/app/v3/runEngineHandlers.server.ts echo echo '--- authenticated environment shape helpers ---' rg -n -A4 -B4 "type AuthenticatedEnvironment|authIncludeBase|authIncludeWithParent|toAuthenticated" apps/webapp/app/models/runtimeEnvironment.server.tsRepository: triggerdotdev/trigger.dev
Length of output: 6348
Forward
txinto the environment lookup.findRun()respects the caller’s transaction, butcontrolPlaneResolver.resolveAuthenticatedEnv()always uses its own client, so a passed-intxcan still read the environment from a different connection/replica and break read-after-write consistency.
|
Clarifying the We are not globally mocking Redis. The What it actually fixes is a connection storm, not test correctness. ~20+ modules construct ioredis clients at import time (eager singletons: workers, cache stores, pub/sub, rate limiters, …). As the webapp suite has grown, importing the service graph across many test files × forked workers means all those eager clients dial simultaneously. CI has no Redis service, so that's a flood of Separately, a handful of pure-infra singletons (the workers, engine, marqs, socket.io, trace pub/sub, rate limiter, task-metadata cache) are stubbed to no-ops — but only ones no test exercises. Every test that needs real infra builds its own client from the testcontainer ( |
|
Follow-up (non-blocking): the cleaner long-term fix is to make those eager clients |
…#4119) ## What Extends the ClickHouse runs-replication service to fan in from multiple Postgres sources (the control-plane DB and the run-ops DB) instead of a single source, plus the admin operations to run and observe it. - **Multi-source fan-in** (`services/runsReplicationService.server.ts`, new `runsReplicationInstance.server.ts`, `runsReplicationGlobal.server.ts`): factors the replication service into per-source instances and a coordinator so a single ClickHouse target is fed from more than one Postgres source. - **Admin ops** (`routes/admin.api.v1.runs-replication.status.ts`, `admin.api.v1.runs-replication.backfill.ts`, `v3/services/adminWorker.server.ts`): adds a status endpoint reporting per-source replication state and updates the backfill entrypoint for the multi-source shape. ## Why PR7 of the run-ops split stack, and the final piece: once run state can live in a separate run-ops DB (earlier PRs), the analytics replication into ClickHouse has to consume both sources so runs remain queryable regardless of residency. Behavior-changing for the replication service internals; the ClickHouse-facing output is unchanged (still one runs stream), and single-source operation is preserved when the split is not enabled. ## Tests New vitest coverage: `runsReplicationInstance.test.ts` (per-source instance behavior) and `runsReplicationService.part8`/`part9` suites exercising the multi-source coordinator. Testcontainers-backed (ClickHouse + Postgres); no mocks. ## Notes Draft, **stacked on #4118** (`runops/pr06-write-path`). Review that first; this diff is against it. Server-change / changeset note to be added at stack-assembly time. 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
What
Routes the webapp write path through the run-ops split seam: trigger/batch minting, idempotency-key resolution, and the run-lifecycle services now determine residency and dispatch writes to the correct store.
runEngine/services/triggerTask.server.ts,batchTrigger.server.ts,createBatch.server.ts,streamBatchItems.server.ts,v3/services/batchTriggerV3.server.ts): mint ids with the run-ops-aware minting and route creation/streaming through the store; batch children inherit the parent's residency.runEngine/concerns/idempotencyKeys.server.ts+ newidempotencyResidency.server.ts): idempotency-key lookup/dedup is residency-aware so a keyed retrigger resolves against the store that owns the original run.createCheckpoint,createTaskRunAttempt,enqueueDelayedRun,expireEnqueuedRun,finalizeTaskRun,resumeBatchRun,cancelDevSessionRuns,executeTasksWaitingForDeploy,triggerFailedTask): resolve their target run through the store rather than a fixed client.runsRepository+clickhouseRunsRepository,BulkActionV2+ batch read-through, realtimesessions/runReader, alertsdeliverAlert/performTaskRunAlerts): route through the read-through resolver.9535ae63d— resolves the parent run through an injectable run store inTriggerFailedTaskService.bf8f7c881— drops the "known-migrated" concept from write-path and read repos; residency is id-shape only.515b897ea— self-defaultsresolveWaitpointThroughReadThroughto the safe run-ops clients.Why
PR6 of the run-ops split stack. This is the write-path counterpart to the read foundation in the previous PRs: with it in place, both reads and writes route through the seam. Additive when the split is disabled (id-shape resolution collapses to the control-plane client); behavior-changing on the minting, idempotency, and lifecycle paths when enabled.
Tests
Large new/expanded vitest suite under
apps/webapp/test/and colocated service tests: trigger-task and batch-trigger store routing, residency inheritance, idempotency dedup residency + legacy-authority, bulk-action read routing, cancel-dev-session routing, alerts store routing, runs-repository read-through, realtime session/run-reader read-through and stream-registration routing, and the waitpoint read-through default. Testcontainers-backed; no mocks.Notes
Draft, stacked on #4117 (
runops/pr05-webapp-foundation). Review that first; this diff is against it.Server-change / changeset note to be added at stack-assembly time.
🤖 Generated with Claude Code