Executive Summary
Overall health for the last 24h is healthy at the GitHub Actions level but with a masked agent-failure signal. Sentry ingested 23,265 spans for github/gh-aw (org github, project gh-aw) with intact trace continuity. There were no cancellations and no timeouts in telemetry.
The one high-signal finding: 9 distinct runs across 8 workflows recorded gh-aw.run.status:failure (37 spans), yet every one of those runs that I cross-checked against GitHub Actions concluded success. These are agent-level failures (agent conclusion = failure, or agent_output.json carried errors) that are being handled gracefully and never surface as a red workflow run — so they are invisible to anyone watching the Actions tab.
Separately, several core attributes are missing on the emit/export path: native Sentry span.status, release, and gen_ai.response.finish_reasons are absent across all spans, and the errors/logs datasets are empty. This means failures are detectable only via the custom gh-aw.run.status attribute, and truncation/runaway-token detection is currently not possible from traces. Token usage shows no runaway (max 112k, avg 25k).
Top Reliability Findings
| Priority |
Workflow |
Problem |
Evidence |
Next Action |
| P1 |
PR Sous Chef (×2), PR Code Quality Reviewer, Smoke Copilot, LintMonster, Daily SPDD Spec Planner, Dependabot Campaign, Daily Copilot PR Merged Report, GitHub API Consumption Report Agent |
Agent-level gh-aw.run.status:failure masked by green Actions runs |
9 distinct failed runs / 37 spans in 24h; GH Actions conclusion = success for all 6 cross-checked runs (26666085009, 26666424409, 26635441220, 26616608644, 26621553598, 26650707241) |
Surface agent failures in run summary / alerting; decide whether agent failure should fail (or annotate) the workflow run |
| P3 |
All workflows |
Native Sentry span.status empty despite OTLP status.code=2 emitted on failures |
has:span.status → 0 spans; emit side sets statusCode=2 (send_otlp_span.cjs:1820,1856) |
Verify OTLP status.code → Sentry span status mapping in the exporter path |
| P3 |
All workflows |
errors and logs datasets empty |
count() over errors = 0, logs = 0 (24h) |
Confirm whether error/log export is intended; if so, treat spans as sole signal and document it |
| P5 |
All gen_ai spans |
gen_ai.response.finish_reasons not emitted → truncation/runaway undetectable |
has:gen_ai.response.finish_reasons → 0; finish_reasons:length → 0 |
Emit finish reasons when runtime metrics include stopReason |
| P5 |
All workflows |
release null on every span → no version correlation for regressions |
has:release → 23,300 spans all null |
Confirm service.version resource attr → Sentry release mapping (backend-dependent) |
Representative Traces
View representative traces
Confirmed agent-failure class — PR Code Quality Reviewer, run §26666085009
- Trace
fe43fc931bc7945e417ea9446346915d (Sentry)
- Continuity intact:
gh-aw.pre_activation.setup gen_ai spans + gateway.request http spans share one trace and gh-aw.run.id. gh-aw.run.status transitions success → success → failure (final failure span at 22:48:13Z), gh-aw.run.attempt=1, model gpt-5-mini.
- GitHub Actions conclusion for this run:
success (the agent-level failure did not fail the workflow).
Other failed-run traces (one per workflow):
- PR Sous Chef — run §26666424409, trace
9a4b3a826b3b613421fbd8c083834bf6
- PR Sous Chef — run §26626991873, trace
b63bcbc7674f1f96fc47270022de6936
- Smoke Copilot — run
26662945830, trace 8c045220e5f37eceebb61a91d1582670
- Daily SPDD Spec Planner — run
26650707241, trace e92b35d53749987ce4342aa39907b9a5
- Daily Copilot PR Merged Report — run
26648920021, trace a76fc0e382f85506ebc5983ffcd7ce20
- GitHub API Consumption Report Agent — run
26635441220, trace d4e1a41b0d6d323e35d8f57d3ea14a67
- Dependabot Campaign — run
26621553598, trace d12405372bfc4d329fc8eb783edc878e
- LintMonster — run
26616608644, trace b7ac3451e1a35d3317d335a136d2ddf4
Token outlier (not a confirmed problem): trace 04c66c1762536e8e9de01a67836f155c carries the max gen_ai.usage.total_tokens=112,165 at 14:08Z — within model context limits and with no finish_reasons:length, so not a truncation/runaway event.
Recommendations
- Make agent-level failures visible. 9 runs failed at the agent layer but reported green in Actions. Smallest fix: include
gh-aw.run.status + first error message in the run summary/footer; then decide policy on whether agent failure should annotate or fail the run.
- Fix the OTLP
status.code → Sentry span.status mapping. Failures are emitted with statusCode=2 (actions/setup/js/send_otlp_span.cjs:1820,1856) but span.status is empty in Sentry, forcing all failure queries onto the custom gh-aw.run.status attribute. Verify the exporter writes span status.
- Emit
gen_ai.response.finish_reasons when runtime metrics include stopReason, so truncation/runaway can be detected from traces instead of inferred.
- Confirm
release/service.version correlation (and whether errors/logs export is intended) so regressions can be tied to a CLI/service version.
Notes
View notes
- Tooling: This Sentry MCP build exposes
list_events (no search_events/get_trace_details). Per the otel-queries skill, trace continuity was validated via list_events filtered by trace:<id>. Aggregate queries require the sort field to appear in fields (e.g. add count() when sorting -count()).
- Confirmed vs inconclusive: The 9 agent failures are confirmed per emit semantics (
gh-aw.run.status derives from agentConclusion/agent_output.json errors, send_otlp_span.cjs:1839-1858), but are not GitHub Actions run failures — all 6 cross-checked runs concluded success. Treat them as masked agent-layer failures, not broken workflow runs.
- Instrumentation gaps cross-checked on emit side (
actions/setup/js/send_otlp_span.cjs): gh-aw.workflow.name present and correct; OTLP status.code/gh-aw.run.status present (but native span.status not mapped into Sentry); finish_reasons gated on stopReason and currently absent; release mapped from resource service.version and null in this backend.
- No timeouts or cancellations appeared in
gh-aw.run.status (only success/failure), and errors/logs datasets returned 0 events — stated explicitly as a companion-check result, not skipped.
- Distinct-run accounting:
count_unique(gh-aw.run.id) = 387 success runs, 9 failure runs (~2.3% agent-failure rate). Failure spans map to 9 unique traces/run IDs, verified individually.
References:
Generated by 🚨 Daily Reliability Review · opus48 1.9M · ◷
Executive Summary
Overall health for the last 24h is healthy at the GitHub Actions level but with a masked agent-failure signal. Sentry ingested 23,265 spans for
github/gh-aw(orggithub, projectgh-aw) with intact trace continuity. There were no cancellations and no timeouts in telemetry.The one high-signal finding: 9 distinct runs across 8 workflows recorded
gh-aw.run.status:failure(37 spans), yet every one of those runs that I cross-checked against GitHub Actions concludedsuccess. These are agent-level failures (agent conclusion = failure, oragent_output.jsoncarried errors) that are being handled gracefully and never surface as a red workflow run — so they are invisible to anyone watching the Actions tab.Separately, several core attributes are missing on the emit/export path: native Sentry
span.status,release, andgen_ai.response.finish_reasonsare absent across all spans, and theerrors/logsdatasets are empty. This means failures are detectable only via the customgh-aw.run.statusattribute, and truncation/runaway-token detection is currently not possible from traces. Token usage shows no runaway (max 112k, avg 25k).Top Reliability Findings
gh-aw.run.status:failuremasked by green Actions runssuccessfor all 6 cross-checked runs (26666085009,26666424409,26635441220,26616608644,26621553598,26650707241)span.statusempty despite OTLP status.code=2 emitted on failureshas:span.status→ 0 spans; emit side setsstatusCode=2(send_otlp_span.cjs:1820,1856)status.code→ Sentry span status mapping in the exporter patherrorsandlogsdatasets emptycount()overerrors= 0,logs= 0 (24h)gen_ai.response.finish_reasonsnot emitted → truncation/runaway undetectablehas:gen_ai.response.finish_reasons→ 0;finish_reasons:length→ 0stopReasonreleasenull on every span → no version correlation for regressionshas:release→ 23,300 spans all nullservice.versionresource attr → Sentryreleasemapping (backend-dependent)Representative Traces
View representative traces
Confirmed agent-failure class — PR Code Quality Reviewer, run §26666085009
fe43fc931bc7945e417ea9446346915d(Sentry)gh-aw.pre_activation.setupgen_ai spans +gateway.requesthttp spans share one trace andgh-aw.run.id.gh-aw.run.statustransitionssuccess → success → failure(final failure span at22:48:13Z),gh-aw.run.attempt=1, modelgpt-5-mini.success(the agent-level failure did not fail the workflow).Other failed-run traces (one per workflow):
9a4b3a826b3b613421fbd8c083834bf6b63bcbc7674f1f96fc47270022de693626662945830, trace8c045220e5f37eceebb61a91d158267026650707241, tracee92b35d53749987ce4342aa39907b9a526648920021, tracea76fc0e382f85506ebc5983ffcd7ce2026635441220, traced4e1a41b0d6d323e35d8f57d3ea14a6726621553598, traced12405372bfc4d329fc8eb783edc878e26616608644, traceb7ac3451e1a35d3317d335a136d2ddf4Token outlier (not a confirmed problem): trace
04c66c1762536e8e9de01a67836f155ccarries the maxgen_ai.usage.total_tokens=112,165at14:08Z— within model context limits and with nofinish_reasons:length, so not a truncation/runaway event.Recommendations
gh-aw.run.status+ first error message in the run summary/footer; then decide policy on whether agent failure should annotate or fail the run.status.code→ Sentryspan.statusmapping. Failures are emitted withstatusCode=2(actions/setup/js/send_otlp_span.cjs:1820,1856) butspan.statusis empty in Sentry, forcing all failure queries onto the customgh-aw.run.statusattribute. Verify the exporter writes span status.gen_ai.response.finish_reasonswhen runtime metrics includestopReason, so truncation/runaway can be detected from traces instead of inferred.release/service.versioncorrelation (and whethererrors/logsexport is intended) so regressions can be tied to a CLI/service version.Notes
View notes
list_events(nosearch_events/get_trace_details). Per the otel-queries skill, trace continuity was validated vialist_eventsfiltered bytrace:<id>. Aggregate queries require the sort field to appear infields(e.g. addcount()when sorting-count()).gh-aw.run.statusderives fromagentConclusion/agent_output.jsonerrors,send_otlp_span.cjs:1839-1858), but are not GitHub Actions run failures — all 6 cross-checked runs concludedsuccess. Treat them as masked agent-layer failures, not broken workflow runs.actions/setup/js/send_otlp_span.cjs):gh-aw.workflow.namepresent and correct; OTLPstatus.code/gh-aw.run.statuspresent (but nativespan.statusnot mapped into Sentry);finish_reasonsgated onstopReasonand currently absent;releasemapped from resourceservice.versionand null in this backend.gh-aw.run.status(onlysuccess/failure), anderrors/logsdatasets returned 0 events — stated explicitly as a companion-check result, not skipped.count_unique(gh-aw.run.id)= 387 success runs, 9 failure runs (~2.3% agent-failure rate). Failure spans map to 9 unique traces/run IDs, verified individually.References: