[reliability] Daily Reliability Review - 2026-05-29

### Executive Summary

Overall health for the last 24h is **healthy at the GitHub Actions level but with a masked agent-failure signal**. Sentry ingested **23,265 spans** for `github/gh-aw` (org `github`, project `gh-aw`) with intact trace continuity. There were **no cancellations and no timeouts** in telemetry.

The one high-signal finding: **9 distinct runs across 8 workflows recorded `gh-aw.run.status:failure`** (37 spans), yet every one of those runs that I cross-checked against GitHub Actions concluded **`success`**. These are agent-level failures (agent conclusion = failure, or `agent_output.json` carried errors) that are being handled gracefully and never surface as a red workflow run — so they are invisible to anyone watching the Actions tab.

Separately, several core attributes are missing on the emit/export path: native Sentry `span.status`, `release`, and `gen_ai.response.finish_reasons` are absent across all spans, and the `errors`/`logs` datasets are empty. This means failures are detectable *only* via the custom `gh-aw.run.status` attribute, and truncation/runaway-token detection is currently not possible from traces. Token usage shows no runaway (max 112k, avg 25k).

### Top Reliability Findings

| Priority | Workflow | Problem | Evidence | Next Action |
| --- | --- | --- | --- | --- |
| P1 | PR Sous Chef (×2), PR Code Quality Reviewer, Smoke Copilot, LintMonster, Daily SPDD Spec Planner, Dependabot Campaign, Daily Copilot PR Merged Report, GitHub API Consumption Report Agent | Agent-level `gh-aw.run.status:failure` masked by green Actions runs | 9 distinct failed runs / 37 spans in 24h; GH Actions conclusion = `success` for all 6 cross-checked runs (`26666085009`, `26666424409`, `26635441220`, `26616608644`, `26621553598`, `26650707241`) | Surface agent failures in run summary / alerting; decide whether agent failure should fail (or annotate) the workflow run |
| P3 | All workflows | Native Sentry `span.status` empty despite OTLP status.code=2 emitted on failures | `has:span.status` → 0 spans; emit side sets `statusCode=2` (`send_otlp_span.cjs:1820,1856`) | Verify OTLP `status.code` → Sentry span status mapping in the exporter path |
| P3 | All workflows | `errors` and `logs` datasets empty | `count()` over `errors` = 0, `logs` = 0 (24h) | Confirm whether error/log export is intended; if so, treat spans as sole signal and document it |
| P5 | All gen_ai spans | `gen_ai.response.finish_reasons` not emitted → truncation/runaway undetectable | `has:gen_ai.response.finish_reasons` → 0; `finish_reasons:length` → 0 | Emit finish reasons when runtime metrics include `stopReason` |
| P5 | All workflows | `release` null on every span → no version correlation for regressions | `has:release` → 23,300 spans all null | Confirm `service.version` resource attr → Sentry `release` mapping (backend-dependent) |

### Representative Traces
<details>
<summary>View representative traces</summary>

**Confirmed agent-failure class** — PR Code Quality Reviewer, run [§26666085009](https://github.com/github/gh-aw/actions/runs/26666085009)
- Trace `fe43fc931bc7945e417ea9446346915d` ([Sentry](https://github.sentry.io/explore/traces/trace/fe43fc931bc7945e417ea9446346915d))
- Continuity intact: `gh-aw.pre_activation.setup` gen_ai spans + `gateway.request` http spans share one trace and `gh-aw.run.id`. `gh-aw.run.status` transitions `success → success → failure` (final failure span at `22:48:13Z`), `gh-aw.run.attempt=1`, model `gpt-5-mini`.
- GitHub Actions conclusion for this run: **`success`** (the agent-level failure did not fail the workflow).

**Other failed-run traces (one per workflow):**
- PR Sous Chef — run [§26666424409](https://github.com/github/gh-aw/actions/runs/26666424409), trace `9a4b3a826b3b613421fbd8c083834bf6`
- PR Sous Chef — run [§26626991873](https://github.com/github/gh-aw/actions/runs/26626991873), trace `b63bcbc7674f1f96fc47270022de6936`
- Smoke Copilot — run `26662945830`, trace `8c045220e5f37eceebb61a91d1582670`
- Daily SPDD Spec Planner — run `26650707241`, trace `e92b35d53749987ce4342aa39907b9a5`
- Daily Copilot PR Merged Report — run `26648920021`, trace `a76fc0e382f85506ebc5983ffcd7ce20`
- GitHub API Consumption Report Agent — run `26635441220`, trace `d4e1a41b0d6d323e35d8f57d3ea14a67`
- Dependabot Campaign — run `26621553598`, trace `d12405372bfc4d329fc8eb783edc878e`
- LintMonster — run `26616608644`, trace `b7ac3451e1a35d3317d335a136d2ddf4`

**Token outlier (not a confirmed problem):** trace `04c66c1762536e8e9de01a67836f155c` carries the max `gen_ai.usage.total_tokens=112,165` at `14:08Z` — within model context limits and with no `finish_reasons:length`, so not a truncation/runaway event.

</details>

### Recommendations

1. **Make agent-level failures visible.** 9 runs failed at the agent layer but reported green in Actions. Smallest fix: include `gh-aw.run.status` + first error message in the run summary/footer; then decide policy on whether agent failure should annotate or fail the run.
2. **Fix the OTLP `status.code` → Sentry `span.status` mapping.** Failures are emitted with `statusCode=2` (`actions/setup/js/send_otlp_span.cjs:1820,1856`) but `span.status` is empty in Sentry, forcing all failure queries onto the custom `gh-aw.run.status` attribute. Verify the exporter writes span status.
3. **Emit `gen_ai.response.finish_reasons`** when runtime metrics include `stopReason`, so truncation/runaway can be detected from traces instead of inferred.
4. **Confirm `release`/`service.version` correlation** (and whether `errors`/`logs` export is intended) so regressions can be tied to a CLI/service version.

### Notes
<details>
<summary>View notes</summary>

- **Tooling:** This Sentry MCP build exposes `list_events` (no `search_events`/`get_trace_details`). Per the otel-queries skill, trace continuity was validated via `list_events` filtered by `trace:<id>`. Aggregate queries require the sort field to appear in `fields` (e.g. add `count()` when sorting `-count()`).
- **Confirmed vs inconclusive:** The 9 agent failures are *confirmed* per emit semantics (`gh-aw.run.status` derives from `agentConclusion`/`agent_output.json` errors, `send_otlp_span.cjs:1839-1858`), but are *not* GitHub Actions run failures — all 6 cross-checked runs concluded `success`. Treat them as masked agent-layer failures, not broken workflow runs.
- **Instrumentation gaps cross-checked on emit side** (`actions/setup/js/send_otlp_span.cjs`): `gh-aw.workflow.name` present and correct; OTLP `status.code`/`gh-aw.run.status` present (but native `span.status` not mapped into Sentry); `finish_reasons` gated on `stopReason` and currently absent; `release` mapped from resource `service.version` and null in this backend.
- **No timeouts or cancellations** appeared in `gh-aw.run.status` (only `success`/`failure`), and `errors`/`logs` datasets returned 0 events — stated explicitly as a companion-check result, not skipped.
- **Distinct-run accounting:** `count_unique(gh-aw.run.id)` = 387 success runs, 9 failure runs (~2.3% agent-failure rate). Failure spans map to 9 unique traces/run IDs, verified individually.

**References:**
- [§26666085009](https://github.com/github/gh-aw/actions/runs/26666085009)
- [§26666424409](https://github.com/github/gh-aw/actions/runs/26666424409)
- [§26635441220](https://github.com/github/gh-aw/actions/runs/26635441220)

</details>







> Generated by [🚨 Daily Reliability Review](https://github.com/github/gh-aw/actions/runs/26666983015) · opus48 1.9M · [◷](https://github.com/search?q=repo%3Agithub%2Fgh-aw+is%3Aissue+%22gh-aw-workflow-call-id%3A+github%2Fgh-aw%2Fdaily-reliability-review%22&type=issues)
> - [x] expires  on May 31, 2026, 11:24 PM UTC

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[reliability] Daily Reliability Review - 2026-05-29 #35816

Executive Summary

Top Reliability Findings

Representative Traces

Recommendations

Notes

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Priority	Workflow	Problem	Evidence	Next Action
P1	PR Sous Chef (×2), PR Code Quality Reviewer, Smoke Copilot, LintMonster, Daily SPDD Spec Planner, Dependabot Campaign, Daily Copilot PR Merged Report, GitHub API Consumption Report Agent	Agent-level `gh-aw.run.status:failure` masked by green Actions runs	9 distinct failed runs / 37 spans in 24h; GH Actions conclusion = `success` for all 6 cross-checked runs (`26666085009`, `26666424409`, `26635441220`, `26616608644`, `26621553598`, `26650707241`)	Surface agent failures in run summary / alerting; decide whether agent failure should fail (or annotate) the workflow run
P3	All workflows	Native Sentry `span.status` empty despite OTLP status.code=2 emitted on failures	`has:span.status` → 0 spans; emit side sets `statusCode=2` (`send_otlp_span.cjs:1820,1856`)	Verify OTLP `status.code` → Sentry span status mapping in the exporter path
P3	All workflows	`errors` and `logs` datasets empty	`count()` over `errors` = 0, `logs` = 0 (24h)	Confirm whether error/log export is intended; if so, treat spans as sole signal and document it
P5	All gen_ai spans	`gen_ai.response.finish_reasons` not emitted → truncation/runaway undetectable	`has:gen_ai.response.finish_reasons` → 0; `finish_reasons:length` → 0	Emit finish reasons when runtime metrics include `stopReason`
P5	All workflows	`release` null on every span → no version correlation for regressions	`has:release` → 23,300 spans all null	Confirm `service.version` resource attr → Sentry `release` mapping (backend-dependent)

[reliability] Daily Reliability Review - 2026-05-29 #35816

Description

Executive Summary

Top Reliability Findings

Representative Traces

Recommendations

Notes

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions