Skip to content

[reliability] Daily Reliability Review - 2026-05-29 #35816

@github-actions

Description

@github-actions

Executive Summary

Overall health for the last 24h is healthy at the GitHub Actions level but with a masked agent-failure signal. Sentry ingested 23,265 spans for github/gh-aw (org github, project gh-aw) with intact trace continuity. There were no cancellations and no timeouts in telemetry.

The one high-signal finding: 9 distinct runs across 8 workflows recorded gh-aw.run.status:failure (37 spans), yet every one of those runs that I cross-checked against GitHub Actions concluded success. These are agent-level failures (agent conclusion = failure, or agent_output.json carried errors) that are being handled gracefully and never surface as a red workflow run — so they are invisible to anyone watching the Actions tab.

Separately, several core attributes are missing on the emit/export path: native Sentry span.status, release, and gen_ai.response.finish_reasons are absent across all spans, and the errors/logs datasets are empty. This means failures are detectable only via the custom gh-aw.run.status attribute, and truncation/runaway-token detection is currently not possible from traces. Token usage shows no runaway (max 112k, avg 25k).

Top Reliability Findings

Priority Workflow Problem Evidence Next Action
P1 PR Sous Chef (×2), PR Code Quality Reviewer, Smoke Copilot, LintMonster, Daily SPDD Spec Planner, Dependabot Campaign, Daily Copilot PR Merged Report, GitHub API Consumption Report Agent Agent-level gh-aw.run.status:failure masked by green Actions runs 9 distinct failed runs / 37 spans in 24h; GH Actions conclusion = success for all 6 cross-checked runs (26666085009, 26666424409, 26635441220, 26616608644, 26621553598, 26650707241) Surface agent failures in run summary / alerting; decide whether agent failure should fail (or annotate) the workflow run
P3 All workflows Native Sentry span.status empty despite OTLP status.code=2 emitted on failures has:span.status → 0 spans; emit side sets statusCode=2 (send_otlp_span.cjs:1820,1856) Verify OTLP status.code → Sentry span status mapping in the exporter path
P3 All workflows errors and logs datasets empty count() over errors = 0, logs = 0 (24h) Confirm whether error/log export is intended; if so, treat spans as sole signal and document it
P5 All gen_ai spans gen_ai.response.finish_reasons not emitted → truncation/runaway undetectable has:gen_ai.response.finish_reasons → 0; finish_reasons:length → 0 Emit finish reasons when runtime metrics include stopReason
P5 All workflows release null on every span → no version correlation for regressions has:release → 23,300 spans all null Confirm service.version resource attr → Sentry release mapping (backend-dependent)

Representative Traces

View representative traces

Confirmed agent-failure class — PR Code Quality Reviewer, run §26666085009

  • Trace fe43fc931bc7945e417ea9446346915d (Sentry)
  • Continuity intact: gh-aw.pre_activation.setup gen_ai spans + gateway.request http spans share one trace and gh-aw.run.id. gh-aw.run.status transitions success → success → failure (final failure span at 22:48:13Z), gh-aw.run.attempt=1, model gpt-5-mini.
  • GitHub Actions conclusion for this run: success (the agent-level failure did not fail the workflow).

Other failed-run traces (one per workflow):

  • PR Sous Chef — run §26666424409, trace 9a4b3a826b3b613421fbd8c083834bf6
  • PR Sous Chef — run §26626991873, trace b63bcbc7674f1f96fc47270022de6936
  • Smoke Copilot — run 26662945830, trace 8c045220e5f37eceebb61a91d1582670
  • Daily SPDD Spec Planner — run 26650707241, trace e92b35d53749987ce4342aa39907b9a5
  • Daily Copilot PR Merged Report — run 26648920021, trace a76fc0e382f85506ebc5983ffcd7ce20
  • GitHub API Consumption Report Agent — run 26635441220, trace d4e1a41b0d6d323e35d8f57d3ea14a67
  • Dependabot Campaign — run 26621553598, trace d12405372bfc4d329fc8eb783edc878e
  • LintMonster — run 26616608644, trace b7ac3451e1a35d3317d335a136d2ddf4

Token outlier (not a confirmed problem): trace 04c66c1762536e8e9de01a67836f155c carries the max gen_ai.usage.total_tokens=112,165 at 14:08Z — within model context limits and with no finish_reasons:length, so not a truncation/runaway event.

Recommendations

  1. Make agent-level failures visible. 9 runs failed at the agent layer but reported green in Actions. Smallest fix: include gh-aw.run.status + first error message in the run summary/footer; then decide policy on whether agent failure should annotate or fail the run.
  2. Fix the OTLP status.code → Sentry span.status mapping. Failures are emitted with statusCode=2 (actions/setup/js/send_otlp_span.cjs:1820,1856) but span.status is empty in Sentry, forcing all failure queries onto the custom gh-aw.run.status attribute. Verify the exporter writes span status.
  3. Emit gen_ai.response.finish_reasons when runtime metrics include stopReason, so truncation/runaway can be detected from traces instead of inferred.
  4. Confirm release/service.version correlation (and whether errors/logs export is intended) so regressions can be tied to a CLI/service version.

Notes

View notes
  • Tooling: This Sentry MCP build exposes list_events (no search_events/get_trace_details). Per the otel-queries skill, trace continuity was validated via list_events filtered by trace:<id>. Aggregate queries require the sort field to appear in fields (e.g. add count() when sorting -count()).
  • Confirmed vs inconclusive: The 9 agent failures are confirmed per emit semantics (gh-aw.run.status derives from agentConclusion/agent_output.json errors, send_otlp_span.cjs:1839-1858), but are not GitHub Actions run failures — all 6 cross-checked runs concluded success. Treat them as masked agent-layer failures, not broken workflow runs.
  • Instrumentation gaps cross-checked on emit side (actions/setup/js/send_otlp_span.cjs): gh-aw.workflow.name present and correct; OTLP status.code/gh-aw.run.status present (but native span.status not mapped into Sentry); finish_reasons gated on stopReason and currently absent; release mapped from resource service.version and null in this backend.
  • No timeouts or cancellations appeared in gh-aw.run.status (only success/failure), and errors/logs datasets returned 0 events — stated explicitly as a companion-check result, not skipped.
  • Distinct-run accounting: count_unique(gh-aw.run.id) = 387 success runs, 9 failure runs (~2.3% agent-failure rate). Failure spans map to 9 unique traces/run IDs, verified individually.

References:

Generated by 🚨 Daily Reliability Review · opus48 1.9M ·

  • expires on May 31, 2026, 11:24 PM UTC

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions