[otel-advisor] OTel improvement: surface token usage from agent-stdio.log when firewall-proxy logs are absent

### 📡 OTel Instrumentation Improvement: surface token usage from `agent-stdio.log` when firewall-proxy logs are absent

**Analysis Date**: 2026-05-22
**Priority**: High
**Effort**: Small (< 2h)

### Problem

`gen_ai.usage.total_tokens` (and the sibling `gen_ai.usage.input_tokens` / `output_tokens` / cache attributes) are missing from the **majority of `gh-aw.agent.conclusion` spans** in production telemetry. Token data is currently sourced **only** from `/tmp/gh-aw/agent_usage.json`, which is written by `parse_token_usage.cjs` from the firewall proxy log at `/tmp/gh-aw/sandbox/firewall*/logs/api-proxy-logs/token-usage.jsonl`. When the firewall proxy isn't active for a given engine — or when its log path differs from the two hard-coded paths — `agent_usage.json` is never written, and **no token data ever reaches OTel**. The Claude / Copilot stream-json `result` event already carries the same usage data on disk in `agent-stdio.log`, but `readAgentRuntimeMetrics()` in `actions/setup/js/send_otlp_span.cjs` reads only `num_turns`, `total_cost_usd`, `stop_reason`, and `model` from that event — it ignores the `usage` block entirely.

<details>
<summary>Why This Matters (DevOps Perspective)</summary>

Without `gen_ai.usage.*` on at least 2 of every 3 agent.conclusion spans, the following operational questions cannot be answered from telemetry alone:

- "Which workflow consumed the most input/output tokens this week?" — `sum(gen_ai.usage.total_tokens) by gh-aw.workflow.name` undercounts by ~69% across all engines.
- "Are we approaching token quotas for a model?" — model-level token aggregations are unreliable.
- "What's the cost-per-trigger for our scheduled workflows?" — cost dashboards built on `gen_ai.usage.*` show only a sliver of real usage.
- "Is engine X significantly more expensive than engine Y?" — engine comparisons are skewed (pi/gemini show 0 tokens, biasing them as "free").

For an oncall engineer triaging a cost spike or quota incident, the missing data forces them to fall back to GitHub job logs and per-workflow firewall artifacts, dramatically increasing MTTR.

</details>

<details>
<summary>Current Behavior</summary>

In `actions/setup/js/send_otlp_span.cjs`, the agent stdio parser reads only `num_turns`, `total_cost_usd`, `stop_reason`, and `model` from `{"type": "result", ...}` events:

```javascript
// actions/setup/js/send_otlp_span.cjs lines 1555–1567
if (parsed.type !== "result") {
 return;
}

if (typeof parsed.num_turns === "number" && parsed.num_turns >= 0) {
 metrics.turns = parsed.num_turns;
}
if (typeof parsed.total_cost_usd === "number" && Number.isFinite(parsed.total_cost_usd) && parsed.total_cost_usd >= 0) {
 metrics.estimatedCostUsd = parsed.total_cost_usd;
}
if (typeof parsed.stop_reason === "string" && parsed.stop_reason) {
 metrics.stopReason = parsed.stop_reason;
}
```

The Claude / Copilot `result` event additionally carries a `usage` object that the parser ignores:

```jsonc
{
 "type": "result",
 "subtype": "success",
 "num_turns": 12,
 "total_cost_usd": 0.42,
 "usage": {
 "input_tokens": 4120,
 "output_tokens": 870,
 "cache_creation_input_tokens": 1500,
 "cache_read_input_tokens": 2200
 }
}
```

Downstream, `sendJobConclusionSpan` only reads token data from `/tmp/gh-aw/agent_usage.json`:

```javascript
// actions/setup/js/send_otlp_span.cjs lines 2019–2040
const agentUsage = readJSONIfExists("/tmp/gh-aw/agent_usage.json") || {};
const usageAttrs = [];
if (typeof agentUsage.input_tokens === "number" && agentUsage.input_tokens > 0) {
 usageAttrs.push(buildAttr("gen_ai.usage.input_tokens", agentUsage.input_tokens));
}
// ...same for output_tokens, cache_read_tokens, cache_write_tokens, total_tokens
```

When `agent_usage.json` is absent (no firewall proxy log), `usageAttrs` stays empty and no `gen_ai.usage.*` attribute is emitted.

</details>

<details>
<summary>Proposed Change</summary>

Extend `readAgentRuntimeMetrics()` to also extract the `usage` block, and use it as a fallback in `sendJobConclusionSpan` when `agent_usage.json` is missing or has zero counts.

```javascript
// 1) In readAgentRuntimeMetrics (actions/setup/js/send_otlp_span.cjs):
// extend AgentRuntimeMetrics with optional usage fields
// `@property` {number | undefined} inputTokens
// `@property` {number | undefined} outputTokens
// `@property` {number | undefined} cacheReadTokens
// `@property` {number | undefined} cacheWriteTokens

// inside the `if (parsed.type !== "result") return;` block, add:
if (parsed.usage && typeof parsed.usage === "object") {
 const u = parsed.usage;
 if (typeof u.input_tokens === "number" && u.input_tokens >= 0) {
 metrics.inputTokens = u.input_tokens;
 }
 if (typeof u.output_tokens === "number" && u.output_tokens >= 0) {
 metrics.outputTokens = u.output_tokens;
 }
 if (typeof u.cache_read_input_tokens === "number" && u.cache_read_input_tokens >= 0) {
 metrics.cacheReadTokens = u.cache_read_input_tokens;
 }
 if (typeof u.cache_creation_input_tokens === "number" && u.cache_creation_input_tokens >= 0) {
 metrics.cacheWriteTokens = u.cache_creation_input_tokens;
 }
}

// 2) In sendJobConclusionSpan, after `const agentUsage = readJSONIfExists(...) || {};`:
// fall back to runtimeMetrics fields when agent_usage.json lacks the value.
const usage = {
 input_tokens: agentUsage.input_tokens || runtimeMetrics.inputTokens,
 output_tokens: agentUsage.output_tokens || runtimeMetrics.outputTokens,
 cache_read_tokens: agentUsage.cache_read_tokens || runtimeMetrics.cacheReadTokens,
 cache_write_tokens: agentUsage.cache_write_tokens || runtimeMetrics.cacheWriteTokens,
};

// then use `usage.*` in place of `agentUsage.*` when building usageAttrs.
```

The fallback path is non-destructive: when the firewall log is present, `agent_usage.json` wins (preserving today's behavior); when it's absent, the stream-json `result` event fills the gap.

</details>

<details>
<summary>Expected Outcome</summary>

After this change:

- **In Grafana / Tempo / Sentry**: `sum(gen_ai.usage.total_tokens) by gh-aw.workflow.name` becomes meaningful — coverage for engines that emit a stream-json `result` event (Claude, Copilot, Codex) should rise from 28–34% toward 95%+ on successful runs.
- **In the local `/tmp/gh-aw/otel.jsonl` mirror**: agent-job spans on machines without the firewall proxy will carry usage attributes for the first time, enabling offline cost analysis from artifact downloads alone.
- **For on-call engineers**: a single Sentry / Grafana query (`sum(gen_ai.usage.total_tokens)`) answers "which workflow burned tokens?" without cross-referencing per-job firewall artifacts.

</details>

<details>
<summary>Implementation Steps</summary>

- [ ] Extend the `AgentRuntimeMetrics` typedef and `readAgentRuntimeMetrics()` parser in `actions/setup/js/send_otlp_span.cjs` to capture `usage.input_tokens`, `usage.output_tokens`, `usage.cache_read_input_tokens`, `usage.cache_creation_input_tokens` from `{"type": "result", ...}` events.
- [ ] In `sendJobConclusionSpan`, prefer `agent_usage.json` values when present (truthy) but fall back to `runtimeMetrics.*Tokens` when they are missing or zero. Recompute `totalTokens` from the resolved values.
- [ ] Update `actions/setup/js/send_otlp_span.test.cjs` with two new cases:
 1. `agent_usage.json` absent + `agent-stdio.log` contains a `result` event with a `usage` block → conclusion span carries `gen_ai.usage.input_tokens` / `output_tokens` / `total_tokens`.
 2. Both sources present → `agent_usage.json` wins (regression guard).
- [ ] Run `cd actions/setup/js && npx vitest run send_otlp_span` to confirm tests pass.
- [ ] Run `make fmt` and `make test-unit` from the repo root.
- [ ] Open a PR referencing this issue.

</details>

<details>
<summary>Evidence from Live OTel Data (Sentry/Grafana)</summary>

**Sentry — `github / gh-aw`, dataset `spans`, last 7 days**, grouped by `gh-aw.engine.id`:

| engine | spans | spans with `gen_ai.usage.total_tokens > 0` | missing % |
|---|---:|---:|---:|
| copilot | 1,073 | 297 | **72%** |
| claude | 324 | 111 | **66%** |
| codex | 112 | 26 | **77%** |
| pi | 22 | 0 | **100%** |
| gemini | 16 | 0 | **100%** |
| **total** | **1,547** | **434** | **~72%** |

Query:
```
span.name:gh-aw.agent.conclusion
fields: gh-aw.engine.id, count(), count_if(gen_ai.usage.total_tokens, greater, 0)
statsPeriod: 7d
```

**Grafana / Tempo (`grafanacloud-traces`)** — confirms attribute keys: the `span`-scope tag list includes `gh-aw.engine.id`, `gh-aw.workflow.name`, and `gh-aw.action_minutes`, but **does not include** `gh-aw.turns`, `gh-aw.estimated_cost_usd`, or `gen_ai.response.model`. This shows the existing `result`-event derived attributes are also missing — but token data extracted from the same event would still flow through the independent `usageAttrs` path proposed above, even if `result`-event parsing later needs follow-up debugging.

**Representative trace**: `1e395bf7dd92c4e6eee4162ff0b78906` (`gh-aw.activation.setup` → `gh-aw.agent.setup` → `gh-aw.agent.conclusion`). Engine `copilot`, workflow `Daily MCP Tool Concurrency Analysis`. The `gh-aw.agent.conclusion` span carries `gh-aw.run.status=success`, `gh-aw.engine.id=copilot`, `gen_ai.system=github_models` — but no `gen_ai.usage.*` attributes.

</details>

<details>
<summary>Related Files</summary>

- `actions/setup/js/send_otlp_span.cjs` — `readAgentRuntimeMetrics()` (lines 1533–1612), `sendJobConclusionSpan()` token-attribute block (lines 2019–2040)
- `actions/setup/js/send_otlp_span.test.cjs` — add new vitest cases for the fallback path
- `actions/setup/js/parse_token_usage.cjs` — unchanged; remains the preferred source when firewall logs exist
- `actions/setup/js/action_conclusion_otlp.cjs` — unchanged; sends the enriched span

</details>

---

*Generated by the [Daily OTel Instrumentation Advisor](https://github.com/github/gh-aw/actions/runs/26281891692) workflow*







> Generated by [📊 Daily OTel Instrumentation Advisor](https://github.com/github/gh-aw/actions/runs/26281891692) · ● 50.7M · [◷](https://github.com/search?q=repo%3Agithub%2Fgh-aw+is%3Aissue+%22gh-aw-workflow-call-id%3A+github%2Fgh-aw%2Fdaily-otel-instrumentation-advisor%22&type=issues)
> - [x] expires  on May 29, 2026, 10:32 AM UTC

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[otel-advisor] OTel improvement: surface token usage from agent-stdio.log when firewall-proxy logs are absent #33976

📡 OTel Instrumentation Improvement: surface token usage from `agent-stdio.log` when firewall-proxy logs are absent

Problem

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

engine	spans	spans with `gen_ai.usage.total_tokens > 0`	missing %
copilot	1,073	297	72%
claude	324	111	66%
codex	112	26	77%
pi	22	0	100%
gemini	16	0	100%
total	1,547	434	~72%

[otel-advisor] OTel improvement: surface token usage from agent-stdio.log when firewall-proxy logs are absent #33976

Description

📡 OTel Instrumentation Improvement: surface token usage from agent-stdio.log when firewall-proxy logs are absent

Problem

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

📡 OTel Instrumentation Improvement: surface token usage from `agent-stdio.log` when firewall-proxy logs are absent