[otel-advisor] OTel improvement: normalize INPUT_JOB_NAME in conclusion post step so gh-aw.agent.conclusion spans carry GenAI attributes

### 📡 OTel Instrumentation Improvement: Restore agent-conclusion GenAI attributes by mirroring the setup-step env normalization

**Analysis Date**: 2026-05-26 
**Priority**: High 
**Effort**: Small (< 2h)

### Problem

Every `gh-aw.agent.conclusion` span emitted in the last 24 hours — across Matt Pocock Skills Reviewer, Test Quality Sentinel, Design Decision Gate, PR Sous Chef, Issue Monster, Daily Malicious Code Scan, Spec Librarian, CI Coach, Smoke CI, and PR Code Quality Reviewer — is missing the attributes that distinguish an agent (LLM) job from a non-agent job:

| Attribute | Expected on `agent` job? | Observed on `gh-aw.agent.conclusion`? |
|---|---|---|
| `gh-aw.job.name` | yes | ❌ missing |
| `gen_ai.operation.name` | yes (`chat`) | ❌ missing |
| `gen_ai.response.model` | yes | ❌ missing |
| `gen_ai.response.finish_reasons` | yes (always, with `unknown` fallback) | ❌ missing |
| `gen_ai.usage.input_tokens` / `output_tokens` / `total_tokens` | yes | ❌ missing |
| `gh-aw.turns`, `gh-aw.estimated_cost_usd` | yes | ❌ missing |
| dedicated `gh-aw.agent.agent` (LLM) sub-span | yes | ❌ zero such spans in last 14 days |
| `gh-aw.effective_tokens` (NOT gated on jobName) | yes | ✅ present |
| `gen_ai.request.model` (NOT gated on jobName) | yes | ✅ present |

Every missing field above is gated on `if (jobName === "agent")` (or its derivative `hasDedicatedAgentSpan`) inside `sendJobConclusionSpan` (`actions/setup/js/send_otlp_span.cjs:1790,1918-1931,2124,2165`). Every field still present is **not** gated on `jobName`. This is a deterministic signature: `sendJobConclusionSpan` is reading `jobName = ""` for the agent-job's post step, while the matching `gh-aw.agent.setup` span on the same job correctly carries `gh-aw.job.name: agent`. Comparison of 2026-05-21 vs 2026-05-26 traces confirms this is a regression — token data used to flow through.

<details>
<summary>Why This Matters (DevOps Perspective)</summary>

Agent-job observability is the highest-value telemetry this repo emits: it is the only signal that tells an on-call engineer whether the LLM stopped early, ran for too many turns, burned tokens beyond budget, or chose the wrong model. With these fields absent on `gh-aw.agent.conclusion`:

- Backend dashboards filtering on `gen_ai.operation.name:chat` exclude every gh-aw agent span and show zero LLM activity.
- `sum(gen_ai.usage.total_tokens)` by workflow returns nothing for agent runs (only the duplicated copy on `safe_outputs.conclusion` survives, which inflates downstream-job tokens and undercounts agent tokens).
- The dedicated `gh-aw.agent.agent` sub-span — designed to measure pure LLM latency excluding setup overhead — is never emitted, so p95 LLM duration alerting is impossible.
- Alerts keyed on `gen_ai.response.finish_reasons:length` (truncated outputs) silently never fire.

For incident triage, this is the difference between “LLM failed” being one query away vs. requiring artifact downloads.

</details>

<details>
<summary>Current Behavior</summary>

The setup post step normalizes the env var so `sendJobSetupSpan` (which reads `process.env.INPUT_JOB_NAME` directly) always sees the value, even on runner versions that preserve the hyphen form `INPUT_JOB-NAME`:

```javascript
// actions/setup/js/action_setup_otlp.cjs:81-87
const inputJobName = getActionInput("JOB_NAME");
if (inputJobName) {
 process.env.INPUT_JOB_NAME = inputJobName;
}
if (inputParentSpanId) {
 process.env.INPUT_PARENT_SPAN_ID = inputParentSpanId;
}
```

The conclusion post step does **not** do the same normalization:

```javascript
// actions/setup/js/action_conclusion_otlp.cjs:72-86
async function run() {
 const endpoints = process.env.GH_AW_OTLP_ENDPOINTS;
 const spanName = buildSpanName(getActionInput("JOB_NAME")); // ← resolves hyphen form for the span NAME
 const startMs = parseJobStartMs(process.env.GITHUB_AW_OTEL_JOB_START_MS);

 if (endpoints) {
 console.log(`[otlp] sending conclusion span "${spanName}" to configured endpoints`);
 } else {
 console.log("[otlp] GH_AW_OTLP_ENDPOINTS not set, skipping OTLP export (will attempt JSONL mirror)");
 }

 await sendOtlpSpan.sendJobConclusionSpan(spanName, { startMs });
 // ← spanName="gh-aw.agent.conclusion" is correct, but inside sendJobConclusionSpan
 // `const jobName = process.env.INPUT_JOB_NAME || ""` returns "" when only the
 // hyphen form is set, so every `if (jobName === "agent")` branch is skipped.
}
```

Inside `sendJobConclusionSpan` (`send_otlp_span.cjs:1790`):

```javascript
const jobName = process.env.INPUT_JOB_NAME || ""; // ← only checks underscore form
// ...
if (jobName) attributes.push(buildAttr("gh-aw.job.name", jobName));
if (jobName === "agent") {
 attributes.push(buildAttr("gen_ai.operation.name", "chat"));
 // gen_ai.response.model, gen_ai.response.finish_reasons, etc.
}
const hasDedicatedAgentSpan = jobName === "agent" && /* ... */;
if (!hasDedicatedAgentSpan && jobName === "agent") {
 attributes.push(...usageAttrs);
}
```

The setup span survives this same pattern only because `action_setup_otlp.cjs` already promotes the hyphen form into the underscore form before invoking `sendJobSetupSpan`.

</details>

<details>
<summary>Proposed Change</summary>

Mirror the setup-step normalization in `actions/setup/js/action_conclusion_otlp.cjs`:

```javascript
// actions/setup/js/action_conclusion_otlp.cjs (inside run(), before sendJobConclusionSpan)
async function run() {
 const endpoints = process.env.GH_AW_OTLP_ENDPOINTS;

 // Normalize to the canonical underscore form so sendJobConclusionSpan
 // (which reads process.env.INPUT_JOB_NAME directly) always finds the value,
 // matching the normalization done by action_setup_otlp.cjs at setup time.
 const inputJobName = getActionInput("JOB_NAME");
 if (inputJobName) {
 process.env.INPUT_JOB_NAME = inputJobName;
 }

 const spanName = buildSpanName(inputJobName);
 const startMs = parseJobStartMs(process.env.GITHUB_AW_OTEL_JOB_START_MS);
 // ...rest unchanged
 await sendOtlpSpan.sendJobConclusionSpan(spanName, { startMs });
}
```

This is the minimal, behavior-preserving fix and stays consistent with the existing setup-step pattern. A follow-up cleanup could replace the direct `process.env.INPUT_JOB_NAME` reads inside `sendJobSetupSpan` and `sendJobConclusionSpan` with `getActionInput("JOB_NAME")` so the two paths can't drift again — but that is a larger refactor and not required for the regression fix.

</details>

<details>
<summary>Expected Outcome</summary>

After this change, `gh-aw.agent.conclusion` spans (and the dedicated `gh-aw.agent.agent` LLM span) will once again carry:

- In Grafana / Sentry / Honeycomb / Datadog: queryable `gen_ai.operation.name:chat`, `gen_ai.usage.total_tokens`, `gen_ai.response.finish_reasons:length` (truncation detection), `gen_ai.response.model`, `gh-aw.turns`, `gh-aw.estimated_cost_usd`, and `gh-aw.job.name:agent` for filtering. Sum-over-workflow dashboards will stop missing agent-job tokens.
- A dedicated `gh-aw.agent.agent` span (CLIENT kind) per agent execution, allowing p95 LLM-latency alerting separately from total job duration.
- In the JSONL mirror (`/tmp/gh-aw/otel.jsonl`): the same attributes survive, so artifact-only debugging is also restored.
- For on-call: a single query `span.name:gh-aw.agent.conclusion AND gen_ai.response.finish_reasons:length` becomes sufficient to detect every truncated agent response across all workflows.

</details>

<details>
<summary>Implementation Steps</summary>

- [ ] Edit `actions/setup/js/action_conclusion_otlp.cjs`: copy the 3-line `INPUT_JOB_NAME` normalization from `action_setup_otlp.cjs:81-84` into the top of `run()`, before constructing `spanName`.
- [ ] Extend `actions/setup/js/action_conclusion_otlp.test.cjs`: add a test that sets only `process.env["INPUT_JOB-NAME"] = "agent"` (hyphen form), calls `run()`, and asserts that `process.env.INPUT_JOB_NAME === "agent"` after the call. This locks in the contract that `sendJobConclusionSpan` will see the canonical form.
- [ ] Run `cd actions/setup/js && npx vitest run action_conclusion_otlp.test.cjs send_otlp_span.test.cjs` to confirm tests pass.
- [ ] Run `make fmt` to ensure formatting.
- [ ] Open a PR referencing this issue.

</details>

<details>
<summary>Evidence from Live OTel Data (Sentry/Grafana)</summary>

**Grafana Tempo — full trace `4655a6296a6d36b45a1a62a328bbe244` (Matt Pocock Skills Reviewer, run 26455647757, 2026-05-26T14:50:27Z):**

The `gh-aw.agent.conclusion` span (id=`0a9f22a916c26aa2`, parent=`b867e64153c0916a` which is `gh-aw.agent.setup`) has:

- `gen_ai.request.model: claude-sonnet-4.6` ✅ (set unconditionally from `awInfo.model`)
- `gh-aw.effective_tokens: 1219039` ✅ (set unconditionally when > 0)
- `gh-aw.action_minutes: 3.874` ✅
- `gh-aw.run.status: success` ✅
- `gh-aw.job.name`: ❌ **absent** (gated on `if (jobName)`)
- `gen_ai.operation.name`: ❌ **absent** (gated on `if (jobName === "agent")`)
- `gen_ai.response.model`: ❌ **absent**
- `gen_ai.response.finish_reasons`: ❌ **absent**
- `gen_ai.usage.*`: ❌ **absent**
- `gh-aw.turns`, `gh-aw.estimated_cost_usd`: ❌ **absent**

The matching `gh-aw.agent.setup` span (id=`b867e64153c0916a`) in the same trace **does** carry `gh-aw.job.name: agent`, confirming the asymmetry is in the post-step path only.

**Sentry spans dataset — query `span.name:gh-aw.agent.conclusion timestamp:>2026-05-25`** returns 14+ spans across multiple workflows (Matt Pocock Skills Reviewer, Test Quality Sentinel, Design Decision Gate, PR Sous Chef, Issue Monster, Daily Malicious Code Scan, Spec Librarian, CI Coach, Smoke CI). **All** are missing the jobName-gated attributes above.

**Regression evidence:** Sentry query `span.name:gh-aw.agent.conclusion gen_ai.usage.total_tokens:>0` returns matches from 2026-05-21 (`gen_ai.usage.total_tokens` present, e.g. 682735, 2116561, 1269362) but **zero** matches from 2026-05-25 onward.

**Cross-backend consistency:** Sentry and Grafana Tempo agree on the attribute set — the omission is at the emit side, not an ingestion gap.

**Connectivity checks performed:** Sentry `whoami` ✅, Sentry `find_organizations` ✅ (github), Sentry `find_projects` ✅ (gh-aw, project id 4511347087179777), Grafana `list_datasources` ✅, Grafana `tempo_traceql-search` ✅, Grafana `tempo_get-trace` ✅. The Sentry MCP build available here does not expose `search_events`, so `list_events` with explicit `fields` was used throughout — this is a tool limitation, not a data gap.

</details>

<details>
<summary>Related Files</summary>

- `actions/setup/js/action_conclusion_otlp.cjs` (the fix — add 3-line normalization in `run()`)
- `actions/setup/js/action_setup_otlp.cjs` (reference implementation — lines 81-84)
- `actions/setup/js/send_otlp_span.cjs` (consumer that reads `process.env.INPUT_JOB_NAME` directly at line 1790; jobName-gated branches at lines 1881, 1918-1931, 2124, 2165)
- `actions/setup/js/action_conclusion_otlp.test.cjs` (extend with a hyphen-form normalization test)
- `actions/setup/js/action_input_utils.cjs` (provides `getActionInput` which already handles both forms)

</details>

---

*Generated by the [Daily OTel Instrumentation Advisor](https://github.com/github/gh-aw/actions/runs/26456717308) workflow*







> Generated by [📊 Daily OTel Instrumentation Advisor](https://github.com/github/gh-aw/actions/runs/26456717308) · opus47 34.3M · [◷](https://github.com/search?q=repo%3Agithub%2Fgh-aw+is%3Aissue+%22gh-aw-workflow-call-id%3A+github%2Fgh-aw%2Fdaily-otel-instrumentation-advisor%22&type=issues)
> - [x] expires  on Jun 2, 2026, 3:21 PM UTC

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[otel-advisor] OTel improvement: normalize INPUT_JOB_NAME in conclusion post step so gh-aw.agent.conclusion spans carry GenAI attributes #34952

📡 OTel Instrumentation Improvement: Restore agent-conclusion GenAI attributes by mirroring the setup-step env normalization

Problem

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Attribute	Expected on `agent` job?	Observed on `gh-aw.agent.conclusion`?
`gh-aw.job.name`	yes	❌ missing
`gen_ai.operation.name`	yes (`chat`)	❌ missing
`gen_ai.response.model`	yes	❌ missing
`gen_ai.response.finish_reasons`	yes (always, with `unknown` fallback)	❌ missing
`gen_ai.usage.input_tokens` / `output_tokens` / `total_tokens`	yes	❌ missing
`gh-aw.turns`, `gh-aw.estimated_cost_usd`	yes	❌ missing
dedicated `gh-aw.agent.agent` (LLM) sub-span	yes	❌ zero such spans in last 14 days
`gh-aw.effective_tokens` (NOT gated on jobName)	yes	✅ present
`gen_ai.request.model` (NOT gated on jobName)	yes	✅ present

[otel-advisor] OTel improvement: normalize INPUT_JOB_NAME in conclusion post step so gh-aw.agent.conclusion spans carry GenAI attributes #34952

Description

📡 OTel Instrumentation Improvement: Restore agent-conclusion GenAI attributes by mirroring the setup-step env normalization

Problem

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions