Skip to content

[otel-advisor] OTel improvement: normalize INPUT_JOB_NAME in conclusion post step so gh-aw.agent.conclusion spans carry GenAI attributes #34952

@github-actions

Description

@github-actions

📡 OTel Instrumentation Improvement: Restore agent-conclusion GenAI attributes by mirroring the setup-step env normalization

Analysis Date: 2026-05-26
Priority: High
Effort: Small (< 2h)

Problem

Every gh-aw.agent.conclusion span emitted in the last 24 hours — across Matt Pocock Skills Reviewer, Test Quality Sentinel, Design Decision Gate, PR Sous Chef, Issue Monster, Daily Malicious Code Scan, Spec Librarian, CI Coach, Smoke CI, and PR Code Quality Reviewer — is missing the attributes that distinguish an agent (LLM) job from a non-agent job:

Attribute Expected on agent job? Observed on gh-aw.agent.conclusion?
gh-aw.job.name yes ❌ missing
gen_ai.operation.name yes (chat) ❌ missing
gen_ai.response.model yes ❌ missing
gen_ai.response.finish_reasons yes (always, with unknown fallback) ❌ missing
gen_ai.usage.input_tokens / output_tokens / total_tokens yes ❌ missing
gh-aw.turns, gh-aw.estimated_cost_usd yes ❌ missing
dedicated gh-aw.agent.agent (LLM) sub-span yes ❌ zero such spans in last 14 days
gh-aw.effective_tokens (NOT gated on jobName) yes ✅ present
gen_ai.request.model (NOT gated on jobName) yes ✅ present

Every missing field above is gated on if (jobName === "agent") (or its derivative hasDedicatedAgentSpan) inside sendJobConclusionSpan (actions/setup/js/send_otlp_span.cjs:1790,1918-1931,2124,2165). Every field still present is not gated on jobName. This is a deterministic signature: sendJobConclusionSpan is reading jobName = "" for the agent-job's post step, while the matching gh-aw.agent.setup span on the same job correctly carries gh-aw.job.name: agent. Comparison of 2026-05-21 vs 2026-05-26 traces confirms this is a regression — token data used to flow through.

Why This Matters (DevOps Perspective)

Agent-job observability is the highest-value telemetry this repo emits: it is the only signal that tells an on-call engineer whether the LLM stopped early, ran for too many turns, burned tokens beyond budget, or chose the wrong model. With these fields absent on gh-aw.agent.conclusion:

  • Backend dashboards filtering on gen_ai.operation.name:chat exclude every gh-aw agent span and show zero LLM activity.
  • sum(gen_ai.usage.total_tokens) by workflow returns nothing for agent runs (only the duplicated copy on safe_outputs.conclusion survives, which inflates downstream-job tokens and undercounts agent tokens).
  • The dedicated gh-aw.agent.agent sub-span — designed to measure pure LLM latency excluding setup overhead — is never emitted, so p95 LLM duration alerting is impossible.
  • Alerts keyed on gen_ai.response.finish_reasons:length (truncated outputs) silently never fire.

For incident triage, this is the difference between “LLM failed” being one query away vs. requiring artifact downloads.

Current Behavior

The setup post step normalizes the env var so sendJobSetupSpan (which reads process.env.INPUT_JOB_NAME directly) always sees the value, even on runner versions that preserve the hyphen form INPUT_JOB-NAME:

// actions/setup/js/action_setup_otlp.cjs:81-87
const inputJobName = getActionInput("JOB_NAME");
if (inputJobName) {
  process.env.INPUT_JOB_NAME = inputJobName;
}
if (inputParentSpanId) {
  process.env.INPUT_PARENT_SPAN_ID = inputParentSpanId;
}

The conclusion post step does not do the same normalization:

// actions/setup/js/action_conclusion_otlp.cjs:72-86
async function run() {
  const endpoints = process.env.GH_AW_OTLP_ENDPOINTS;
  const spanName = buildSpanName(getActionInput("JOB_NAME")); // ← resolves hyphen form for the span NAME
  const startMs = parseJobStartMs(process.env.GITHUB_AW_OTEL_JOB_START_MS);

  if (endpoints) {
    console.log(`[otlp] sending conclusion span "${spanName}" to configured endpoints`);
  } else {
    console.log("[otlp] GH_AW_OTLP_ENDPOINTS not set, skipping OTLP export (will attempt JSONL mirror)");
  }

  await sendOtlpSpan.sendJobConclusionSpan(spanName, { startMs });
  // ← spanName="gh-aw.agent.conclusion" is correct, but inside sendJobConclusionSpan
  //   `const jobName = process.env.INPUT_JOB_NAME || ""` returns "" when only the
  //   hyphen form is set, so every `if (jobName === "agent")` branch is skipped.
}

Inside sendJobConclusionSpan (send_otlp_span.cjs:1790):

const jobName = process.env.INPUT_JOB_NAME || ""; // ← only checks underscore form
// ...
if (jobName) attributes.push(buildAttr("gh-aw.job.name", jobName));
if (jobName === "agent") {
  attributes.push(buildAttr("gen_ai.operation.name", "chat"));
  // gen_ai.response.model, gen_ai.response.finish_reasons, etc.
}
const hasDedicatedAgentSpan = jobName === "agent" && /* ... */;
if (!hasDedicatedAgentSpan && jobName === "agent") {
  attributes.push(...usageAttrs);
}

The setup span survives this same pattern only because action_setup_otlp.cjs already promotes the hyphen form into the underscore form before invoking sendJobSetupSpan.

Proposed Change

Mirror the setup-step normalization in actions/setup/js/action_conclusion_otlp.cjs:

// actions/setup/js/action_conclusion_otlp.cjs (inside run(), before sendJobConclusionSpan)
async function run() {
  const endpoints = process.env.GH_AW_OTLP_ENDPOINTS;

  // Normalize to the canonical underscore form so sendJobConclusionSpan
  // (which reads process.env.INPUT_JOB_NAME directly) always finds the value,
  // matching the normalization done by action_setup_otlp.cjs at setup time.
  const inputJobName = getActionInput("JOB_NAME");
  if (inputJobName) {
    process.env.INPUT_JOB_NAME = inputJobName;
  }

  const spanName = buildSpanName(inputJobName);
  const startMs = parseJobStartMs(process.env.GITHUB_AW_OTEL_JOB_START_MS);
  // ...rest unchanged
  await sendOtlpSpan.sendJobConclusionSpan(spanName, { startMs });
}

This is the minimal, behavior-preserving fix and stays consistent with the existing setup-step pattern. A follow-up cleanup could replace the direct process.env.INPUT_JOB_NAME reads inside sendJobSetupSpan and sendJobConclusionSpan with getActionInput("JOB_NAME") so the two paths can't drift again — but that is a larger refactor and not required for the regression fix.

Expected Outcome

After this change, gh-aw.agent.conclusion spans (and the dedicated gh-aw.agent.agent LLM span) will once again carry:

  • In Grafana / Sentry / Honeycomb / Datadog: queryable gen_ai.operation.name:chat, gen_ai.usage.total_tokens, gen_ai.response.finish_reasons:length (truncation detection), gen_ai.response.model, gh-aw.turns, gh-aw.estimated_cost_usd, and gh-aw.job.name:agent for filtering. Sum-over-workflow dashboards will stop missing agent-job tokens.
  • A dedicated gh-aw.agent.agent span (CLIENT kind) per agent execution, allowing p95 LLM-latency alerting separately from total job duration.
  • In the JSONL mirror (/tmp/gh-aw/otel.jsonl): the same attributes survive, so artifact-only debugging is also restored.
  • For on-call: a single query span.name:gh-aw.agent.conclusion AND gen_ai.response.finish_reasons:length becomes sufficient to detect every truncated agent response across all workflows.
Implementation Steps
  • Edit actions/setup/js/action_conclusion_otlp.cjs: copy the 3-line INPUT_JOB_NAME normalization from action_setup_otlp.cjs:81-84 into the top of run(), before constructing spanName.
  • Extend actions/setup/js/action_conclusion_otlp.test.cjs: add a test that sets only process.env["INPUT_JOB-NAME"] = "agent" (hyphen form), calls run(), and asserts that process.env.INPUT_JOB_NAME === "agent" after the call. This locks in the contract that sendJobConclusionSpan will see the canonical form.
  • Run cd actions/setup/js && npx vitest run action_conclusion_otlp.test.cjs send_otlp_span.test.cjs to confirm tests pass.
  • Run make fmt to ensure formatting.
  • Open a PR referencing this issue.
Evidence from Live OTel Data (Sentry/Grafana)

Grafana Tempo — full trace 4655a6296a6d36b45a1a62a328bbe244 (Matt Pocock Skills Reviewer, run 26455647757, 2026-05-26T14:50:27Z):

The gh-aw.agent.conclusion span (id=0a9f22a916c26aa2, parent=b867e64153c0916a which is gh-aw.agent.setup) has:

  • gen_ai.request.model: claude-sonnet-4.6 ✅ (set unconditionally from awInfo.model)
  • gh-aw.effective_tokens: 1219039 ✅ (set unconditionally when > 0)
  • gh-aw.action_minutes: 3.874
  • gh-aw.run.status: success
  • gh-aw.job.name: ❌ absent (gated on if (jobName))
  • gen_ai.operation.name: ❌ absent (gated on if (jobName === "agent"))
  • gen_ai.response.model: ❌ absent
  • gen_ai.response.finish_reasons: ❌ absent
  • gen_ai.usage.*: ❌ absent
  • gh-aw.turns, gh-aw.estimated_cost_usd: ❌ absent

The matching gh-aw.agent.setup span (id=b867e64153c0916a) in the same trace does carry gh-aw.job.name: agent, confirming the asymmetry is in the post-step path only.

Sentry spans dataset — query span.name:gh-aw.agent.conclusion timestamp:>2026-05-25 returns 14+ spans across multiple workflows (Matt Pocock Skills Reviewer, Test Quality Sentinel, Design Decision Gate, PR Sous Chef, Issue Monster, Daily Malicious Code Scan, Spec Librarian, CI Coach, Smoke CI). All are missing the jobName-gated attributes above.

Regression evidence: Sentry query span.name:gh-aw.agent.conclusion gen_ai.usage.total_tokens:>0 returns matches from 2026-05-21 (gen_ai.usage.total_tokens present, e.g. 682735, 2116561, 1269362) but zero matches from 2026-05-25 onward.

Cross-backend consistency: Sentry and Grafana Tempo agree on the attribute set — the omission is at the emit side, not an ingestion gap.

Connectivity checks performed: Sentry whoami ✅, Sentry find_organizations ✅ (github), Sentry find_projects ✅ (gh-aw, project id 4511347087179777), Grafana list_datasources ✅, Grafana tempo_traceql-search ✅, Grafana tempo_get-trace ✅. The Sentry MCP build available here does not expose search_events, so list_events with explicit fields was used throughout — this is a tool limitation, not a data gap.

Related Files
  • actions/setup/js/action_conclusion_otlp.cjs (the fix — add 3-line normalization in run())
  • actions/setup/js/action_setup_otlp.cjs (reference implementation — lines 81-84)
  • actions/setup/js/send_otlp_span.cjs (consumer that reads process.env.INPUT_JOB_NAME directly at line 1790; jobName-gated branches at lines 1881, 1918-1931, 2124, 2165)
  • actions/setup/js/action_conclusion_otlp.test.cjs (extend with a hyphen-form normalization test)
  • actions/setup/js/action_input_utils.cjs (provides getActionInput which already handles both forms)

Generated by the Daily OTel Instrumentation Advisor workflow

Generated by 📊 Daily OTel Instrumentation Advisor · opus47 34.3M ·

  • expires on Jun 2, 2026, 3:21 PM UTC

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions