Skip to content

[processor/genainormalizer] OpenInference input/output message are not normalized due to flattened indexed attribute format #48421

@VinozzZ

Description

@VinozzZ

Component(s)

processor/genainormalizer

What happened?

Description

The processor's OpenInference source mapping for llm.input_messages and llm.output_messages is silently a no-op for real OpenInference data. The OpenInference spec defines messages as flattened indexed span attributes, but the processor only performs exact string key matching against llm.input_messages (and llm.output_messages), which never appear as literal attribute keys in the wire data.

Steps to Reproduce

Run any OpenInference-instrumented LLM call (e.g. using openinference-instrumentation-anthropic) through a collector with genainormalizerprocessor configured. Observe that the output span still contains the original indexed attributes and no gen_ai.input.messages / gen_ai.output.messages are produced.

Expected Result

After normalization, gen_ai.input.messages and gen_ai.output.messages should be correctly populated on the span.

OpenInference also emits input.value and output.value as JSON strings carrying the complete LLM request/response payloads. A possible implementation approach is to map input.valuegen_ai.input.messages and output.valuegen_ai.output.messages, since these are already present as single string attributes . However, the input.value and output.value` contains more than just the messages. We will need to do some cleanup in order to match the expectation of the GenAI spec.

Collector version

dev-build on main

Environment information

Environment

OS: MacOS
Compiler(if manually compiled): go 1.26.1

OpenTelemetry Collector configuration

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: localhost:4317
      http:
        endpoint: localhost:4318

processors:
  gen_ai_normalizer:
    sources:
      - name: openinference
        remove_originals: true

exporters:
  debug/before:
    verbosity: detailed
  debug/after:
    verbosity: detailed

service:
  pipelines:
    traces/before:
      receivers: [otlp]
      exporters: [debug/before]
    traces/after:
      receivers: [otlp]
      processors: [gen_ai_normalizer]
      exporters: [debug/after]

Log output

Resource SchemaURL: 
Resource attributes:
     -> telemetry.sdk.language: Str(python)
     -> telemetry.sdk.name: Str(opentelemetry)
     -> telemetry.sdk.version: Str(1.41.1)
     -> service.name: Str(unknown_service)
ScopeSpans #0
ScopeSpans SchemaURL: https://opentelemetry.io/schemas/1.40.0
InstrumentationScope openinference.instrumentation.anthropic 1.0.4
Span #0
    Trace ID       : 4f9df586f6f7dd5bf27a47f3382746aa
    Parent ID      : 
    ID             : b0f87e21212efc1e
    Name           : messages.create
    Kind           : Internal
    Start time     : 2026-05-15 19:55:48.158054 +0000 UTC
    End time       : 2026-05-15 19:55:49.294986 +0000 UTC
    Status code    : Ok
    Status message : 
    DroppedAttributesCount: 0
    DroppedEventsCount: 0
    DroppedLinksCount: 0
Attributes:
     -> gen_ai.provider.name: Str(anthropic)
     -> llm.system: Str(anthropic)
     -> input.value: Str({"model": "claude-sonnet-4-6", "max_tokens": 1024, "tools": [{"name": "get_weather", "description": "Get the current weather in a given location", "input_schema": {"type": "object", "properties": {"location": {"type": "string", "description": "The city and state, e.g. San Francisco, CA"}, "unit": {"type": "string", "enum": ["celsius", "fahrenheit"], "description": "The unit of temperature, either \"celsius\" or \"fahrenheit\""}}, "required": ["location"]}}], "messages": [{"role": "user", "content": "What is the weather like in San Francisco in Fahrenheit?"}, {"content": ["ToolUseBlock(id='toolu_01RiJ897npopZ9HHSVRkZHn3', caller=DirectCaller(type='direct'), input={'location': 'San Francisco, CA', 'unit': 'fahrenheit'}, name='get_weather', type='tool_use')"], "role": "assistant"}, {"content": [{"tool_use_id": "toolu_01RiJ897npopZ9HHSVRkZHn3", "content": "{\"weather\": \"sunny\", \"temperature\": \"75\"}", "type": "tool_result", "is_error": false}], "role": "user"}]})
     -> input.mime_type: Str(application/json)
     -> llm.input_messages.0.message.role: Str(user)
     -> llm.input_messages.0.message.content: Str(What is the weather like in San Francisco in Fahrenheit?)
     -> llm.input_messages.1.message.role: Str(assistant)
     -> llm.input_messages.1.message.tool_calls.0.tool_call.id: Str(toolu_01RiJ897npopZ9HHSVRkZHn3)
     -> llm.input_messages.1.message.tool_calls.0.tool_call.function.name: Str(get_weather)
     -> llm.input_messages.1.message.tool_calls.0.tool_call.function.arguments: Str({"location": "San Francisco, CA", "unit": "fahrenheit"})
     -> llm.input_messages.2.message.role: Str(user)
     -> llm.input_messages.2.message.tool_call_id: Str(toolu_01RiJ897npopZ9HHSVRkZHn3)
     -> llm.input_messages.2.message.content: Str({"weather": "sunny", "temperature": "75"})
     -> llm.tools.0.tool.json_schema: Str({"name": "get_weather", "description": "Get the current weather in a given location", "input_schema": {"type": "object", "properties": {"location": {"type": "string", "description": "The city and state, e.g. San Francisco, CA"}, "unit": {"type": "string", "enum": ["celsius", "fahrenheit"], "description": "The unit of temperature, either \"celsius\" or \"fahrenheit\""}}, "required": ["location"]}})
     -> llm.invocation_parameters: Str({"max_tokens": 1024})
     -> gen_ai.request.model: Str(claude-sonnet-4-6)
     -> llm.output_messages.0.message.role: Str(assistant)
     -> llm.output_messages.0.message.content: Str(The weather in San Francisco, CA is currently **sunny** with a temperature of **75°F**. It sounds like a beautiful day! 🌞)
     -> gen_ai.usage.input_tokens: Int(729)
     -> gen_ai.usage.output_tokens: Int(36)
     -> output.value: Str({"id":"msg_01H1outA4PrykmfEZYpTAEke","container":null,"content":[{"citations":null,"text":"The weather in San Francisco, CA is currently **sunny** with a temperature of **75°F**. It sounds like a beautiful day! 🌞","type":"text"}],"model":"claude-sonnet-4-6","role":"assistant","stop_details":null,"stop_reason":"end_turn","stop_sequence":null,"type":"message","usage":{"cache_creation":{"ephemeral_1h_input_tokens":0,"ephemeral_5m_input_tokens":0},"cache_creation_input_tokens":0,"cache_read_input_tokens":0,"inference_geo":"us","input_tokens":729,"output_tokens":36,"server_tool_use":null,"service_tier":"standard"}})
     -> output.mime_type: Str(application/json)
     -> gen_ai.operation.name: Str(chat)

Additional context

No response

Tip

React with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding +1 or me too, to help us triage it. Learn more here.

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions