Component(s)
processor/genainormalizer
What happened?
Description
The processor's OpenInference source mapping for llm.input_messages and llm.output_messages is silently a no-op for real OpenInference data. The OpenInference spec defines messages as flattened indexed span attributes, but the processor only performs exact string key matching against llm.input_messages (and llm.output_messages), which never appear as literal attribute keys in the wire data.
Steps to Reproduce
Run any OpenInference-instrumented LLM call (e.g. using openinference-instrumentation-anthropic) through a collector with genainormalizerprocessor configured. Observe that the output span still contains the original indexed attributes and no gen_ai.input.messages / gen_ai.output.messages are produced.
Expected Result
After normalization, gen_ai.input.messages and gen_ai.output.messages should be correctly populated on the span.
OpenInference also emits input.value and output.value as JSON strings carrying the complete LLM request/response payloads. A possible implementation approach is to map input.value → gen_ai.input.messages and output.value → gen_ai.output.messages, since these are already present as single string attributes . However, the input.value and output.value` contains more than just the messages. We will need to do some cleanup in order to match the expectation of the GenAI spec.
Collector version
dev-build on main
Environment information
Environment
OS: MacOS
Compiler(if manually compiled): go 1.26.1
OpenTelemetry Collector configuration
receivers:
otlp:
protocols:
grpc:
endpoint: localhost:4317
http:
endpoint: localhost:4318
processors:
gen_ai_normalizer:
sources:
- name: openinference
remove_originals: true
exporters:
debug/before:
verbosity: detailed
debug/after:
verbosity: detailed
service:
pipelines:
traces/before:
receivers: [otlp]
exporters: [debug/before]
traces/after:
receivers: [otlp]
processors: [gen_ai_normalizer]
exporters: [debug/after]
Log output
Resource SchemaURL:
Resource attributes:
-> telemetry.sdk.language: Str(python)
-> telemetry.sdk.name: Str(opentelemetry)
-> telemetry.sdk.version: Str(1.41.1)
-> service.name: Str(unknown_service)
ScopeSpans #0
ScopeSpans SchemaURL: https://opentelemetry.io/schemas/1.40.0
InstrumentationScope openinference.instrumentation.anthropic 1.0.4
Span #0
Trace ID : 4f9df586f6f7dd5bf27a47f3382746aa
Parent ID :
ID : b0f87e21212efc1e
Name : messages.create
Kind : Internal
Start time : 2026-05-15 19:55:48.158054 +0000 UTC
End time : 2026-05-15 19:55:49.294986 +0000 UTC
Status code : Ok
Status message :
DroppedAttributesCount: 0
DroppedEventsCount: 0
DroppedLinksCount: 0
Attributes:
-> gen_ai.provider.name: Str(anthropic)
-> llm.system: Str(anthropic)
-> input.value: Str({"model": "claude-sonnet-4-6", "max_tokens": 1024, "tools": [{"name": "get_weather", "description": "Get the current weather in a given location", "input_schema": {"type": "object", "properties": {"location": {"type": "string", "description": "The city and state, e.g. San Francisco, CA"}, "unit": {"type": "string", "enum": ["celsius", "fahrenheit"], "description": "The unit of temperature, either \"celsius\" or \"fahrenheit\""}}, "required": ["location"]}}], "messages": [{"role": "user", "content": "What is the weather like in San Francisco in Fahrenheit?"}, {"content": ["ToolUseBlock(id='toolu_01RiJ897npopZ9HHSVRkZHn3', caller=DirectCaller(type='direct'), input={'location': 'San Francisco, CA', 'unit': 'fahrenheit'}, name='get_weather', type='tool_use')"], "role": "assistant"}, {"content": [{"tool_use_id": "toolu_01RiJ897npopZ9HHSVRkZHn3", "content": "{\"weather\": \"sunny\", \"temperature\": \"75\"}", "type": "tool_result", "is_error": false}], "role": "user"}]})
-> input.mime_type: Str(application/json)
-> llm.input_messages.0.message.role: Str(user)
-> llm.input_messages.0.message.content: Str(What is the weather like in San Francisco in Fahrenheit?)
-> llm.input_messages.1.message.role: Str(assistant)
-> llm.input_messages.1.message.tool_calls.0.tool_call.id: Str(toolu_01RiJ897npopZ9HHSVRkZHn3)
-> llm.input_messages.1.message.tool_calls.0.tool_call.function.name: Str(get_weather)
-> llm.input_messages.1.message.tool_calls.0.tool_call.function.arguments: Str({"location": "San Francisco, CA", "unit": "fahrenheit"})
-> llm.input_messages.2.message.role: Str(user)
-> llm.input_messages.2.message.tool_call_id: Str(toolu_01RiJ897npopZ9HHSVRkZHn3)
-> llm.input_messages.2.message.content: Str({"weather": "sunny", "temperature": "75"})
-> llm.tools.0.tool.json_schema: Str({"name": "get_weather", "description": "Get the current weather in a given location", "input_schema": {"type": "object", "properties": {"location": {"type": "string", "description": "The city and state, e.g. San Francisco, CA"}, "unit": {"type": "string", "enum": ["celsius", "fahrenheit"], "description": "The unit of temperature, either \"celsius\" or \"fahrenheit\""}}, "required": ["location"]}})
-> llm.invocation_parameters: Str({"max_tokens": 1024})
-> gen_ai.request.model: Str(claude-sonnet-4-6)
-> llm.output_messages.0.message.role: Str(assistant)
-> llm.output_messages.0.message.content: Str(The weather in San Francisco, CA is currently **sunny** with a temperature of **75°F**. It sounds like a beautiful day! 🌞)
-> gen_ai.usage.input_tokens: Int(729)
-> gen_ai.usage.output_tokens: Int(36)
-> output.value: Str({"id":"msg_01H1outA4PrykmfEZYpTAEke","container":null,"content":[{"citations":null,"text":"The weather in San Francisco, CA is currently **sunny** with a temperature of **75°F**. It sounds like a beautiful day! 🌞","type":"text"}],"model":"claude-sonnet-4-6","role":"assistant","stop_details":null,"stop_reason":"end_turn","stop_sequence":null,"type":"message","usage":{"cache_creation":{"ephemeral_1h_input_tokens":0,"ephemeral_5m_input_tokens":0},"cache_creation_input_tokens":0,"cache_read_input_tokens":0,"inference_geo":"us","input_tokens":729,"output_tokens":36,"server_tool_use":null,"service_tier":"standard"}})
-> output.mime_type: Str(application/json)
-> gen_ai.operation.name: Str(chat)
Additional context
No response
Tip
React with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding +1 or me too, to help us triage it. Learn more here.
Component(s)
processor/genainormalizer
What happened?
Description
The processor's OpenInference source mapping for
llm.input_messagesandllm.output_messagesis silently a no-op for real OpenInference data. The OpenInference spec defines messages as flattened indexed span attributes, but the processor only performs exact string key matching against llm.input_messages (and llm.output_messages), which never appear as literal attribute keys in the wire data.Steps to Reproduce
Run any OpenInference-instrumented LLM call (e.g. using openinference-instrumentation-anthropic) through a collector with genainormalizerprocessor configured. Observe that the output span still contains the original indexed attributes and no gen_ai.input.messages / gen_ai.output.messages are produced.
Expected Result
After normalization,
gen_ai.input.messagesandgen_ai.output.messagesshould be correctly populated on the span.OpenInference also emits
input.valueandoutput.valueas JSON strings carrying the complete LLM request/response payloads. A possible implementation approach is to mapinput.value→gen_ai.input.messagesandoutput.value→gen_ai.output.messages, since these are already present as single string attributes . However, theinput.valueand output.value` contains more than just the messages. We will need to do some cleanup in order to match the expectation of the GenAI spec.Collector version
dev-build on main
Environment information
Environment
OS: MacOS
Compiler(if manually compiled): go 1.26.1
OpenTelemetry Collector configuration
Log output
Resource SchemaURL: Resource attributes: -> telemetry.sdk.language: Str(python) -> telemetry.sdk.name: Str(opentelemetry) -> telemetry.sdk.version: Str(1.41.1) -> service.name: Str(unknown_service) ScopeSpans #0 ScopeSpans SchemaURL: https://opentelemetry.io/schemas/1.40.0 InstrumentationScope openinference.instrumentation.anthropic 1.0.4 Span #0 Trace ID : 4f9df586f6f7dd5bf27a47f3382746aa Parent ID : ID : b0f87e21212efc1e Name : messages.create Kind : Internal Start time : 2026-05-15 19:55:48.158054 +0000 UTC End time : 2026-05-15 19:55:49.294986 +0000 UTC Status code : Ok Status message : DroppedAttributesCount: 0 DroppedEventsCount: 0 DroppedLinksCount: 0 Attributes: -> gen_ai.provider.name: Str(anthropic) -> llm.system: Str(anthropic) -> input.value: Str({"model": "claude-sonnet-4-6", "max_tokens": 1024, "tools": [{"name": "get_weather", "description": "Get the current weather in a given location", "input_schema": {"type": "object", "properties": {"location": {"type": "string", "description": "The city and state, e.g. San Francisco, CA"}, "unit": {"type": "string", "enum": ["celsius", "fahrenheit"], "description": "The unit of temperature, either \"celsius\" or \"fahrenheit\""}}, "required": ["location"]}}], "messages": [{"role": "user", "content": "What is the weather like in San Francisco in Fahrenheit?"}, {"content": ["ToolUseBlock(id='toolu_01RiJ897npopZ9HHSVRkZHn3', caller=DirectCaller(type='direct'), input={'location': 'San Francisco, CA', 'unit': 'fahrenheit'}, name='get_weather', type='tool_use')"], "role": "assistant"}, {"content": [{"tool_use_id": "toolu_01RiJ897npopZ9HHSVRkZHn3", "content": "{\"weather\": \"sunny\", \"temperature\": \"75\"}", "type": "tool_result", "is_error": false}], "role": "user"}]}) -> input.mime_type: Str(application/json) -> llm.input_messages.0.message.role: Str(user) -> llm.input_messages.0.message.content: Str(What is the weather like in San Francisco in Fahrenheit?) -> llm.input_messages.1.message.role: Str(assistant) -> llm.input_messages.1.message.tool_calls.0.tool_call.id: Str(toolu_01RiJ897npopZ9HHSVRkZHn3) -> llm.input_messages.1.message.tool_calls.0.tool_call.function.name: Str(get_weather) -> llm.input_messages.1.message.tool_calls.0.tool_call.function.arguments: Str({"location": "San Francisco, CA", "unit": "fahrenheit"}) -> llm.input_messages.2.message.role: Str(user) -> llm.input_messages.2.message.tool_call_id: Str(toolu_01RiJ897npopZ9HHSVRkZHn3) -> llm.input_messages.2.message.content: Str({"weather": "sunny", "temperature": "75"}) -> llm.tools.0.tool.json_schema: Str({"name": "get_weather", "description": "Get the current weather in a given location", "input_schema": {"type": "object", "properties": {"location": {"type": "string", "description": "The city and state, e.g. San Francisco, CA"}, "unit": {"type": "string", "enum": ["celsius", "fahrenheit"], "description": "The unit of temperature, either \"celsius\" or \"fahrenheit\""}}, "required": ["location"]}}) -> llm.invocation_parameters: Str({"max_tokens": 1024}) -> gen_ai.request.model: Str(claude-sonnet-4-6) -> llm.output_messages.0.message.role: Str(assistant) -> llm.output_messages.0.message.content: Str(The weather in San Francisco, CA is currently **sunny** with a temperature of **75°F**. It sounds like a beautiful day! 🌞) -> gen_ai.usage.input_tokens: Int(729) -> gen_ai.usage.output_tokens: Int(36) -> output.value: Str({"id":"msg_01H1outA4PrykmfEZYpTAEke","container":null,"content":[{"citations":null,"text":"The weather in San Francisco, CA is currently **sunny** with a temperature of **75°F**. It sounds like a beautiful day! 🌞","type":"text"}],"model":"claude-sonnet-4-6","role":"assistant","stop_details":null,"stop_reason":"end_turn","stop_sequence":null,"type":"message","usage":{"cache_creation":{"ephemeral_1h_input_tokens":0,"ephemeral_5m_input_tokens":0},"cache_creation_input_tokens":0,"cache_read_input_tokens":0,"inference_geo":"us","input_tokens":729,"output_tokens":36,"server_tool_use":null,"service_tier":"standard"}}) -> output.mime_type: Str(application/json) -> gen_ai.operation.name: Str(chat)Additional context
No response
Tip
React with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding
+1orme too, to help us triage it. Learn more here.