Spec ID: OVOS-TRANSFORM-1 · Version: 1 · Status: Draft
This document defines transformer plugins as an architectural pattern of voice operating systems: ordered black-box chains of components, inserted at well-defined points in the utterance lifecycle, that enrich, normalize, translate, or otherwise mutate the artifacts flowing through the assistant. The spec identifies six injection points that are the natural homes for this kind of work in a voice operating system's utterance lifecycle (§2), defines the per-type contract for each (§3), and specifies the shared chain abstraction — ordering, error handling, cancellation, registration — that any orchestrator implementing chains follows (§4, §6, §7, §8).
An orchestrator MAY implement transformer chains at any subset of the six injection points (none, some, or all). For each chain it does implement, this spec defines what the chain looks like and what it MUST do. The spec does not require any specific chain to be implemented; it defines the design pattern and the contract, not a feature list.
It builds on three companion specifications:
- the Bus Message Specification (OVOS-MSG-1) — the envelope and
the
sessioncarrier (§4) in which per-session transformer overrides live; - the Utterance Lifecycle and Pipeline Specification (OVOS-PIPELINE-1) — the per-utterance flow into which the six transformer chains insert (§2 of this spec extends OVOS-PIPELINE-1 §6);
- the Intent Definition Specification (OVOS-INTENT-3) — the
Matchshape an intent transformer (§3.4) consumes and emits.
The key words MUST, MUST NOT, SHOULD, SHOULD NOT, and MAY are used as in RFC 2119.
A transformer is a black-box component that consumes one artifact at a specific point in the utterance lifecycle, optionally mutates it, and produces an artifact of the same shape for the next stage to consume. What a transformer does internally is unconstrained — anything from a regex substitution to a language-model rewrite to an audio DSP filter qualifies, provided its IO conforms to the contract of its type (§3).
A transformer chain is an ordered set of transformers of one type that the orchestrator runs at an injection point. Unlike a pipeline plugin (OVOS-PIPELINE-1 §3) — which decides whether to claim an utterance — every transformer in a chain always runs when its injection point is reached. There is no claim-or-decline, no first-result-wins, no early exit (except utterance cancellation per §8). Whatever the last transformer returns is what the orchestrator passes to the next lifecycle stage.
Per OVOS-PIPELINE-1 §2, the orchestrator MAY be implemented as multiple cooperating processes. The six transformer chains partition naturally along the audio-boundary split named there: the audio chain (§3.1) with the audio-input service; the utterance / metadata / intent chains (§3.2–§3.4) with the utterance-handling service; the dialog and TTS chains (§3.5, §3.6) with the audio-output service. Under a split, no single process holds a global view of loaded transformers — the introspection surface (§6) is broadcast-query / scatter-response specifically to accommodate this. A single-process implementation is equally conformant; the wire shape is the same either way.
A transformer is identified by a (type, transformer_id) pair.
typeis exactly one of the six values defined by §2:audio,utterance,metadata,intent,dialog,tts. The type fixes the injection point at which the transformer runs and the IO contract it conforms to (§3).transformer_idis an opaque deployment-unique string within its type. The orchestrator's loaded transformers are partitioned by type into per-type registriestransformer_id → transformer instance. When the orchestrator is split across multiple processes, each process holds the slice of those registries relevant to the chains it implements; the union across processes is the full loaded set.
Constraints on transformer_id strings:
- Non-empty.
- Must match the topic-name syntax of OVOS-MSG-1 §2.1 (ASCII
letters, digits,
.,_,-; no whitespace). - Must not contain
:(the dispatch-topic separator of OVOS-PIPELINE-1 §7). - Unique within its type's registry. A single deployment MAY load
transformers with the same
transformer_idacross different types; the six type registries are independent.
A transformer MAY appear in a chain at most once for its
type; a chain is an ordered set of distinct transformer_ids
within a single type.
This specification defines the shared chain model (§1, §4, §7),
the six injection points in the utterance lifecycle and the
per-type IO contracts (§2, §3), the per-session override mechanism
(§5), the broadcast-query / scatter-response introspection
surface (§6), the utterance cancellation plugin contract
(§8), the language disambiguation hierarchy for
Message.context (§7.1), conformance (§9), and the non-goals
(§10).
It does not define:
- What any individual transformer does internally — transformers are black boxes; only the IO contract at the injection point is normative.
- How transformers are loaded, discovered, configured, or instantiated — deployment concerns.
- Slot value typing schemas. Intent transformers (§3.4) are the canonical home for system-type entity injection (dates, numbers, durations, etc.), but the typed value formats themselves are deferred to a future text-normalization specification (OVOS-INTENT-1 §5.3).
- Streaming / end-to-end pipeline shapes. The §2 flow diagram describes the canonical staged flow most transformers depend on (mic → STT → text → intent → speak → TTS → playback); implementations that collapse stages (streaming STT, end-to-end speech-to-speech models) MAY omit hooks that have no corresponding artifact in their flow, provided the conformance rules of §9 are met for every chain they do implement.
For the design rationale behind each injection point and why transformer chains are the right architectural primitive for cross-cutting concerns, see appendix/rationale.md §4.7.
This specification claims six Message.context keys, one per
transformer type:
| Type (§2) | Context key |
|---|---|
| audio | audio_transformer_ids |
| utterance | utterance_transformer_ids |
| metadata | metadata_transformer_ids |
| intent | intent_transformer_ids |
| dialog | dialog_transformer_ids |
| tts | tts_transformer_ids |
Each key, when present, holds an ordered list of transformer_id
strings (§1.1) belonging to the corresponding type's registry.
The list records the chain of transformers of that type that
touched the Message, in order of touch. The last element is
the current-attribution transformer; the full list records chain
provenance. The plural key name signals the list shape; the
singular <type>_transformer_id naming is not used by this
specification.
Stamp rule. On every Message a transformer places on the
bus by authorial action — a fresh emission, or
Message.reply(...) / Message.response(...) derivation it
performs and emits (OVOS-MSG-1 §5) — and on every Message it
modifies in place within its execution window before the
Message proceeds, the transformer MUST ensure that its own
transformer_id is the last element of the corresponding
<type>_transformer_ids list.
Message.forward(...) (OVOS-MSG-1 §5.1) preserves context
unchanged and is propagation, not authorial assertion. A
transformer that .forwards a Message it did not modify MUST
NOT append its own transformer_id for that derivation — the
inherited list rides through untouched. If the transformer
modifies the Message in place and then .forwards the modified
Message, the modify-in-place clause applies and the stamp
obligation fires.
Operationally, on every touch the transformer appends its
own transformer_id to the list (creating the list if absent or
empty). The append fires once per execution window. The six
<type>_transformer_ids keys coexist on a single Message with
each other and with the component-identity keys claimed by other
specifications — context["skill_id"] (OVOS-INTENT-4 §3.1) and
context["pipeline_id"] (OVOS-PIPELINE-1 §3.1), both single
strings. Attribution consumers that need to pick a single
emitter apply the precedence rule codified in OVOS-CONTEXT-1 §5.2
(most-specific by lifecycle position, reading the last element of
the list-valued keys).
<type>_transformer_ids is the transformer chain's
self-attribution. It is distinct from any
data["transformer_id"] (singular) a topic's payload schema may
carry as the subject of the Message — for example, the
transformer_id payload field in
ovos.transformer.{type}.list responses (§6) identifies the
transformer the entry describes, not who emitted the response.
The orchestrator (or any component that loads transformers)
SHOULD intercept / decorate the transformer's emit pathway and
its return-value handling at load time so non-compliant
transformer code cannot emit a Message or hand back a modified
Message whose <type>_transformer_ids list does not end with the
transformer's own id. The orchestrator's own bus emissions on
behalf of a transformer — the cancel_by stamping of §8.1, for
example — are made by the orchestrator from its own runtime
knowledge of which transformer caused the event; those emissions
carry the orchestrator's own attribution discipline, not the
transformer's.
A consumer that needs to attribute a transformer's action
MUST read the corresponding <type>_transformer_ids list
directly (typically the last element for current attribution, the
full list for chain provenance); it MUST NOT infer the
transformer from source,
from data fields, or from the topic name.
This specification identifies six injection points in the utterance lifecycle of OVOS-PIPELINE-1 §6 where transformer chains are the right architectural primitive. Each injection point exists because the lifecycle, at that exact moment, holds an artifact in a state that makes a particular class of work possible there and nowhere else. §3 covers each in detail; this section is the catalogue.
The six injection points, in lifecycle order:
mic audio
│
├─ audio-transformer chain (§3.1)
│
STT → text
│
├─ utterance-transformer chain (§3.2)
│
├─ metadata-transformer chain (§3.3)
│
intent-context decay (OVOS-CONTEXT-1 §4)
│
match round (OVOS-PIPELINE-1 §6)
│
├─ intent-transformer chain (§3.4)
│
dispatch + handler trio (OVOS-PIPELINE-1 §7, §8)
│
skill emits speak()
│
├─ dialog-transformer chain (§3.5)
│
TTS → wav file
│
├─ tts-transformer chain (§3.6)
│
playback
An orchestrator MAY implement transformer chains at any subset of these injection points (none, some, or all). Each chain it implements MUST conform to the per-type contract of the matching §3 subsection; each chain it does not implement is simply a no-op at that point in the lifecycle. Implementations whose architecture omits an upstream artifact entirely (a streaming STT that produces no discrete "STT → text" boundary, an end-to-end speech-to-speech model that bypasses intermediate text) MAY likewise omit the chains for artifacts they don't materialise.
Each implemented chain is run to completion before the next stage
of the lifecycle proceeds. A chain whose transformers all raise
still produces the input unchanged (§7) and the lifecycle
continues. A chain or stage MAY be aborted early by utterance
cancellation (§8) — the only sanctioned way to short-circuit the
lifecycle before its natural terminal events; cancellation
preserves OVOS-PIPELINE-1 §9.5's universal ovos.utterance.handled
invariant.
For each of the six injection points, this section defines the chain's input artifact, what the chain MAY/MUST change, and any type-specific conformance rules. Design rationale for each injection point — why each is the only point in the lifecycle where its class of work is possible — is in appendix/rationale.md §4.7.
Four of the six per-type contracts (§3.1 audio, §3.2 utterance,
§3.5 dialog, §3.6 TTS) operate on an artifact whose content
language can be authoritative. The orchestrator threads this
language through the chain as a parameter named lang,
alongside the artifact and Message.context. The parameter is
bidirectional — it appears in both the input and the output
of each transformer call, so a transformer that mutates the
artifact's language can mutate lang in lockstep.
- Source at chain start. The orchestrator sources the initial
langfromMessage.data.langof the Message whose artifact the chain is processing.data.langis owned by the topic's spec; its presence is an authoritative declaration that the artifact is in that language. - Optional, no orchestrator-side synthesis.
langis OPTIONAL. The orchestrator MUST pass it through whenMessage.data.langis present, and MUST NOT synthesize a value when it is absent — in particular, it MUST NOT fall back tosession.lang, to any per-utterance signal field (stt_lang,request_lang,detected_lang), or to a deployment default. An absentlangparameter is a faithful signal that the content language is not authoritatively known. - Consumer-side resolution. A transformer that needs a
language and receives
lang: NoneMAY consultMessage.context.sessionto read the user-preference signal (session.lang) or any per-utterance signal field, or fall back to its own default — the choice is the transformer's. - Output
lang— transformer mutation. Each transformer call returns alangvalue alongside the modified artifact and context. The returnedlangMAY differ from the inputlang: pass-through (unchanged), set/detect (new value replacingNone), translate (destination language), or clear (None). - Threading across the chain. The orchestrator threads the
output
(artifact, lang)of each transformer into the input of the next. - Writeback to
data.lang. After the chain finishes, the orchestrator MUST reflect the final outputlanginto the artifact-bearing Message'sdata.lang: setdata.langto the final value when non-None; unsetdata.langwhen the final value isNoneand the field was present on entry. - Metadata (§3.3) and intent (§3.4) transformers do not
receive
langas a parameter. Intent transformers receive aMatchwhoseMatch.lang(OVOS-PIPELINE-1 §4.1) already names the language; metadata transformers operate onMessage.contextonly and read whichever language signal their policy calls for.
Injection point. Pre-STT. Operate on raw audio chunks from the microphone or any other audio source feeding the assistant.
Input. A binary audio chunk, the optional lang parameter
(§3.0), and a metadata object carrying at minimum the audio's
sample rate, sample width, and channel count; the metadata
object is otherwise extensible.
Output. A binary audio chunk, the (possibly mutated) lang
value per §3.0, and an updated metadata object.
Permitted mutations. A transformer MAY rewrite the audio buffer (noise reduction, gain control, format conversion) and MAY add or modify metadata keys (detected language, loudness, voice activity score). When the transformer changes the audio's physical format (sample rate, sample width, channel count), it MUST update the corresponding metadata fields to match; conversely it SHOULD NOT modify those physical-format metadata fields without having actually changed the audio.
Injection point. Post-STT, pre-intent. Operate on the candidate transcription list.
Input. A non-empty list of candidate utterance strings, the
optional lang parameter (§3.0), and the full Message.context
object (OVOS-MSG-1 §2.3) for the in-flight utterance — same
surface §3.3 describes, including the session carrier and
everything other transformers and other specifications have
written into it.
By convention utterances[0] is the primary candidate — the
canonical STT transcription, or the result of whatever upstream
chain step elected one. Later indices are alternative candidates
(STT n-best alternatives, paraphrases added by an earlier
transformer, normalized variants). Plugins that operate on a
single text MAY target utterances[0] only; plugins that produce
alternatives extend the list. Downstream matchers MAY try any
candidate.
Output. A possibly modified list of utterance strings, the
(possibly mutated) lang value per §3.0, and a possibly mutated
Message.context.
Permitted mutations. A transformer MAY rewrite, expand, or
contract the candidate list (add a paraphrase, drop an invalid
transcription). Mutation MAY be performed in place on the input
list or by returning a new list; both are conformant. It MAY also
mutate Message.context per the same permissive rules of §3.3 —
utterance transformers legitimately need to write metadata they
derived from the text (detected language, confidence rescoring),
and may mutate session-internal fields when the result of their
work warrants it (e.g. a translation transformer that normalizes
session.lang to the internal language after translating). The
§3.3 coordination guidance on companion-spec reserved keys
applies here equally.
Empty-list semantics. A transformer MAY return an empty list. Two distinct outcomes share this shape: (1) no plausible transcription — empty list without the §8.1 cancellation signal; downstream stages treat it as silence and the lifecycle terminates with
complete_intent_failurefollowed byovos.utterance.handledper OVOS-PIPELINE-1 §9; (2) cancellation — empty list returned together withcanceled: trueandcancel_reasonper §8.1; the orchestrator terminates via the §8.2 path, emittingovos.utterance.cancelledfollowed byovos.utterance.handled. A transformer that wants the cancellation outcome MUST set the §8.1 keys; returning an empty list alone is the no-transcription case.
Injection point. Post-utterance, pre-intent. The metadata-
transformer chain operates directly on the Message.context
object (OVOS-MSG-1 §2.3) for the in-flight utterance — including
the session carrier (OVOS-MSG-1 §4), accumulated context from prior
transformers, and any other context keys other specifications have
populated. A metadata transformer's defining trait is that its only
input and its only output is Message.context; it has no
artifact-specific input the way audio (§3.1), utterance (§3.2),
intent (§3.4), dialog (§3.5), or TTS (§3.6) transformers do.
Input. The full Message.context object (OVOS-MSG-1 §2.3) for
the in-flight utterance: routing keys (§3 of MSG-1), the session
carrier (§4 of MSG-1, which itself carries session.intent_context
(OVOS-CONTEXT-1 §2), session.pipeline (OVOS-PIPELINE-1 §5), the
six per-session transformer overrides (§5 of this spec), and any
other normative or non-normative internal session fields), plus
any top-level metadata keys earlier transformers or other
specifications have written.
Output. A Message.context object — in practice the input
mutated in place, or a returned replacement of the same shape.
Permitted mutations. A metadata transformer MAY mutate
Message.context however it sees fit. That is its purview, by
design: the chain exists to give a deployer a single in-process
place to manipulate per-message context unrestricted. This
includes:
- adding, updating, or removing top-level keys in
Message.context; - mutating session-internal fields directly: writing entries to
session.intent_context(OVOS-CONTEXT-1 §2), reordering or replacingsession.pipeline(OVOS-PIPELINE-1 §5), mutating the active-handler listsession.active_handlersor the response-mode holdersession.response_mode(OVOS-CONVERSE-1 §3.3 explicitly cites the metadata-transformer hook as the recommended position for such mutations, and §5.3 there fixes the cancellation semantics when a transformer mutation removes or replaces the current response-mode holder), changingsession.lang(OVOS-MSG-1 §4.2), overriding the six per-session transformer chains (§5 of this spec) for this utterance, or any other field onsession; - adjusting routing keys
source/destination(OVOS-MSG-1 §3). Routing-key mutation is a load-bearing change that affects every downstreamforward/reply/responsederivation and is the attachment point layer-2 substrates build on (OVOS-MSG-1 §3.4). A metadata transformer SHOULD NOT mutatesourceordestinationunless the transformer's deliberate role is re-routing this lifecycle (e.g. an authorization-rewrite transformer); a transformer that mutates routing keys MUST understand the OVOS-MSG-1 §5 derivation consequences for every emission downstream of this stage.
The spec does not police what a metadata transformer mutates.
A deployer who loaded a particular metadata transformer has
implicitly authorized whatever it does to Message.context. A
consumer trying to attribute an unexpected context key to its
source uses the introspection surface of §6 (the set of loaded
metadata transformers) and the chain order — these together name
the universe of candidates deterministically.
Informative — mutations with cross-spec consequences. Mutating certain reserved keys has effects that spec readers should be aware of even though they are not prohibited:
- Mutating
session.intent_contextdirectly bypasses OVOS-CONTEXT-1 §5 bus-event stamping — nooriginis stamped because the mutation does not ride the §5 bus events.- Mutating
session.pipeline(OVOS-PIPELINE-1 §5) changes which pipeline plugins are consulted for this utterance — a powerful per-utterance routing primitive that is also easy to misuse.- Mutating session-level language signals (OVOS-SESSION-1 §3.2) changes how subsequent stages localize.
- Mutating
source/destination(OVOS-MSG-1 §3) changes routing for downstream Message derivations (forward/reply/response).
Injection point. Post-match, pre-handler-dispatch. Operate on
the Match object that a pipeline plugin produced
(OVOS-PIPELINE-1 §4.1) before the orchestrator emits the dispatch
Message (OVOS-PIPELINE-1 §7). Two things happen in this window —
engine-side session mutation per OVOS-CONTEXT-1 §5.3 and the
intent-transformer chain of this section — and the engine-side
mutation MUST happen first. The orchestrator accepts the match,
allows the matching engine to write any context entries it intends
to per CONTEXT-1 §5.3, and only then runs the intent-transformer
chain over the resulting Match. This ordering lets an intent
transformer read context the matching engine just wrote (for
example, to enrich a capture based on a freshly-promoted entry).
Input. The Match produced by the pipeline plugin that
claimed the utterance — skill_id, intent_name, captures,
utterance (OVOS-PIPELINE-1 §4.1) — together with the post-engine-
mutation session.intent_context snapshot.
Output. A Match of the same shape, possibly with an enriched
captures map.
Permitted mutations. A transformer MAY add entries to
Match.captures and MAY overwrite existing entries it itself
produced earlier in the chain. It SHOULD NOT delete or
overwrite capture entries produced by the matching engine or by an
earlier transformer in the chain, unless deletion is the
transformer's deployer-configured purpose (PII redaction,
content filtering, profanity censoring). It MUST NOT change
Match.skill_id or Match.intent_name — those identify the
dispatch topic (OVOS-PIPELINE-1 §7), and changing them would route
the handler elsewhere than the engine that matched intended.
Orchestrator enforcement of identity invariants. If a
transformer returns a Match whose skill_id or intent_name
differs from its input, the orchestrator MUST treat the return
as a shape violation per §7 — discard the transformer's output and
proceed with the prior step's Match unchanged. This is the
orchestrator-side safety net for the MUST NOT above.
Injection point. Post-skill, pre-TTS. Operate on the rendered
dialog string a skill emitted (typically via a speak event),
before it becomes synthesized audio.
Input. The dialog string, the optional lang parameter
(§3.0), and the full Message.context object (OVOS-MSG-1 §2.3)
carrying the session and any per-message
context written by earlier lifecycle stages. Same surface §3.3
describes.
Output. A possibly modified dialog string, the (possibly
mutated) lang value per §3.0, and a possibly mutated
Message.context.
Permitted mutations. A transformer MAY rewrite the dialog
string entirely (translation, persona, simplification, length cap).
It MAY also mutate Message.context per the same permissive rules
of §3.3 — common cases include setting a voice_id hint for a
downstream TTS transformer, restoring session.lang to the user's
preferred language after a temporary mid-lifecycle override, or
writing the rewriter's choices into context for downstream
observability. The §3.3 coordination guidance on companion-spec
reserved keys applies here equally.
Injection point. Post-TTS, pre-playback. Operate on the synthesized audio file the TTS engine produced, before the playback subsystem consumes it.
Input. A path or handle to the synthesized audio, the
optional lang parameter (§3.0), and the full
Message.context object (OVOS-MSG-1 §2.3). Same surface
§3.3 describes.
Output. A path or handle to the (possibly replaced)
synthesized audio, the (possibly mutated) lang value per §3.0,
and a possibly mutated Message.context.
Permitted mutations. A transformer MAY replace the audio with
a transformed version (pitch shift, reverb, EQ, tempo, format
conversion, watermarking, insertion of jingles or earcons). It
SHOULD NOT silently re-synthesize the speech in a different
language or with different content — translation and rewriting are
dialog-transformer (§3.5) concerns, performed against the text
before TTS; performing them again on the synthesized audio
defeats the staging. The transformer MAY also mutate
Message.context per the same permissive rules of §3.3 — for
example writing playback metadata (final audio format, duration,
applied effects) for observability.
A chain runs in ascending priority order: a transformer with
priority = 1 runs before one with priority = 50 runs before one
with priority = 100. Lower number = earlier in the chain. This
matches the natural "stages count up" reading and the existing
fallback-skill ordering convention elsewhere in OVOS.
Each transformer plugin declares an integer priority. The
default is 50 — the middle of the band — so plugins with no
opinion sit between explicitly-early and explicitly-late
transformers.
Two ordering mechanisms are defined; deployers choose:
- Priority-based (default). The orchestrator sorts the loaded
set ascending by
priorityand runs the resulting chain. Ties are broken in a stable but unspecified order — chain authors who care about relative ordering between two transformers SHOULD give them distinct priorities. - Explicit deployer order. Deployer configuration supplies an
ordered list of
transformer_ids for the chain. The orchestrator runs them in that order, ignoring declared priorities. Explicit order wins over priority. Transformers loaded but absent from the explicit list are not run at this hook.
The orchestrator MUST support both mechanisms and MUST apply explicit order when configured.
This specification claims twelve session fields under OVOS-SESSION-1 §2.1: six preference fields naming a per-type chain ordering (§5.1) and six policy fields naming a per-type denylist (§5.2). The composition rule of §5.3 layers them.
All six preference fields propagate unchanged per OVOS-MSG-1 §4.1 and are session-scoped; in the absence of a field, the deployer-configured default chain for that type is used.
Six session fields, one per injection point, expressing the session origin's preferred chain for that type:
| Field | Chain | Wire type | Deployment default (absence) |
|---|---|---|---|
session.audio_transformers |
§3.1 | array of string (transformer_id) |
the deployer-configured audio chain for this orchestrator process |
session.utterance_transformers |
§3.2 | array of string (transformer_id) |
the deployer-configured utterance chain |
session.metadata_transformers |
§3.3 | array of string (transformer_id) |
the deployer-configured metadata chain |
session.intent_transformers |
§3.4 | array of string (transformer_id) |
the deployer-configured intent chain |
session.dialog_transformers |
§3.5 | array of string (transformer_id) |
the deployer-configured dialog chain |
session.tts_transformers |
§3.6 | array of string (transformer_id) |
the deployer-configured TTS chain |
Each field is OPTIONAL on the wire. An omitted, empty, or absent
field resolves at consumption to the deployment default for that
hook per OVOS-SESSION-1 §2.1. An empty array ([]) is wire-
equivalent to omission for every field in the table above. Per
the canonical wire-weight rule of OVOS-SESSION-1 §3.4, a producer
SHOULD omit any of these fields whose value matches the
deployment default — including the empty-array case where the
deployment default is to run no transformers of that type — rather
than emit a redundant value.
The fields are a preference channel: any session origin (local, remote, layer-2-attached, programmatic) MAY populate them to request a specific chain ordering. The orchestrator narrows the request by what is loaded and what policy permits, per §5.3.
Different sessions may carry different chains. This is how a deployment provides differentiated behaviour per participant — for example, a remote-peer session may request restricted chains tailored to its participant. Whether the preference is honoured is a policy decision (§5.3).
The plugin instances stay process-wide. Per-session chains are per-session orderings over the loaded set, not per-session instantiation.
Six session fields, one per injection point, expressing the policy channel for transformer selection:
| Field | Chain |
|---|---|
session.blacklisted_audio_transformers |
§3.1 |
session.blacklisted_utterance_transformers |
§3.2 |
session.blacklisted_metadata_transformers |
§3.3 |
session.blacklisted_intent_transformers |
§3.4 |
session.blacklisted_dialog_transformers |
§3.5 |
session.blacklisted_tts_transformers |
§3.6 |
Each field is an unordered array of transformer_id strings of
the corresponding type's registry. Wire type, propagation, and
absence semantics match the chain-ordering fields of §5.1: array
of string, propagates unchanged, OPTIONAL on the wire, []
wire-equivalent to omission, SHOULD-omit per OVOS-SESSION-1 §3.4
when no transformer is to be denied.
A transformer whose transformer_id is listed in the corresponding
blacklisted_<type>_transformers for this session MUST NOT be
invoked by the orchestrator for that injection point on that
session — even if the same transformer_id is requested in the
corresponding <type>_transformers chain-ordering field of §5.1.
Policy overrides preference (§5.3).
Filtering is orchestrator-only — a single-tier rule. When the
orchestrator composes the effective chain for the injection point
(per §5.3), it skips any denied transformer_id as if it were not
loaded. No transform call is made; no bus event is emitted for
the skip. The filtering is observable only as a non-invocation. The
two-tier shape used by PIPELINE-1 §5.3 / §5.4 for skill / intent
denylists has no analogue here because transformers do not return
match candidates — the orchestrator drives the chain directly.
Unknown transformer_ids in the denylist are harmless and
MUST NOT cause the utterance to abort — they simply match
nothing.
For each of the six injection points, the orchestrator composes the effective chain for an utterance in a fixed three-stage order, mirroring OVOS-PIPELINE-1 §5.5:
- Preference. Start from the corresponding
<type>_transformersfield if set and non-empty; otherwise start from the deployer-configured default chain for that injection point (§4). - Availability. Drop any
transformer_idthat does not correspond to a transformer loaded for this type. Unknown identifiers do not abort the utterance and do not trigger fallback to the deployer default — the remaining known identifiers are the effective ordered set. - Policy. Drop any
transformer_idlisted in the correspondingblacklisted_<type>_transformers, even if it was explicitly requested in step 1. Policy overrides preference.
The result is the ordered list of transformers the orchestrator invokes at that injection point for this utterance.
If every requested transformer_id is dropped by availability or
policy, the effective chain is empty for that injection point and
the orchestrator simply runs no transformers at that stage — the
artifact passes through unmodified to the next lifecycle stage.
This is consistent with §9's null-implementation conformance:
running zero transformers at a chain is always valid.
The intended separation of concerns mirrors PIPELINE-1 §5.6:
- Any session origin MAY populate
<type>_transformersto request a preferred chain. No authorization implied. - Only policy — the denylists of §5.2, typically populated by the orchestrator owner or by a layer-2 substrate that owns the session — can refuse a transformer the preference layer asked for. The two channels are layered, not alternatives.
This is the same authorization surface OVOS-PIPELINE-1 §5.6 describes for pipeline plugins, extended to the transformer chains: a layer-2 substrate that grants per-peer permissions populates the relevant denylists from the peer's grant, and the orchestrator's §5.3 composition enforces the policy without any per-hop re-authorization.
The orchestrator's loaded transformers may be split across multiple cooperating orchestrator processes (§1) — typically along the audio-input / utterance-handling / audio-output boundary. No single process holds the global picture. Introspection therefore follows a broadcast-query / scatter-response pattern: the requester emits a query; every orchestrator process that has loaded transformers of the queried type responds with its own local slice; the requester aggregates if it wants a global picture. Deployments that run the orchestrator as a single process answer fully from one reply.
Six per-type query/response topic pairs, one per chain type:
| Topic | Reply | Scope |
|---|---|---|
ovos.transformer.audio.list |
ovos.transformer.audio.list.response |
Audio chain (§3.1) |
ovos.transformer.utterance.list |
ovos.transformer.utterance.list.response |
Utterance chain (§3.2) |
ovos.transformer.metadata.list |
ovos.transformer.metadata.list.response |
Metadata chain (§3.3) |
ovos.transformer.intent.list |
ovos.transformer.intent.list.response |
Intent chain (§3.4) |
ovos.transformer.dialog.list |
ovos.transformer.dialog.list.response |
Dialog chain (§3.5) |
ovos.transformer.tts.list |
ovos.transformer.tts.list.response |
TTS chain (§3.6) |
There is deliberately no aggregate "give me everything" query; a consumer that wants all six types issues six queries.
Each query takes no payload. Each .response (OVOS-MSG-1 §5.3
reply convention) carries one orchestrator process's own slice:
| Field | Type | Required | Meaning |
|---|---|---|---|
loaded |
array of strings | yes | The transformer_ids this responding process has loaded for this type. |
priorities |
object (string→integer) | yes | The declared priority of every transformer_id in loaded. Priorities are intrinsic to the plugin and always returned. |
A .response carries only the responder's local view. It
does not report a global chain order — chain composition is
the §4 priority order plus the §5 per-session override applied
across the union of responses, and any aggregating consumer (a
developer tool, a monitoring service) is responsible for
combining the slices.
Response aggregation. A requester that wants the full picture
collects responses arriving on the corresponding .response topic
within an implementation-defined window. The bus is async; there
is no completeness signal. A requester that needs guaranteed
completeness must keep its own roster of expected responders
(via service-discovery means out of scope here) and time out
non-responders.
Pull-query is the source of truth. Each orchestrator process
MUST subscribe to the relevant ovos.transformer.{type}.list
topics — one per chain it implements — and respond with its
local slice. A consumer that needs accurate state MUST query
and MUST NOT assume any prior announcement reached it — load
ordering between producers and consumers on the bus is not
guaranteed (a consumer that starts after a producer's announcement
fired has missed it; the bus is async and has no catch-up channel
for missed broadcasts).
Optional load-time announcements. On load, an orchestrator
process MAY volunteer a one-shot announcement on the
corresponding .response topic, with the same shape it would
return to a pull query. This is a convenience for consumers that
happen to be listening already (a monitoring service subscribed
before the orchestrator process came online). Announcements are
not normative and consumers MUST NOT rely on receiving
them. Processes that do not announce are fully conformant;
consumers that ignore announcements and only act on query
responses are equally so.
A process that comes online answers subsequent queries; one that goes offline simply disappears from subsequent aggregations.
A transformer that raises is treated as if it returned its input unchanged. The orchestrator MUST catch the exception, SHOULD log it, and MUST proceed to the next transformer in the chain. A single transformer's bug MUST NOT abort the utterance — same posture as OVOS-PIPELINE-1 §6.2 for pipeline plugin exceptions. Logging is SHOULD rather than MUST because logging policy is a deployment concern (embedded targets, regulated environments) and the catch-and-proceed behaviour is the load-bearing contract.
A transformer that returns an output of the wrong shape — wrong type, missing required field, list shrunk to empty for a non-empty input — is treated the same as a raised exception: the orchestrator SHOULD log and MUST proceed with the prior transformer's output as if this transformer had returned its input unchanged.
Timeouts and per-transformer execution limits are implementation-defined. Deployers concerned about a slow transformer blocking the lifecycle SHOULD configure timeouts at the orchestrator level; this specification does not prescribe a default.
Concurrency. A transformer instance is process-wide and MAY
be invoked concurrently by the orchestrator for utterances in
different sessions. Transformers MUST be re-entrant: any
per-utterance state lives in the artifact and context passed
through transform, not in the transformer instance. Implementations
that need per-instance state (loaded models, caches, opened sockets)
MUST guard it for concurrent access.
No rollback on partial chain failure. Side effects a transformer performs through other bus events (intent context mutations per OVOS-CONTEXT-1 §5, telemetry emissions, external HTTP calls) MUST NOT be rolled back by the orchestrator if a later transformer in the chain raises or signals cancellation (§8). The chain is a best-effort enrichment pipeline, not a transaction. A transformer that needs all-or-nothing semantics must implement them internally (e.g. stage its mutations and apply them only at chain end via a final commit step).
Mid-lifecycle session mutations propagate via Message.context.
When a transformer mutates the session carrier inside
Message.context (session.lang, session.pipeline,
session.intent_context, etc., per §3.2 / §3.3 / §3.5 permissions), the
mutated session rides forward as part of Message.context to
every downstream stage that reads it. Downstream consumers
MUST read live session values from the in-flight
Message.context rather than caching session state from an
earlier observation; this is what makes mid-lifecycle session
mutation work uniformly across transformer chains, intent
matching, dispatch (OVOS-PIPELINE-1 §7), and skill handlers.
Cross-transformer coordination via context keys. Transformers
that need to coordinate (a bidirectional translator's input half
signalling its output half; a metadata transformer writing a hint
a later intent transformer will consume) communicate through
top-level keys in Message.context. To avoid collisions between
unrelated plugins, transformers SHOULD namespace their
ad-hoc coordination keys with their transformer_id (or a
related stable identifier) as a prefix —
e.g. <transformer_id>.output_lang rather than
bare output_lang. The spec defines no central registry for
context-key names; namespacing is the discipline that makes the
absence of a registry safe.
Several injection points are natural producers of session-level language signals defined by OVOS-SESSION-1 §3.2:
- §3.1 audio transformers are the natural source for
session.detected_langderived from acoustic features. An audio language detector writessession.detected_langafter running. - §3.2 utterance transformers MAY refine
session.detected_langfrom text characteristics (script, function-word density). They MAY also overwritesession.langdirectly per §3.2's mutation permissions if a confident classification warrants persisting the change beyond this utterance. - §3.3 metadata transformers are the catch-all for any further language-classification refinement; the chain runs after utterance transformers so it sees the cumulative signal.
How a downstream consumer consolidates the available language signals into a single value for any given operation is not prescribed by this specification — see OVOS-SESSION-1 §3.2.7 for the informative default ordering. Transformers that produce signals MUST NOT assume any particular consolidation policy on the part of consumers; they populate the appropriate session field and leave consumption to the operation that needs it.
The lifecycle MAY be aborted early — before reaching its natural terminal events — by a transformer in any of the six chains signalling utterance cancellation. Cancellation is the only sanctioned short-circuit defined by this specification.
Cancellation is always signalled by a transformer plugin. There is no bus event a third party can send to request it; the orchestrator owns the cancellation machinery and exposes the signal only as a plugin contract. A deployment that wants out-of-band cancellation (a hardware stop button, a caller-side abort signal, a barge-in from another channel) ships an appropriately scoped transformer that watches for the trigger and sets the cancellation signal from inside the chain — keeping the trigger surface a deployment concern and the contract a plugin concern.
A transformer MAY signal cancellation by setting two reserved keys in the context object it returns:
"canceled": true,
"cancel_reason": "<short string describing why>"Both keys MUST be present together when cancellation is being
signalled. canceled is the boolean flag the orchestrator
recognises; cancel_reason is a short string identifying the
cancellation reason. A context with canceled: true but no
cancel_reason, or with cancel_reason set but canceled absent
or false, is treated as a §7 shape violation; the orchestrator
SHOULD log and MUST proceed as if the transformer returned
its input unchanged.
[Informative] cancel_reason vocabulary. Downstream consumers
of ovos.utterance.cancelled — analytics, audit, transcript
viewers, end-user diagnostics — benefit when the reason field
draws from a stable shared vocabulary rather than free-form
strings. This specification mints the following reserved values;
a transformer SHOULD use one of them when its reason fits:
| Value | Meaning |
|---|---|
stop_word |
A stop / cancel keyword was detected in the utterance. |
transcription_invalid |
STT output was deemed unusable (garbage, low confidence, validation failure). |
policy_block |
A content / safety / authorization policy refused the utterance or response. |
parental_control |
A parental-control or restricted-mode guard refused. |
other |
Universal fallback for reasons that don't fit a reserved value. |
A transformer with a more specific reason than any of the above
MAY emit a free-form string; deployers are encouraged to
coordinate vocabulary across their loaded transformers. A
transformer that doesn't want to think about vocabulary
SHOULD use other. The orchestrator MUST NOT rewrite
or normalize cancel_reason; it propagates whatever value the
transformer set.
A transformer MAY additionally set other top-level context keys carrying plugin-specific cancellation metadata (the matched cue, a confidence score, a sentinel identifying the cancellation source) — those are not part of this specification and transformers SHOULD namespace them per §7's coordination guidance.
The orchestrator MUST stamp a third key automatically when it observes a cancellation signal:
"cancel_by": "<emitting transformer_id>"Stamped from the transformer that produced the signal (the orchestrator knows which one), not from any value the transformer included in the payload. This parallels OVOS-CONTEXT-1 §5.2's origin-stamping rule and serves the same purpose: a transformer cannot impersonate another transformer's cancellation.
When canceled: true is observed alongside an empty utterance
list (§3.2) or any other artifact, the cancellation flag is the
signal — the empty list is a convention, not the trigger.
On observing the signal:
- The orchestrator MUST stop running the current chain — no further transformers in this chain are invoked.
- It MUST skip every subsequent injection-point chain in §2 that has not yet started, including any chain belonging to a downstream stage the orchestrator implements.
- It MUST terminate the lifecycle per §8.2.
The orchestrator MUST NOT strip or modify the canceled /
cancel_reason / cancel_by keys between transformers — a later
observer of the cancelled utterance's Messages (debugger,
analytics) sees that it was cancelled, why, and by whom.
On cancellation, the orchestrator MUST terminate the lifecycle with:
ovos.utterance.cancelled (new; defined here)
ovos.utterance.handled (OVOS-PIPELINE-1 §9.5)
emitted in that order. ovos.utterance.cancelled carries the
cancel_reason and orchestrator-stamped cancel_by from the §8.1
signal that triggered the cancellation. ovos.utterance.handled
preserves the universal end-marker invariant of OVOS-PIPELINE-1
§9.5.
The orchestrator MUST NOT emit complete_intent_failure
(OVOS-PIPELINE-1 §9.3) on the cancellation path — failure and
cancellation are distinct outcomes; an observer that wants to
count "user gave up" or "policy blocked it" separately from
"matcher found nothing" needs them distinguishable on the bus.
The orchestrator MUST NOT dispatch any handler whose match preceded the cancellation in the same dispatch sequence. An intent transformer (§3.4) runs after the orchestrator accepted the match but before dispatch (OVOS-PIPELINE-1 §6); an intent transformer that cancels preempts the dispatch entirely.
Side effects performed by earlier transformers in the same lifecycle (intent context mutations per OVOS-CONTEXT-1 §5, telemetry emissions, external HTTP calls) are not rolled back by cancellation — consistent with §7's no-rollback rule. The cancellation aborts what hasn't run yet; it does not unwind what has.
An orchestrator MAY implement transformer chains at any subset of the six injection points of §2 (including none). The conformance rules below apply per chain — for each chain the orchestrator implements, all of the corresponding obligations bind; for chains the orchestrator does not implement, no obligations arise.
An orchestrator that implements one or more transformer chains MUST, for each chain it implements:
- run the chain to completion at its injection point before the next stage of the lifecycle proceeds (§1, §2);
- order the chain by §4 — ascending priority by default, or the explicit deployer-configured order when one is present;
- apply per-session chain overrides (§5) when the session carries
a non-empty corresponding
session.*_transformersfield, falling back to the deployer-configured chain otherwise; - catch transformer exceptions and shape-violations, log them, and proceed with the prior transformer's output (§7);
- inspect the context object after every transformer for the
canceledflag (§8.1) and terminate the lifecycle per §8.2 when set, skipping every subsequent chain in §2 of this spec that has not yet started; MUST stampcancel_byfrom the emitting transformer'stransformer_idon observing the signal; - on any cancellation, emit
ovos.utterance.cancelledfollowed byovos.utterance.handled(§8.2), carryingcancel_reasonand the stampedcancel_by, and MUST NOT emitcomplete_intent_failureon the cancellation path; MUST NOT strip thecanceled/cancel_reason/cancel_bykeys fromMessage.contexton the terminal events or downstream derivations; MUST NOT dispatch a Match that was reached before cancellation.
When the orchestrator is implemented as a single process, the introspection obligations of §6 are met by that process. When the orchestrator is split (§1) across cooperating processes — typically along the audio-input / utterance-handling / audio-output boundary — each process that implements one or more chains MUST meet the per-process introspection obligations below for the chains it implements. The composition of all such per-process responses is the orchestrator's full view.
Additionally, an orchestrator that implements the intent
transformer chain (§3.4) MUST enforce the §3.4 identity
invariants on transformer output, treating skill_id /
intent_name changes as §7 shape violations.
An orchestrator that implements none of the six chains is a conformant null-implementation of this specification — it has no obligations under §9 and exposes none of the artefacts (per-type queries, override fields, cancellation handling) that depend on implemented chains. Such an orchestrator simply does not offer transformer extensibility at the points this specification covers.
Each orchestrator process that implements one or more chains MUST:
- subscribe to the relevant
ovos.transformer.{type}.listquery topics — one per chain it implements — and respond on the corresponding.responsetopic (§6) with its own local slice of loadedtransformer_ids and their declared priorities — never invent entries for transformers it has not loaded.
Each orchestrator process MAY:
- volunteer a one-shot load-time announcement on the corresponding
.responsetopic (§6) with the same shape it would return to a pull query. Announcements are not normative; consumers MUST NOT rely on receiving them.
Consumers of the introspection surface MUST:
- query
ovos.transformer.{type}.list(one per chain type they care about) when they need accurate state; MUST NOT assume any prior announcement reached them (load ordering between producer and consumer is not guaranteed — §6).
A transformer (the plugin itself) MUST:
- conform to its type's IO contract (§3): consume the input shape, produce the output shape, observe the type's MAY/MUST NOT rules on permitted mutations;
- be re-entrant — the host may invoke it concurrently for utterances in different sessions, and any per-instance state must be guarded for concurrent access (§7);
- declare an integer
priority(§4); the value50is the conventional middle-of-the-band default; - when signalling cancellation (§8.1), set both
canceled: trueandcancel_reason: <reason>in the returned context; the orchestrator will stampcancel_byfrom the emitting transformer'stransformer_id.
A transformer MAY:
- read and mutate
session.intent_context(OVOS-CONTEXT-1 §2) directly on the session object it holds in hand. The direct- mutation pathway is normatively permitted for any transformer type by OVOS-CONTEXT-1 §5.3 — the orchestrator is the carrier of writes, not the bus. When mutating, the transformer MUST use the key-shape rules of OVOS-CONTEXT-1 §3 and §5.3 (private entries prefixed<skill_id>:, where<skill_id>for a transformer is its owntransformer_idor, when the transformer is writing on behalf of a specific skill, that skill'sskill_id). Mutations made via the bus (ovos.context.set/.unset/.clear, OVOS-CONTEXT-1 §5) are also permitted; the choice between direct and bus is the transformer's, with the trade-offs catalogued in OVOS-CONTEXT-1 §5.3; - access the bus for side-effects unrelated to the transformer's
IO (logging, telemetry, cross-session signals) — but SHOULD
NOT make the transformer's output depend on bus responses
fetched synchronously inside
transform, as this serializes the lifecycle on the bus's responsiveness. Every such bus emission MUST ensure the appropriate<type>_transformer_idslist inMessage.contextends with the transformer's own id per §1.3.
An observer that sees Message.context carrying canceled: true or cancel_reason:
- MUST NOT attempt to cancel the utterance by emitting bus events — cancellation is a transformer-plugin contract only (§8);
- MAY read
cancel_reasonandcancel_byfor audit, analytics, or observational purposes.
- Slot value typing schemas. Intent transformers (§3.4) are where typed system entities are injected, but the typed value formats themselves (date encoding, number representation, duration units) are deferred to a future text-normalization specification (OVOS-INTENT-1 §5.3). This spec defines the injection pathway; the future spec will define what gets injected.
- Behavioural contracts for any specific transformer type beyond the IO shape and the canonical use-case list. Whether an utterance transformer normalizes contractions, translates, validates STT — that is per-plugin behaviour, not spec-level contract. This spec covers only the frame every transformer runs in.
- Cross-transformer coordination protocols. Transformers do not
see each other's prior outputs except through the artifact they
pass forward. There is no shared scratch space, no
transformer-to-transformer messaging, no inheritance hierarchy.
Coordination, when it is needed, happens through the artifact
(the utterance list, the context object, the
Match). - Loading, discovery, instantiation, configuration management. Deployment concerns; out of scope.
- Mandating any specific chain be implemented. This spec defines the architectural pattern and the per-chain contract; it does not require any orchestrator to implement any particular chain. A null-implementation that runs no chains is conformant (§9). Which chains a given orchestrator implements is a deployment decision.
- Out-of-band cancellation channels. Cancellation is exclusively a transformer-plugin contract (§8); the orchestrator owns the cancellation machinery and exposes the trigger only via the §8.1 context flag. Deployments that want hardware buttons, peer signals, or barge-in to cancel an in-flight utterance ship a thin transformer that watches for the trigger and sets the cancellation signal from within the chain. The bus has no third-party cancel topic.
- Hot reload of transformer chains. Whether and how an orchestrator can swap a transformer chain at runtime is an implementation concern.
- Timeouts and execution limits per transformer. Recommended for production deployments (§7) but not specified.
- Wire-level invocation messages across orchestrator processes.
When the orchestrator is split across cooperating processes
(§1), one process may invoke a transformer loaded by another
process. This specification defines the introspection surface
(§6) and the IO contracts (§3) any invocation MUST satisfy, but
does not prescribe a specific
transformer.{type}.invokerequest / response topic shape. A single-process orchestrator needs no such surface; a split orchestrator requires one, and deployments adopt whatever request / response convention fits their substrate.
- Utterance Lifecycle and Pipeline Specification
(OVOS-PIPELINE-1) — the per-utterance flow into which §2 of this
spec inserts the six transformer hooks; the
Matchshape §3.4 consumes. - Bus Message Specification (OVOS-MSG-1) — the
sessioncarrier (§4), the shared identifier-component rule (§2.1.1) boundingtransformer_id, and the.responsereply convention (§5.3) the §6 query events follow. - Session Specification (OVOS-SESSION-1) — the wire shape of
session, the registry mechanism under which this specification claims the six per-session transformer-override fields (§5), and the deployment-default fallback rule for omitted fields. - Intent Context Specification (OVOS-CONTEXT-1) — the context-mutation pathways transformers may use. Both the bus events (§5) and the direct-session-mutation pathway (§5.3) are available; the choice is the transformer's per the conformance rules of §9 of this spec.
- Intent Definition Specification (OVOS-INTENT-3) — the intent
and
Matchmodel that §3.4 operates on; §7 capture-map shape. - Sentence Template Grammar Specification (OVOS-INTENT-1) — §5.3 deferred slot value typing, for which §3.4 of this spec is the agreed injection home.