Skip to content

Commit 52ef5aa

Browse files
committed
AUDIO-1: audio output service specification
1 parent 54c6a0e commit 52ef5aa

4 files changed

Lines changed: 467 additions & 0 deletions

File tree

CHANGELOG.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,20 @@ status quo, `2` once it is not backwards compatible. Entries are grouped under
77
the spec's current class. Every pull request that alters normative content adds
88
an entry here.
99

10+
## OVOS-AUDIO-1 — Audio Output Service
11+
12+
### 2
13+
14+
- The audio output service: the rendering pipeline (dialog-transformer
15+
chain, TTS synthesis, TTS-transformer chain, playback queue), the
16+
sequential playback queue shared by speech (`ovos.utterance.speak`) and
17+
sound effects (`ovos.audio.queue` / `ovos.audio.play_sound`), the
18+
remote-client rendering mode (`ovos.utterance.speak.b64`
19+
`ovos.audio.speech`), output lifecycle signals
20+
(`ovos.audio.output.started` / `.ended`), the speaking-status query
21+
(`ovos.audio.is_speaking`), stop integration (`ovos.audio.stop`,
22+
`ovos.stop`), and the `listen`-triggered `ovos.mic.listen` follow-up.
23+
1024
## OVOS-INTENT-1 — Sentence Template Grammar
1125

1226
### 2

appendix/divergences.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -196,6 +196,21 @@ defined by any spec** and should be removed or replaced:
196196
- **`ovos.utterance.speak`** (PIPELINE-1 §9.6). The NL output
197197
exit point; symmetric to `ovos.utterance.handle`. No current
198198
equivalent — TTS trigger is currently implicit.
199+
- **`ovos.utterance.speak.b64`** (AUDIO-1 §3.4). Variant of
200+
`ovos.utterance.speak` for remote-client delivery: the audio
201+
output service runs the same TTS pipeline but emits synthesised
202+
audio as base64 via `ovos.audio.speech` instead of queuing for
203+
local playback. Used by bridges serving satellites without TTS
204+
(BRIDGE-1 §4.2.4).
205+
- **`ovos.audio.speech`** (AUDIO-1 §4.3). Base64-encoded
206+
synthesised audio broadcast; emitted in response to
207+
`ovos.utterance.speak.b64`. Carries a `listen` flag. Remote
208+
clients (e.g. satellites relayed by a bridge) decode and play
209+
the audio themselves.
210+
- **`ovos.audio.queue`** / **`ovos.audio.play_sound`** (AUDIO-1
211+
§4.1, §4.2). Sound-effect playback topics. Payloads accept
212+
either a `uri` or inline base64 `audio` field, enabling
213+
cross-host audio delivery without shared filesystem access.
199214
- **`ovos.intent.list` / `ovos.intent.describe`** (INTENT-4
200215
§10). Introspection topics served from the orchestrator's
201216
passive registration index.

appendix/rationale.md

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -449,3 +449,23 @@ subscribed to `<own_skill_id>:stop`. The pipeline plugin matches
449449
and selects; the skill stops. Stop is one of the few cases in
450450
the spec set where the pipeline / skill split is not
451451
substitutable.
452+
453+
454+
### 4.9 Audio output service (AUDIO-1)
455+
456+
**Sentence segmentation as a latency-reduction technique (AUDIO-1 §3.2).**
457+
When a TTS engine synthesises a long utterance as a single unit, the
458+
user must wait for the entire synthesis to complete before hearing
459+
anything. An implementation can reduce perceived latency by splitting
460+
the utterance at sentence boundaries, synthesising each sentence
461+
independently, and enqueuing each segment as soon as it is ready —
462+
so the first sentence begins playing while later sentences are still
463+
being synthesised.
464+
465+
This is an internal implementation strategy: no other bus participant
466+
observes whether the TTS engine segments or not. The visible contract
467+
is unchanged — `ovos.audio.output.started` fires when the first
468+
audio begins, `ovos.audio.output.ended` fires when the last audio
469+
completes. The `listen` flag is honoured after all audio for the
470+
originating utterance has played, regardless of how many internal
471+
segments were used.

0 commit comments

Comments
 (0)