Spec ID: OVOS-OCP-1 · Version: 1 · Status: Draft
This specification defines the OVOS Virtual Media Player — a single logical media player, scoped to a session, that every media voice command targets. It is the contract by which an orchestrator turns "play jazz", "pause", "next", "louder", and "stop the music" into observable playback state, and by which that state is mirrored to and from the host operating system over open standards (MPRIS).
OCP stands for OVOS Common Playback: common because one player arbitrates all media for a session regardless of which application, provider, or output device ultimately serves a track — the same way the intent stack gives one utterance one handler.
Dependencies: OVOS-MSG-1 (envelope and the context.session carrier),
OVOS-SESSION-1 (session field registry), OVOS-SESSION-2 (session
assignment and mutation boundaries), OVOS-PIPELINE-1 (the media pipeline
that matches playback and control utterances and dispatches into this
surface), OVOS-STOP-1 (global stop cascade, of which media stop is one
subscriber).
The key words MUST, MUST NOT, SHOULD, SHOULD NOT, MAY are used as in RFC 2119.
This specification defines: the Virtual Media Player abstraction and its session scoping; the player and media state model; the bus surface that requests playback and transport control and that reports state; the distinction between playback requests (start something) and control requests (act on what is already playing); and the MPRIS bridge by which the virtual player is exported to the host OS and by which externally-initiated, standards-compliant players become controllable by voice.
It does not define: how candidate media is discovered or ranked (a provider concern), how a URI is turned into bytes on a speaker (a backend concern), stream-extraction formats, the GUI rendering of now-playing, the NLU that classifies an utterance as media (OVOS-PIPELINE-1), or the on-disk/programming-language shape of any implementation. These are implementer concerns; this spec fixes only the observable contract.
A conformant orchestrator MUST present, per session, exactly one Virtual Media Player: a single addressable target that owns the session's now-playing track, its playback queue, and its transport state. "Virtual" because it is not a device and not an application — it is an arbitration point. Behind it, any number of playback backends, remote devices, or external OS players may do the actual work; in front of it, all voice commands and all status consumers see one coherent player.
This mirrors the rest of the platform: as the intent stack gives one utterance one winning handler (OVOS-PIPELINE-1), the media stack gives one session one player. A request never names a backend by necessity; it names the player, and the player routes.
Two request classes target the player and MUST be distinguished:
- Playback requests — "play X", "open X". The player acquires media (via the pipeline and providers) and begins playback. §4.2.
- Control requests — "pause", "resume", "next", "previous", "stop", "shuffle", seek, repeat. The player acts on its existing now-playing / queue without acquiring new media. §4.3.
A control request MUST NOT require the player to have been started by OVOS: if the player is bridged to an external OS source (§6), control requests act on that source.
The player exposes three orthogonal state axes. Each axis has a fixed enumeration; an implementation MUST NOT report a value outside its axis, and SHOULD treat unknown received values as the axis's neutral member.
PlayerState |
Meaning |
|---|---|
STOPPED |
No active playback. |
PLAYING |
A track is advancing. |
PAUSED |
A track is loaded and held at a position. |
MediaState describes the loaded media, independent of whether it is
advancing: at minimum NO_MEDIA, LOADING, LOADED, BUFFERING,
END_OF_MEDIA, INVALID_MEDIA. It is advisory for consumers (e.g. a GUI
spinner) and MUST NOT be conflated with PlayerState.
The player MUST track a loop mode (at minimum NONE, REPEAT,
REPEAT_TRACK) and a boolean shuffle mode. Both are control-request
targets (§4.3).
The player is the single writer of its own state. It emits a state event when state changes (§4.4); it MUST NOT derive its authoritative state by subscribing to its own emitted events.
All player traffic is namespaced under ovos.common_play. and carried
in the OVOS-MSG-1 envelope. Every message MUST carry context.session
(OVOS-SESSION-1); the player it addresses is the one owning that
session_id (§5).
The ovos.common_play. prefix is reserved for the Virtual Media
Player. Components other than the player and its pipeline MUST NOT
emit playback-mutating messages under this prefix; they observe state
(§4.4) and issue requests (§4.2, §4.3).
| Message | Meaning |
|---|---|
ovos.common_play.play |
Begin playback of a resolved result / queue. |
ovos.common_play.search |
Acquire candidate media for a phrase (the pipeline's discovery step); …search.start / …search.end bracket it. |
A playback request MAY name a preferred output (a backend alias) in the utterance; absent that, the player selects an output by its configured preference order. Output selection is informative here and owned by the implementation.
| Message | Acts on |
|---|---|
ovos.common_play.pause |
now-playing → PAUSED |
ovos.common_play.resume |
now-playing → PLAYING |
ovos.common_play.stop |
now-playing → STOPPED (an OVOS-STOP-1 subscriber, §7) |
ovos.common_play.next |
advance the queue |
ovos.common_play.previous |
retreat the queue |
ovos.common_play.seek |
move the position within now-playing |
Control requests are idempotent with respect to absent media: issuing
pause with nothing playing is a no-op, not an error.
The player MUST announce state transitions so that GUIs, satellites, MPRIS exporters, and the pipeline's per-session tracking stay coherent:
| Message | Carries |
|---|---|
ovos.common_play.player.state |
the §3.1 value |
ovos.common_play.media.state |
the §3.2 value |
ovos.common_play.track.state |
now-playing track transitions |
A consumer MUST NOT assume it can read player state synchronously; the state reports are the contract.
The Virtual Media Player is per session (OVOS-SESSION-2). A request's
context.session.session_id selects the player instance; co-located
single-user setups use session_id: "default".
An orchestrator serving multiple concurrent sessions (e.g. a hub serving
satellites) MUST keep each session's now-playing, queue, and transport
state isolated: a pause for session A MUST NOT affect session B.
State reports (§4.4) MUST carry the originating session so consumers
can demultiplex.
The Virtual Media Player SHOULD integrate with the host operating
system through MPRIS (the freedesktop org.mpris.MediaPlayer2
D-Bus interface). The bridge has two independent roles; an implementation
MAY provide either or both, and each MUST be separately
configurable.
The virtual player MAY publish itself on the session bus as an MPRIS
MediaPlayer2 (e.g. org.mpris.MediaPlayer2.OCP), mapping §3 state and §4
transport onto the MPRIS PlaybackStatus, LoopStatus, Metadata,
Position, and the Play/Pause/Next/Previous/Stop/Seek methods.
This makes OVOS playback visible to and controllable by ordinary desktop
media keys and applets, with no knowledge of OVOS.
A Role A exporter MUST report only MPRIS-valid strings (e.g.
PlaybackStatus ∈ {Playing, Paused, Stopped}, LoopStatus ∈ {None, Track, Playlist}) and MUST degrade gracefully — log and continue — when no
session D-Bus is available (headless hosts).
The virtual player MAY discover and control other MPRIS players on
the host (org.mpris.MediaPlayer2.*). This is the key consequence of the
common-playback model: media playback that OVOS did not initiate is still
controllable by voice, provided the source speaks an open standard. "Pause
the music" can pause a browser tab or a desktop player; "next" can skip the
system's current player.
Role B is off by default and gated by configuration, because it acts on software OVOS does not own. When enabled, the player MUST maintain an ignore-list (at minimum its own export name from Role A) and SHOULD scope control to the most recently active external player to avoid ambiguous broadcast.
When both OVOS-initiated playback and external MPRIS sources are present, the virtual player is the single arbiter of "the current media" for control requests (§4.3). The arbitration policy (prefer own playback, prefer most recently active, etc.) is implementation-defined, but the player MUST present one coherent answer per control request — a control request MUST NOT act on two players at once unless the user explicitly requested a global action.
Media stop is one subscriber to the OVOS-STOP-1 cascade. When the stop
pipeline dispatches to the media player's …:stop (the player being an
active handler) or broadcasts a global stop, the player MUST transition
now-playing to STOPPED (§3.1) and MUST scope the effect to the
inbound session_id (OVOS-STOP-1 §6). A global stop MUST NOT stop
another session's playback.
- present exactly one player per session (§2, §5) and keep sessions isolated (§5);
- distinguish playback requests from control requests, and honour control requests against externally-sourced media when bridged (§2, §4.3, §6.2);
- be the single writer of its own state and announce transitions on
ovos.common_play.player.state/…media.state/…track.state(§3.3, §4.4); - treat control requests as no-ops when no media is present (§4.3);
- scope a stop to the inbound session (§7).
- integrate with the host OS via MPRIS Role A (§6.1);
- degrade gracefully without a session D-Bus (§6.1).
- control external MPRIS players via Role B, off by default and gated by configuration (§6.2);
- accept a user-named output preference in a playback request (§4.2).
- OVOS-PIPELINE-1 — the media pipeline that classifies playback vs. control utterances and dispatches into the §4 surface.
- OVOS-STOP-1 — global stop cascade; media stop is a subscriber (§7).
- OVOS-SESSION-1 / OVOS-SESSION-2 — the
context.sessioncarrier and per-session ownership that scope the player (§5). - OVOS-MSG-1 — envelope and session carrier for all §4 traffic.
- OVOS-AUDIO-1 — audio output service (TTS/dialog); distinct from media playback, which this spec owns.