🐙 Polpo Voice

Polpo Voice is a native macOS application that adds full bidirectional voice interaction to Anthropic Claude Code. You speak prompts via Apple Speech Framework, hear Claude's replies via ElevenLabs voices, all from a floating menu bar cockpit. €69 lifetime, no subscription, runs locally.

Talk to Claude Code hands-free. Dictate, get the answer spoken back, switch voices on the fly, all from a single floating panel that stays above every other window without stealing focus.

Live UI — STT in ascolto, LOOP auto-send attivo, voce George selezionata, waveform in tempo reale.

What it does

Polpo Voice closes the loop between you and Claude Code without your hands ever leaving the coffee:

You speak. The Mac captures the dictation through your microphone (or your AirPods).
The transcribed text is auto-pasted into the Claude Code terminal.
After a short silence, Return is pressed automatically — your prompt is sent.
Claude replies as usual.
The reply is intercepted, cleaned of code blocks and markdown, and spoken back to you with a natural ElevenLabs voice (default: George, warm British storyteller).
You walk around. You answer back. Repeat.

You never touched the keyboard.

Why it exists

Coding by voice is finally good enough — Apple Speech is sub-200ms, ElevenLabs voices are uncanny, AirPods Pro have great mics. The missing piece was a glue layer that made all of this work for a single specific workflow: talking with an AI coding agent in a terminal.

Polpo Voice is that glue. No subscription, no cloud sync, no telemetry. Just one binary, one floating panel, one paid forever.

Why not just a dictation app?

Because Polpo Voice isn't a dictation app. Dictation apps go one way — voice in, text out. Polpo Voice closes the loop: voice in, and Claude's answer spoken back to you. You can stand up, walk to the window, and have a full conversation with your coding agent without looking at the screen.

Wispr Flow, Monologue, Superwhisper and friends are excellent at voice → text everywhere. None of them read the reply back. None of them are built for the one workflow Polpo Voice obsesses over: talking with Claude Code in a terminal.

	🐙 Polpo Voice	Cloud dictation apps (Wispr Flow, Monologue …)
Built for	Bidirectional voice chat with Claude Code	General voice → text in any app
Speaks the reply back (TTS)	✅ — the whole point	❌
Auto-send loop	✅ silence → Return, hands-free	❌
Audio leaves your Mac	❌ STT is local (Apple Speech + local Whisper)	☁️ typically cloud-processed
Price	€69 once, forever	~$15/mo / ~$144 a year
Telemetry	None	Varies

If you want to dictate emails everywhere, buy a dictation app. If you want to talk with Claude Code — speak, listen, walk around, repeat — this is the only thing that does it.

Highlights

Floating panel that stays above every other window, on every Space, never steals focus from your terminal.
4 live status hearts — TTS speaking, STT listening, autosend on/off, dialog loop on/off (each interactive).
60-bar live waveform at 30 FPS with threshold and ambient floor annotations — see your voice in real time.
ElevenLabs voice picker that fetches your account library on demand, marks router-mapped voices with a star, plays a sample inline.
Audio device pills (input / output) that show the active microphone and speaker by name and let you switch via dropdown — bypasses System Settings entirely.
Smart audio policy — refuses iPhone Continuity microphone, prefers AirPods then MacBook, prefers HomePod for output.
Copyable transcription panel with text selection and an explicit copy button.
Resizable, position and size persist between launches.
Brand identity — every accent is the deep red of the Polpo, every micro-label is in plain Italian for clarity.

Pricing

€69 lifetime, one-time payment. No subscription. All future updates included. License key checked once, cached locally. (Anchor price €79 — early adopters and beta testers pay €69.)

Requirements

macOS 15 (Sequoia) or later — uses MenuBarExtra-era APIs, symbolEffect, textSelection, Canvas, NSPanel floating modes.
Apple Silicon. Built with -target arm64-apple-macos15.
Claude Code installed and configured.
ElevenLabs API key in ~/claude_voice/.env (ELEVENLABS_API_KEY=…).
The companion daemons that handle STT (stt_bar.py), autosend (jarvis_autosend.py), and the Stop hook (stop_tts_response.py) — bundled in the full Polpo Voice installer (the standalone JarvisToggle.app in this repo is the cockpit only).

Install

Build from source

git clone https://github.com/mattiacalastri/jarvis-toggle.git
cd jarvis-toggle
./build.sh
cp -r JarvisToggle.app /Applications/
open /Applications/JarvisToggle.app

Pre-built binary (coming soon)

Download the latest signed and notarized Polpo Voice.app from Releases, drag into /Applications, double-click to launch.

After first launch macOS will prompt for:

Microphone access — required for STT.
Accessibility access — required for autosend keystroke listener.
Apple Events — required for Return simulation.

Grant all three. Add the app to System Settings → General → Login Items so it starts automatically with the Mac.

Architecture

Polpo Voice is a 4-layer system communicating via plain files in ~/.local/run/jarvis/:

┌─────────────────────────────────────────────────────────────────┐
│              ~/.local/run/jarvis/  (file contract)              │
│  stt_state · stt_disabled · stt_engine · stt_diag · stt_last    │
│  stt_bar.pid · voice_speaking.lock · jarvis_autosend_state      │
│  jarvis_voice_loop_state · stt_levels.bin · voices.json         │
│  voice_selected                                                 │
└────┬─────────────────┬──────────────────┬──────────────┬────────┘
     │                 │                  │              │
 ┌───▼────┐    ┌───────▼────────┐  ┌──────▼─────┐  ┌────▼─────┐
 │ stt_bar│    │ jarvis_autosend│  │stop_tts_   │  │  Polpo   │
 │  .py   │    │   (pynput)     │  │response.py │  │  Voice   │
 │PRODUCER│    │  AUTOSEND      │  │ DIALOG     │  │   .app   │
 └────────┘    └────────────────┘  └────────────┘  └──────────┘

Producers and consumers are completely decoupled — the file contract is the only protocol. You can swap any producer or consumer for your own implementation.

File contract reference

File	Producer	Consumer	Purpose
`stt_state`	stt_bar.py	Polpo Voice	idle / recording / transcribing / off
`stt_disabled`	jarvis_toggle.sh	all	flag presence = STT off
`stt_bar.pid`	stt_bar.py	Polpo Voice	process liveness check
`stt_engine.txt`	stt_bar.py	Polpo Voice	apple / whisper
`stt_diag.txt`	stt_bar.py	Polpo Voice	threshold + cal_mean + cal_max
`stt_last`	stt_bar.py	Polpo Voice	last transcription
`stt_levels.bin`	stt_bar.py	Polpo Voice	60 × float32 LE = waveform
`voice_speaking.lock`	voice_briefing.py	Polpo Voice	OUT heart active
`jarvis_autosend_state`	Polpo Voice	jarvis_autosend.py	LOOP on/off
`jarvis_voice_loop_state`	Polpo Voice	stop_tts_response.py	DIALOG on/off
`voices.json`	dump_voices.py	Polpo Voice	live ElevenLabs library
`voice_selected`	Polpo Voice	voice_briefing.py	active voice key

Privacy

No telemetry, no remote logging, no analytics.
The microphone audio is processed locally by Apple Speech Framework. Whisper as fallback runs locally too (MLX).
Only the text of your dictation is sent to Anthropic via your normal Claude Code session.
Only the text of Claude's reply is sent to ElevenLabs for TTS rendering.
The license key check is one HTTPS call per launch. After that, fully offline.

License

MIT — see LICENSE. Use it, fork it, sell forks, attribute the original.

Credits

Built by Mattia Calastri under the Astra Digital banner. Voice rendering by ElevenLabs. STT by Apple Speech Framework / MLX Whisper.

🐙 Tentacolo vocale del Polpo.

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
assets		assets
src		src
.gitignore		.gitignore
AGENTS.md		AGENTS.md
Info.plist		Info.plist
LICENSE		LICENSE
README.md		README.md
RELEASE.md		RELEASE.md
SECURITY.md		SECURITY.md
build.sh		build.sh
notarize.sh		notarize.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🐙 Polpo Voice

What it does

Why it exists

Why not just a dictation app?

Highlights

Pricing

Requirements

Install

Build from source

Pre-built binary (coming soon)

Architecture

File contract reference

Privacy

License

Credits

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🐙 Polpo Voice

What it does

Why it exists

Why not just a dictation app?

Highlights

Pricing

Requirements

Install

Build from source

Pre-built binary (coming soon)

Architecture

File contract reference

Privacy

License

Credits

About

Topics

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages