Polpo Voice is a native macOS application that adds full bidirectional voice interaction to Anthropic Claude Code. You speak prompts via Apple Speech Framework, hear Claude's replies via ElevenLabs voices, all from a floating menu bar cockpit. €69 lifetime, no subscription, runs locally.
Talk to Claude Code hands-free. Dictate, get the answer spoken back, switch voices on the fly, all from a single floating panel that stays above every other window without stealing focus.
Polpo Voice closes the loop between you and Claude Code without your hands ever leaving the coffee:
- You speak. The Mac captures the dictation through your microphone (or your AirPods).
- The transcribed text is auto-pasted into the Claude Code terminal.
- After a short silence, Return is pressed automatically — your prompt is sent.
- Claude replies as usual.
- The reply is intercepted, cleaned of code blocks and markdown, and spoken back to you with a natural ElevenLabs voice (default: George, warm British storyteller).
- You walk around. You answer back. Repeat.
You never touched the keyboard.
Coding by voice is finally good enough — Apple Speech is sub-200ms, ElevenLabs voices are uncanny, AirPods Pro have great mics. The missing piece was a glue layer that made all of this work for a single specific workflow: talking with an AI coding agent in a terminal.
Polpo Voice is that glue. No subscription, no cloud sync, no telemetry. Just one binary, one floating panel, one paid forever.
Because Polpo Voice isn't a dictation app. Dictation apps go one way — voice in, text out. Polpo Voice closes the loop: voice in, and Claude's answer spoken back to you. You can stand up, walk to the window, and have a full conversation with your coding agent without looking at the screen.
Wispr Flow, Monologue, Superwhisper and friends are excellent at voice → text everywhere. None of them read the reply back. None of them are built for the one workflow Polpo Voice obsesses over: talking with Claude Code in a terminal.
| 🐙 Polpo Voice | Cloud dictation apps (Wispr Flow, Monologue …) | |
|---|---|---|
| Built for | Bidirectional voice chat with Claude Code | General voice → text in any app |
| Speaks the reply back (TTS) | ✅ — the whole point | ❌ |
| Auto-send loop | ✅ silence → Return, hands-free | ❌ |
| Audio leaves your Mac | ❌ STT is local (Apple Speech + local Whisper) | ☁️ typically cloud-processed |
| Price | €69 once, forever | ~$15/mo / ~$144 a year |
| Telemetry | None | Varies |
If you want to dictate emails everywhere, buy a dictation app. If you want to talk with Claude Code — speak, listen, walk around, repeat — this is the only thing that does it.
- Floating panel that stays above every other window, on every Space, never steals focus from your terminal.
- 4 live status hearts — TTS speaking, STT listening, autosend on/off, dialog loop on/off (each interactive).
- 60-bar live waveform at 30 FPS with threshold and ambient floor annotations — see your voice in real time.
- ElevenLabs voice picker that fetches your account library on demand, marks router-mapped voices with a star, plays a sample inline.
- Audio device pills (input / output) that show the active microphone and speaker by name and let you switch via dropdown — bypasses System Settings entirely.
- Smart audio policy — refuses iPhone Continuity microphone, prefers AirPods then MacBook, prefers HomePod for output.
- Copyable transcription panel with text selection and an explicit copy button.
- Resizable, position and size persist between launches.
- Brand identity — every accent is the deep red of the Polpo, every micro-label is in plain Italian for clarity.
€69 lifetime, one-time payment. No subscription. All future updates included. License key checked once, cached locally. (Anchor price €79 — early adopters and beta testers pay €69.)
- macOS 15 (Sequoia) or later — uses
MenuBarExtra-era APIs,symbolEffect,textSelection,Canvas,NSPanelfloating modes. - Apple Silicon. Built with
-target arm64-apple-macos15. - Claude Code installed and configured.
- ElevenLabs API key in
~/claude_voice/.env(ELEVENLABS_API_KEY=…). - The companion daemons that handle STT (
stt_bar.py), autosend (jarvis_autosend.py), and the Stop hook (stop_tts_response.py) — bundled in the full Polpo Voice installer (the standaloneJarvisToggle.appin this repo is the cockpit only).
git clone https://github.com/mattiacalastri/jarvis-toggle.git
cd jarvis-toggle
./build.sh
cp -r JarvisToggle.app /Applications/
open /Applications/JarvisToggle.appDownload the latest signed and notarized Polpo Voice.app from Releases, drag into /Applications, double-click to launch.
After first launch macOS will prompt for:
- Microphone access — required for STT.
- Accessibility access — required for autosend keystroke listener.
- Apple Events — required for Return simulation.
Grant all three. Add the app to System Settings → General → Login Items so it starts automatically with the Mac.
Polpo Voice is a 4-layer system communicating via plain files in ~/.local/run/jarvis/:
┌─────────────────────────────────────────────────────────────────┐
│ ~/.local/run/jarvis/ (file contract) │
│ stt_state · stt_disabled · stt_engine · stt_diag · stt_last │
│ stt_bar.pid · voice_speaking.lock · jarvis_autosend_state │
│ jarvis_voice_loop_state · stt_levels.bin · voices.json │
│ voice_selected │
└────┬─────────────────┬──────────────────┬──────────────┬────────┘
│ │ │ │
┌───▼────┐ ┌───────▼────────┐ ┌──────▼─────┐ ┌────▼─────┐
│ stt_bar│ │ jarvis_autosend│ │stop_tts_ │ │ Polpo │
│ .py │ │ (pynput) │ │response.py │ │ Voice │
│PRODUCER│ │ AUTOSEND │ │ DIALOG │ │ .app │
└────────┘ └────────────────┘ └────────────┘ └──────────┘
Producers and consumers are completely decoupled — the file contract is the only protocol. You can swap any producer or consumer for your own implementation.
| File | Producer | Consumer | Purpose |
|---|---|---|---|
stt_state |
stt_bar.py | Polpo Voice | idle / recording / transcribing / off |
stt_disabled |
jarvis_toggle.sh | all | flag presence = STT off |
stt_bar.pid |
stt_bar.py | Polpo Voice | process liveness check |
stt_engine.txt |
stt_bar.py | Polpo Voice | apple / whisper |
stt_diag.txt |
stt_bar.py | Polpo Voice | threshold + cal_mean + cal_max |
stt_last |
stt_bar.py | Polpo Voice | last transcription |
stt_levels.bin |
stt_bar.py | Polpo Voice | 60 × float32 LE = waveform |
voice_speaking.lock |
voice_briefing.py | Polpo Voice | OUT heart active |
jarvis_autosend_state |
Polpo Voice | jarvis_autosend.py | LOOP on/off |
jarvis_voice_loop_state |
Polpo Voice | stop_tts_response.py | DIALOG on/off |
voices.json |
dump_voices.py | Polpo Voice | live ElevenLabs library |
voice_selected |
Polpo Voice | voice_briefing.py | active voice key |
- No telemetry, no remote logging, no analytics.
- The microphone audio is processed locally by Apple Speech Framework. Whisper as fallback runs locally too (MLX).
- Only the text of your dictation is sent to Anthropic via your normal Claude Code session.
- Only the text of Claude's reply is sent to ElevenLabs for TTS rendering.
- The license key check is one HTTPS call per launch. After that, fully offline.
MIT — see LICENSE. Use it, fork it, sell forks, attribute the original.
Built by Mattia Calastri under the Astra Digital banner. Voice rendering by ElevenLabs. STT by Apple Speech Framework / MLX Whisper.
🐙 Tentacolo vocale del Polpo.

