Skip to content

mattiacalastri/jarvis-toggle

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🐙 Polpo Voice

Polpo Voice is a native macOS application that adds full bidirectional voice interaction to Anthropic Claude Code. You speak prompts via Apple Speech Framework, hear Claude's replies via ElevenLabs voices, all from a floating menu bar cockpit. €69 lifetime, no subscription, runs locally.

Talk to Claude Code hands-free. Dictate, get the answer spoken back, switch voices on the fly, all from a single floating panel that stays above every other window without stealing focus.

macOS Swift License Status

Polpo Voice — il tentacolo vocale


Polpo Voice — UI live

Live UI — STT in ascolto, LOOP auto-send attivo, voce George selezionata, waveform in tempo reale.


What it does

Polpo Voice closes the loop between you and Claude Code without your hands ever leaving the coffee:

  1. You speak. The Mac captures the dictation through your microphone (or your AirPods).
  2. The transcribed text is auto-pasted into the Claude Code terminal.
  3. After a short silence, Return is pressed automatically — your prompt is sent.
  4. Claude replies as usual.
  5. The reply is intercepted, cleaned of code blocks and markdown, and spoken back to you with a natural ElevenLabs voice (default: George, warm British storyteller).
  6. You walk around. You answer back. Repeat.

You never touched the keyboard.

Why it exists

Coding by voice is finally good enough — Apple Speech is sub-200ms, ElevenLabs voices are uncanny, AirPods Pro have great mics. The missing piece was a glue layer that made all of this work for a single specific workflow: talking with an AI coding agent in a terminal.

Polpo Voice is that glue. No subscription, no cloud sync, no telemetry. Just one binary, one floating panel, one paid forever.

Why not just a dictation app?

Because Polpo Voice isn't a dictation app. Dictation apps go one way — voice in, text out. Polpo Voice closes the loop: voice in, and Claude's answer spoken back to you. You can stand up, walk to the window, and have a full conversation with your coding agent without looking at the screen.

Wispr Flow, Monologue, Superwhisper and friends are excellent at voice → text everywhere. None of them read the reply back. None of them are built for the one workflow Polpo Voice obsesses over: talking with Claude Code in a terminal.

🐙 Polpo Voice Cloud dictation apps (Wispr Flow, Monologue …)
Built for Bidirectional voice chat with Claude Code General voice → text in any app
Speaks the reply back (TTS) ✅ — the whole point
Auto-send loop ✅ silence → Return, hands-free
Audio leaves your Mac ❌ STT is local (Apple Speech + local Whisper) ☁️ typically cloud-processed
Price €69 once, forever ~$15/mo / ~$144 a year
Telemetry None Varies

If you want to dictate emails everywhere, buy a dictation app. If you want to talk with Claude Code — speak, listen, walk around, repeat — this is the only thing that does it.

Highlights

  • Floating panel that stays above every other window, on every Space, never steals focus from your terminal.
  • 4 live status hearts — TTS speaking, STT listening, autosend on/off, dialog loop on/off (each interactive).
  • 60-bar live waveform at 30 FPS with threshold and ambient floor annotations — see your voice in real time.
  • ElevenLabs voice picker that fetches your account library on demand, marks router-mapped voices with a star, plays a sample inline.
  • Audio device pills (input / output) that show the active microphone and speaker by name and let you switch via dropdown — bypasses System Settings entirely.
  • Smart audio policy — refuses iPhone Continuity microphone, prefers AirPods then MacBook, prefers HomePod for output.
  • Copyable transcription panel with text selection and an explicit copy button.
  • Resizable, position and size persist between launches.
  • Brand identity — every accent is the deep red of the Polpo, every micro-label is in plain Italian for clarity.

Pricing

€69 lifetime, one-time payment. No subscription. All future updates included. License key checked once, cached locally. (Anchor price €79 — early adopters and beta testers pay €69.)

Requirements

  • macOS 15 (Sequoia) or later — uses MenuBarExtra-era APIs, symbolEffect, textSelection, Canvas, NSPanel floating modes.
  • Apple Silicon. Built with -target arm64-apple-macos15.
  • Claude Code installed and configured.
  • ElevenLabs API key in ~/claude_voice/.env (ELEVENLABS_API_KEY=…).
  • The companion daemons that handle STT (stt_bar.py), autosend (jarvis_autosend.py), and the Stop hook (stop_tts_response.py) — bundled in the full Polpo Voice installer (the standalone JarvisToggle.app in this repo is the cockpit only).

Install

Build from source

git clone https://github.com/mattiacalastri/jarvis-toggle.git
cd jarvis-toggle
./build.sh
cp -r JarvisToggle.app /Applications/
open /Applications/JarvisToggle.app

Pre-built binary (coming soon)

Download the latest signed and notarized Polpo Voice.app from Releases, drag into /Applications, double-click to launch.

After first launch macOS will prompt for:

  • Microphone access — required for STT.
  • Accessibility access — required for autosend keystroke listener.
  • Apple Events — required for Return simulation.

Grant all three. Add the app to System Settings → General → Login Items so it starts automatically with the Mac.

Architecture

Polpo Voice is a 4-layer system communicating via plain files in ~/.local/run/jarvis/:

┌─────────────────────────────────────────────────────────────────┐
│              ~/.local/run/jarvis/  (file contract)              │
│  stt_state · stt_disabled · stt_engine · stt_diag · stt_last    │
│  stt_bar.pid · voice_speaking.lock · jarvis_autosend_state      │
│  jarvis_voice_loop_state · stt_levels.bin · voices.json         │
│  voice_selected                                                 │
└────┬─────────────────┬──────────────────┬──────────────┬────────┘
     │                 │                  │              │
 ┌───▼────┐    ┌───────▼────────┐  ┌──────▼─────┐  ┌────▼─────┐
 │ stt_bar│    │ jarvis_autosend│  │stop_tts_   │  │  Polpo   │
 │  .py   │    │   (pynput)     │  │response.py │  │  Voice   │
 │PRODUCER│    │  AUTOSEND      │  │ DIALOG     │  │   .app   │
 └────────┘    └────────────────┘  └────────────┘  └──────────┘

Producers and consumers are completely decoupled — the file contract is the only protocol. You can swap any producer or consumer for your own implementation.

File contract reference

File Producer Consumer Purpose
stt_state stt_bar.py Polpo Voice idle / recording / transcribing / off
stt_disabled jarvis_toggle.sh all flag presence = STT off
stt_bar.pid stt_bar.py Polpo Voice process liveness check
stt_engine.txt stt_bar.py Polpo Voice apple / whisper
stt_diag.txt stt_bar.py Polpo Voice threshold + cal_mean + cal_max
stt_last stt_bar.py Polpo Voice last transcription
stt_levels.bin stt_bar.py Polpo Voice 60 × float32 LE = waveform
voice_speaking.lock voice_briefing.py Polpo Voice OUT heart active
jarvis_autosend_state Polpo Voice jarvis_autosend.py LOOP on/off
jarvis_voice_loop_state Polpo Voice stop_tts_response.py DIALOG on/off
voices.json dump_voices.py Polpo Voice live ElevenLabs library
voice_selected Polpo Voice voice_briefing.py active voice key

Privacy

  • No telemetry, no remote logging, no analytics.
  • The microphone audio is processed locally by Apple Speech Framework. Whisper as fallback runs locally too (MLX).
  • Only the text of your dictation is sent to Anthropic via your normal Claude Code session.
  • Only the text of Claude's reply is sent to ElevenLabs for TTS rendering.
  • The license key check is one HTTPS call per launch. After that, fully offline.

License

MIT — see LICENSE. Use it, fork it, sell forks, attribute the original.

Credits

Built by Mattia Calastri under the Astra Digital banner. Voice rendering by ElevenLabs. STT by Apple Speech Framework / MLX Whisper.

🐙 Tentacolo vocale del Polpo.

Releases

No releases published

Packages

 
 
 

Contributors