Skip to content

Latest commit

 

History

History
358 lines (299 loc) · 13.4 KB

File metadata and controls

358 lines (299 loc) · 13.4 KB

ESP32 Digital Drum Kit — Claude Code Pair Programming Guide

TL;DR — What We're Building

A progressive drum kit project. We start with zero extra hardware (just ESP32 + USB cable + browser) and add capability phase by phase until we have a fully wireless, standalone physical instrument.

Current phase: Phase 5 — OLED + Kit Switching (next up) Last completed: Phase 4b — I2S Amp + Speaker Audio


Phase Overview

Phase Name Branch Hardware Status
0 UART + Browser MVP phase-0-mvp ESP32 + USB only Complete
1 Physical Buttons → UART → Browser phase-1-buttons + 7 buttons + breadboard Complete
2 WiFi AP + WebSocket → Phone Audio phase-2-wifi-ap No new hardware Complete
3 FreeRTOS Dual-Core Task Split phase-3-polyphony Same as Phase 2 Complete
4a SD Card WAV Loading phase-4a-sd-card + Adafruit SD card module + microSD Complete
4b I2S Amp + Speaker Audio phase-4b-i2s-audio + MAX98357A amp + speaker Complete
5 OLED + Kit Switching phase-5-display + OLED Not started
6 Enclosure + Final Build phase-6-enclosure Full BOM Not started

Phase 0 — UART + Web Browser MVP ✅ COMPLETE

Architecture

[ESP32 firmware]
   Serial.println("KICK")   ← sends command string over UART
        |
   USB cable
        |
[Chrome Web App]
   Web Serial API            ← reads the serial port
        |
   Web Audio API             ← plays drum sound in browser

Command Protocol (all phases use this)

Command Drum sound
KICK Kick drum
SNARE Snare drum
HIHAT_CLOSED Closed hi-hat
HIHAT_OPEN Open hi-hat
TOM_LOW Low tom
TOM_MID Mid tom
CRASH Crash cymbal
RIDE Ride cymbal

Phase 1 — Physical Buttons → UART → Browser ✅ COMPLETE

What was built

  • 7 tactile buttons wired to GPIO 4,5,12,13,14,15,18 via breadboard
  • ISR per button (IRAM_ATTR, FALLING edge)
  • 10ms software debounce via millis() timestamp
  • Serial.println(command) sends to Chrome over USB

GPIO Pin Map (Phase 1+)

GPIO Drum
4 KICK
5 SNARE
12 HIHAT_CLOSED
13 HIHAT_OPEN
14 TOM_LOW
15 TOM_MID
18 CRASH

Avoid GPIO 6–11 (internal flash). GPIO 34–39 are input-only.


Phase 2 — WiFi AP + WebSocket → Phone Audio ✅ COMPLETE

Goal

Completely wireless. No USB cable. No laptop. Press a button → iPhone plays drum sound.

Architecture

ESP32 (WiFi Access Point mode)
  ├── SSID: "DrumKit-ESP32"  Password: "drumkit123"
  ├── IP:   192.168.4.1
  ├── HTTP server → serves single bundled HTML file
  └── WebSocket server (port 81) → pushes drum commands to phone
           |
    [iPhone connects to "DrumKit-ESP32" WiFi]
           |
    Safari opens http://192.168.4.1
           |
    Web app loads (HTML + JS + CSS + base64 WAV samples — all in one file)
           |
    WebSocket connects back to ESP32 on ws://192.168.4.1:81
           |
    Button press → ESP32 → WebSocket message → Safari → Web Audio API → phone speaker

Why WiFi AP (not WiFi client)

  • No router needed — ESP32 is its own hotspot
  • Works anywhere — no home network dependency
  • Direct connection = lower latency (~3–8ms vs ~10–20ms via router)
  • iPhone connects like any WiFi network

Key Technical Decisions

Decision Choice Reason
Transport WebSocket Real-time push, persistent connection, ~1ms delivery
Web app delivery Single bundled HTML served by ESP32 No laptop, no external server
WAV samples Base64-encoded inside HTML Avoids SPIFFS file serving complexity
AP credentials SSID: DrumKit-ESP32, Pass: drumkit123 Fixed for easy connection
WebSocket port 81 Avoids conflict with HTTP on port 80
Phone browser iPhone Safari Web Audio API + WebSocket both supported

Libraries Required

Library Purpose How to add
WiFi.h AP mode Built-in ESP32 Arduino core
WebServer.h HTTP server, serves HTML Built-in ESP32 Arduino core
WebSocketsServer WebSocket push to phone PlatformIO: links2004/WebSockets

platformio.ini additions for Phase 2

lib_deps =
  links2004/WebSockets @ ^2.4.1

Phase 2 File Layout

firmware/phase2/
├── platformio.ini
└── src/
    └── main.cpp          ← WiFi AP + HTTP + WebSocket + button ISRs

web_app/phase2/
└── index.html            ← Single self-contained file
                             (HTML + CSS + JS + base64 WAV samples bundled)

Phase 2 Firmware Responsibilities

  1. Boot → start WiFi AP "DrumKit-ESP32"
  2. Start HTTP server on port 80 → serve index.html on GET /
  3. Start WebSocket server on port 81
  4. Init 7 button GPIO pins (INPUT_PULLUP, ISR FALLING edge, 10ms debounce)
  5. On button press → webSocket.broadcastTXT("KICK") (same command strings as before)
  6. Main loop: webSocket.loop() + server.handleClient() + check trigger flags

Phase 2 Web App Responsibilities

  1. On load → connect WebSocket to ws://192.168.4.1:81
  2. On WebSocket message → parse command → play corresponding WAV sample
  3. WAV samples bundled as base64 strings in JS — decoded to AudioBuffer on load
  4. "Tap to Start" screen on load (iOS Safari requires user gesture before audio)
  5. Visual pad grid lights up on hit (same as Phase 0/1 web app)

iOS Safari Constraints

  • Must have a "Tap to Start" screen — iOS blocks audio until user taps
  • Web Serial API does not exist on iOS — not needed (WebSocket replaces it)
  • Web Audio API works fully in Safari iOS 14.5+
  • WebSocket works fully in Safari

Memory Constraints

  • ESP32 has 520KB SRAM
  • WiFi stack uses ~100KB
  • WebSocket server uses ~20KB
  • Leaves ~400KB for application — sufficient
  • Base64 WAV samples live in phone memory, not ESP32 — no SPIFFS needed

Latency Budget (Phase 2)

Stage Target
Button press → ISR < 1ms
ISR → WebSocket broadcast < 2ms
WiFi AP → iPhone < 8ms
Web Audio playback start < 5ms
Total < 16ms

Phase 3 — FreeRTOS Dual-Core Task Split ✅ COMPLETE

What was built

  • WiFi/WebSocket handling pinned to Core 0 (WiFiTask, priority 19, 8KB stack)
  • Button ISR flag processing pinned to Core 1 (InputTask, priority 20, 2KB stack)
  • loop() sleeps permanently with vTaskDelay(portMAX_DELAY) — all work in tasks
  • Result: button input never waits for WiFi processing and vice versa

Architecture

Core 0 — WiFiTask:  ws_server.loop() + http_server.handleClient()
Core 1 — InputTask: reads trigger_flags[], calls ws_server.broadcastTXT()
ISRs:               IRAM_ATTR, fire on any core, set volatile flags only

Key config

Constant Value
WIFI_TASK_CORE 0
INPUT_TASK_CORE 1
WIFI_TASK_PRIORITY 19
INPUT_TASK_PRIORITY 20 (higher — never miss a hit)
WIFI_TASK_STACK 8192 bytes
INPUT_TASK_STACK 2048 bytes

Testing checklist — COMPLETE ✓

  • Serial Monitor shows "WiFiTask → Core 0" and "InputTask → Core 1" on boot
  • All 7 buttons trigger correctly
  • Sound plays on iPhone with no latency regression
  • System stable after extended use

Phase 4a — SD Card WAV Loading ✅ COMPLETE

What was built

  • Adafruit MicroSD breakout mounted over SPI at boot
  • All 8 WAV files opened, header validated (22050Hz 16-bit mono PCM), PCM data loaded into heap buffers
  • SNARE remapped GPIO 5 → 33, CRASH remapped GPIO 18 → 32 to free SPI pins
  • InputTask on Core 1 confirms all 7 buttons via Serial Monitor
  • No audio output — purely validates SD + WAV data pipeline

Test results — COMPLETE ✓

  • SD card mounts on boot — "SD mounted OK — card size: 30436MB"
  • All 8 WAV files confirmed present and readable
  • WAV header parsed: 22050Hz, 16-bit, mono for all 8 files
  • Files pre-loaded into heap buffers at boot
  • CRASH fires on GPIO 32, SNARE fires on GPIO 33
  • Free heap after load: 282748 bytes — sufficient for Phase 4b

Phase 4b — I2S Amp + Speaker Audio ✅ COMPLETE

What was built

  • I2S audio output through MAX98357A Class D mono amplifier to Adafruit #3351 speaker
  • 4-voice polyphonic mixer (32-bit accumulation → clipped to int16) running on Core 0
  • Two-pass WAV loader: probe all headers → malloc all buffers while heap is clean → read PCM data
    • Solves heap fragmentation that caused malloc failures when loading interleaved with SD I/O
  • I2S_CHANNEL_FMT_ONLY_RIGHT for correct mono output to MAX98357A
  • 0.5x software attenuation (>> 1) to prevent overdriving amp at max hardware gain (+15dB)
  • 50ms debounce to eliminate button double-triggers
  • WAV samples converted on Mac using afconvert + Python script to enforce:
    • 22050Hz, 16-bit, mono, standard 44-byte header (no macOS metadata offset)
  • Original samples stored in firmware/phase4b/audio_samples_original/ for re-conversion

I2S Wiring (MAX98357A)

MAX98357A Pin ESP32 Pin Notes
VIN 3V3 3.3V power
GND GND Ground
SD (mode) 3V3 Float or 3V3 = (L+R)/2 output
GAIN GND +15dB hardware gain
DIN GPIO 22 I2S data
BCLK GPIO 26 I2S bit clock
LRC GPIO 25 I2S word select

Button → Sample map

Button GPIO Sample
KICK 4 kick.wav
SNARE 33 snare.wav
HIHAT_CLOSED 12 hihat_closed.wav
HIHAT_OPEN 13 hihat_closed.wav (same — no open hihat sample)
TOM_LOW 14 tom_low.wav
TOM_MID 15 tom_mid.wav
CRASH 32 crash.wav

Key technical lessons learned

  • ESP.getMaxAllocHeap() can be misleading after SD I/O — always malloc before file reads
  • WAV files from macOS afconvert get audio data file offset: 4096 (metadata) — must strip to 44-byte header with Python
  • Software gain above 1x with full-scale drum samples causes clipping — use attenuation instead
  • MAX98357A GAIN=GND gives +15dB — back off in firmware to avoid speaker distortion

Test results — COMPLETE ✓

  • All 8 WAV files load at boot (8/8)
  • All 7 buttons trigger correct sounds
  • Kick, snare, crash confirmed matching Mac reference playback
  • 4-voice polyphony working — simultaneous hits mix correctly
  • No clipping or distortion at 0.5x software attenuation
  • System stable across extended play session
  • Free heap after load: ~131KB

Code Style Rules (All Phases)

  • snake_case for variables and functions
  • UPPER_SNAKE_CASE for constants and #define
  • Comment GPIO numbers inline: #define BTN_KICK 4 // GPIO4
  • ISRs must be IRAM_ATTR — set flags only, no Serial, no I/O, no malloc
  • No delay() anywhere in the input or audio path
  • No heap allocation in DMA callback (Phase 4+)
  • No dynamic allocation in audio hot path — pre-allocate at boot

What NOT to Do

  • Do not use delay() in ISRs or WiFi/WebSocket callbacks
  • Do not call Serial.print() inside ISRs
  • Do not block webSocket.loop() or server.handleClient() with long operations
  • Do not use GPIO 6–11
  • Do not merge phase branches out of order

Branching Strategy

  • main — stable, tagged releases only
  • Each phase has its own branch, branched from main after previous phase merges
  • PRs go: phase-Nmain when phase is fully working and tested
  • Next branch: phase-5-display (branches from main after 4b merges)

Audio Sample Spec

  • Format: WAV, PCM, uncompressed
  • Sample rate: 22050 Hz
  • Bit depth: 16-bit
  • Channels: Mono
  • Files: kick, snare, hihat_closed, hihat_open, tom_low, tom_mid, crash, ride
  • Phase 2: base64-encoded inside web app HTML
  • Phase 4+: raw WAV on SD card

Testing Checklist by Phase

Phase 0 — COMPLETE ✓

  • Web app connects to ESP32 serial port in Chrome
  • Typing KICK in Serial Monitor plays kick sound in browser
  • All 8 command strings trigger correct sounds
  • On-screen pads work without ESP32 (standalone test mode)

Phase 1 — COMPLETE ✓

  • All 7 buttons register on press
  • No double-trigger at realistic press speed (10ms debounce)
  • Button → browser sound working end-to-end
  • Fast repeated hits all register cleanly

Phase 2 — COMPLETE ✓

  • iPhone connects to "DrumKit-ESP32" WiFi hotspot
  • Safari opens http://192.168.4.1 and loads web app
  • WebSocket connects (status indicator shows connected)
  • Press button → phone plays drum sound with no USB cable
  • All 7 buttons trigger correct sounds on phone
  • Latency feels acceptable (< 20ms perceived)
  • AudioContext resume fix applied for iOS background suspension

Phase 4a — COMPLETE ✓

  • SD card mounts on boot
  • All 8 WAV files load at 22050Hz 16-bit mono
  • All 7 buttons fire including remapped GPIO 32/33
  • Free heap sufficient for Phase 4b audio buffers

Phase 4b — COMPLETE ✓

  • Button → audible sound in < 10ms
  • 4 simultaneous buttons all produce sound (polyphonic mixer)
  • No clipping at 0.5x software attenuation
  • All 7 buttons trigger correct sounds
  • System stable across extended play session

Phase 5 (next)

  • OLED display shows current kit name and hit indicator
  • Multiple kit switching via button or encoder
  • Hardware: Adafruit #4440 OLED (ordered)