ESP32 Digital Drum Kit — Claude Code Pair Programming Guide
TL;DR — What We're Building
A progressive drum kit project. We start with zero extra hardware (just ESP32 + USB cable + browser) and add capability phase by phase until we have a fully wireless, standalone physical instrument.
Current phase: Phase 5 — OLED + Kit Switching (next up)
Last completed: Phase 4b — I2S Amp + Speaker Audio
Phase
Name
Branch
Hardware
Status
0
UART + Browser MVP
phase-0-mvp
ESP32 + USB only
Complete
1
Physical Buttons → UART → Browser
phase-1-buttons
+ 7 buttons + breadboard
Complete
2
WiFi AP + WebSocket → Phone Audio
phase-2-wifi-ap
No new hardware
Complete
3
FreeRTOS Dual-Core Task Split
phase-3-polyphony
Same as Phase 2
Complete
4a
SD Card WAV Loading
phase-4a-sd-card
+ Adafruit SD card module + microSD
Complete
4b
I2S Amp + Speaker Audio
phase-4b-i2s-audio
+ MAX98357A amp + speaker
Complete
5
OLED + Kit Switching
phase-5-display
+ OLED
Not started
6
Enclosure + Final Build
phase-6-enclosure
Full BOM
Not started
Phase 0 — UART + Web Browser MVP ✅ COMPLETE
[ESP32 firmware]
Serial.println("KICK") ← sends command string over UART
|
USB cable
|
[Chrome Web App]
Web Serial API ← reads the serial port
|
Web Audio API ← plays drum sound in browser
Command Protocol (all phases use this)
Command
Drum sound
KICK
Kick drum
SNARE
Snare drum
HIHAT_CLOSED
Closed hi-hat
HIHAT_OPEN
Open hi-hat
TOM_LOW
Low tom
TOM_MID
Mid tom
CRASH
Crash cymbal
RIDE
Ride cymbal
Phase 1 — Physical Buttons → UART → Browser ✅ COMPLETE
7 tactile buttons wired to GPIO 4,5,12,13,14,15,18 via breadboard
ISR per button (IRAM_ATTR, FALLING edge)
10ms software debounce via millis() timestamp
Serial.println(command) sends to Chrome over USB
GPIO
Drum
4
KICK
5
SNARE
12
HIHAT_CLOSED
13
HIHAT_OPEN
14
TOM_LOW
15
TOM_MID
18
CRASH
Avoid GPIO 6–11 (internal flash). GPIO 34–39 are input-only.
Phase 2 — WiFi AP + WebSocket → Phone Audio ✅ COMPLETE
Completely wireless. No USB cable. No laptop. Press a button → iPhone plays drum sound.
ESP32 (WiFi Access Point mode)
├── SSID: "DrumKit-ESP32" Password: "drumkit123"
├── IP: 192.168.4.1
├── HTTP server → serves single bundled HTML file
└── WebSocket server (port 81) → pushes drum commands to phone
|
[iPhone connects to "DrumKit-ESP32" WiFi]
|
Safari opens http://192.168.4.1
|
Web app loads (HTML + JS + CSS + base64 WAV samples — all in one file)
|
WebSocket connects back to ESP32 on ws://192.168.4.1:81
|
Button press → ESP32 → WebSocket message → Safari → Web Audio API → phone speaker
Why WiFi AP (not WiFi client)
No router needed — ESP32 is its own hotspot
Works anywhere — no home network dependency
Direct connection = lower latency (~3–8ms vs ~10–20ms via router)
iPhone connects like any WiFi network
Decision
Choice
Reason
Transport
WebSocket
Real-time push, persistent connection, ~1ms delivery
Web app delivery
Single bundled HTML served by ESP32
No laptop, no external server
WAV samples
Base64-encoded inside HTML
Avoids SPIFFS file serving complexity
AP credentials
SSID: DrumKit-ESP32, Pass: drumkit123
Fixed for easy connection
WebSocket port
81
Avoids conflict with HTTP on port 80
Phone browser
iPhone Safari
Web Audio API + WebSocket both supported
Library
Purpose
How to add
WiFi.h
AP mode
Built-in ESP32 Arduino core
WebServer.h
HTTP server, serves HTML
Built-in ESP32 Arduino core
WebSocketsServer
WebSocket push to phone
PlatformIO: links2004/WebSockets
platformio.ini additions for Phase 2
lib_deps =
links2004/WebSockets @ ^2.4.1
firmware/phase2/
├── platformio.ini
└── src/
└── main.cpp ← WiFi AP + HTTP + WebSocket + button ISRs
web_app/phase2/
└── index.html ← Single self-contained file
(HTML + CSS + JS + base64 WAV samples bundled)
Phase 2 Firmware Responsibilities
Boot → start WiFi AP "DrumKit-ESP32"
Start HTTP server on port 80 → serve index.html on GET /
Start WebSocket server on port 81
Init 7 button GPIO pins (INPUT_PULLUP, ISR FALLING edge, 10ms debounce)
On button press → webSocket.broadcastTXT("KICK") (same command strings as before)
Main loop: webSocket.loop() + server.handleClient() + check trigger flags
Phase 2 Web App Responsibilities
On load → connect WebSocket to ws://192.168.4.1:81
On WebSocket message → parse command → play corresponding WAV sample
WAV samples bundled as base64 strings in JS — decoded to AudioBuffer on load
"Tap to Start" screen on load (iOS Safari requires user gesture before audio)
Visual pad grid lights up on hit (same as Phase 0/1 web app)
Must have a "Tap to Start" screen — iOS blocks audio until user taps
Web Serial API does not exist on iOS — not needed (WebSocket replaces it)
Web Audio API works fully in Safari iOS 14.5+
WebSocket works fully in Safari
ESP32 has 520KB SRAM
WiFi stack uses ~100KB
WebSocket server uses ~20KB
Leaves ~400KB for application — sufficient
Base64 WAV samples live in phone memory, not ESP32 — no SPIFFS needed
Stage
Target
Button press → ISR
< 1ms
ISR → WebSocket broadcast
< 2ms
WiFi AP → iPhone
< 8ms
Web Audio playback start
< 5ms
Total
< 16ms
Phase 3 — FreeRTOS Dual-Core Task Split ✅ COMPLETE
WiFi/WebSocket handling pinned to Core 0 (WiFiTask, priority 19, 8KB stack)
Button ISR flag processing pinned to Core 1 (InputTask, priority 20, 2KB stack)
loop() sleeps permanently with vTaskDelay(portMAX_DELAY) — all work in tasks
Result: button input never waits for WiFi processing and vice versa
Core 0 — WiFiTask: ws_server.loop() + http_server.handleClient()
Core 1 — InputTask: reads trigger_flags[], calls ws_server.broadcastTXT()
ISRs: IRAM_ATTR, fire on any core, set volatile flags only
Constant
Value
WIFI_TASK_CORE
0
INPUT_TASK_CORE
1
WIFI_TASK_PRIORITY
19
INPUT_TASK_PRIORITY
20 (higher — never miss a hit)
WIFI_TASK_STACK
8192 bytes
INPUT_TASK_STACK
2048 bytes
Testing checklist — COMPLETE ✓
Phase 4a — SD Card WAV Loading ✅ COMPLETE
Adafruit MicroSD breakout mounted over SPI at boot
All 8 WAV files opened, header validated (22050Hz 16-bit mono PCM), PCM data loaded into heap buffers
SNARE remapped GPIO 5 → 33, CRASH remapped GPIO 18 → 32 to free SPI pins
InputTask on Core 1 confirms all 7 buttons via Serial Monitor
No audio output — purely validates SD + WAV data pipeline
Test results — COMPLETE ✓
Phase 4b — I2S Amp + Speaker Audio ✅ COMPLETE
I2S audio output through MAX98357A Class D mono amplifier to Adafruit #3351 speaker
4-voice polyphonic mixer (32-bit accumulation → clipped to int16) running on Core 0
Two-pass WAV loader: probe all headers → malloc all buffers while heap is clean → read PCM data
Solves heap fragmentation that caused malloc failures when loading interleaved with SD I/O
I2S_CHANNEL_FMT_ONLY_RIGHT for correct mono output to MAX98357A
0.5x software attenuation (>> 1) to prevent overdriving amp at max hardware gain (+15dB)
50ms debounce to eliminate button double-triggers
WAV samples converted on Mac using afconvert + Python script to enforce:
22050Hz, 16-bit, mono, standard 44-byte header (no macOS metadata offset)
Original samples stored in firmware/phase4b/audio_samples_original/ for re-conversion
MAX98357A Pin
ESP32 Pin
Notes
VIN
3V3
3.3V power
GND
GND
Ground
SD (mode)
3V3
Float or 3V3 = (L+R)/2 output
GAIN
GND
+15dB hardware gain
DIN
GPIO 22
I2S data
BCLK
GPIO 26
I2S bit clock
LRC
GPIO 25
I2S word select
Button
GPIO
Sample
KICK
4
kick.wav
SNARE
33
snare.wav
HIHAT_CLOSED
12
hihat_closed.wav
HIHAT_OPEN
13
hihat_closed.wav (same — no open hihat sample)
TOM_LOW
14
tom_low.wav
TOM_MID
15
tom_mid.wav
CRASH
32
crash.wav
Key technical lessons learned
ESP.getMaxAllocHeap() can be misleading after SD I/O — always malloc before file reads
WAV files from macOS afconvert get audio data file offset: 4096 (metadata) — must strip to 44-byte header with Python
Software gain above 1x with full-scale drum samples causes clipping — use attenuation instead
MAX98357A GAIN=GND gives +15dB — back off in firmware to avoid speaker distortion
Test results — COMPLETE ✓
Code Style Rules (All Phases)
snake_case for variables and functions
UPPER_SNAKE_CASE for constants and #define
Comment GPIO numbers inline: #define BTN_KICK 4 // GPIO4
ISRs must be IRAM_ATTR — set flags only, no Serial, no I/O, no malloc
No delay() anywhere in the input or audio path
No heap allocation in DMA callback (Phase 4+)
No dynamic allocation in audio hot path — pre-allocate at boot
Do not use delay() in ISRs or WiFi/WebSocket callbacks
Do not call Serial.print() inside ISRs
Do not block webSocket.loop() or server.handleClient() with long operations
Do not use GPIO 6–11
Do not merge phase branches out of order
main — stable, tagged releases only
Each phase has its own branch, branched from main after previous phase merges
PRs go: phase-N → main when phase is fully working and tested
Next branch: phase-5-display (branches from main after 4b merges)
Format: WAV, PCM, uncompressed
Sample rate: 22050 Hz
Bit depth: 16-bit
Channels: Mono
Files: kick, snare, hihat_closed, hihat_open, tom_low, tom_mid, crash, ride
Phase 2: base64-encoded inside web app HTML
Phase 4+: raw WAV on SD card
Testing Checklist by Phase