Two Modes · Why It's Different · Install · Pipeline · Philosophy · License
A Claude Code / Claude Agent SDK skill that ingests YouTube videos as structured, vault-anchored notes in Obsidian (or any markdown knowledge graph). The skill is opinionated: it values reasoning over summarization and graph-anchored synthesis over isolated transcripts.
The skill ships with two modes:
- standard - one neuron per video, no review schedule. Good for a research notebook or sporadic ingest.
- brain-training - adds speaker network tracking, convergence/tension detection, active-recall question generation, spaced-repetition scheduling, and a weekly digest. Good for deliberate learning when you want each video to leave a lasting trace.
Pick brain-training mode if you watch a lot of long-form content and the vault keeps filling up with notes you never re-read. See BRAIN_TRAINING.md.
/youtube-ingest <url> # standard
/youtube-ingest <url> --mode brain-training # active learning system
/youtube-ingest --review # surface neurons due for review
Most YouTube-to-markdown skills do this:
URL -> transcript -> LLM summary -> markdown file
This skill does this:
URL
-> existing-note gate (avoid blind re-ingest)
-> metadata (oembed)
-> transcript (yt-dlp + whisper local)
-> reasoning anchored to your existing knowledge graph
(vault scan -> backlinks -> cross-source verification)
-> write note with quality-gated backlinks
-> deep garden walk (second-pass synapse discovery)
-> verify-post-write (sintattico != funzionale)
The valuable step is the reasoning anchored to your existing graph: where does this video touch concepts you already have notes on? That step is what turns a transcript into a neuron.
- Obsidian vault (or any markdown KG)
- Claude Code OR Claude Agent SDK
- yt-dlp for transcript fetching
- mlx_whisper (Apple Silicon) or faster-whisper (generic) for ASR fallback
- Obsidian Local REST API plugin + mcp-obsidian MCP server (for graph-aware reads/writes)
git clone https://github.com/<you>/youtube-ingest-skill.git
cd youtube-ingest-skill
cp config.example.yml config.yml
# edit config.yml: set VAULT_PATH, MEMORY_DIR, TRANSCRIPT_TOOL, etc.Link the skill into your Claude Code skills directory:
ln -s "$(pwd)/SKILL.md" ~/.claude/skills/youtube-ingest/SKILL.md(Adjust path if your skills live elsewhere.)
In Claude Code:
/youtube-ingest https://www.youtube.com/watch?v=VID_ID
Or with flags:
/youtube-ingest <url> --no-vault # stop at reasoning, no write
/youtube-ingest <url> --container persons # force container
/youtube-ingest <url> --lang it # force whisper language
See config.example.yml. The skill reads these paths:
| Variable | Description | Example |
|---|---|---|
VAULT_PATH |
Root of your Obsidian vault | ~/Documents/MyVault |
MEMORY_DIR |
Directory for persistent agent memory files | ~/.claude/memory |
TRANSCRIPT_TOOL |
Path to the transcript shell script | ./scripts/transcript.sh |
TRANSCRIPT_OUT |
Where transcripts land | ~/Desktop/transcripts/ |
KNOWLEDGE_CONTAINER |
Default folder inside vault for ingested videos | Knowledge Library |
PERSONS_CONTAINER |
Folder for person-centric notes | Persons |
WHISPER_BIN |
Path to whisper binary | ~/.venv/bin/mlx_whisper |
Read PHILOSOPHY.md before extending. Three principles drive the design:
- Reasoning is the neuron, not the transcript. A skill that only transcribes-and-summarizes throws away the graph value.
- Anchored synthesis beats isolated synthesis. The first vault scan happens before the reasoning, not after.
- Syntactic success != functional success. Every write is verified post-hoc by reading back from disk.
Detailed pipeline in SKILL.md. Summary:
| Step | Purpose | Gate |
|---|---|---|
| 0 | Existing-note gate | avoid blind re-ingest |
| 1 | Metadata via oembed | language auto-detect |
| 2-3 | Transcript (yt-dlp -> whisper fallback) | background |
| 4 | Reasoning anchored to vault | pre-emit cross-reference verify |
| 5 | Write neuron note | quality-gated backlinks (3-7, >=2 synapse) |
| 5b | Memory satellite (gated) | only if pattern is cross-applicable |
| 6 | Deep garden walk | second-pass synapse discovery |
| 7 | Verify post-write | read back, check frontmatter + backlinks |
See examples/sample-note.md for a canonical note structure.
The skill embeds anti-patterns discovered during ~200 ingest cycles. The full list lives in SKILL.md. Highlights:
- Whisper drift: don't mix venv whisper with system whisper. Pin one.
- Blind re-ingest: always check if a note for this URL or speaker already exists. Default to additive update.
- Wikilink-to-memory rot:
[[wikilink]]only resolves inside the vault. Memory files outside the vault must be referenced as paths in prose, not as wikilinks. - Pre-emit naming check: verify the filename you are about to write does not collide and matches the pattern other agents are using in parallel.
- Garden walk is mandatory: first reasoning pass sees 6-7 synapses, second pass finds 3-9 more. Without the second pass the synthesis is shallow.
- Verify post-write:
write_notereturning OK does not mean the file is on disk (iCloud sync, TCC, path drift). Read it back. - Inline fact-check beats unverified flag: when a claim is verifiable with a cheap tool call, do it inline. Don't pollute the vault with
[unverified]flags that become permanent narrative debt.
The skill is opinionated. PRs welcome for:
- Generic transcript tool wrappers (faster-whisper, distil-whisper, deepgram, etc.)
- Vault adapters beyond Obsidian (Logseq, Foam, plain markdown)
- Language-specific reasoning prompts
- Anti-pattern discoveries (open an issue with reproduction)
Out of scope:
- Closed-source LLM cost optimizations (your fork, your call)
- Auto-tagging / classification heuristics that bypass the reasoning step
MIT. See LICENSE.
The skill grew out of ~200 ingest sessions inside a personal second-brain workflow. The pipeline structure mirrors the canonical "anchored synthesis" pattern from knowledge-graph research; the sintattico-vs-funzionale verify discipline mirrors postmortem patterns from distributed systems engineering applied to single-user knowledge work.
