Meet the Librarian. He's a lightweight, local-first, plain-text activity tracker — keep a running, structured record of what you actually do, so reporting season is a query instead of a memory test.
- You sit down for your annual performance review and cannot remember half of what you accomplished this year. The big wins from January are a blur.
- You finally start your multi-year post-tenure review and realize you never wrote down the talks, the committees, the student you mentored to a publication.
- Your certification renewal is due and you need to report continuing-education credits — but the courses and conferences are scattered across an inbox, a calendar, and your memory.
- You are a student applying to grad school and have to reconstruct three years of projects, research, and activities from old folders.
- A great job posting appears and you want a resume tailored to it — not the generic one — but assembling the relevant experience from scratch is daunting.
Every one of those is the same problem: the work happened, but it was never
recorded in a form you can search, filter, and report from. librarian fixes
that. You log activities as they happen — a couple of minutes each — into a
plain-text file you own. When a reporting moment arrives, the record is already
there.
librarian is a CRUD + search tool over a YAML database of activity
entries. Each entry has a few fixed core fields (id, date, title,
description, tags, supporting docs) plus optional structured blocks defined
by a pluggable schema you choose. It ships as:
- a command-line tool (
librarian), - an MCP server so AI assistants can read and update the database for you,
- and four ready-made schemas for common reporting needs.
- You own your data. It is a YAML file on your disk. No account, no subscription, no cloud, no vendor.
- No lock-in. Plain text means you can read, grep, edit, and back it up
with any tool. If you stop using
librariantomorrow, your record is still a perfectly readable file. - Git-friendly. Commit the file to a private repo and you get a full, diff-able history of your career record for free.
- Extensible. The schema is data, not code — adapt it to your situation without touching the program.
- AI-agent-friendly. The bundled MCP server lets an assistant log activities, answer "what did I do in Q2?", and assemble tailored reports — using your real record, not guesses.
- Pluggable schema — structured "blocks" (review classification, credit
tracking, ...) are declared in a
schema.yaml, not hardcoded. Add a block to an existing entry withset-block <id> <block> <json>; the whole block is validated atomically before the write. - Five bundled schemas — performance review, post-tenure review, certification credits, student portfolio, grant / funding.
- Full-text and structured search —
search,filter,list,project,similar,stats. - Aggregation —
rollup <block> [--sum FIELD] [--group-by FIELD]totals a block across the entries that carry it, e.g.librarian rollup grant --sum amount --group-by statusfor a funding portfolio breakdown (awarded vs. pending). - Project aggregation —
project <name>lists every entry tagged with a project name and appends any keyword-only matches as a "more like this" section;--strictdrops the appendix,--broadswitches to keyword hits anywhere in the entry text. - Slice export —
exportwrites a filtered subset of entries to CSV or JSON, with date and tag filters, for downstream reporting or import into other tools. - Format-preserving writes — edits are surgical, line-level splices; your hand-formatting and paragraph breaks survive every write.
- Schema validation —
validateflags bad enum values, missing required fields, duplicate ids, dangling cross-references, and inventory problems. An entry that legitimately has no artifact can setdocs_optional: true(update-field <id> docs_optional true) to suppress its NO DOCS warning so genuine gaps stand out. - Discoverability —
envprints the resolved data-home paths and whichLIBRARIAN_*variable set each (the "truth sources" for the active setup);schemaenumerates every enum's allowed values, including dependent maps like category → subcategory, so valid choices are discoverable without readingschema.yaml. - Change ledger — every write is appended to an audit log; poll "what
changed since I last looked?" with
changes.search,filter, andlistalso accept--changed-since/--changed-untilto intersect a query with the ledger — e.g.filter --tag grant --changed-since 2026-05-01returns only the grant entries changed since you last pulled, which is handy whenever you need to export updates into an external system. - File inventory — track supporting artifacts (PDFs, posters, certificates) in a normalized registry, with sha256 de-duplication.
- Contact rolodex — auto-built index of the people you collaborate with,
derived from two-or-more-word
Name (email)orName <email>mentions in your descriptions and queryable by name or email fragment. - Safe cross-references —
rename-idrepoints every backticked and plain-text reference to an entry across the corpus, token-bounded so longer ids aren't matched as substrings.delete <id> --repoint-to <target>reuses the same rewriter, so deleting an entry can rewrite its inbound references to a successor in the same call instead of leaving dangling links behind. - Atomic consolidation —
merge <source-id>... --into <target-id>folds tags, docs, and missing schema blocks from one or more source entries into a target, repoints every inbound reference, and deletes the sources, all in one locked write.--on-block-conflict=abort|keep-target|keep-sourcechooses how to resolve same-block collisions;--append-sourcesopts into a mechanical fold of each source's description under a## From <id>header;--dry-run(the default without--confirm) prints the full plan plus source descriptions so the caller can fold the prose in by hand viaupdate-description. - Cross-process write lock — every writer (
create,delete,update-*,set-block,add-tags,remove-tags,add-docs,remove-docs,rename-id,merge, file-inventory writes) takes an exclusive advisory lock around its full read-plan-write transaction, so two concurrent librarian processes serialize cleanly instead of last-writer-wins. - Tag normalization —
tag-auditflags case and separator variants of the same tag (e.g.Build-A-Botvsbuild-a-bot) so they don't fragment your index over time. - Concurrency-safe —
fcntladvisory locking guards every write. - MCP server — drive the whole tool from an AI assistant.
- Duplicate detection — fuzzy similarity warns you before you create a near-duplicate entry.
- Bundled agent template — a generic AI-agent definition in
agents/.
Requires Python 3.10+.
git clone <repository-url>
cd librarian
python3 -m venv .venv
source .venv/bin/activate
pip install -e .This puts a librarian command on your PATH. (For development — tests and
lint — use pip install -e ".[dev]" or run ./scripts/setup-dev.sh.)
# 1. Choose a schema. The tool runs schema-less by default; opt into one by
# copying it into your data home (created on first use):
mkdir -p ~/.config/librarian
cp schemas/performance-review.yaml ~/.config/librarian/schema.yaml
# 2. Inspect the active schema:
librarian schema
# 3. Log an activity (writes need a --label for the audit ledger):
librarian create --label cli:setup --json '{
"id": "2026-03-launch",
"date": "2026-03-15",
"title": "Led the v2 launch",
"description": "Shipped the v2 release; coordinated three teams.",
"tags": ["delivery", "leadership"],
"docs": ["https://example.com/v2-notes"],
"review": {"kind": "accomplishment", "competency": "leadership",
"scope": "organization", "review_period": "2026-H1",
"notes": "Cross-team delivery under deadline."}
}'
# 4. Search, filter, and summarize:
librarian search launch
librarian filter --block-field review.competency leadership
librarian stats
librarian rollup grant --sum amount --group-by status
librarian validateYour data lives in $XDG_CONFIG_HOME/librarian/ (or ~/.config/librarian/):
activities.yaml, files.yaml, schema.yaml, changes.log, and an
artifacts/ folder. Every location is overridable — see
Data location.
An entry always has the core fields id, date, title, description,
tags, docs (and an optional end_date). On top of that, a schema.yaml
declares optional structured blocks — each with named fields, types, and
enum value sets. The validator, stats, filter, and the field-update command
all read the active schema; nothing about any particular block is hardcoded.
Field types: enum, text, string, int, bool, date, date?
(nullable date). Enums can be dependent — the legal values of one field can
depend on the value of a sibling field.
With no schema.yaml, librarian runs in generic mode: full CRUD, search,
and file-inventory still work; blocks are just not validated.
Each maps to a concrete reporting moment. Pick the one that fits, or adapt one.
| Schema | For | The reporting moment it serves |
|---|---|---|
performance-review.yaml |
any employee | the annual performance review or promotion packet — the "brag document" |
ptr.yaml |
academic faculty | post-tenure review — teaching, scholarly, and service classification |
cpe.yaml |
certified professionals | continuing-education credit reporting — CISSP CPEs, PMP PDUs, nursing CEUs, etc. |
student-portfolio.yaml |
students | grad-school / scholarship / job applications — coursework, projects, research, awards |
grant.yaml |
grant holders | funding-portfolio reporting — award amount, role, status, sponsor; sum with rollup |
To track more than one at once, merge the blocks from several schema files into
one schema.yaml under a single blocks: mapping.
Here is the payoff of keeping a structured record. Because librarian holds a
tagged, searchable database of your real work, you — or an AI assistant via
the MCP server — can rapidly assemble a resume tailored to a specific job
posting: pull only the entries relevant to that role, and frame each example
toward what the solicitation actually asks for.
A targeted resume drawn from your real record beats a one-size-fits-all resume padded with experience that does not apply to the role. Instead of "here is everything I have ever done", you get "here is precisely the experience this job calls for, with concrete examples and dates" — assembled in minutes, because the underlying record already exists and is queryable.
# Find the experience relevant to a posting's keywords:
librarian search "incident response"
librarian filter --tag leadership --after 2024-01-01
librarian export --format json --tag cloud-securityHand that to an assistant connected over MCP and ask it to draft a resume section for the specific role — it works from evidence, not from a blank page.
Supporting artifacts (PDFs, posters, slide decks, certificates) are tracked in
a separate normalized registry, files.yaml. Register a file with file-add;
it gets an id, a category, a sha256 digest, and an added date. Entries then
reference a file by putting file:<id> in their docs list — not a raw path —
so moving or renaming the file is a single file-move edit and no entry
reference changes. file-add warns (without blocking) on exact-content
duplicates and fuzzily-similar titles. A file need not belong to any entry: a
standalone, categorized artifact is a valid record on its own.
librarian file-add ~/docs/cert.pdf --category Certifications \
--title "CISSP Certificate" --label cli:setup
librarian add-docs 2026-03-launch file:cert --label cli:setup
librarian file-list --orphans # inventory coverage reportA side-effect of writing descriptions is a queryable rolodex of the people you
collaborate with. Every time a description mentions someone with a two-or-more-word name like
First Last (email@domain) or First Last <email@domain>, the librarian
indexes the pair and remembers which entry mentioned them (single-token
mentions are skipped as too noisy to be reliable names). Ask contact <query>
to look someone up by name or email fragment; the result lists the entries where they
appear (up to three by default — see the note below the example for the full
list), so you can pivot from "who is Jane?" to "everything I've worked on
with Jane" without running two searches. No manual rolodex curation — it's
derived purely from the descriptions you already write.
librarian contact garcia
# Found 1 contact(s):
#
# Dr. Maria Garcia
# mgarcia@example.edu
# Sources: 2024-grant-application, 2025-conference-poster, 2025-coauthored-paperThe default text output caps the Sources: list at three entries per contact.
Pass --format json for the full, untruncated list (useful when a recurring
collaborator appears in many entries). Browse the whole rolodex with
librarian contact --all, or filter by any email substring with
librarian contact --institution example.edu to surface everyone at a single
institution (the flag is name-suggestive of the domain case but matches
anywhere in the email — --institution garcia would also hit local-parts
like garcia@anywhere.com).
librarian ships an MCP server so an AI
assistant can operate the database directly — log activities you describe in
chat, answer questions about your record, and assemble reports. Run it with:
python3 -m librarian.mcp_serverThe server self-bootstraps an in-repo .venv on first run and exposes the
read tools (no label needed) and write tools (a session_label is required for
audit attribution). Point your MCP-capable client at that command to register
it. If you set LIBRARIAN_MEMORY_DIR, the directory's Markdown files are also
exposed as MCP resources (opt-in; there is no default).
The canonical way to launch the server is as a Python module
(python -m librarian.mcp_server); launching it by file path also works (a
__package__ shim in the script handles it), but the module form is preferred.
For Claude Code, add this entry under mcpServers.librarian in
~/.claude.json, substituting your own paths:
{
"type": "stdio",
"command": "/absolute/path/to/librarian/.venv/bin/python",
"args": ["-m", "librarian.mcp_server"],
"env": {
"LIBRARIAN_YAML_PATH": "/absolute/path/to/activities.yaml",
"LIBRARIAN_FILES_PATH": "/absolute/path/to/files.yaml",
"LIBRARIAN_LEDGER_PATH": "/absolute/path/to/changes.log",
"LIBRARIAN_ROOT": "/absolute/path/to/data-root",
"LIBRARIAN_SCHEMA_PATH": "/absolute/path/to/schema.yaml"
}
}Every LIBRARIAN_* env var is optional; omit it to fall back to the XDG
default (~/.config/librarian/...). The env block is the right place to point
the OSS server at an existing data home — e.g. a YAML you already track in a
private directory — without copying the file. LIBRARIAN_ROOT is the base
against which inventory file paths (the path: field on each files.yaml
record) are resolved.
Restart Claude Code after editing ~/.claude.json so the MCP server is
re-spawned with the new config.
agents/librarian.md is a generic, reusable AI-agent definition template —
drop it into your assistant's agent configuration to get sensible behavior on
top of the raw tools: search-before-create, schema-aware classification,
consistent tagging, audit-labeled writes, and accurate attribution.
For Claude Code, copy it to ~/.claude/agents/librarian.md (user-scope, so it
is available to every Claude Code session on your machine):
cp agents/librarian.md ~/.claude/agents/librarian.md
⚠️ Customize before relying on it. The template ships intentionally generic — it has no project-specific judgment until you add it. After copying it, edit your local copy to fill in:
- Your active schema — which schema you selected (e.g.
ptr,cpe,performance-review,student-portfolio) and what its blocks mean for your reporting context.- Your tagging conventions — the projects, people, and topics you tag for, and their canonical forms (lowercase-hyphenated, TitleCase-Hyphenated, etc.).
- Project-specific notes — anything an agent should know about your record-keeping situation (preferred labels, what is in vs. out of scope for you, recurring collaborators).
Generic-as-shipped, the agent is a competent operator of the tool but not a curator of your record. Customization takes around 15 minutes and is the difference between an agent that logs entries blindly and one that behaves like a thoughtful collaborator.
The default data home is $XDG_CONFIG_HOME/librarian/, falling back to
~/.config/librarian/. Every location is overridable by an environment
variable:
| Variable | Overrides |
|---|---|
LIBRARIAN_HOME |
the whole data home directory |
LIBRARIAN_YAML_PATH |
the activities.yaml path |
LIBRARIAN_FILES_PATH |
the files.yaml inventory path |
LIBRARIAN_LEDGER_PATH |
the change-ledger path |
LIBRARIAN_SCHEMA_PATH |
the schema.yaml path |
LIBRARIAN_ROOT |
the root that inventory file paths resolve against |
LIBRARIAN_MEMORY_DIR |
the optional MCP memory-resource directory |
Per-resource variables win over LIBRARIAN_HOME, which wins over the XDG
default.
Read: search, get, filter, list, stats, rollup, tags,
tag-audit, validate, export, project, similar, contact, changes,
schema, env
Write: create, update-field, update-description, update-notes,
update-nested-field, set-block, add-tags, remove-tags, add-docs,
remove-docs, delete, rename-id, merge
File inventory: file-add, file-list, file-get, file-move,
file-update, file-rehash, file-search
Run librarian <command> --help for command-specific options.
Entry ids must be slugs — lowercase letters, digits and hyphens, at least
two characters (e.g. 2026-03-launch). create and rename-id reject other
shapes so ids stay unambiguous in cross-references and round-trip safely
through the space-delimited change ledger.
See CONTRIBUTING.md for the development setup, the local
checks (ruff check, ruff format --check, pytest), and the
Claude-driven review workflows that run automatically on pull requests. Security
issues: see SECURITY.md — please do not open a public issue
for them.
MIT © librarian contributors
