Skip to content

daniel-pittman/librarian

Repository files navigation

The Librarian — a friendly young cartoon character with round spectacles and a navy cardigan, holding a small stack of manila folders

The Librarian

License: MIT CI Python GitHub release

Meet the Librarian. He's a lightweight, local-first, plain-text activity tracker — keep a running, structured record of what you actually do, so reporting season is a query instead of a memory test.


Ever had one of these moments?

  • You sit down for your annual performance review and cannot remember half of what you accomplished this year. The big wins from January are a blur.
  • You finally start your multi-year post-tenure review and realize you never wrote down the talks, the committees, the student you mentored to a publication.
  • Your certification renewal is due and you need to report continuing-education credits — but the courses and conferences are scattered across an inbox, a calendar, and your memory.
  • You are a student applying to grad school and have to reconstruct three years of projects, research, and activities from old folders.
  • A great job posting appears and you want a resume tailored to it — not the generic one — but assembling the relevant experience from scratch is daunting.

Every one of those is the same problem: the work happened, but it was never recorded in a form you can search, filter, and report from. librarian fixes that. You log activities as they happen — a couple of minutes each — into a plain-text file you own. When a reporting moment arrives, the record is already there.

What it is

librarian is a CRUD + search tool over a YAML database of activity entries. Each entry has a few fixed core fields (id, date, title, description, tags, supporting docs) plus optional structured blocks defined by a pluggable schema you choose. It ships as:

  • a command-line tool (librarian),
  • an MCP server so AI assistants can read and update the database for you,
  • and four ready-made schemas for common reporting needs.

Why local-first, plain-text?

  • You own your data. It is a YAML file on your disk. No account, no subscription, no cloud, no vendor.
  • No lock-in. Plain text means you can read, grep, edit, and back it up with any tool. If you stop using librarian tomorrow, your record is still a perfectly readable file.
  • Git-friendly. Commit the file to a private repo and you get a full, diff-able history of your career record for free.
  • Extensible. The schema is data, not code — adapt it to your situation without touching the program.
  • AI-agent-friendly. The bundled MCP server lets an assistant log activities, answer "what did I do in Q2?", and assemble tailored reports — using your real record, not guesses.

Features

  • Pluggable schema — structured "blocks" (review classification, credit tracking, ...) are declared in a schema.yaml, not hardcoded. Add a block to an existing entry with set-block <id> <block> <json>; the whole block is validated atomically before the write.
  • Five bundled schemas — performance review, post-tenure review, certification credits, student portfolio, grant / funding.
  • Full-text and structured searchsearch, filter, list, project, similar, stats.
  • Aggregationrollup <block> [--sum FIELD] [--group-by FIELD] totals a block across the entries that carry it, e.g. librarian rollup grant --sum amount --group-by status for a funding portfolio breakdown (awarded vs. pending).
  • Project aggregationproject <name> lists every entry tagged with a project name and appends any keyword-only matches as a "more like this" section; --strict drops the appendix, --broad switches to keyword hits anywhere in the entry text.
  • Slice exportexport writes a filtered subset of entries to CSV or JSON, with date and tag filters, for downstream reporting or import into other tools.
  • Format-preserving writes — edits are surgical, line-level splices; your hand-formatting and paragraph breaks survive every write.
  • Schema validationvalidate flags bad enum values, missing required fields, duplicate ids, dangling cross-references, and inventory problems. An entry that legitimately has no artifact can set docs_optional: true (update-field <id> docs_optional true) to suppress its NO DOCS warning so genuine gaps stand out.
  • Discoverabilityenv prints the resolved data-home paths and which LIBRARIAN_* variable set each (the "truth sources" for the active setup); schema enumerates every enum's allowed values, including dependent maps like category → subcategory, so valid choices are discoverable without reading schema.yaml.
  • Change ledger — every write is appended to an audit log; poll "what changed since I last looked?" with changes. search, filter, and list also accept --changed-since / --changed-until to intersect a query with the ledger — e.g. filter --tag grant --changed-since 2026-05-01 returns only the grant entries changed since you last pulled, which is handy whenever you need to export updates into an external system.
  • File inventory — track supporting artifacts (PDFs, posters, certificates) in a normalized registry, with sha256 de-duplication.
  • Contact rolodex — auto-built index of the people you collaborate with, derived from two-or-more-word Name (email) or Name <email> mentions in your descriptions and queryable by name or email fragment.
  • Safe cross-referencesrename-id repoints every backticked and plain-text reference to an entry across the corpus, token-bounded so longer ids aren't matched as substrings. delete <id> --repoint-to <target> reuses the same rewriter, so deleting an entry can rewrite its inbound references to a successor in the same call instead of leaving dangling links behind.
  • Atomic consolidationmerge <source-id>... --into <target-id> folds tags, docs, and missing schema blocks from one or more source entries into a target, repoints every inbound reference, and deletes the sources, all in one locked write. --on-block-conflict=abort|keep-target|keep-source chooses how to resolve same-block collisions; --append-sources opts into a mechanical fold of each source's description under a ## From <id> header; --dry-run (the default without --confirm) prints the full plan plus source descriptions so the caller can fold the prose in by hand via update-description.
  • Cross-process write lock — every writer (create, delete, update-*, set-block, add-tags, remove-tags, add-docs, remove-docs, rename-id, merge, file-inventory writes) takes an exclusive advisory lock around its full read-plan-write transaction, so two concurrent librarian processes serialize cleanly instead of last-writer-wins.
  • Tag normalizationtag-audit flags case and separator variants of the same tag (e.g. Build-A-Bot vs build-a-bot) so they don't fragment your index over time.
  • Concurrency-safefcntl advisory locking guards every write.
  • MCP server — drive the whole tool from an AI assistant.
  • Duplicate detection — fuzzy similarity warns you before you create a near-duplicate entry.
  • Bundled agent template — a generic AI-agent definition in agents/.

Install

Requires Python 3.10+.

git clone <repository-url>
cd librarian
python3 -m venv .venv
source .venv/bin/activate
pip install -e .

This puts a librarian command on your PATH. (For development — tests and lint — use pip install -e ".[dev]" or run ./scripts/setup-dev.sh.)

Quickstart

# 1. Choose a schema. The tool runs schema-less by default; opt into one by
#    copying it into your data home (created on first use):
mkdir -p ~/.config/librarian
cp schemas/performance-review.yaml ~/.config/librarian/schema.yaml

# 2. Inspect the active schema:
librarian schema

# 3. Log an activity (writes need a --label for the audit ledger):
librarian create --label cli:setup --json '{
  "id": "2026-03-launch",
  "date": "2026-03-15",
  "title": "Led the v2 launch",
  "description": "Shipped the v2 release; coordinated three teams.",
  "tags": ["delivery", "leadership"],
  "docs": ["https://example.com/v2-notes"],
  "review": {"kind": "accomplishment", "competency": "leadership",
             "scope": "organization", "review_period": "2026-H1",
             "notes": "Cross-team delivery under deadline."}
}'

# 4. Search, filter, and summarize:
librarian search launch
librarian filter --block-field review.competency leadership
librarian stats
librarian rollup grant --sum amount --group-by status
librarian validate

Your data lives in $XDG_CONFIG_HOME/librarian/ (or ~/.config/librarian/): activities.yaml, files.yaml, schema.yaml, changes.log, and an artifacts/ folder. Every location is overridable — see Data location.

The schema system

An entry always has the core fields id, date, title, description, tags, docs (and an optional end_date). On top of that, a schema.yaml declares optional structured blocks — each with named fields, types, and enum value sets. The validator, stats, filter, and the field-update command all read the active schema; nothing about any particular block is hardcoded.

Field types: enum, text, string, int, bool, date, date? (nullable date). Enums can be dependent — the legal values of one field can depend on the value of a sibling field.

With no schema.yaml, librarian runs in generic mode: full CRUD, search, and file-inventory still work; blocks are just not validated.

The five bundled schemas

Each maps to a concrete reporting moment. Pick the one that fits, or adapt one.

Schema For The reporting moment it serves
performance-review.yaml any employee the annual performance review or promotion packet — the "brag document"
ptr.yaml academic faculty post-tenure review — teaching, scholarly, and service classification
cpe.yaml certified professionals continuing-education credit reporting — CISSP CPEs, PMP PDUs, nursing CEUs, etc.
student-portfolio.yaml students grad-school / scholarship / job applications — coursework, projects, research, awards
grant.yaml grant holders funding-portfolio reporting — award amount, role, status, sponsor; sum with rollup

To track more than one at once, merge the blocks from several schema files into one schema.yaml under a single blocks: mapping.

Use case: a tailored resume as a job aid

Here is the payoff of keeping a structured record. Because librarian holds a tagged, searchable database of your real work, you — or an AI assistant via the MCP server — can rapidly assemble a resume tailored to a specific job posting: pull only the entries relevant to that role, and frame each example toward what the solicitation actually asks for.

A targeted resume drawn from your real record beats a one-size-fits-all resume padded with experience that does not apply to the role. Instead of "here is everything I have ever done", you get "here is precisely the experience this job calls for, with concrete examples and dates" — assembled in minutes, because the underlying record already exists and is queryable.

# Find the experience relevant to a posting's keywords:
librarian search "incident response"
librarian filter --tag leadership --after 2024-01-01
librarian export --format json --tag cloud-security

Hand that to an assistant connected over MCP and ask it to draft a resume section for the specific role — it works from evidence, not from a blank page.

The file inventory

Supporting artifacts (PDFs, posters, slide decks, certificates) are tracked in a separate normalized registry, files.yaml. Register a file with file-add; it gets an id, a category, a sha256 digest, and an added date. Entries then reference a file by putting file:<id> in their docs list — not a raw path — so moving or renaming the file is a single file-move edit and no entry reference changes. file-add warns (without blocking) on exact-content duplicates and fuzzily-similar titles. A file need not belong to any entry: a standalone, categorized artifact is a valid record on its own.

librarian file-add ~/docs/cert.pdf --category Certifications \
  --title "CISSP Certificate" --label cli:setup
librarian add-docs 2026-03-launch file:cert --label cli:setup
librarian file-list --orphans     # inventory coverage report

The contact rolodex

A side-effect of writing descriptions is a queryable rolodex of the people you collaborate with. Every time a description mentions someone with a two-or-more-word name like First Last (email@domain) or First Last <email@domain>, the librarian indexes the pair and remembers which entry mentioned them (single-token mentions are skipped as too noisy to be reliable names). Ask contact <query> to look someone up by name or email fragment; the result lists the entries where they appear (up to three by default — see the note below the example for the full list), so you can pivot from "who is Jane?" to "everything I've worked on with Jane" without running two searches. No manual rolodex curation — it's derived purely from the descriptions you already write.

librarian contact garcia
# Found 1 contact(s):
#
#   Dr. Maria Garcia
#     mgarcia@example.edu
#     Sources: 2024-grant-application, 2025-conference-poster, 2025-coauthored-paper

The default text output caps the Sources: list at three entries per contact. Pass --format json for the full, untruncated list (useful when a recurring collaborator appears in many entries). Browse the whole rolodex with librarian contact --all, or filter by any email substring with librarian contact --institution example.edu to surface everyone at a single institution (the flag is name-suggestive of the domain case but matches anywhere in the email — --institution garcia would also hit local-parts like garcia@anywhere.com).

The MCP server

librarian ships an MCP server so an AI assistant can operate the database directly — log activities you describe in chat, answer questions about your record, and assemble reports. Run it with:

python3 -m librarian.mcp_server

The server self-bootstraps an in-repo .venv on first run and exposes the read tools (no label needed) and write tools (a session_label is required for audit attribution). Point your MCP-capable client at that command to register it. If you set LIBRARIAN_MEMORY_DIR, the directory's Markdown files are also exposed as MCP resources (opt-in; there is no default).

Registering with Claude Code (~/.claude.json)

The canonical way to launch the server is as a Python module (python -m librarian.mcp_server); launching it by file path also works (a __package__ shim in the script handles it), but the module form is preferred. For Claude Code, add this entry under mcpServers.librarian in ~/.claude.json, substituting your own paths:

{
  "type": "stdio",
  "command": "/absolute/path/to/librarian/.venv/bin/python",
  "args": ["-m", "librarian.mcp_server"],
  "env": {
    "LIBRARIAN_YAML_PATH":   "/absolute/path/to/activities.yaml",
    "LIBRARIAN_FILES_PATH":  "/absolute/path/to/files.yaml",
    "LIBRARIAN_LEDGER_PATH": "/absolute/path/to/changes.log",
    "LIBRARIAN_ROOT":        "/absolute/path/to/data-root",
    "LIBRARIAN_SCHEMA_PATH": "/absolute/path/to/schema.yaml"
  }
}

Every LIBRARIAN_* env var is optional; omit it to fall back to the XDG default (~/.config/librarian/...). The env block is the right place to point the OSS server at an existing data home — e.g. a YAML you already track in a private directory — without copying the file. LIBRARIAN_ROOT is the base against which inventory file paths (the path: field on each files.yaml record) are resolved.

Restart Claude Code after editing ~/.claude.json so the MCP server is re-spawned with the new config.

The bundled agent template

agents/librarian.md is a generic, reusable AI-agent definition template — drop it into your assistant's agent configuration to get sensible behavior on top of the raw tools: search-before-create, schema-aware classification, consistent tagging, audit-labeled writes, and accurate attribution.

For Claude Code, copy it to ~/.claude/agents/librarian.md (user-scope, so it is available to every Claude Code session on your machine):

cp agents/librarian.md ~/.claude/agents/librarian.md

⚠️ Customize before relying on it. The template ships intentionally generic — it has no project-specific judgment until you add it. After copying it, edit your local copy to fill in:

  • Your active schema — which schema you selected (e.g. ptr, cpe, performance-review, student-portfolio) and what its blocks mean for your reporting context.
  • Your tagging conventions — the projects, people, and topics you tag for, and their canonical forms (lowercase-hyphenated, TitleCase-Hyphenated, etc.).
  • Project-specific notes — anything an agent should know about your record-keeping situation (preferred labels, what is in vs. out of scope for you, recurring collaborators).

Generic-as-shipped, the agent is a competent operator of the tool but not a curator of your record. Customization takes around 15 minutes and is the difference between an agent that logs entries blindly and one that behaves like a thoughtful collaborator.

Data location

The default data home is $XDG_CONFIG_HOME/librarian/, falling back to ~/.config/librarian/. Every location is overridable by an environment variable:

Variable Overrides
LIBRARIAN_HOME the whole data home directory
LIBRARIAN_YAML_PATH the activities.yaml path
LIBRARIAN_FILES_PATH the files.yaml inventory path
LIBRARIAN_LEDGER_PATH the change-ledger path
LIBRARIAN_SCHEMA_PATH the schema.yaml path
LIBRARIAN_ROOT the root that inventory file paths resolve against
LIBRARIAN_MEMORY_DIR the optional MCP memory-resource directory

Per-resource variables win over LIBRARIAN_HOME, which wins over the XDG default.

Command reference

Read: search, get, filter, list, stats, rollup, tags, tag-audit, validate, export, project, similar, contact, changes, schema, env

Write: create, update-field, update-description, update-notes, update-nested-field, set-block, add-tags, remove-tags, add-docs, remove-docs, delete, rename-id, merge

File inventory: file-add, file-list, file-get, file-move, file-update, file-rehash, file-search

Run librarian <command> --help for command-specific options.

Entry ids must be slugs — lowercase letters, digits and hyphens, at least two characters (e.g. 2026-03-launch). create and rename-id reject other shapes so ids stay unambiguous in cross-references and round-trip safely through the space-delimited change ledger.

Contributing

See CONTRIBUTING.md for the development setup, the local checks (ruff check, ruff format --check, pytest), and the Claude-driven review workflows that run automatically on pull requests. Security issues: see SECURITY.md — please do not open a public issue for them.

License

MIT © librarian contributors

About

A local-first, plain-text activity tracker with a pluggable schema and an MCP server — keep a structured record of your work for reviews, portfolios, and credentialing.

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors