GitHub - Vrin-cloud/engram: Knowledge-graph RAG library for multi-hop QA — iterative retrieve-and-reason (IRCoT), graph retrieval, and cross-document enrichment at ingest. Production retrieval pipeline in Python.

Engram
A KG-augmented RAG library with iterative retrieve-and-reason for multi-hop questions.

Status: pre-alpha. Not on PyPI yet — install from source (Install). API will change before v0.1.0. Full evolution in CHANGELOG.md.

TL;DR

Engram is a Python library for production RAG with three opt-in capabilities most other libraries don't ship:

IRCoT (iterative retrieve-and-reason). Round 1 retrieves; reader emits a CoT thought; round 2 retrieves with the thought as augmented query. +0.09 F1 over single-pass on MuSiQue at gpt-4o-mini reader.
A knowledge-graph layer. Entity + fact extraction at ingest; two-stage Personalized PageRank + multi-hop beam search + triple-vector ANN match + RRF fusion at query time. Opt-in via --kg-retrieval. Adds graph capabilities; F1 lift over baseline+IRCoT is neutral on MuSiQue (use it for the capabilities, not for the benchmark).
A strategic router (adaptive per-query orchestration). One token-minimal LLM call per query decides which capabilities to enable — IRCoT, KG traversal, decomposition, MQE, retrieval planner — based on the question's structure. Statistical parity with the benchmark-tuned static config at ~40% lower median latency. Opt-in via --adaptive. See Adaptive strategic router.

Production-quality components — BM25 + dense + RRF, Cohere Rerank 3.5, Jaccard dedup — are wired into the default pipeline.

Measured performance

Reference on MuSiQue dev (benchmarks/fixtures/musique_n200_seed1_ids.json, gpt-4o-mini reader, text-embedding-3-small embedder, Cohere Rerank 3.5 via AWS Bedrock):

Config	F1	EM	Notes
Plain hybrid (no rerank)	~0.40	—	Field reference floor
+ Cohere Rerank ("fast mode")	0.46	0.32	Engram's no-IRCoT default
+ IRCoT (production v1)	0.54	0.40	Headline number
+ KG-hybrid retrieval	0.51-0.53	0.36-0.39	Adds graph capabilities; no F1 lift over IRCoT
+ Graph-aware retrieval planner (opt-in)	within noise	within noise	One LLM call up-front emits a typed plan that biases beam search + rerank — see below
Adaptive strategic router (`--adaptive`)	0.52	0.38	Within run variance of production v1 at ~40% lower median latency (p50 8.9s vs 14.5s); skips KG traversal on 48% of queries and the second reader round on 24%
Field SOTA at this reader (G-reasoner)	0.525	0.385	Trained 8M-param GNN

Engram baseline + IRCoT lands at field SOTA for gpt-4o-mini on n=200. Sample variance is ±0.02-0.03 F1 across reruns. Reproduce with benchmarks/musique.py; methodology in docs/benchmarks.md.

Where Engram fits in your RAG pipeline

Document → Chunker → [optional: Engram.aenrich] → Embedder → Vector DB
                              ↑
                              builds KG (entity graph, bi-temporal supersession)
                              via cold-path LLM extraction
                                                                        │
                                                                        ▼
                                                       Retriever → [Engram retrieval modes] → Reader
                                                                        ↑
                                                                        hybrid + Cohere rerank
                                                                        + IRCoT + optional KG fusion

Engram is two things at once:

An ingestion enricher (Engram.aenrich, opt-in via build_graph=True) that builds a knowledge graph alongside your chunks.
A retrieval lift layer (hybrid + Cohere rerank + IRCoT + optional KG fusion at query time). Today this lives in benchmarks/; promoted to a stable library API in v0.1.0.

Install

Not on PyPI yet. Package name reserved for v0.1.0.

git clone https://github.com/Vrin-cloud/engram
cd engram
uv venv && uv pip install -e ".[memory,llm,benchmarks,observability]"

Extra	Brings in
`memory`	lmdb, hnswlib, numpy, networkx, scipy — the default `MemoryBackend`
`llm`	litellm, instructor, tenacity — the `LLMProvider` stack
`benchmarks`	datasets, rank-bm25 — to run the MuSiQue benchmark
`observability`	opentelemetry — tracing spans for ingest + query
`all`	every extra

API keys

Var	Used by
`OPENAI_API_KEY`	text-embedding-3-small (embedder), gpt-4o-mini (reader) defaults
AWS credentials (default chain)	Cohere Rerank 3.5 on Bedrock — region defaults to us-east-1
`ANTHROPIC_API_KEY`	only if you pass `--reader-model anthropic/claude-haiku-4-5`

Quickstart

Run the headline benchmark

python -m benchmarks.musique \
  --question-ids-file benchmarks/fixtures/musique_n200_seed1_ids.json \
  --mode baseline \
  --rerank \
  --ircot \
  --output predictions.jsonl

Expected: F1 ~0.54, EM ~0.40, ~5 min wall time, ~$1.50 in API costs. This is the production v1 default (hybrid + Cohere rerank + Jaccard dedup + IRCoT) on the canonical 200-question MuSiQue fixture.

Add KG mode for graph capabilities

python -m benchmarks.musique \
  --question-ids-file benchmarks/fixtures/musique_n200_seed1_ids.json \
  --mode enriched \
  --build-graph --kg-retrieval --rerank --ircot \
  --disable-derivation --disable-bridging

--build-graph runs Engram's cold-path ingest (entity extraction → canonical resolution with alias persistence → fact extraction with literal-value sink → fact-triple embedding into a second hnswlib → bi-temporal supersession). --kg-retrieval fuses hybrid retrieval with triple-vector ANN + hub-weighted two-stage PPR + multi-hop confidence-decayed beam search via RRF.

Add --retrieval-planner to layer the graph-aware planner on top: one extra LLM call per query produces a structured RetrievalPlan (expected answer type, priority predicates, optional hop sequence) that biases beam search, post-fusion fact filtering, and the Cohere Rerank query. Default off; opt in when you want explainable retrieval traces. --trace-retrieval-plan PATH dumps every plan to JSONL for inspection.

Costs ~$0.40 per 1K chunks of cold-path + ~12 min/1K of ingest latency. Use it when you need graph queries, contradiction surfacing, or bi-temporal awareness — not for F1 lift on standard QA.

Adaptive strategic router

python -m benchmarks.musique \
  --question-ids-file benchmarks/fixtures/musique_n200_seed1_ids.json \
  --mode enriched \
  --build-graph --rerank --adaptive

--adaptive puts capability selection under a per-query strategic router: one token-minimal LLM call (output is a bare list of capability tags, single-digit tokens) that reads the question's structure and decides what this specific query needs. The router picks from {ircot, kg_retrieval, decomposition, mqe, retrieval_planner}; hybrid retrieval, Cohere Rerank, and entity extraction are always on underneath.

The routing rests on one core distinction the prompt teaches explicitly:

Depth (sequential chains — "the X of the Y", bridge entities that must be discovered before the next hop) → ircot. Decomposition measurably hurts depth chains (parallel sub-questions can't phrase later hops before earlier answers exist).
Breadth (open-ended, multi-aspect — evaluations, memos, bear/bull cases) → decomposition, usually with kg_retrieval + ircot.

One hard safety rule: questions that don't explicitly name their subject ("this round", "the company") must enable kg_retrieval so the graph anchors the query on the corpus's central entities instead of guessing — this eliminates a measured subject-hallucination failure mode.

Measured on the n=200 MuSiQue fixture: EM 0.38 / F1 0.52 (within run variance of the static production v1 config) at p50 latency 8.9s vs 14.5s — the router skips graph traversal on 48% of queries and the second reader round on 24%. On an open-ended evaluation fixture (benchmarks/fixtures/pebble_corpus.jsonl) the same router with no per-domain tuning shifts its plan profile to kg 100% / decomposition 50% and outscores both the static config and plain hybrid. Routing triggers must reference observable question structure — an earlier prompt that asked the router to predict retrieval insufficiency (unobservable from question text) fired IRCoT on only 21% of multi-hop questions and cost 5 EM points.

Compare configurations on your own corpus with benchmarks/custom_kb.py: it runs standard (hybrid + rerank), vrin (static full stack), and vrin_adaptive (router) side by side over a JSONL corpus + question set and reports per-question deltas plus each routing decision.

Use Engram as a Python library

Today's stable surface is the ingest API:

from engram import Engram
from engram.backends.memory import MemoryBackend
from engram.llm.embedders import LiteLLMEmbedder
from engram.llm.litellm_provider import LiteLLMProvider
from engram.llm.rate_control import AdaptiveConcurrency, TokenBucket

embedder = LiteLLMEmbedder(model="openai/text-embedding-3-small")
backend = MemoryBackend(embedder=embedder, path="./engram-data")

llm = LiteLLMProvider(
    bucket=TokenBucket(rate=20.0, burst=25),
    adaptive=AdaptiveConcurrency(initial_limit=4, max_limit=12),
    default_model="openai/gpt-4o-mini",
)

engram = Engram(
    corpus_backend=backend,
    llm=llm,
    build_graph=True,
    enable_synthesis=False,
    enable_derivation=False,
    enable_bridging=False,
    model="openai/gpt-4o-mini",
)

enriched = await engram.aenrich(your_chunks)
# Each EnrichedChunk has: id, text, source_id, enrichment_summary, metadata
# The backend's fact_graph (networkx.MultiDiGraph) holds entity ↔ fact structure
# Stored facts queryable via backend.find_facts(...) and backend.neighbors_facts(...)

The query-time lift layer (hybrid_neighbors, kg_hybrid_neighbors, IRCoT 2-round flow) currently lives in benchmarks/runner.py; promoted to engram.retrieve / engram.iterative_query in v0.1.0.

Deep documentation

Doc	What's in it
docs/architecture.md	High-level architecture, data model, storage layout, ingest + query lifecycles
docs/benchmarks.md	MuSiQue methodology, fixtures, replay against existing indices, cost / latency profiles
docs/configuration.md	Every CLI flag and constructor parameter, with defaults and effects
docs/kg-internals.md	LMDB sub-db layout, fact graph schema, PPR / beam search internals
docs/llm-provider.md	LiteLLM routing, Instructor strict-mode, prompt caching, rate control
docs/concepts/ircot.md	IRCoT pattern explainer + why it works at our scale
docs/concepts/kg-retrieval.md	Triple match, two-stage PPR (PropRAG), beam search, RRF fusion
docs/concepts/synthesis-and-extraction.md	Hot path vs cold path, entity / fact extraction, pronoun resolution
docs/concepts/bi-temporal.md	Fact supersession, valid_from / valid_to / recorded_at semantics, Noisy-OR fusion

CLI flags reference (quick)

python -m benchmarks.musique --help is the source of truth. Common flags:

Flag	Default	Effect
`--mode {baseline,enriched,both}`	`both`	`baseline` skips cold path; `enriched` runs `Engram.aenrich` first
`--rerank` / `--no-rerank`	on	Cohere Rerank 3.5 via Bedrock
`--ircot`	off	2-round retrieve-then-reason. +0.09 F1.
`--build-graph`	off	Cold-path KG ingest
`--kg-retrieval`	off	Triple match + 2-stage PPR + beam at query time
`--disable-synthesis`	off	Skip per-chunk synthesis (saves ~$0.30/1K + ~4 min/1K)
`--disable-derivation`	off	Skip cold-path derivation pass
`--disable-bridging`	off	Skip cold-path bridging pass
`--adaptive`	off	Strategic router decides capabilities per query. Requires `--build-graph`
`--question-ids-file PATH`	—	Pin to a question ID set for reproducibility
`--data-dir PATH`	`~/.engram-bench/...`	Where LMDB indexes live

Full reference: docs/configuration.md.

Architecture decisions backed by ablation

These reflect measurements, not theory. Full ablation log in docs/benchmarks.md.

Jaccard dedup post-rerank is always on. Cheap quality win.
IRCoT is opt-in but recommended as the default lift layer (+0.09 F1).
Synthesis contributes ~+0.04 F1 only when KG retrieval is also on. Otherwise it's pure latency cost.
Derivation and bridging were deferred in the v0 KG-hybrid plan; benchmark defaults are --disable-derivation --disable-bridging.
MQE, decomposition, sufficiency-judge, CRAG-style filter all regressed when stacked on top of baseline + IRCoT. Not in the production config. The adaptive router resolves the decomposition finding: it's a depth-vs-breadth mismatch — decomposition hurts sequential factoid chains (where these ablations ran) but helps open-ended multi-aspect questions; the router applies it only to the latter.
Graph-aware retrieval planner is opt-in via --retrieval-planner (default OFF). One LLM call per query reads a compressed view of the relevant fact-graph slice and emits a typed RetrievalPlan (expected answer type, priority predicates, optional hop sequence). The plan biases — does not replace — beam search edge weighting, post-fusion fact filtering, and the Cohere Rerank query. Metric impact is within run-to-run variance at n=100; ship it for the capability (explainable retrieval, structured plan traces) rather than the EM/F1 number. Plumbing in src/engram/core/graph_view.py, src/engram/dialogue/retrieval_planner.py, prompt in src/engram/dialogue/prompts/retrieval_plan.py.

Cost and latency profile

Per 1K chunks at ~400 tokens each, gpt-4o-mini + text-embedding-3-small + Cohere Rerank 3.5:

Stage	Default mode (IRCoT, no KG)	KG mode (`--build-graph --kg-retrieval`)
Ingest cost	~$0.008	~$0.46 (with synthesis) / ~$0.38 (without)
Ingest latency	~15 sec	~12 min (with synthesis) / ~8.5 min (without)
Query cost	~$0.003	~$0.004
Query latency	~3-5 sec	~4-6 sec

Comparison to adjacent systems

System	Mechanism	Where they fit
Cohere Rerank, Voyage Rerank	Cross-encoder rerank over the retriever's top-K	Engram uses Cohere Rerank — a building block, not a competitor
GBrain (`github.com/garrytan/gbrain`)	Production hybrid retrieval: BM25 + dense + RRF + cross-encoder rerank, intent classification, mode bundles, structural code-edge walk	The production-grade hybrid retrieval reference. Engram extends it with iterative retrieve-then-reason (IRCoT) and an entity/fact KG
HippoRAG 2	OpenIE triples + synonym edges + PPR + recognition gate	Closest conceptually to Engram's KG mode
PropRAG	n-ary propositions + two-stage PPR + beam	Two-stage PPR is ported (`engram.core.kg_retrieval.two_stage_ppr_facts`). Propositions are deferred
LangChain, LlamaIndex	RAG framework + integrations	Engram is a focused library; could be wrapped by either

Codebase map (for AI agents and contributors)

Path	What's there
`src/engram/core/models.py`	Pydantic v2: `Chunk`, `EnrichedChunk`, `Fact`, `EntityRecord`, `Contradiction`, `CrossReference`
`src/engram/core/protocol.py`	`CorpusBackend`, `LLMProvider`, `Embedder` — async-first protocols
`src/engram/core/scoring.py`	`deduplicate_chunks` (Jaccard 0.70)
`src/engram/core/entities.py`	`normalize_entity_name`, `entities_match_fuzzy`, `case_variants`, `is_literal_value`
`src/engram/core/retrieval.py`	`TraversalConfig`, `merge_fact_strategies` (RRF), `dynamic_chunk_cutoff`
`src/engram/core/kg_retrieval.py`	`triple_match`, `ppr_facts`, `two_stage_ppr_facts`, `beam_search_facts`, `facts_to_chunk_ids`
`src/engram/backends/memory.py`	`MemoryBackend` — LMDB + hnswlib + in-memory networkx graph
`src/engram/dialogue/strategic_router.py`	Adaptive router — `decide_strategy` + dependency resolution
`src/engram/dialogue/prompts/strategic_plan.py`	`StrategicPlan` schema + the depth/breadth routing prompt
`src/engram/dialogue/orchestrator.py`	`Engram` orchestrator — hot path + cold path
`src/engram/dialogue/extraction.py`	Batched entity + fact extraction (Instructor strict-mode)
`src/engram/dialogue/prompts/extraction.py`	Prompts with explicit pronoun/coref resolution
`src/engram/dialogue/temporal.py`	Bi-temporal conflict detection + supersession
`src/engram/dialogue/contradiction.py`	Noisy-OR confidence fusion
`src/engram/llm/litellm_provider.py`	`LLMProvider` over LiteLLM + Instructor
`src/engram/llm/embedders.py`	`LiteLLMEmbedder`, `OllamaEmbedder`
`src/engram/llm/rate_control.py`	`TokenBucket` + `AdaptiveConcurrency`
`benchmarks/musique.py`	CLI for the MuSiQue ablation benchmark (`--adaptive` for the router)
`benchmarks/custom_kb.py`	Bring-your-own-corpus comparison: standard vs static vs adaptive
`benchmarks/runner.py`	Query pipeline incl. IRCoT 2-round + kg_hybrid_neighbors
`benchmarks/retrieval.py`	`hybrid_neighbors`, `kg_hybrid_neighbors`, RRF, BM25, dedup wiring
`benchmarks/fixtures/`	Pinned MuSiQue question-id JSON files (n=100, n=200)

Stable interfaces today

Surface	Module	Status
`Engram` class (`.enrich`, `.aenrich`)	`engram`	Stable
`MemoryBackend`	`engram.backends.memory`	Stable; LMDB schema versions tracked in CHANGELOG.md
`CorpusBackend`, `LLMProvider`, `Embedder` protocols	`engram.core.protocol`	Stable contracts
`Chunk`, `EnrichedChunk`, `Fact`, `EntityRecord`	`engram.core.models`	Stable Pydantic v2
`LiteLLMProvider`, `LiteLLMEmbedder`	`engram.llm.*`	Stable
`kg_hybrid_neighbors`, IRCoT loop	`benchmarks.runner`	Promoted to `engram.retrieve` in v0.1.0 — currently in benchmarks namespace

Project status

Track	State
KG-hybrid foundation (Phases 1-6)	shipped to main
IRCoT 2-round retrieval	shipped (opt-in via `--ircot`)
Two-stage PPR (PropRAG) + multi-hop beam search	shipped
Entity resolution + alias persistence + pronoun-aware extraction	shipped
Adaptive strategic router	shipped on `strategic-router` (opt-in via `--adaptive`)
Reactive escalation (round-2 from round-1 evidence signals, not upfront prediction)	next
Promote query-side API to library namespace	scheduled for v0.1.0
Public PyPI release	scheduled for v0.1.0
HopRAG pseudo-question edges, PropRAG propositions, doc-level preamble	deferred
Hosted Vrin API + SDK + MCP server	planning phase

Contributing

See CONTRIBUTING.md. All commits must carry a Signed-off-by trailer per the Developer Certificate of Origin.

Security

Report vulnerabilities to vedant@vrin.cloud. See SECURITY.md.

License

Apache License 2.0. See LICENSE and NOTICE.

Name		Name	Last commit message	Last commit date
Latest commit History 72 Commits
.github		.github
benchmarks		benchmarks
docs		docs
src/engram		src/engram
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CITATION.cff		CITATION.cff
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MASTER_PLAN.md		MASTER_PLAN.md
NOTICE		NOTICE
README.md		README.md
SECURITY.md		SECURITY.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TL;DR

Measured performance

Where Engram fits in your RAG pipeline

Install

API keys

Quickstart

Run the headline benchmark

Add KG mode for graph capabilities

Adaptive strategic router

Use Engram as a Python library

Deep documentation

CLI flags reference (quick)

Architecture decisions backed by ablation

Cost and latency profile

Comparison to adjacent systems

Codebase map (for AI agents and contributors)

Stable interfaces today

Project status

Contributing

Security

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

TL;DR

Measured performance

Where Engram fits in your RAG pipeline

Install

API keys

Quickstart

Run the headline benchmark

Add KG mode for graph capabilities

Adaptive strategic router

Use Engram as a Python library

Deep documentation

CLI flags reference (quick)

Architecture decisions backed by ablation

Cost and latency profile

Comparison to adjacent systems

Codebase map (for AI agents and contributors)

Stable interfaces today

Project status

Contributing

Security

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages