Skip to content

Vrin-cloud/engram

Engram
A KG-augmented RAG library with iterative retrieve-and-reason for multi-hop questions.

CI PyPI Python versions License


Status: pre-alpha. Not on PyPI yet — install from source (Install). API will change before v0.1.0. Full evolution in CHANGELOG.md.

TL;DR

Engram is a Python library for production RAG with three opt-in capabilities most other libraries don't ship:

  1. IRCoT (iterative retrieve-and-reason). Round 1 retrieves; reader emits a CoT thought; round 2 retrieves with the thought as augmented query. +0.09 F1 over single-pass on MuSiQue at gpt-4o-mini reader.
  2. A knowledge-graph layer. Entity + fact extraction at ingest; two-stage Personalized PageRank + multi-hop beam search + triple-vector ANN match + RRF fusion at query time. Opt-in via --kg-retrieval. Adds graph capabilities; F1 lift over baseline+IRCoT is neutral on MuSiQue (use it for the capabilities, not for the benchmark).
  3. A strategic router (adaptive per-query orchestration). One token-minimal LLM call per query decides which capabilities to enable — IRCoT, KG traversal, decomposition, MQE, retrieval planner — based on the question's structure. Statistical parity with the benchmark-tuned static config at ~40% lower median latency. Opt-in via --adaptive. See Adaptive strategic router.

Production-quality components — BM25 + dense + RRF, Cohere Rerank 3.5, Jaccard dedup — are wired into the default pipeline.

Measured performance

Reference on MuSiQue dev (benchmarks/fixtures/musique_n200_seed1_ids.json, gpt-4o-mini reader, text-embedding-3-small embedder, Cohere Rerank 3.5 via AWS Bedrock):

Config F1 EM Notes
Plain hybrid (no rerank) ~0.40 Field reference floor
+ Cohere Rerank ("fast mode") 0.46 0.32 Engram's no-IRCoT default
+ IRCoT (production v1) 0.54 0.40 Headline number
+ KG-hybrid retrieval 0.51-0.53 0.36-0.39 Adds graph capabilities; no F1 lift over IRCoT
+ Graph-aware retrieval planner (opt-in) within noise within noise One LLM call up-front emits a typed plan that biases beam search + rerank — see below
Adaptive strategic router (--adaptive) 0.52 0.38 Within run variance of production v1 at ~40% lower median latency (p50 8.9s vs 14.5s); skips KG traversal on 48% of queries and the second reader round on 24%
Field SOTA at this reader (G-reasoner) 0.525 0.385 Trained 8M-param GNN

Engram baseline + IRCoT lands at field SOTA for gpt-4o-mini on n=200. Sample variance is ±0.02-0.03 F1 across reruns. Reproduce with benchmarks/musique.py; methodology in docs/benchmarks.md.

Where Engram fits in your RAG pipeline

Document → Chunker → [optional: Engram.aenrich] → Embedder → Vector DB
                              ↑
                              builds KG (entity graph, bi-temporal supersession)
                              via cold-path LLM extraction
                                                                        │
                                                                        ▼
                                                       Retriever → [Engram retrieval modes] → Reader
                                                                        ↑
                                                                        hybrid + Cohere rerank
                                                                        + IRCoT + optional KG fusion

Engram is two things at once:

  • An ingestion enricher (Engram.aenrich, opt-in via build_graph=True) that builds a knowledge graph alongside your chunks.
  • A retrieval lift layer (hybrid + Cohere rerank + IRCoT + optional KG fusion at query time). Today this lives in benchmarks/; promoted to a stable library API in v0.1.0.

Install

Not on PyPI yet. Package name reserved for v0.1.0.

git clone https://github.com/Vrin-cloud/engram
cd engram
uv venv && uv pip install -e ".[memory,llm,benchmarks,observability]"
Extra Brings in
memory lmdb, hnswlib, numpy, networkx, scipy — the default MemoryBackend
llm litellm, instructor, tenacity — the LLMProvider stack
benchmarks datasets, rank-bm25 — to run the MuSiQue benchmark
observability opentelemetry — tracing spans for ingest + query
all every extra

API keys

Var Used by
OPENAI_API_KEY text-embedding-3-small (embedder), gpt-4o-mini (reader) defaults
AWS credentials (default chain) Cohere Rerank 3.5 on Bedrock — region defaults to us-east-1
ANTHROPIC_API_KEY only if you pass --reader-model anthropic/claude-haiku-4-5

Quickstart

Run the headline benchmark

python -m benchmarks.musique \
  --question-ids-file benchmarks/fixtures/musique_n200_seed1_ids.json \
  --mode baseline \
  --rerank \
  --ircot \
  --output predictions.jsonl

Expected: F1 ~0.54, EM ~0.40, ~5 min wall time, ~$1.50 in API costs. This is the production v1 default (hybrid + Cohere rerank + Jaccard dedup + IRCoT) on the canonical 200-question MuSiQue fixture.

Add KG mode for graph capabilities

python -m benchmarks.musique \
  --question-ids-file benchmarks/fixtures/musique_n200_seed1_ids.json \
  --mode enriched \
  --build-graph --kg-retrieval --rerank --ircot \
  --disable-derivation --disable-bridging

--build-graph runs Engram's cold-path ingest (entity extraction → canonical resolution with alias persistence → fact extraction with literal-value sink → fact-triple embedding into a second hnswlib → bi-temporal supersession). --kg-retrieval fuses hybrid retrieval with triple-vector ANN + hub-weighted two-stage PPR + multi-hop confidence-decayed beam search via RRF.

Add --retrieval-planner to layer the graph-aware planner on top: one extra LLM call per query produces a structured RetrievalPlan (expected answer type, priority predicates, optional hop sequence) that biases beam search, post-fusion fact filtering, and the Cohere Rerank query. Default off; opt in when you want explainable retrieval traces. --trace-retrieval-plan PATH dumps every plan to JSONL for inspection.

Costs ~$0.40 per 1K chunks of cold-path + ~12 min/1K of ingest latency. Use it when you need graph queries, contradiction surfacing, or bi-temporal awareness — not for F1 lift on standard QA.

Adaptive strategic router

python -m benchmarks.musique \
  --question-ids-file benchmarks/fixtures/musique_n200_seed1_ids.json \
  --mode enriched \
  --build-graph --rerank --adaptive

--adaptive puts capability selection under a per-query strategic router: one token-minimal LLM call (output is a bare list of capability tags, single-digit tokens) that reads the question's structure and decides what this specific query needs. The router picks from {ircot, kg_retrieval, decomposition, mqe, retrieval_planner}; hybrid retrieval, Cohere Rerank, and entity extraction are always on underneath.

The routing rests on one core distinction the prompt teaches explicitly:

  • Depth (sequential chains — "the X of the Y", bridge entities that must be discovered before the next hop) → ircot. Decomposition measurably hurts depth chains (parallel sub-questions can't phrase later hops before earlier answers exist).
  • Breadth (open-ended, multi-aspect — evaluations, memos, bear/bull cases) → decomposition, usually with kg_retrieval + ircot.

One hard safety rule: questions that don't explicitly name their subject ("this round", "the company") must enable kg_retrieval so the graph anchors the query on the corpus's central entities instead of guessing — this eliminates a measured subject-hallucination failure mode.

Measured on the n=200 MuSiQue fixture: EM 0.38 / F1 0.52 (within run variance of the static production v1 config) at p50 latency 8.9s vs 14.5s — the router skips graph traversal on 48% of queries and the second reader round on 24%. On an open-ended evaluation fixture (benchmarks/fixtures/pebble_corpus.jsonl) the same router with no per-domain tuning shifts its plan profile to kg 100% / decomposition 50% and outscores both the static config and plain hybrid. Routing triggers must reference observable question structure — an earlier prompt that asked the router to predict retrieval insufficiency (unobservable from question text) fired IRCoT on only 21% of multi-hop questions and cost 5 EM points.

Compare configurations on your own corpus with benchmarks/custom_kb.py: it runs standard (hybrid + rerank), vrin (static full stack), and vrin_adaptive (router) side by side over a JSONL corpus + question set and reports per-question deltas plus each routing decision.

Use Engram as a Python library

Today's stable surface is the ingest API:

from engram import Engram
from engram.backends.memory import MemoryBackend
from engram.llm.embedders import LiteLLMEmbedder
from engram.llm.litellm_provider import LiteLLMProvider
from engram.llm.rate_control import AdaptiveConcurrency, TokenBucket

embedder = LiteLLMEmbedder(model="openai/text-embedding-3-small")
backend = MemoryBackend(embedder=embedder, path="./engram-data")

llm = LiteLLMProvider(
    bucket=TokenBucket(rate=20.0, burst=25),
    adaptive=AdaptiveConcurrency(initial_limit=4, max_limit=12),
    default_model="openai/gpt-4o-mini",
)

engram = Engram(
    corpus_backend=backend,
    llm=llm,
    build_graph=True,
    enable_synthesis=False,
    enable_derivation=False,
    enable_bridging=False,
    model="openai/gpt-4o-mini",
)

enriched = await engram.aenrich(your_chunks)
# Each EnrichedChunk has: id, text, source_id, enrichment_summary, metadata
# The backend's fact_graph (networkx.MultiDiGraph) holds entity ↔ fact structure
# Stored facts queryable via backend.find_facts(...) and backend.neighbors_facts(...)

The query-time lift layer (hybrid_neighbors, kg_hybrid_neighbors, IRCoT 2-round flow) currently lives in benchmarks/runner.py; promoted to engram.retrieve / engram.iterative_query in v0.1.0.

Deep documentation

Doc What's in it
docs/architecture.md High-level architecture, data model, storage layout, ingest + query lifecycles
docs/benchmarks.md MuSiQue methodology, fixtures, replay against existing indices, cost / latency profiles
docs/configuration.md Every CLI flag and constructor parameter, with defaults and effects
docs/kg-internals.md LMDB sub-db layout, fact graph schema, PPR / beam search internals
docs/llm-provider.md LiteLLM routing, Instructor strict-mode, prompt caching, rate control
docs/concepts/ircot.md IRCoT pattern explainer + why it works at our scale
docs/concepts/kg-retrieval.md Triple match, two-stage PPR (PropRAG), beam search, RRF fusion
docs/concepts/synthesis-and-extraction.md Hot path vs cold path, entity / fact extraction, pronoun resolution
docs/concepts/bi-temporal.md Fact supersession, valid_from / valid_to / recorded_at semantics, Noisy-OR fusion

CLI flags reference (quick)

python -m benchmarks.musique --help is the source of truth. Common flags:

Flag Default Effect
--mode {baseline,enriched,both} both baseline skips cold path; enriched runs Engram.aenrich first
--rerank / --no-rerank on Cohere Rerank 3.5 via Bedrock
--ircot off 2-round retrieve-then-reason. +0.09 F1.
--build-graph off Cold-path KG ingest
--kg-retrieval off Triple match + 2-stage PPR + beam at query time
--disable-synthesis off Skip per-chunk synthesis (saves ~$0.30/1K + ~4 min/1K)
--disable-derivation off Skip cold-path derivation pass
--disable-bridging off Skip cold-path bridging pass
--adaptive off Strategic router decides capabilities per query. Requires --build-graph
--question-ids-file PATH Pin to a question ID set for reproducibility
--data-dir PATH ~/.engram-bench/... Where LMDB indexes live

Full reference: docs/configuration.md.

Architecture decisions backed by ablation

These reflect measurements, not theory. Full ablation log in docs/benchmarks.md.

  • Jaccard dedup post-rerank is always on. Cheap quality win.
  • IRCoT is opt-in but recommended as the default lift layer (+0.09 F1).
  • Synthesis contributes ~+0.04 F1 only when KG retrieval is also on. Otherwise it's pure latency cost.
  • Derivation and bridging were deferred in the v0 KG-hybrid plan; benchmark defaults are --disable-derivation --disable-bridging.
  • MQE, decomposition, sufficiency-judge, CRAG-style filter all regressed when stacked on top of baseline + IRCoT. Not in the production config. The adaptive router resolves the decomposition finding: it's a depth-vs-breadth mismatch — decomposition hurts sequential factoid chains (where these ablations ran) but helps open-ended multi-aspect questions; the router applies it only to the latter.
  • Graph-aware retrieval planner is opt-in via --retrieval-planner (default OFF). One LLM call per query reads a compressed view of the relevant fact-graph slice and emits a typed RetrievalPlan (expected answer type, priority predicates, optional hop sequence). The plan biases — does not replace — beam search edge weighting, post-fusion fact filtering, and the Cohere Rerank query. Metric impact is within run-to-run variance at n=100; ship it for the capability (explainable retrieval, structured plan traces) rather than the EM/F1 number. Plumbing in src/engram/core/graph_view.py, src/engram/dialogue/retrieval_planner.py, prompt in src/engram/dialogue/prompts/retrieval_plan.py.

Cost and latency profile

Per 1K chunks at ~400 tokens each, gpt-4o-mini + text-embedding-3-small + Cohere Rerank 3.5:

Stage Default mode (IRCoT, no KG) KG mode (--build-graph --kg-retrieval)
Ingest cost ~$0.008 ~$0.46 (with synthesis) / ~$0.38 (without)
Ingest latency ~15 sec ~12 min (with synthesis) / ~8.5 min (without)
Query cost ~$0.003 ~$0.004
Query latency ~3-5 sec ~4-6 sec

Comparison to adjacent systems

System Mechanism Where they fit
Cohere Rerank, Voyage Rerank Cross-encoder rerank over the retriever's top-K Engram uses Cohere Rerank — a building block, not a competitor
GBrain (github.com/garrytan/gbrain) Production hybrid retrieval: BM25 + dense + RRF + cross-encoder rerank, intent classification, mode bundles, structural code-edge walk The production-grade hybrid retrieval reference. Engram extends it with iterative retrieve-then-reason (IRCoT) and an entity/fact KG
HippoRAG 2 OpenIE triples + synonym edges + PPR + recognition gate Closest conceptually to Engram's KG mode
PropRAG n-ary propositions + two-stage PPR + beam Two-stage PPR is ported (engram.core.kg_retrieval.two_stage_ppr_facts). Propositions are deferred
LangChain, LlamaIndex RAG framework + integrations Engram is a focused library; could be wrapped by either

Codebase map (for AI agents and contributors)

Path What's there
src/engram/core/models.py Pydantic v2: Chunk, EnrichedChunk, Fact, EntityRecord, Contradiction, CrossReference
src/engram/core/protocol.py CorpusBackend, LLMProvider, Embedder — async-first protocols
src/engram/core/scoring.py deduplicate_chunks (Jaccard 0.70)
src/engram/core/entities.py normalize_entity_name, entities_match_fuzzy, case_variants, is_literal_value
src/engram/core/retrieval.py TraversalConfig, merge_fact_strategies (RRF), dynamic_chunk_cutoff
src/engram/core/kg_retrieval.py triple_match, ppr_facts, two_stage_ppr_facts, beam_search_facts, facts_to_chunk_ids
src/engram/backends/memory.py MemoryBackend — LMDB + hnswlib + in-memory networkx graph
src/engram/dialogue/strategic_router.py Adaptive router — decide_strategy + dependency resolution
src/engram/dialogue/prompts/strategic_plan.py StrategicPlan schema + the depth/breadth routing prompt
src/engram/dialogue/orchestrator.py Engram orchestrator — hot path + cold path
src/engram/dialogue/extraction.py Batched entity + fact extraction (Instructor strict-mode)
src/engram/dialogue/prompts/extraction.py Prompts with explicit pronoun/coref resolution
src/engram/dialogue/temporal.py Bi-temporal conflict detection + supersession
src/engram/dialogue/contradiction.py Noisy-OR confidence fusion
src/engram/llm/litellm_provider.py LLMProvider over LiteLLM + Instructor
src/engram/llm/embedders.py LiteLLMEmbedder, OllamaEmbedder
src/engram/llm/rate_control.py TokenBucket + AdaptiveConcurrency
benchmarks/musique.py CLI for the MuSiQue ablation benchmark (--adaptive for the router)
benchmarks/custom_kb.py Bring-your-own-corpus comparison: standard vs static vs adaptive
benchmarks/runner.py Query pipeline incl. IRCoT 2-round + kg_hybrid_neighbors
benchmarks/retrieval.py hybrid_neighbors, kg_hybrid_neighbors, RRF, BM25, dedup wiring
benchmarks/fixtures/ Pinned MuSiQue question-id JSON files (n=100, n=200)

Stable interfaces today

Surface Module Status
Engram class (.enrich, .aenrich) engram Stable
MemoryBackend engram.backends.memory Stable; LMDB schema versions tracked in CHANGELOG.md
CorpusBackend, LLMProvider, Embedder protocols engram.core.protocol Stable contracts
Chunk, EnrichedChunk, Fact, EntityRecord engram.core.models Stable Pydantic v2
LiteLLMProvider, LiteLLMEmbedder engram.llm.* Stable
kg_hybrid_neighbors, IRCoT loop benchmarks.runner Promoted to engram.retrieve in v0.1.0 — currently in benchmarks namespace

Project status

Track State
KG-hybrid foundation (Phases 1-6) shipped to main
IRCoT 2-round retrieval shipped (opt-in via --ircot)
Two-stage PPR (PropRAG) + multi-hop beam search shipped
Entity resolution + alias persistence + pronoun-aware extraction shipped
Adaptive strategic router shipped on strategic-router (opt-in via --adaptive)
Reactive escalation (round-2 from round-1 evidence signals, not upfront prediction) next
Promote query-side API to library namespace scheduled for v0.1.0
Public PyPI release scheduled for v0.1.0
HopRAG pseudo-question edges, PropRAG propositions, doc-level preamble deferred
Hosted Vrin API + SDK + MCP server planning phase

Contributing

See CONTRIBUTING.md. All commits must carry a Signed-off-by trailer per the Developer Certificate of Origin.

Security

Report vulnerabilities to vedant@vrin.cloud. See SECURITY.md.

License

Apache License 2.0. See LICENSE and NOTICE.

About

Knowledge-graph RAG library for multi-hop QA — iterative retrieve-and-reason (IRCoT), graph retrieval, and cross-document enrichment at ingest. Production retrieval pipeline in Python.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages