All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog 1.1.0, and this project adheres to Semantic Versioning 2.0.0.
While the project is pre-1.0, breaking changes may land on minor version bumps
and will be called out under ### Changed with a migration note.
- Retrieval — IRCoT 2-round (
--ircot). Iterative retrieve-and-reason: round 1 retrieves + reader emits CoT; round 2 re-retrieves with the thought as augmented query; both rounds' chunks fused before the final reader call. +0.09 F1 / +0.07 EM on MuSiQue at gpt-4o-mini reader, n=200. Opt-in, default off. Code:_answer_with_reader_raw,_extract_thought_span, IRCoT path inbenchmarks/runner.py. Promoted to library API in v0.1.0. - Retrieval — KG-hybrid path (
--kg-retrieval). RRF-fused triple-vector ANN match + hub-weighted two-stage PPR + multi-hop confidence-decayed beam search, resolved to source chunks via the mutualchunk_factsindex, RRF-fused with hybrid chunks, reranked, deduped. Lives inbenchmarks/retrieval.py:kg_hybrid_neighbors. - Two-stage PPR (PropRAG) in
engram.core.kg_retrieval.two_stage_ppr_facts. Stage 1 broad spread at α=0.75 from hub-weighted seeds; Stage 2 narrow focus at α=0.45 re-seeded from Stage 1's top entities. +0.032 F1 vs single-stage on synth_off / no-IRCoT. Replaces single-stageppr_factsin productionkg_hybrid_neighbors. - Multi-hop beam search in
engram.core.kg_retrieval.beam_search_facts. Confidence-decayed walk over the in-memory fact graph: per-hop fan-out cap, min edge confidence, hub-aware fan-out for high-degree nodes, frontier expansion gated bypath_confidence_floor. Ports Vrin'sfind_facts_multi_hopto networkx. - Triple-vector ANN match via
MemoryBackend.neighbors_facts— second hnswlib index over(s, p, o)triple embeddings, populated automatically byupsert_facts. - KG storage layer in
MemoryBackend— 6 new LMDB sub-databases (entity_by_name,entity_aliases,entity_degree_index,chunk_facts,fact_vectors,fact_vector_label), second hnswlib index for fact triples, in-memorynetworkx.MultiDiGraphmirror of stored facts (only active facts; rebuilt at backend open; updated incrementally on every upsert/update/supersede). Detail in docs/kg-internals.md. - Entity resolution at write (
MemoryBackend.resolve_or_create_entity,MemoryBackend.resolve_entities_batch). Normalize → exact LMDB lookup → entity_aliases redirect → fuzzySequenceMatcher≥ 0.95 → create new. Ingest-time threshold is strict (0.95) since false positives merge entities permanently; query-time fallback uses 0.80. EntityRecordPydantic model inengram.core.models:canonical_name,aliases,entity_type,mention_count,first_seen_at.- Alias persistence —
entity_aliasesLMDB sub-db; LLM-emitted aliases onExtractedEntityare persisted automatically. - Pronoun / coreference resolution in extraction prompts.
_ENTITY_SYSTEM_PROMPTexplicitly forbids pronouns / generic refs as entities._FACT_SYSTEM_PROMPThas explicit pronoun→entity resolution instructions with two worked examples (Example 6: "She announced..." → "Sarah Martinez announced..."; Example 7: "It stands..." and "The landmark..." both → "Eiffel Tower"). Pure prompt approach — no regex coref. - Literal value sink (
engram.core.entities.is_literal_value). Drops facts whose object is a number / year / percentage / date / ratio before they reach the KG. Prevents hundreds of facts sharing object "2024" or "15%" from collapsing onto a shared node and blowing up PPR propagation. - Mutual chunk↔fact indexing.
MemoryBackend.upsert_factsauto-mirrorsFact.source_chunk_idsinto thechunk_factsLMDB sub-db using composite keys (chunk_id|fact_id). Reverse direction is a single cursor sweep. Read API:MemoryBackend.get_facts_for_chunk. - Fact triple embedding (Phase 4.9).
MemoryBackend.upsert_factsembeds each fact's(s, p, o)triple with the same embedder used for chunks and writes tofact_vectorsLMDB + the second hnswlib +fact_vector_labelLMDB. Cold-path Pass C restructured to batch the upsert post-gather so one embedding API call covers all new facts per ingest cycle. - Phase 1 Jaccard dedup post-rerank.
engram.core.scoring.deduplicate_chunksis now wired intohybrid_neighborsandgraph_aware_neighborsafter Cohere rerank, before the context budget cap. Free quality improvement; uses rerank position as the dedup tiebreaker. - Cohere Rerank 3.5 integration via AWS Bedrock (
cohere.rerank-v3-5:0). Default on (--rerank); credentials from the boto3 chain. enable_synthesisflag. Opt out of the per-chunk synthesis hot path viaEngram(enable_synthesis=False)or--disable-synthesis. Saves ~$0.30/1K chunks + ~4 min/1K. Per ablation: synthesis contributes +0.04 F1 only when KG retrieval is also enabled; otherwise pure overhead.- 8 entity / retrieval helper functions in
engram.core.entitiesandengram.core.retrieval:normalize_entity_name,entities_match_fuzzy,case_variants,is_literal_value,TraversalConfig,extract_frontier_entities,merge_fact_strategies(RRF fact fusion),dynamic_chunk_cutoff(CAR cluster-gap). - Confidence floor tightened from 0.6 to 0.7 (
MIN_FACT_CONFIDENCE) — Vrin parity, prevents low-confidence facts from polluting PPR propagation. - Pinned MuSiQue benchmark fixtures —
benchmarks/fixtures/musique_n100_seed1_ids.jsonandmusique_n200_seed1_ids.jsonfor reproducible eval. scipy>=1.11added to thememoryextra (required bynetworkx.pagerankfor sparse-matrix power iteration).networkx>=3.2already added in the Phase 3 commit.- Documentation — full docs/ tree with architecture.md, benchmarks.md, configuration.md, kg-internals.md, llm-provider.md, and conceptual deep-dives for IRCoT, KG retrieval, synthesis + extraction, and bi-temporal supersession.
These were implemented on feature branches, ablated against the production v1 config (baseline + IRCoT), and removed because they regressed F1:
- Sufficiency judge — Self-RAG / SURE-RAG / L-MARS Judge Agent pattern. F1 regressed -0.04 to -0.13 when stacked on baseline + IRCoT.
- CRAG-style chunk filter — Corrective RAG (arXiv:2401.15884) post-rerank LLM relevance filter. F1 regressed -0.13 combined with sufficiency judge.
- Tested but kept opt-in / off-by-default:
--multi-query,--decompose— measured to regress on top of IRCoT at n=100 and n=200.
- Initial repository scaffold:
pyproject.toml(hatchling),ruffandpytestconfiguration, source tree undersrc/engram/with sub-packages forcore,dialogue,backends,llm,caching, andobservability. - Core data models in
engram.core.models:Chunk,EnrichedChunk,Fact,Contradiction,CrossReference, and theSourceType,FactType, andRelationKindliteral aliases. - Core protocols in
engram.core.protocol:Enricher,CorpusBackend,LLMProvider,VectorStore, allruntime_checkableand async-first. - GitHub Actions CI: lint and test matrix on Python 3.11 and 3.12 plus one
macOS cell; PyPI publish on
v*tags via the trusted-publisher OIDC flow. - Apache 2.0 license, contributor docs (
CONTRIBUTING.md,CODE_OF_CONDUCT.md,SECURITY.md), issue and PR templates, andCITATION.cff. engram.dialogue.temporal: bi-temporal conflict detection ported from the Vrintemporal_consistency_manager. Public surface:detect_conflict,batch_detect_conflicts,select_valid_at,apply_resolution, and theFactConflictdataclass.engram.dialogue.inference:ReasoningChainandCrossDocumentPatternPydantic models seeding the multi-hop inference output. LLM-driven chain construction lands in a later release.engram.core.scoring:jaccard_similarity,deduplicate_chunks, andscore_chunk_relevanceported from Vrin's chunk-filter pipeline.engram.core.protocol.Embedder: split out ofVectorStoreso callers can mix local Ollama embeddings with a hosted vector index. Takes akind="document"|"query"parameter for asymmetric models like Cohere v3 and Voyage.VectorStoreis now pure ANN over precomputed vectors.engram.llm.embedders.LiteLLMEmbedder: routes through LiteLLM for OpenAI, Cohere, Voyage, Bedrock, Anthropic, and any other provider in the LiteLLM matrix. Mapskindto the right provider-specificinput_typeautomatically.engram.llm.embedders.OllamaEmbedder: hits a local Ollama server via the officialollamaPython client for fully-local mode.engram.backends.memory.MemoryBackend: LMDB + hnswlib backedCorpusBackend. The zero-config default. Persists to disk; composite-key indexes ons,p,sp,o,staxes; async wraps sync viaasyncio.to_thread. Reseats hnsw labels from zero on reopen.engram.llm.litellm_provider.LiteLLMProvider: singleLLMProvidercomposing LiteLLM for routing plus Instructor for Pydantic-validated structured extraction plus tenacity for retries. First-class Anthropic prompt-cache support via thecache_breakpointsparameter.engram.observability.tracing: OpenTelemetry + OpenInference span scaffolding.backend_spanandllm_spanasync context managers with TOOL / LLM OpenInference kinds,engram.cache_breakpoint_countattribute, and a no-op tracer fallback whenopentelemetry-apiis not installed.
CorpusBackend.find_factsis now keyword-only and accepts astatusfilter (defaults to"active"). Passstatus=Noneto include every lifecycle state.CorpusBackendgainsupdate_fact(fact_id, properties)for in-place patches that should not create supersession history.LLMProvider.completeandextractaccept acache_breakpointssequence of message indices. Providers that don't support prompt caching ignore the field.