Skip to content

Latest commit

 

History

History
105 lines (91 loc) · 10.6 KB

File metadata and controls

105 lines (91 loc) · 10.6 KB

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog 1.1.0, and this project adheres to Semantic Versioning 2.0.0.

While the project is pre-1.0, breaking changes may land on minor version bumps and will be called out under ### Changed with a migration note.

Added (v0 KG-hybrid foundation — shipped 2026-05-19 → 2026-05-22)

  • Retrieval — IRCoT 2-round (--ircot). Iterative retrieve-and-reason: round 1 retrieves + reader emits CoT; round 2 re-retrieves with the thought as augmented query; both rounds' chunks fused before the final reader call. +0.09 F1 / +0.07 EM on MuSiQue at gpt-4o-mini reader, n=200. Opt-in, default off. Code: _answer_with_reader_raw, _extract_thought_span, IRCoT path in benchmarks/runner.py. Promoted to library API in v0.1.0.
  • Retrieval — KG-hybrid path (--kg-retrieval). RRF-fused triple-vector ANN match + hub-weighted two-stage PPR + multi-hop confidence-decayed beam search, resolved to source chunks via the mutual chunk_facts index, RRF-fused with hybrid chunks, reranked, deduped. Lives in benchmarks/retrieval.py:kg_hybrid_neighbors.
  • Two-stage PPR (PropRAG) in engram.core.kg_retrieval.two_stage_ppr_facts. Stage 1 broad spread at α=0.75 from hub-weighted seeds; Stage 2 narrow focus at α=0.45 re-seeded from Stage 1's top entities. +0.032 F1 vs single-stage on synth_off / no-IRCoT. Replaces single-stage ppr_facts in production kg_hybrid_neighbors.
  • Multi-hop beam search in engram.core.kg_retrieval.beam_search_facts. Confidence-decayed walk over the in-memory fact graph: per-hop fan-out cap, min edge confidence, hub-aware fan-out for high-degree nodes, frontier expansion gated by path_confidence_floor. Ports Vrin's find_facts_multi_hop to networkx.
  • Triple-vector ANN match via MemoryBackend.neighbors_facts — second hnswlib index over (s, p, o) triple embeddings, populated automatically by upsert_facts.
  • KG storage layer in MemoryBackend — 6 new LMDB sub-databases (entity_by_name, entity_aliases, entity_degree_index, chunk_facts, fact_vectors, fact_vector_label), second hnswlib index for fact triples, in-memory networkx.MultiDiGraph mirror of stored facts (only active facts; rebuilt at backend open; updated incrementally on every upsert/update/supersede). Detail in docs/kg-internals.md.
  • Entity resolution at write (MemoryBackend.resolve_or_create_entity, MemoryBackend.resolve_entities_batch). Normalize → exact LMDB lookup → entity_aliases redirect → fuzzy SequenceMatcher ≥ 0.95 → create new. Ingest-time threshold is strict (0.95) since false positives merge entities permanently; query-time fallback uses 0.80.
  • EntityRecord Pydantic model in engram.core.models: canonical_name, aliases, entity_type, mention_count, first_seen_at.
  • Alias persistenceentity_aliases LMDB sub-db; LLM-emitted aliases on ExtractedEntity are persisted automatically.
  • Pronoun / coreference resolution in extraction prompts. _ENTITY_SYSTEM_PROMPT explicitly forbids pronouns / generic refs as entities. _FACT_SYSTEM_PROMPT has explicit pronoun→entity resolution instructions with two worked examples (Example 6: "She announced..." → "Sarah Martinez announced..."; Example 7: "It stands..." and "The landmark..." both → "Eiffel Tower"). Pure prompt approach — no regex coref.
  • Literal value sink (engram.core.entities.is_literal_value). Drops facts whose object is a number / year / percentage / date / ratio before they reach the KG. Prevents hundreds of facts sharing object "2024" or "15%" from collapsing onto a shared node and blowing up PPR propagation.
  • Mutual chunk↔fact indexing. MemoryBackend.upsert_facts auto-mirrors Fact.source_chunk_ids into the chunk_facts LMDB sub-db using composite keys (chunk_id|fact_id). Reverse direction is a single cursor sweep. Read API: MemoryBackend.get_facts_for_chunk.
  • Fact triple embedding (Phase 4.9). MemoryBackend.upsert_facts embeds each fact's (s, p, o) triple with the same embedder used for chunks and writes to fact_vectors LMDB + the second hnswlib + fact_vector_label LMDB. Cold-path Pass C restructured to batch the upsert post-gather so one embedding API call covers all new facts per ingest cycle.
  • Phase 1 Jaccard dedup post-rerank. engram.core.scoring.deduplicate_chunks is now wired into hybrid_neighbors and graph_aware_neighbors after Cohere rerank, before the context budget cap. Free quality improvement; uses rerank position as the dedup tiebreaker.
  • Cohere Rerank 3.5 integration via AWS Bedrock (cohere.rerank-v3-5:0). Default on (--rerank); credentials from the boto3 chain.
  • enable_synthesis flag. Opt out of the per-chunk synthesis hot path via Engram(enable_synthesis=False) or --disable-synthesis. Saves ~$0.30/1K chunks + ~4 min/1K. Per ablation: synthesis contributes +0.04 F1 only when KG retrieval is also enabled; otherwise pure overhead.
  • 8 entity / retrieval helper functions in engram.core.entities and engram.core.retrieval: normalize_entity_name, entities_match_fuzzy, case_variants, is_literal_value, TraversalConfig, extract_frontier_entities, merge_fact_strategies (RRF fact fusion), dynamic_chunk_cutoff (CAR cluster-gap).
  • Confidence floor tightened from 0.6 to 0.7 (MIN_FACT_CONFIDENCE) — Vrin parity, prevents low-confidence facts from polluting PPR propagation.
  • Pinned MuSiQue benchmark fixturesbenchmarks/fixtures/musique_n100_seed1_ids.json and musique_n200_seed1_ids.json for reproducible eval.
  • scipy>=1.11 added to the memory extra (required by networkx.pagerank for sparse-matrix power iteration). networkx>=3.2 already added in the Phase 3 commit.
  • Documentation — full docs/ tree with architecture.md, benchmarks.md, configuration.md, kg-internals.md, llm-provider.md, and conceptual deep-dives for IRCoT, KG retrieval, synthesis + extraction, and bi-temporal supersession.

Removed / not shipped (ablated and proven net-negative)

These were implemented on feature branches, ablated against the production v1 config (baseline + IRCoT), and removed because they regressed F1:

  • Sufficiency judge — Self-RAG / SURE-RAG / L-MARS Judge Agent pattern. F1 regressed -0.04 to -0.13 when stacked on baseline + IRCoT.
  • CRAG-style chunk filter — Corrective RAG (arXiv:2401.15884) post-rerank LLM relevance filter. F1 regressed -0.13 combined with sufficiency judge.
  • Tested but kept opt-in / off-by-default: --multi-query, --decompose — measured to regress on top of IRCoT at n=100 and n=200.

Original Phase 1 scaffold (pre-2026-05-19)

  • Initial repository scaffold: pyproject.toml (hatchling), ruff and pytest configuration, source tree under src/engram/ with sub-packages for core, dialogue, backends, llm, caching, and observability.
  • Core data models in engram.core.models: Chunk, EnrichedChunk, Fact, Contradiction, CrossReference, and the SourceType, FactType, and RelationKind literal aliases.
  • Core protocols in engram.core.protocol: Enricher, CorpusBackend, LLMProvider, VectorStore, all runtime_checkable and async-first.
  • GitHub Actions CI: lint and test matrix on Python 3.11 and 3.12 plus one macOS cell; PyPI publish on v* tags via the trusted-publisher OIDC flow.
  • Apache 2.0 license, contributor docs (CONTRIBUTING.md, CODE_OF_CONDUCT.md, SECURITY.md), issue and PR templates, and CITATION.cff.
  • engram.dialogue.temporal: bi-temporal conflict detection ported from the Vrin temporal_consistency_manager. Public surface: detect_conflict, batch_detect_conflicts, select_valid_at, apply_resolution, and the FactConflict dataclass.
  • engram.dialogue.inference: ReasoningChain and CrossDocumentPattern Pydantic models seeding the multi-hop inference output. LLM-driven chain construction lands in a later release.
  • engram.core.scoring: jaccard_similarity, deduplicate_chunks, and score_chunk_relevance ported from Vrin's chunk-filter pipeline.
  • engram.core.protocol.Embedder: split out of VectorStore so callers can mix local Ollama embeddings with a hosted vector index. Takes a kind="document"|"query" parameter for asymmetric models like Cohere v3 and Voyage. VectorStore is now pure ANN over precomputed vectors.
  • engram.llm.embedders.LiteLLMEmbedder: routes through LiteLLM for OpenAI, Cohere, Voyage, Bedrock, Anthropic, and any other provider in the LiteLLM matrix. Maps kind to the right provider-specific input_type automatically.
  • engram.llm.embedders.OllamaEmbedder: hits a local Ollama server via the official ollama Python client for fully-local mode.
  • engram.backends.memory.MemoryBackend: LMDB + hnswlib backed CorpusBackend. The zero-config default. Persists to disk; composite-key indexes on s, p, sp, o, st axes; async wraps sync via asyncio.to_thread. Reseats hnsw labels from zero on reopen.
  • engram.llm.litellm_provider.LiteLLMProvider: single LLMProvider composing LiteLLM for routing plus Instructor for Pydantic-validated structured extraction plus tenacity for retries. First-class Anthropic prompt-cache support via the cache_breakpoints parameter.
  • engram.observability.tracing: OpenTelemetry + OpenInference span scaffolding. backend_span and llm_span async context managers with TOOL / LLM OpenInference kinds, engram.cache_breakpoint_count attribute, and a no-op tracer fallback when opentelemetry-api is not installed.

Changed

  • CorpusBackend.find_facts is now keyword-only and accepts a status filter (defaults to "active"). Pass status=None to include every lifecycle state.
  • CorpusBackend gains update_fact(fact_id, properties) for in-place patches that should not create supersession history.
  • LLMProvider.complete and extract accept a cache_breakpoints sequence of message indices. Providers that don't support prompt caching ignore the field.