Skip to content

Latest commit

 

History

History
292 lines (210 loc) · 23.7 KB

File metadata and controls

292 lines (210 loc) · 23.7 KB

EverAlgo — AI Coding Assistant Context Guide

Canonical source of truth. This file (AGENTS.md) is the single source of truth for AI coding assistant context. CLAUDE.md (Claude Code) and .cursorrules (Cursor) are symlinks to it. Pattern adopted from Mirascope and scikit-learn.

If you are an AI assistant (Claude Code / Cursor / Copilot / Codex / …) onboarding to this repository, read this file first, then docs/concepts/architecture.md for the full architecture before touching code.


1. Project Identity

EverAlgo is an algorithm library for memory extraction and retrieval — not a service, not a framework.

  • Algorithm-only. All memory extraction / fusion / re-ranking strategies live here. The library is stateless: it does not connect to databases, does not read or write the filesystem, does not own any business state.
  • Two paths. Every operator belongs to one of two read/write paths whose contracts are symmetric (stateless, in-memory I/O):
    • Extract — write path. Input: structured units (e.g. MemCell). Output: structured memories (Episode / Profile / Case / Skill / …).
    • Retrieve — read path. Input: query + caller-injected RetrieveFn / RerankFn callables. Output: a ranked memory list. The everalgo-rank package serves as both the retrieval facade and the underlying ranking toolkit: it exposes four strategies — hybrid (dual-route RRF), agentic (LLM-guided sufficiency + multi/refined query wrapper over any base), cluster (cluster-based recall expansion), and maxsim (MaxSim nearest-neighbour reranking) — alongside the lower-level ranking primitives (rrf / lr / vector_anchored fusion, weight helpers, LLM-based rerank). Caller binds storage / model clients inside its RetrieveFn / RerankFn; algo never touches persistence.
  • Orchestration is upstream. When to call, in what order, with what concurrency, persistence to the markdown filesystem — all owned by EverOS. EverAlgo does not care whether the caller is open-source or cloud commercial; both paths share this code.

For the rationale and deeper background, read docs/concepts/architecture.md


2. Repository Layout

everalgo/                              # monorepo, uv virtual workspace
├── pyproject.toml                     # workspace root, [tool.uv] package = false
├── uv.lock                            # generated by `uv sync` — do not hand-edit
├── AGENTS.md  ← you are here          # CLAUDE.md and .cursorrules are symlinks
├── README.md
├── LICENSE                            # Apache-2.0
├── .gitignore  .gitlab-ci.yml  cliff.toml  .pre-commit-config.yaml
├── docs/
│   ├── concepts/                      # high-level architecture notes
│   └── api/                            # API reference (per-distribution)
├── examples/                          # runnable quickstart scripts (01–07, use FakeLLMClient)
├── packages/
│   ├── everalgo-core/                 # types, llm (+ providers), prompts, testing
│   ├── everalgo-boundary/             # detect_boundaries + DetectionResult + workspace stub
│   ├── everalgo-clustering/           # cluster_by_geometry / cluster_by_llm over list[Cluster]
│   ├── everalgo-rank/                 # 4 rankers + fusion / weight / rerank toolkit
│   ├── everalgo-parser/               # multimodal raw-file → ParsedContent (EXPERIMENTAL stub)
│   ├── everalgo-user-memory/          # BoundaryDetector + Episode / Foresight / AtomicFact / Profile
│   ├── everalgo-agent-memory/         # AgentBoundaryDetector + AgentCase / AgentSkill / AgentProfile
│   └── everalgo-knowledge/            # KnowledgeExtractor + aclassify_category (file-based knowledge extraction)
├── benchmarks/                        # internal LoCoMo benchmark suite ([tool.uv] package = false, not published)
└── tests/

Eight distributions share the everalgo.* namespace through PEP 420 native namespace packages: every packages/*/src/everalgo/ directory deliberately omits __init__.py, while subpackages (everalgo/<subpkg>/__init__.py) are regular packages. This is the PyPA-recommended layout for Py3-only + pip-installed projects, and it lets from everalgo.user_memory import EpisodeExtractor work even when everalgo-user-memory and everalgo-boundary are installed from different distributions. Industrial precedents: google-cloud-* (100+ dists sharing google.cloud.*) and sphinxcontrib-* (6 official Sphinx-extension dists sharing sphinxcontrib.*).

The dev workflow is built on a uv virtual workspace ([tool.uv] package = false at the root, members under packages/*). Same shape: Apache Airflow (100+ workspace members, single root lockfile) and pydantic-ai. Note these two projects are uv-workspace references only — Airflow's airflow.providers.* is pkgutil-style legacy namespace, not PEP 420; pydantic-ai uses three independent namespaces, not one shared. LangChain and LlamaIndex are referenced for the monorepo layout only — neither uses uv workspace itself; they keep per-package venvs and lockfiles.

Dependency topology (see docs/concepts/architecture.md for the full graph and rationale):

                                everalgo-core
                                     ▲
       ┌────────────┬────────────┬──┴───────────┐
       │            │            │              │
   boundary    clustering        rank         parser
       ▲            ▲                            ▲
       │            │                            │
   user-memory  agent-memory                 knowledge

Edges (arrow → dependency; every package also depends on core):
  user-memory  → boundary
  agent-memory → boundary, clustering
  knowledge    → parser

3. Quick Start

# Prerequisites: Python 3.12+ and uv (https://docs.astral.sh/uv/).
git clone git@github.com:EverMind-AI/EverAlgo.git
cd everalgo

# Install all workspace packages editable into a shared venv (includes dev tools).
uv sync --all-packages --group dev

# Run tests across the workspace.
uv run pytest

# Lint + format checks.
uv run ruff check .
uv run ruff format --check .

# Type check (both checkers — they catch different things).
uv run mypy .
uv run pyright

Working on a single package? Sync only that package's dependencies:

uv sync --package everalgo-clustering
uv run pytest packages/everalgo-clustering/tests/

Try any operator offline using the bundled examples (no API key required):

uv run python examples/01_boundary_chat.py          # Chat → MemCell
uv run python examples/03_user_memory_episode.py    # MemCell → Episode
uv run python examples/04_agent_memory_case.py      # Agent trajectory → AgentCase
uv run python examples/06_full_user_memory_pipeline.py   # Full pipeline

Pre-commit hook (required)

The repo ships a .pre-commit-config.yaml that runs ruff check --fix + ruff format + a set of standard sanitisers (trailing whitespace, EOF newline, merge-conflict markers, large files, line endings, YAML / TOML syntax) on every commit. This matches the workflow used by sklearn, pydantic, dspy, langchain, pandas, numpy.

Install + verify after every clone (this is per-clone state, NOT stored in the repo — every new clone / new dev machine starts with hooks disabled):

uv sync --all-packages --group dev   # pulls pre-commit into the workspace venv
uv run pre-commit install            # creates .git/hooks/pre-commit
ls -la .git/hooks/pre-commit         # MUST exist and be executable

If the third command shows No such file or directory, the install step silently failed and every git commit will silently bypass lint. Fix before doing any work.

Common pitfall: --all-files ≠ hook installed

uv run pre-commit run --all-files is a manual invocation. It validates that the hook configuration is healthy but says nothing about whether git commit will actually trigger it. The hook only fires automatically when .git/hooks/pre-commit exists and is executable.

This trap is real: running --all-files and seeing "9/9 Passed" can mask a missing hook for an entire sprint, while every git commit quietly bypasses lint and surfaces only when CI rejects a violation that the hook would have caught locally. Always ls .git/hooks/pre-commit after install to verify.

Common usage

Run against the whole tree before opening an MR (catches anything you might have committed before the hook was installed):

uv run pre-commit run --all-files

Update pinned hook versions periodically:

uv run pre-commit autoupdate

What's deliberately NOT in pre-commit

  • mypy / pyright — strict type-checks over the 8-package PEP 420 workspace each take several seconds per run and would make commit feel sluggish; enforced by CI instead (pydantic / sklearn / openai-python / anthropic-sdk-python do the same).
  • pytest — same reason. CI is the gate.

Editor integration (recommended)

Pre-commit fires at commit time. For per-keystroke feedback, install the ruff editor plugin too:

  • VSCode / Cursor: install the Ruff extension. Enable format-on-save so the editor runs ruff check --fix and ruff format automatically.
  • PyCharm / IntelliJ: install the Ruff plugin.
  • Vim / Neovim: configure ruff through your LSP setup (e.g. ruff-lsp or built-in LSP via nvim-lspconfig).

The CI pipeline (.gitlab-ci.yml) re-runs ruff check . + ruff format --check . + mypy . + pyright on every MR as the load-bearing fallback. Both type-checkers run because they catch slightly different things — same dual-checker setup used by openai-python and anthropic-sdk-python. Pre-commit and editor coverage are about latency of feedback; CI is the gate.


4. Common Commands

Action Command
Install workspace (editable) uv sync --all-packages --group dev
Run all tests uv run pytest
Run a specific test uv run pytest path/to/test.py::test_name -v
Lint uv run ruff check .
Format uv run ruff format .
Type-check (mypy) uv run mypy .
Type-check (pyright) uv run pyright
Build a single distribution cd packages/everalgo-core && uv build
Add a runtime dep to one package uv add --package everalgo-clustering numpy
Add a dev tool to the workspace uv add --group dev pytest-asyncio

Reference: uv workspace documentation.


5. Code Style

The full rationale lives in docs/concepts/architecture.md. Hard rules:

  • Naming contract — a prefix means async. Methods named aextract / arank / adetect / aparse are native async (do real I/O — LLM, network, …); call them with await. Methods without the a prefix (rank, extract, count_tokens, rrf, …) are sync (pure compute, no I/O); call them directly. Same convention as dspy.acall / litellm.acompletion / instructor.AsyncInstructor. The one exception is LLMClient.chat (a caller-injected client Protocol, not an operator): it is async without the a prefix, mirroring the OpenAI SDK client interface.
  • I/O operators: async-first + sync bridge. Native async via asyncio; sync version is derived through asgiref.async_to_sync for non-event-loop callers (CLI scripts, plain unit tests). Never call the sync bridge from inside a running event loop.
  • Pure-compute operators: sync only. No async wrapper for fusion.rrf, _tokenize.count_tokens, clustering distances, etc. Mirrors numpy / scipy / sklearn / pandas conventions.
  • Prompts as Python string modules. Concrete prompt strings live in <subpkg>/prompts/{en,zh}/<name>.py as module-level constants. Editing a prompt = editing a .py file. No external .md / .yaml / .toml prompt stores. Caller customisation: per-call prompt= argument (fine-grained) or monkey-patching the module constant at startup (coarse-grained).
  • Protocol for typing, not ABC. EverAlgo operators are stateless; implementations do not need to subclass anything.
  • No dependency injection in algorithm code. Module-level functions + global config + monkeypatch in tests. Algorithm authors should be one keystroke away from running their code; do not impose framework ceremony.
  • Sync bridge for I/O operators: write extract = async_to_sync(aextract) one-liner; do not introduce a DualInterface mixin. This keeps type inference predictable, avoids metaclass magic. The async_to_sync helper comes from asgiref.sync.
  • Lint configuration. Workspace-wide ruff is configured in the root pyproject.toml (line-length = 120, target version inferred from requires-python = ">=3.12", rule set derived from the pytorch + pydantic-ai intersection). Google-style docstrings — aligns with Google Python Style Guide. Args: / Returns: / Raises: sections, no type repetition in the body (type annotations in the signature are authoritative).
  • Logging discipline. On the LLM / I/O path use logger = logging.getLogger(__name__) with lazy %-format (logger.debug("count=%d", n) — never f-strings inside log calls) and logger.exception(...) inside except blocks. For user-behaviour problems and deprecations use warnings.warn(..., stacklevel=2); for pure-algorithm errors raise ValueError(...) with a detailed message (numpy style — shapes (3,4) and (5,6) not aligned, etc.). Every public subpackage __init__.py already attaches a NullHandler; everalgo.llm carries a default-on SensitiveHeadersFilter. Forbidden in library code: logging.basicConfig, addHandler (anything but NullHandler), setLevel, explicit propagate = True/False, and any module-level logging.warning(...) / logging.error(...) / logging.getLogger() (no-arg) / logging.root.* — these all target the root logger and are an application's job. Forbidden in DEBUG logs: request / response bodies, prompt text, model outputs (the Filter only redacts headers; bodies leak PII the Filter cannot see). Performance timing is the user's job (cProfile / line_profiler / %timeit); the library does not log durations. ruff rule sets G + LOG + TRY enforce these at lint time.
  • Use the full line-length = 120 budget when hand-wrapping. For Python comments / docstrings and TOML/YAML comments, fill each line to roughly 100–115 characters before wrapping — do not pre-wrap at 70 / 79 / 80 / 88 / 100 out of habit. E501 is in the ignore list, and ruff never flags lines that are too short, so this is a writer's discipline rather than a lint check. A 3-line comment that collapses cleanly into 2 lines at 120 should be 2 lines. Exceptions: bullet lists, code blocks, and any line where a natural break aids comprehension. Markdown files in this repo deliberately use the one-paragraph-per-line (no hard-wrap) style — the same convention Prettier emits by default and GitHub renders cleanly — so .md prose is exempt from the 100–115 rule entirely; rely on editor soft-wrap.
  • English only in code, config, and commit messages. All Python code, comments, identifiers, pyproject.toml comments, CI files, and commit messages must be English. The same rule that EverOS enforces with a pre-commit hook applies here. Content under docs/ must be English as well.

6. Branching & Commits

Branching: trunk-based (see DSPy / scikit-learn / instructor / pydantic — the four reference Python algorithm libraries all do this; no GitFlow).

  • main is the only long-lived branch. It is GitLab-protected (Settings → Repository → Protected branches): direct push is denied for everyone; the only path to land changes on main is via Merge Request.
  • Feature work happens on short-lived branches: feat/<topic>, fix/<bug>, docs/<topic>, refactor/<topic>. Open an MR → squash-merge into main.
  • Release = tag on main using SemVer per distribution: everalgo-clustering/v0.2.0. Each distribution has its own version cadence — the independent-versioning model used by google-cloud-python (many distributions sharing google.cloud.*, each on its own tag) and Apache Airflow providers; see docs/concepts/architecture.md and README.md "Cutting a release".
  • Maintenance branches (0.1.X-fixes) are introduced only when a published version needs back-ports; not by default.

Commit messages: Gitmoji + Conventional Commits. Format: <emoji> <type>(<scope>): <description>.

✨ feat(clustering): add cluster_by_llm decision prompt zh variant
🐛 fix(boundary): correct token count for emoji-only chat segments
♻️ refactor(rank): extract shared fusion helper from case / skill rankers
✅ test(user-memory): cover EpisodeExtractor tail-merge edge case
📝 docs(design): clarify cluster_previews shape

Allowed types: feat, fix, docs, style, refactor, perf, test, build, ci, chore, revert.

MR title is load-bearing. GitLab is configured (Settings → Merge Requests → Squash commit template = %{title}) so the MR title lands verbatim as the squash commit on main. MR titles must match the format above, because the release-notes generator (git cliff, see cliff.toml + README.md "Cutting a release") parses these messages to assemble per-distribution CHANGELOGs.

Scope = distribution name without the everalgo- prefix. Use clustering / rank / core / boundary / parser / user-memory / agent-memory / knowledge. For cross-cutting changes (CI, monorepo tooling, root docs), use ci / release / repo / design / docs as the scope or omit the scope entirely.

Squashing matters for per-distribution filtering. git cliff --include-path 'packages/everalgo-<name>/**' filters commits by changed paths. Squash merges keep one commit = one MR = one scoped Conventional-Commit message, which is the unit git-cliff groups by.

CHANGELOG [Unreleased] entry is part of the MR. Every MR that adds, changes, or removes user-visible behaviour must include a one-line entry in packages/everalgo-<dist>/CHANGELOG.md under ## [Unreleased]. Written by the MR author (the person with the most context), not reconstructed at release time. See docs/releasing.md "Keeping [Unreleased] up to date" for format and scope rules.


7. Adding a New Algorithm Operator

Follow this checklist when introducing a new extractor / ranker / clusterer:

  1. Pick the subpackage. Decide which packages/everalgo-<dist>/src/everalgo/<subpkg>/ the operator lives in based on its product axis (user_memory / agent_memory / knowledge) or tool axis (boundary / clustering / rank / parser). When in doubt, read docs/concepts/architecture.md
  2. Create the module. <subpkg>/<operator>.py — module-level functions or one stateless class. Operators need not subclass anything; if an operator consumes an injected client, type it against the relevant Protocol where that Protocol is defined (e.g. everalgo.llm.protocols.LLMClient, everalgo.rank.protocols).
  3. Write the prompt(s). If the operator calls an LLM, drop prompt strings as module-level constants in <subpkg>/prompts/en/<operator>.py (and zh/<operator>.py for the Chinese variant when applicable).
  4. Re-export the public surface. If the operator is part of the public API of its facade subpackage, add it to <subpkg>/__init__.py's re-export block and __all__. See docs/concepts/architecture.md for the re-export pattern.
  5. Wire dependencies. If the new code requires a new third-party library, add it via uv add --package everalgo-<dist> <library>, which updates the right pyproject.toml.
  6. Write tests. Use everalgo.testing.FakeLLMClient to avoid real API calls; use everalgo.testing.assert_*_shape for structural memory checks.
  7. Update the CHANGELOG. Add a one-line entry under ## [Unreleased] in packages/everalgo-<dist>/CHANGELOG.md describing the new operator (subsection ### Added). See docs/releasing.md for format.
  8. Run lint + format + type-check + tests locally before raising the MR (uv run ruff check . && uv run ruff format --check . && uv run mypy . && uv run pyright && uv run pytest).

8. Adding a New LLM Provider

Providers live inside everalgo-core's everalgo/llm/providers/<provider>/ (per ADR 004 — providers are nested in llm, not a separate distribution; the convention follows litellm / instructor / dspy / llama-index).

  1. Create everalgo/llm/providers/<provider>.py.
  2. Implement the LLMClient Protocol from everalgo.llm.protocols — a single async def chat(...) -> ChatResponse method (no sync variant, no streaming).
  3. Wire the provider into everalgo/llm/factory.py::build_client (it currently constructs OpenAICompatClient directly; add provider selection there — there is no separate routing.py).
  4. Map provider-native exceptions onto LLMError (chain via raise LLMError(...) from original).
  5. Add per-provider prompts only if the provider needs special formatting (rare — most providers are OpenAI-compatible).
  6. Add tests under packages/everalgo-core/tests/llm/providers/test_<provider>.py. No mocks at the HTTP layer when a real key is available in CI; otherwise use respx to record fixtures.
  7. Update the CHANGELOG. Add an entry under ## [Unreleased] in packages/everalgo-core/CHANGELOG.md (subsection ### Added). See docs/releasing.md for format.
  8. Update docs/concepts/architecture.md and AGENTS.md if the public surface changes.

9. Testing Guidelines

  • asyncio_mode = "auto" is set workspace-wide (see root pyproject.toml); plain async def test_*() works without decorators.
  • Use everalgo.testing.fake_llm for deterministic LLM replays. Do not stub at the HTTP layer in unit tests — that breaks when the provider tweaks its protocol.
  • Cross-package integration smoke tests belong in the workspace-root tests/ directory. Per-distribution unit tests should colocate under packages/everalgo-<name>/tests/ once a distribution grows enough mass (mirroring pydantic-ai's pydantic_ai_slim/tests/).
  • No real network calls in default pytest. Mark provider-network tests with @pytest.mark.integration and gate them behind an env var.

10. References

Subject Where
Architecture (definitive) docs/concepts/architecture.md
High-level architecture notes docs/concepts/
Runnable operator examples examples/ — use FakeLLMClient, no API key needed
Source of EverOS contract Confluence (internal)
uv workspace concepts https://docs.astral.sh/uv/concepts/projects/workspaces/
PEP 420 namespace packages https://peps.python.org/pep-0420/
PEP 8 (style) / 257 (docstrings) / 484 (type hints) https://peps.python.org/
Conventional Commits https://www.conventionalcommits.org/
Gitmoji https://gitmoji.dev/

11. Editing This File

This file is the contract between human engineers and AI assistants on this repo. When you change it, please:

  1. Keep it the canonical copy. CLAUDE.md and .cursorrules should remain symlinks.
  2. Cite a source for every concrete decision — a docs/concepts/architecture.md section, an ADR, a public spec / star-project URL. No groundless claims.
  3. Whenever the repository structure or workflow changes, update (layout) and-§4 (commands) in the same MR.