Start with the preset that matches your appetite for latency and churn:
conservative: smallest advanced surface, lowest ongoing spendbalanced: recommended default for most installsresearch-max: broadest shipped experimental surfacelocal-llm-heavy: biases expensive helper/extraction paths toward local inference
The original roadmap named budgets at a higher level than the current runtime surface. Use this mapping when tuning a live install:
| Roadmap knob | Live config surface |
|---|---|
maxRecallTokens |
maxMemoryTokens and recallBudgetChars |
maxRecallMs |
Use stage-specific limits such as recallPlannerTimeoutMs, conversationRecallTimeoutMs, and rerankTimeoutMs |
maxCompressionTokensPerHour |
maxCompressionTokensPerHour |
maxGraphTraversalSteps |
maxGraphTraversalSteps |
maxArtifactsPerSession |
No dedicated per-session cap; use verbatimArtifactsEnabled, verbatimArtifactsMaxRecall, and artifact-category scoping |
maxProactiveQuestionsPerExtraction |
maxProactiveQuestionsPerExtraction |
maxProactiveExtractionMs |
proactiveExtractionTimeoutMs |
maxProactiveExtractionTokens |
proactiveExtractionMaxTokens |
indexRefreshBudgetMs |
Use qmdUpdateMinIntervalMs, qmdUpdateTimeoutMs, and conversationIndexMinUpdateIntervalMs |
- Start with
memoryOsPreset: "conservative"ormemoryOsPreset: "balanced". - Keep
maxCompressionTokensPerHourandmaxProactiveQuestionsPerExtractionat their defaults until baseline recall is stable. - Keep
proactiveExtractionTimeoutMsandproactiveExtractionMaxTokenslow until you trust the second-pass memory additions. - Raise
maxMemoryTokensonly after you know which sections are providing real value. - Enable graph traversal only after checking that standard recall already finds the right seeds.
- If recall payloads are too large, lower
maxMemoryTokensbefore changing per-section budgets. - If ranking is too slow, lower
rerankTimeoutMsand keep rerank fail-open. - If transcript recall is too expensive, lower
conversationRecallTopKorconversationRecallMaxChars. - If compression-learning churn is too high, set
maxCompressionTokensPerHour: 0. - If proactive extraction is noisy, set
maxProactiveQuestionsPerExtraction: 0. - If proactive extraction is slow, lower
proactiveExtractionTimeoutMsorproactiveExtractionMaxTokens, or set either to0to hard-disable the second pass.
Two roadmap knobs remain intentionally unshipped as first-class fields:
- a single global
maxRecallMs - a dedicated
maxArtifactsPerSession
Engram uses stage-specific timeouts and artifact gating instead. That keeps the live runtime aligned with the actual retrieval paths rather than pretending everything can be bounded by one coarse timer.