Remnic is a multi-platform memory system. Keep these boundaries intact on every change:
@remnic/core,@remnic/server, and@remnic/cliown Remnic's core behavior. Core memory semantics, storage, retrieval, extraction, governance, and standalone operation must live there.- Core and standalone paths must not depend on OpenClaw, Hermes, or any future host. Host integrations may consume core. Core must not reach back into host SDKs, config shapes, or runtime lifecycles.
- Platform-specific behavior belongs in platform adapters only.
OpenClaw-specific code belongs in
packages/plugin-openclawplus the current rootsrc/compatibility wiring that still hosts OpenClaw runtime entrypoints today. Hermes-specific code belongs inpackages/plugin-hermes. Keep host logic thin and translation-focused. - Do not reinvent host-native features. If OpenClaw, Hermes, or another platform already provides a runtime capability, plugin hook, command surface, or extension primitive, use that real upstream contract instead of recreating a parallel Remnic abstraction.
- Verify host behavior against current upstream source and docs before implementing it. Issue text, old local docs, or remembered APIs are not enough for host-facing work.
Use these as the canonical starting points for adapter work:
- OpenClaw repository: https://github.com/openclaw/openclaw
- OpenClaw plugin docs: https://github.com/openclaw/openclaw/tree/main/docs/plugins
- OpenClaw SDK overview: https://github.com/openclaw/openclaw/blob/main/docs/plugins/sdk-overview.md
- OpenClaw SDK entrypoints: https://github.com/openclaw/openclaw/blob/main/docs/plugins/sdk-entrypoints.md
- Hermes Agent repository: https://github.com/NousResearch/hermes-agent
- Hermes Agent docs/site: https://hermes-agent.nousresearch.com
- Start from the host's current upstream contracts, then adapt Remnic core into them.
- Reuse upstream platform primitives when they exist; only add Remnic-owned glue where the host does not already solve the problem.
- Keep standalone and shared-core behavior testable without booting OpenClaw, Hermes, or another host.
- If a change touches both core semantics and a host adapter, land the core contract first and make the adapter consume it second.
Remnic must support OpenClaw releases from at least the previous 60 days.
Recalculate this window from the current date before changing OpenClaw adapter
metadata. For this May 31, 2026 PR, the required floor is April 1, 2026 /
OpenClaw 2026.4.1.
- Do not raise
peerDependencies.openclaw,openclaw.compat.pluginApi, oropenclaw.install.minHostVersionabove the active 60-day floor unless a documented upstream breaking change makes older hosts impossible to support. openclaw.compat.pluginApiandopenclaw.install.minHostVersionMUST be a single>=x.y.zcomparator — never a||list (issue #1450). OpenClaw's installer (clawhub.ts) splits the range on whitespace and AND-evaluates every token, so a||fails the check entirely; it also normalizes away the host prerelease suffix, so a single>=2026.4.1floor already admits stable AND prerelease hosts. Do NOT enumerate prerelease versions in these two fields.peerDependencies.openclawis the ONLY field that lists reviewed prereleases explicitly (>=x.y.z || <prerelease> || …). It is resolved by npm/node-semver, which supports||but excludes prereleases from a bare>=range — so the explicit entries are required there and there only. These two fields are intentionally decoupled by resolver; do not "align" them.- Preserve additive compatibility metadata for older hosts when adding newer
OpenClaw manifest surfaces. For example, keep
supportsandproviderAuthEnvVarswhile also adding newersetup.providers[].envVars. - If the latest OpenClaw prefers a newer manifest field, add it in parallel with older-compatible metadata whenever OpenClaw ignores unknown fields safely.
- Document the recalculated floor and any deliberate exception in
docs/plugins/openclaw.md,packages/plugin-openclaw/README.md,llms.txt, and the relevant package metadata tests.
These rules are the default workflow for all agents and contributors.
-
Keep PR scope narrow.
- One subsystem group per PR whenever possible.
- If work spans multiple groups, split it before review. The default split for memory-heavy work is:
- schema/surface contract changes
- storage/serialization/cache changes
- retrieval/planner/freshness behavior changes
-
Sync with
mainbefore the first serious review cycle.- Rebase or merge
mainbefore requesting AI review. - Do not let a PR drift for multiple review rounds and then merge
mainhalfway through unless forced by a conflict.
- Rebase or merge
-
Batch review fixes by subsystem.
- Re-scan unresolved comments, fix the whole subsystem, run verification once, then push once.
- Avoid serial micro-pushes that only expose the next adjacent invariant.
-
Run the local hardening gate before claiming review-clean.
- Always run
npm run preflight:quick. - If you touch
src/orpackages/remnic-core/src/orchestrator.ts,storage.ts,intent.ts,memory-cache.ts,entity-retrieval.ts, orconfig.ts, also runnpm run test:entity-hardening. - If Cursor CLI is available, run
npm run review:cursorbefore requesting external AI review.
- Always run
-
Treat external AI review as stale unless it matches the current head.
- Do not call a PR clean if the latest positive AI verdict targets an older commit.
- A merge-ready PR needs green checks, zero unresolved review threads, and a fresh positive AI verdict on the current head.
Reference workflow:
docs/ops/pr-review-hardening-playbook.md
PRs in retrieval, session identity, compaction, cache, or reset/end-of-session code often attract many review rounds for the same structural reason:
- The subsystem is stateful across multiple entrypoints.
- A local fix in one hook can break
before_reset,session_end, compaction, sparse metadata handling, remembered bindings, provider rebinding, or restart recovery.
- A local fix in one hook can break
- Reviewers probe different slices of the same state machine.
- One reviewer may catch provider detection drift.
- Another may catch lifecycle drain gaps.
- Another may catch stale-cache or replay behavior. These are usually adjacent invariant misses, not unrelated bugs.
- Comment-by-comment patching makes churn worse.
- If you only fix the literal review comment, the next review round often finds the neighboring invariant you did not model yet.
Required response:
- Stop and model the full contract first.
- Write the scenario matrix before changing code.
- Patch the subsystem coherently once.
- Add tests for the failure class, not just the reported instance.
- Run the hardening gate before asking for another review.
Minimum scenario matrix for session/retrieval/cache work:
- explicit provider identity
- sparse metadata with remembered binding
- sparse metadata without remembered binding
- provider rebinding
- restart/reload recovery
- compaction flush
before_resetsession_end- dedupe/replay behavior
If you cannot explain the behavior for every row in that matrix, the PR is not ready for external review.
These patterns were extracted from 60+ PRs across 2026-04-05 to 2026-04-12 (including deep analysis of PRs #343-#408 with 980+ review comments). Every item below was caught by a reviewer (Cursor Bugbot, Codex, or CodeQL) and required a follow-up commit to fix. Follow these rules to ship clean on the first push.
Reviewers repeatedly caught silent defaulting on invalid inputs. Never silently accept and reinterpret bad values.
- CLI flags must validate their argument exists —
--format jsonwhere--formathas no value must throw, not silently default. - Enum/config values must be validated against an explicit allow-list — when
adding a new accepted value (e.g.,
"low"foractiveRecallThinking), add it to the validation schema AND the config parser. - Numeric inputs must be type-checked — port values must be finite integers
in [1, 65535]; reject
"abc"and3.7rather than truncating. - Date/timestamp parsing must guard overflow — reject inputs that would
overflow
Datebounds instead of producingInvalid Date.
The Engram→Remnic rename touched every surface. Every rename PR required follow-up fixes for missed references.
- Search the entire codebase when renaming anything —
grep -ri oldnameacross all files including docs, tests, lock files, changesets, hooks, and CI configs. - Always add a legacy fallback chain — env vars:
REMNIC_FOO→ENGRAM_FOO; config keys: tryremnicblock first, fall back toengramblock. - Update lock files when changing workspace dependencies — changing
workspace:*specifiers or package names without runningpnpm installbreaks the lock file. - Changeset files must reference current package names — stale package IDs
in
.changeset/will cause release failures. - Hook scripts must use the current plugin name in error messages and paths.
CodeQL and Bugbot repeatedly flagged these patterns.
- Never interpolate unsanitized values into shell commands — pass host/port via environment variables, never via string interpolation into script strings.
- Restrict file permissions on auth tokens — config files containing tokens
should use
0600permissions. - Block symlink traversal in directory scans — when scanning
artifacts/or memory directories, reject symlinks that resolve outside the allowed root. Reject symlinked root directories entirely. - Validate external inputs at system boundaries — profile values, connector IDs, and config paths must be sanitized before filesystem operations.
Token store failures, daemon unavailability, and filesystem errors must not block the primary operation.
- Wrap token/external-service operations in try-catch — if
generateToken()fails, the install should still complete with a note to run token generation manually later. - Write rollback manifests BEFORE migration markers — if rollback metadata write fails, the system must not think migration succeeded.
- Use AbortController for timeout-able async operations — timed-out
before_resetflushes must abort the in-flight extraction before buffer clearing, so late flushes cannot clear turns buffered after reset proceeds. - Guard refcount operations against double-decrement — track whether
increment happened before decrementing; use a
didCountStartflag.
Multiple plugin instances can coexist; globals must be scoped.
- Scope singletons per plugin ID — runtime orchestrator mirrors, CLI dedupe
guards, and capability caches must be keyed by
serviceId, not stored as bare globals. - Scope extraction deduplication by session/buffer key —
shouldQueueExtractionmust fingerprintbufferKey + normalizedTurnText, not just turn text, so parallel sessions don't suppress each other's extractions. - Cache writes and reads must use consistent formats — if the hook path
writes
{version, data}and the section path readsdatadirectly, they will diverge.
Reviewers caught multiple tests that passed vacuously.
- Never write assertions on empty arrays —
expect(result).toEqual([])passes trivially; assert on non-empty expected data or assert the function was called. - Don't assume filesystem ordering —
readdiris not guaranteed to be alphabetical; sort explicitly before comparing. - Clean up ALL global state in test teardown — including unkeyed globals
like
__openclawEngramOrchestratormirror keys inresetGlobals(). - Test error paths — for every
try/catchadded in production code, add a test that forces the error path and asserts recovery behavior. - Don't use fragile CWD-relative paths — use
import.meta.dirnameorpath.resolve(__dirname, ...)instead of assuming CWD.
Every doc PR required follow-up fixes for stale references.
- Code examples must reference current variable names — after a rename, search all code blocks in docs for the old name.
- CLI command examples must use current commands —
remnic connectors install, notengram connectors install. - Hook templates must use current env var chains — match the real hook
scripts'
REMNIC_* → ENGRAM_*fallback precedence. - Architecture diagrams must use current labels — "Remnic Orchestrator", not "Engram Orchestrator".
Reviewers flagged unreachable branches and unused exports.
- Remove unreachable branches — if a non-recursive flag makes a branch unreachable, delete it rather than leaving dead code.
- Don't duplicate helpers across packages — if
toolJsonResultexists in two tool files, extract to a shared utility. - Remove dead switch cases — after normalizing tool names, remove the old case rather than leaving it to silently never match.
The slot-based config resolution pattern (slot → PLUGIN_ID → LEGACY_PLUGIN_ID)
was independently reimplemented in 5+ locations with divergent guard styles,
causing inconsistent behavior during migration.
- Extract config resolution into a single shared module —
resolveRemnicPluginEntrymust be the one source of truth; all callers (access-cli, operator-toolkit, materialize.cjs, src/index.ts) must import from it. - Validate that resolved plugin IDs belong to Remnic — a foreign plugin's
config can be read and applied to Remnic when
slots.memorypoints elsewhere. Always checkresolvedId === PLUGIN_ID || resolvedId === LEGACY_PLUGIN_ID. - Maintain legacy flat-config fallback — developer-mode configs where the top-level object IS the plugin config must still resolve correctly.
- Keep env var priority consistent — primary
REMNIC_*/OPENCLAW_*must be checked before legacyENGRAM_*/OPENCLAW_ENGRAM_*everywhere.
Node.js fs functions do NOT expand ~. Multiple PRs had path-related bugs.
- Expand
~consistently withexpandTilde— never use ad-hoc regex likepath.replace(/^~/, homedir())which incorrectly matches~user/prefixes. Use the sharedexpandTilde()for all user-facing path inputs:memoryDir,--config,OPENCLAW_CONFIG_PATH,--memory-dir. - Validate path type before using —
existsSyncreturns true for files too; usestatSync().isDirectory()when a directory is expected. Reject file paths used asmemoryDir. - Fail fast on invalid JSON config — when
openclaw.jsonexists but cannot be parsed (or parses tonull/ non-object), surface an error instead of silently returning{}which then overwrites the file destroying all settings. - Validate
plugins.entriesshape — check it's a plain object, notnull, array, number, or string before usinginoperator or property access.
Changing a function signature is a high-risk operation that consistently required follow-up fixes.
- Search ALL code including evals, tests, and adapters — when changing
addTurn(role, content)toaddTurn(sessionId, turn), search not justsrc/butevals/,tests/, andpackages/*/for old-form call sites. - Add a deprecation path for public APIs — if the function is exported, add a compatibility wrapper that maps old args to new with a deprecation log, rather than breaking silently.
- Update test helpers to match production behavior — if production code
gates on a
migrateLegacyflag, the test helper must read the same flag instead of unconditionally executing.
Multiple sort comparators never returned 0, causing non-deterministic
ordering that broke diffs and automation.
- Sort comparators must be well-formed — return
-1,0, or1. Never return1for both orderings of equal items. Whena.updatedAt === b.updatedAt, return0or use a stable secondary key (e.g.,id). - Non-deterministic output breaks downstream — top-N slices from unstable sorts produce different results across runs, making briefings, reports, and diffs unreliable.
- Test sort stability explicitly — sort a list with duplicate keys and assert the output is identical across multiple invocations.
When content is transformed before persistence (e.g., citation injection, timestamp appending), hash operations must consistently use either raw or transformed form — never a mix.
- All hash-index operations must use the same content form — if writes
hash
rawContent, reads and dedup checks must also hashrawContent, notcitedContent(which includes timestamps). - Beware of double-hashing — if
contentHashIndex.remove()internally hashes its argument, passing an already-hashed value produceshash(hash(x))which never matches stored entries. - Don't mix
contentHashSourceand direct hashing — if one write path passescontentHashSource: rawContentand another omits it (causing the index to hash the persisted form with timestamp), dedup breaks.
PR #400 had 20+ review rounds on connector lifecycle. The dominant pattern was destroying valid state before confirming the replacement is viable.
- Don't rotate/destroy tokens before confirming the new config write succeeds
— if
generateToken()revokes the old token, thenupsertHermesConfigorcommitTokenEntryfails, the user is left with a revoked token and no working config. Always confirm the new state before destroying the old. - Don't clean up old profile config before new profile write succeeds — if
removeHermesConfig(oldProfile)runs beforeupsertHermesConfig(newProfile)succeeds, a partial failure leaves neither profile configured. - Persist rollback data BEFORE writing success markers — if
.rollback.jsonwrite fails, a.migrated-from-engrammarker creates a false success signal. - Don't write connector JSON with a new token before confirming token store
commit —
connector.jsonholding a token the daemon doesn't recognize creates an invisible auth mismatch.
Reviewers repeatedly flagged cross-package relative imports that bypass the public export surface.
- Import via package name, not relative path — use
import { X } from "@remnic/core"notimport { X } from "../../../remnic-core/src/foo.js". A directory rename or build-output change in the target package silently breaks the import. - Shim packages must own their runtime identity — when a shim re-exports
pluginDefinition, itsregister()must use its ownLEGACY_PLUGIN_ID, not the inheritedPLUGIN_ID. Module-level constants are captured at import time, not overridden by object-spread. - Config loaders must ALL agree on lookup semantics — if
access-cli.tsuses ternary+??fallback andsrc/index.tsuses early-return, they diverge during migration when both entries exist. One shared resolver, one pattern.
Reviewers caught features that unconditionally transformed behavior without any escape hatch or configuration gate.
- Procedural memory (issue #519) — All runtime behavior is behind
procedural.enabled(defaultfalse). Docs:docs/procedural-memory.md. When changing extraction, recall injection, or mining paths, keep gates aligned with that flag and the nestedprocedural.*knobs inparseConfig. - Add an
enabledcheck or escape hatch for every new filter/transform — if a new recall filter unconditionally removesdream/proceduralmemories, users can never search for them even when the feature is disabled. Mirror the pattern: lifecycle filters haveenabledchecks; new filters must too. - Force reinstall must merge from existing config — when
--forceis used without re-supplying--config profile=..., hard-resetting to defaults silently loses the user's configured profile/host/port. Read the existing stored config first and merge. - Guard slot-based lookups against foreign plugin IDs — if
plugins.slots.memorypoints to a non-Remnic plugin, the lookup must reject it rather than silently applying a foreign plugin's settings to Remnic. Always validateresolvedId === PLUGIN_ID || resolvedId === LEGACY_PLUGIN_ID.
Multiple PRs had bugs from JavaScript's numeric quirks and CLI string→number coercion issues.
- Guard
slice(-maxEntries)againstmaxEntries === 0—entries.slice(-Math.max(0, 0))producesslice(-0)which equalsslice(0)and returns ALL entries. Always checkif (maxEntries <= 0)before negation. - CLI values arrive as strings —
--config port=5555produces"5555", not5555. Type guards liketypeof prev?.port === "number"reject saved values on reinstall. Always coerce at the input boundary withNumber(port)+ validation, then store as the expected type. - Reject non-integers explicitly —
Number.isFinite(4318.9)is true but silently truncating to a different port is a misconfiguration. UseNumber.isInteger()when integers are expected.
Reviewers caught a critical bug where explicit flush operations (session flush, before_reset) were suppressed by the same deduplication that guards automatic extraction.
- Explicit flushes must pass
skipDedupeCheck: true— if a prior extraction attempt failed/timed out but left the buffer intact, the dedupe fingerprint still exists. A subsequent force-flush must not be suppressed by stale dedup state. - Buffer key must be propagated through all extraction paths — if
ingestReplayBatchcallsqueueBufferedExtractionwithoutbufferKey, the default"default"key is used, clearing the wrong buffer on success. - Don't health-check with uncommitted tokens — if
commitTokenEntryfails or is skipped,checkDaemonHealthsends an unknown token, gets 401, waits 6 seconds on retry, and reports a misleading "not reachable" message.
Reviewers caught host-prefixed files living in core packages, violating the
stated architecture boundary that @remnic/core must not depend on any host.
- Never prefix core files with host names —
openclaw-recall-audit.tsin@remnic/coreviolates the boundary rule even though the file itself contains no OpenClaw-specific logic. The prefix creates confusion about where host-specific code belongs and signals a wrong dependency direction. - Generic audit/log modules belong in core without host prefixes — rename
to
recall-audit.tsor similar. If host-specific behavior is needed, the host adapter extends or wraps the core module. - When in doubt, check the architecture boundary rules — Section 1 of this document states: "Core and standalone paths must not depend on OpenClaw, Hermes, or any future host." File names are part of this contract.
Multiple parsers used content.indexOf(line) to compute source offsets, which
returns the first occurrence rather than the current parsing position.
- Track character position during iteration — when parsing structured text
(heartbeat blocks, task lists), maintain a running
offsetvariable that advances with each line/section processed, rather than re-searching from the start withindexOf. indexOfon repeated content is wrong — if the same line text appears earlier in the content (e.g., a repeated indentation pattern or comment),indexOfreturns the position of the first occurrence, making the offset point to the wrong location.- This applies to all line-based parsers — not just heartbeat parsing. Any parser that needs error-reporting positions or source mapping must track its own position during iteration.
Reviewers caught test mocks that defined functions with fewer parameters than the production interface, making tests pass vacuously.
- Mock signatures must match the production interface exactly — if the
production interface declares
getLastRecall(sessionKey: string), the test mock must accept and use thesessionKeyparameter, not define a zero-argument function that ignores it. - Verify mock parameter usage in assertions — for per-session dispatch (command handlers, keyed lookups), test that different session keys produce different results. A mock that always returns the same value masks that per-session dispatch is broken.
- Interface changes must propagate to test mocks — when a production
function signature changes (e.g., adding a
sessionKeyparameter), grep all test files for the old signature and update mocks to match.
When a backend call returns an empty result (e.g., no matching embeddings) versus when it fails (timeout, error, 5xx), the code must NOT conflate both cases into the same return path. Reviewers caught 5+ instances in PR #399 alone.
- Return distinct sentinel values for "empty" vs "failed" — if
search()returns[]for both "index is empty" and "embedding endpoint returned 5xx", callers cannot short-circuit on genuine failures. Use a result object like{ok: true, results: []}vs{ok: false, error: "backend_unavailable"}. - Batch operations need failure detection — when processing many items, a single backend failure should be distinguishable from "no candidates found" so the batch can stop paying timeouts on every subsequent item.
- Telemetry and dashboards depend on correct categorization —
reason: "no_candidates"from a genuinely empty index is a healthy signal.reason: "backend_unavailable"from a timeout is an alert. Conflating them masks outages.
When filtering data by time ranges, code must consistently use [start, end)
(half-open) interval semantics. Reviewers caught 6+ instances of inclusive upper
bounds causing double-counting at exact boundaries in PR #396.
- Upper bounds must be exclusive (
<) not inclusive (<=) — a memory timestamped at exactly midnight should appear in only one day's briefing, not both yesterday's and today's. Whentois documented as exclusive, the filter must usets < toMs. - Date-only comparisons need careful handling — a "floating" event with
endDateas a date string (no time component) must not be treated as active on the end date itself when the contract says[start, end). Convert date-only values to the start of the next day for exclusive-end comparisons. - Test boundary conditions explicitly — include test cases with timestamps at exact boundary values (midnight, start-of-day, end-of-day) to catch inclusive/exclusive confusion.
CLI flags pass values as strings: --config installExtension=false produces the
string "false", not the boolean false. Code that checks !== false treats
"false" as truthy, silently ignoring the user's explicit opt-out. Reviewers
caught 4+ instances across PRs #394 and #397.
- Coerce boolean-like strings at config-read boundaries —
"false","0","no","off"must be treated as falsy. Use a sharedcoerceBool()helper that normalizes these string representations. !== falseis NOT a boolean gate — when config values come from CLI or persisted JSON, they may be strings. Use explicit coercion or a Zod boolean transform rather than relying on JavaScript truthiness.- Test with string-typed config values — every config gate test should include
cases where the value is the string
"false"and"0", not just the booleanfalse.
When a storage manager maintains multiple caches (hot memory, cold tier, hash
index), the invalidation function must clear ALL of them. Reviewers caught cases
where invalidateAllMemoriesCache() only cleared the hot cache but left the cold
cache stale, despite comments claiming it cleared both (PR #402).
- Name invalidation functions precisely — if a function only clears one cache
layer, name it
invalidateHotCache(), notinvalidateAllMemoriesCache(). - Verify invalidation covers all layers — when adding a new cache layer, grep for all invalidation functions and add the new cache to each one.
- Don't invalidate before reads that need the cache — calling invalidation before a read that populates the cache defeats the caching purpose. Invalidation should happen after writes, not before reads.
When building a hash or serialized string from object properties, Object.entries()
preserves insertion order. Two semantically identical objects constructed differently
produce different hash strings, silently bypassing deduplication (PR #402).
- Sort object keys before serializing for hashing — use
Object.keys(obj).sort().map(k => ...)orJSON.stringify(obj, Object.keys(obj).sort())to ensure deterministic serialization regardless of insertion order. - This affects all dedup/content-hash operations — if structured attributes
like
{city: "NYC", country: "US"}vs{country: "US", city: "NYC"}produce different hash strings, deduplication silently fails. - Test with different key orderings — when testing dedup, include test cases where the same data is represented with keys in different orders.
When a feature flag (e.g., temporalSupersessionEnabled) controls behavior, ALL
recall paths (QMD search, recent-scan fallback, cold fallback) must implement
the gate identically. Reviewers caught divergent gating across code paths in
PR #402 (4 instances).
- Enumerate every code path when adding a feature gate — list all recall/search paths and verify each one checks the same flag in the same way.
- Enable-then-disable must revert cleanly — if a user enables a feature, runs for a while, then disables it, all paths must behave as if the feature never existed. Partial gating leaves stale artifacts that only appear on some paths.
- Test each path independently with the flag on AND off — don't just test the primary path. Each fallback path should have explicit tests for both flag states.
The writeChain = writeChain.then(async () => { ... }) serialization pattern in
session-toggles.ts permanently broke all future writes after the first I/O error.
A rejected promise in the chain prevents all subsequent .then() callbacks from
executing for the process lifetime. PR #408.
- Always add
.catch()recovery to serialized promise chains — afterwriteChain = writeChain.then(...), ensure the chain resets to a resolved state so a single failure doesn't poison all subsequent operations. - Surface the failure to the current caller but unblock future callers —
use a pattern like
writeChain = writeChain.then(fn).catch(err => { throw err; })or a dedicatedqueueWrite()wrapper that recovers the chain after rejection. - Test serialization resilience explicitly — force a write failure in a test, then verify the next write on the same instance succeeds.
In ingestReplayBatch, the loop used for (const sessionTurns of bySession.values())
but then referenced bufferKey: key where key was undefined. The loop needed
.entries() to destructure both key and value. PR #408 (High Severity).
- Match the iterator method to the data you need —
.keys()for keys only,.values()for values only,.entries()for both. Never reference a variable from an outer scope when the loop doesn't bind it. - TypeScript strict mode catches this — ensure
noImplicitAnyandstrictare enabled so referencing an undefined variable in the block is a compile error. - Grep the entire function body for variables used but not declared locally
— if a loop body references
keyoridthat isn't in its destructuring pattern, it's either undefined or from an outer scope, both likely wrong.
recallForActiveMemory searched across all namespaces (no namespace constraint)
while getMemoryForActiveMemory read from default storage only. In multi-tenant
deployments, search could return IDs from non-default namespaces that get operations
would fail to resolve. PR #408 (P1 severity).
- Read and write paths must resolve through the same namespace layer — if search goes through namespace-aware resolution, get/delete must too.
- Cross-tenant data exposure is a security risk — un-namespaced search in multi-principal deployments can leak data between tenants. Always constrain search scope via session-derived namespace resolution.
- Test with multiple namespaces — create test fixtures with data in different namespaces and verify each session only sees its own data.
The heartbeat import path wrote procedural memories directly to storage but didn't trigger any reindex step. Because active-memory search is QMD-backed, newly imported entries were not discoverable until unrelated maintenance happened. PR #408 (P2 severity).
- After writing data that needs to be searchable, trigger reindex — direct storage writes bypass the normal extraction→persist→index pipeline, so they must explicitly call the reindex step.
- Verify discoverability in tests — after writing data, perform a search and assert the new data is findable. Tests that only check file existence miss index staleness.
- Document all direct-write paths — any code that bypasses the normal write pipeline should be flagged as needing manual reindex triggers.
In the semantic dedup guard (PR #399), when a fact was rejected by the
importance gate or semantic dedup check, fact.content was still added to
contentHashIndex. The index accumulated phantom entries for content that
doesn't exist in storage, causing false dedup matches on subsequent extractions.
- Only add to index AFTER successful persistence — move
contentHashIndex.add()calls to after the write succeeds. If a dedup check, importance gate, or other filter rejects content before persistence, the index must remain untouched. - Phantom index entries cause silent data loss — a phantom entry causes the next extraction with similar content to be dedup-suppressed against a non-existent stored fact, effectively losing the new extraction silently.
- Test index consistency after rejection paths — force a dedup/importance rejection in a test, then verify the index does not contain an entry for the rejected content.
In PR #399, semanticDedupCandidates was documented as "set to 0 to disable"
but the JSON schema had minimum: 1 and the code clamped to Math.max(1, ...).
Users following docs to disable the feature got silently overridden to minimum 1.
- When a config value can disable a feature, schema AND code must accept 0 —
if docs say "set
maxCandidatesto 0 to disable", the JSON schema must setminimum: 0(not1), and the code must handle the0case (typically by short-circuiting before the operation). - Zero-value semantics are a compatibility contract —
enabled=falseand0limits are user-facing guarantees. Coercing0to1violates the documented contract silently. Test with the documented disable values. - Validate schema against documented behavior in CI — the
check-config-contractscript should flag when a config property's schemaminimumcontradicts the documented disable value.
In PR #401, templateMatcher built a regex from only the prefix (before first
placeholder) and suffix (after last placeholder) of a citation template. When
both were empty (a template consisting of only a placeholder), the resulting
regex matched everything. Additionally, special $ patterns in regex replacement
strings corrupted citation output.
- Escape all literal template parts before embedding in regex — use
String.raworescapeRegex()on prefix/suffix before building the pattern. Never assume template parts are regex-safe. - Test with empty prefix/suffix — a template like
{{tag}}with no surrounding literal text must not produce a match-everything regex. - Escape
$in replacement strings —String.replacewith a regex treats$',$`,$&,$1, etc. as special in the replacement string. Use a replacement function or escape$→$$before passing to replace.
PR #347 had a single mutable clientInfo object shared across all MCP connections.
When one connection set its clientInfo, the value bled into all other active
connections. In multi-tenant deployments this is a cross-tenant data leak.
- Each connection/session must own its mutable state — if
resolveAdapter()writes to a sharedclientInfoobject, two concurrent connections see each other's adapter metadata. Use per-connection instances or deep-copy before storing. - Shared state is distinct from global singletons — pattern #5 covers singleton scoping by plugin ID. This pattern covers mutable objects shared across connections within the same plugin instance. Both are needed.
- Test with concurrent connections — create two sessions, set different adapter data on each, and verify neither sees the other's data.
In PRs #344 and #345, a feedback decision enum silently defaulted to
"approved" when the value was undefined or missing. This means a missing
rejection is treated as approval — a security vulnerability. PR #343 had a
similar issue with qmdDebug passing an object instead of a string to a method
that expected a string, silently producing wrong debug output.
- Enum defaults must be the safest option, not the most convenient — when
a decision/status enum value is missing or unrecognized, default to
"rejected","pending","disabled", or"none"— never"approved","enabled", or"active". - Never silently coerce unexpected types — if
qmdDebugreceives an object where a string is expected, throw or log a warning. Don't silently stringify as[object Object]. - Test with missing/undefined enum values — every enum parser should have
test cases for
undefined,null,"", and unrecognized string values, and each must assert the default is the least-privileged option.
In PR #392, duplicate rollout slugs caused an ENOENT crash: the first rename moved the file, then the second rename tried to move the same (now non-existent) source. When processing batches of file operations, duplicate identifiers in the input cause the second operation to fail.
- Deduplicate batch operation inputs before execution — before processing a list of rename/move/delete operations, check for duplicate source or target identifiers. Either deduplicate (keep the last) or fail fast with a clear error.
- Verify source exists before each move — in a batch loop,
statSyncthe source file before attempting to move it. If it was already moved by a duplicate entry, skip or error rather than crashing. - Test batch operations with duplicate inputs — include test cases where the input list contains duplicate identifiers and verify the behavior is deterministic (not dependent on filesystem ordering).
In PR #349, the Hermes Python CI workflow used patterns that hid test and type failures, making them invisible to reviewers. Broken tests that passed CI were caught only by manual review.
- Never use
|| trueon test/type-check commands in CI — ifpytest,mypy,tsc, or equivalent commands fail, the CI step MUST fail. Silencing failures with|| trueor missingset -emeans broken code passes CI. - Each language's quality gate must be a separate CI step — don't bundle
ruff check && mypy && pytestinto a single script withset -eat the top and then call it with|| true. Make each a distinct step so failures are visible in the CI UI. - Audit CI workflows for failure suppression — grep all workflow files for
|| true,continue-on-error: true, and missingset -ein shell scripts. These should only exist on intentional tolerance (like cleanup steps), never on quality gates.
PR #396 had 10+ instances where invalid CLI flags (--format jsno), MCP
parameters, briefing window tokens, and format values silently fell back to
defaults instead of being rejected. While pattern #1 covers CLI flag validation,
this pattern addresses the broader issue of accept-then-default behavior in
ALL input surfaces (CLI, MCP, config, API).
- Invalid values must be rejected, not silently reinterpreted — when
--format jsnois provided, throw an error listing valid formats. Don't silently fall back toconfig.briefing.defaultFormat. The user explicitly chose a value; ignoring it hides configuration mistakes. - MCP/API surfaces must validate exactly like CLI surfaces — when a tool parameter is invalid, return a clear error, not a result computed with default values. MCP callers (agents) cannot tell the difference between a valid response and a silently-defaulted response.
- Missing flag arguments must fail, not default —
--sincewith no value must error, not fall back toconfig.briefing.defaultWindow. The user's intent is ambiguous, not "use the default". - Briefing window tokens must reject unrecognized values — when
sincecontainsgarbage, don't silently fall back toyesterday. The caller should know their input was invalid.
PR #396 had 3 instances where validation accepted values that downstream code
never handled. BRIEFING_FORMAT_ALLOWED included "text" but the format
resolution only handled "markdown" and "json". Dead switch cases after
name normalization. Legacy tool schemas inheriting updated descriptions.
- Validation allow-lists must exactly match handled values — if a format
validator accepts
"text","markdown", and"json", the downstream code must handle ALL three. Any value accepted by validation but unhandled in code produces undefined behavior (typically silent fallthrough to default). - Dead switch cases after normalization must be removed — if tool names
are normalized from
remnic.*toengram.*, acase "remnic.briefing":branch is dead code that can never match. Remove it rather than leaving it to silently never execute. - Legacy wrappers must override ALL inherited fields, not just names —
when creating a legacy tool schema from a primary schema, override both
nameANDdescription. Otherwise the legacy tool advertises the new branding in its description while using the old name, confusing clients. - Test that every accepted value produces correct behavior — for each value in an allow-list, write a test that passes it through the full pipeline and verifies the output matches the expected behavior for that specific value.
PR #396 had 3 instances where status-based filters only checked some non-active
states (e.g., filtering superseded and archived but not quarantined,
rejected, or pending_review). Incomplete filtering causes stale, rejected,
or quarantined data to appear in user-facing outputs like briefings.
- When filtering by status, enumerate ALL non-active states — if a filter
excludes
supersededandarchived, it must also excludequarantined,rejected, andpending_reviewunless explicitly intended. Use anisActivehelper that checks a single set, not an ad-hoc exclusion list. - Define the "active" set explicitly, not the "inactive" set — rather
than listing states to exclude, define the states to include:
if (!ACTIVE_STATUSES.includes(memory.status)) continue;. This prevents new states from accidentally flowing through. - Test with every known status value — create a test fixture with memories in each known status and verify the filter produces the correct subset.
- When adding a new status, update ALL filters — grep for every status filter in the codebase and add the new status to the appropriate inclusion or exclusion set.
PR #394 had 2 instances where code deleted an existing file/directory before writing the replacement. If the write fails after the delete succeeds (e.g., permissions, disk full, cross-device rename), the old data is permanently lost with no recovery path.
- Never
rmSyncthenrenameSync— use the reverse order — write the new content to a temp location first, then rename it over the target. On most filesystems,renameSyncis atomic, so the target always exists in a valid state. If the write to temp fails, the original remains intact. - Backup before destructive operations — when replacing a config file,
copy the old content to a
.bakfile first. If the new write fails, restore from backup. Clean up the backup after confirming success. - Verify write success before cleanup — if you must delete old data
(e.g., removing a temp directory after successful rename), verify the
rename succeeded before cleaning up the source.
renameSynccan fail on cross-device moves. - Test the failure path — mock
renameSyncto throw afterrmSyncsucceeds and verify the error is handled and data is recoverable.
Remnic ships as a family of composable packages. The architectural contract
is that users install only what they use: @remnic/core alone, core plus
@remnic/plugin-openclaw, core plus @remnic/export-weclone, or all three.
A PR that forces an optional package into a base install surface breaks
this contract even if every test passes, because the breakage only shows
up at npm-install time for someone who didn't want that optional surface.
- Load optional packages via computed-specifier dynamic imports. Never
do
import { X } from "@remnic/bench"in a base install surface (CLI, core, plugin-openclaw). Useawait import("@remnic/" + "bench")so the bundler cannot statically resolve the module and pull it into the bundle. Wrap in a loader helper (loadBenchModule()) that throws a user-facing install hint on miss. Canonical patterns:packages/remnic-cli/src/optional-bench.ts,packages/remnic-cli/src/optional-weclone-export.ts,packages/remnic-core/src/cli.ts:ensureBuiltInBulkImportAdapters. - Declare as optional peer deps, not
dependencies. Optional companions go underpeerDependencieswithpeerDependenciesMeta.<name>.optional = true. If you put them underdependencies, npm install of the base package pulls them in and the à-la-carte model is gone. - Never add to
noExternal.packages/remnic-cli/tsup.config.tsmustexternalany optional package (or simply omit it fromnoExternal). A past regression listed@remnic/benchand@remnic/export-wecloneundernoExternal, which bundled them into every CLI install even for users who never ranremnic bench *. - Publish every surface users are told to install. Any package that
docs, error messages, or install hints mention must actually exist on
npm. Keeping a package
"private": truewhile recommending it in a CLI install hint is a bug — ship it (update.github/workflows/release-and-publish.yml PUBLISH_ORDER) or stop recommending it. - Verify both paths end to end. When you touch tsup configs, optional-
loader modules, or the publish workflow, verify that:
npm install @remnic/clisucceeds without the optional packages.- Running an optional command without the package throws the install
hint — not a raw
MODULE_NOT_FOUND. - Installing the optional package and rerunning the command works.
PRs #397 and #398 had 3 instances where documentation claimed behavior that
the code didn't implement. Docs said remnic.timeout applied to daemon calls
but the provider never forwarded the timeout parameter. A publish workflow
allowed dispatching from any branch without branch protection.
- Every documented behavior must have a corresponding test — if docs say "timeout is applied to all daemon calls", write a test that verifies the timeout parameter reaches the daemon client constructor. Without a test, documentation drifts from implementation silently.
- CI workflows must validate their trigger constraints — if a publish
workflow should only run on
main, addif: github.ref == 'refs/heads/main'to the job, not just to the trigger. Manualworkflow_dispatchcan target any branch, bypassing branch-only triggers. - When adding a config property, wire it end-to-end — adding
timeoutto the config schema but not passing it through the provider to the client means users set a value that has no effect. Thecheck-config-contractscript should flag config properties that are defined in the schema but never read in code. - Test that documented config properties are consumed — for each config property in the schema, write a test that sets it and verifies it affects the documented behavior. Missing tests mean the property may be silently ignored.
Remnic gives AI agents long-term memory that persists across conversations.
If you touch retrieval/planner/cache/config logic, you must run the hardening gate in:
docs/ops/pr-review-hardening-playbook.md
This is mandatory before claiming a PR is review-clean.
Treat these as non-negotiable engineering constraints for this plugin:
-
Recall pipeline order is a contract:
- retrieve candidate headroom
- apply policy filters (namespace/status/path/type)
- rerank/boost
- cap to user-facing budget
- format and inject Never cap before final filtering for the section users consume.
-
Artifact isolation: Artifacts must flow only through the dedicated verbatim-artifact path. Generic QMD/embedding memory recall must exclude
artifacts/paths. -
Planner mode semantics:
no_recall,minimal,full, andgraph_modeare behavioral contracts.- each mode must be reachable
no_recallmust gate all fallback pathsminimalmust actually cap retrieval size
-
Config is runtime API:
enabled=falseand0limits are compatibility guarantees, not hints. Never coerce0to non-zero. Keep write-time/read-time behavior symmetric. -
Intent heuristics must be morphology-aware and precedence-tested: Regex-based intent extraction must handle common conjugations/variants and avoid accidental mismatches. Add tests for representative natural language variants, not only base forms.
-
Cache invariants:
- cache versions must be shared per memory directory when multiple instances can read/write
- cache timestamps must reflect rebuild completion time
- cache must persist negative lookups where useful (e.g., missing IDs) to avoid rebuild loops
- concurrent writes during rebuild must not publish stale snapshots
-
Fallback parity: Any retrieval-policy rule applied in primary search must be mirrored in fallback search paths.
If you change src/orchestrator.ts, src/storage.ts, or src/intent.ts, include/adjust tests for all impacted invariants:
- planner reachability and gating
- zero-limit semantics
- cap-after-filter behavior
- artifact-path isolation
- cache coherence across instances and concurrent writes
- heuristic variant coverage (intent phrases/conjugations)
Think of it like a personal assistant who:
- Remembers everything you've told them
- Learns your preferences and patterns
- Can recall relevant context when you ask about something
- Never forgets, but updates outdated information
Without memory, every conversation starts fresh. Agents forget:
- Your name and preferences
- Previous decisions and context
- Projects you're working on
- People and companies you've mentioned
With Engram:
- Agents recall relevant context automatically
- Profile captures your preferences
- Facts, entities, and relationships are tracked
- Contradictions are detected and resolved
┌─────────────────────────────────────────────────────────────┐
│ OpenClaw Gateway │
│ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ Agent Turn │ │
│ │ │ │
│ │ 1. User sends prompt │ │
│ │ ↓ │ │
│ │ 2. ENGRAM: Recall relevant memories (→ inject) │ │
│ │ ↓ │ │
│ │ 3. Agent processes (with memory context) │ │
│ │ ↓ │ │
│ │ 4. ENGRAM: Buffer turn for extraction │ │
│ │ ↓ │ │
│ │ 5. (Periodically) Run extraction → persist │ │
│ │ │ │
│ └─────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────┐ ┌─────────────────────────────┐ │
│ │ Engram │ │ Storage │ │
│ │ Orchestrator │◄──►│ facts/ entities/ profile │ │
│ └────────┬────────┘ └─────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────┐ ┌─────────────────────────────┐ │
│ │ GPT-5.2 │ │ QMD │ │
│ │ (extraction) │ │ (search: BM25 + vector) │ │
│ └─────────────────┘ └─────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
The plugin:
- Injects memory - On
before_agent_start, searches for relevant memories and adds to system prompt - Buffers turns - On
agent_end, captures the user/assistant exchange - Extracts facts - Uses GPT-5.2 to extract facts, entities, and profile updates
- Stores memories - Persists to markdown files with YAML frontmatter
- Consolidates - Periodically merges, updates, and cleans memories
| Type | What It Is | Storage Location |
|---|---|---|
| Fact | A single piece of information | facts/{date}/ |
| Entity | A person, place, company, or project | entities/ |
| Profile | User preferences and patterns | profile.md |
| Correction | Explicit correction of a fact | corrections/ |
| Question | Curiosity questions for follow-up | questions/ |
Facts are categorized by type:
| Category | Examples |
|---|---|
fact |
"OpenClaw runs on port 3000" |
decision |
"We decided to use PostgreSQL" |
preference |
"User prefers dark mode" |
commitment |
"I will review the PR by Friday" |
relationship |
"Alice works with Bob on Project X" |
principle |
"Always write tests before code" |
moment |
"Today we launched v2.0" |
skill |
"User knows Python and TypeScript" |
When an agent starts processing a prompt:
User Prompt: "What was that API rate limit issue?"
│
▼
┌───────────────────┐
│ QMD Search │ ← Hybrid search (BM25 + vector + reranking)
│ (prompt text) │
└────────┬──────────┘
│
▼
┌───────────────────┐
│ Boost Results │ ← Recency, access count, importance
└────────┬──────────┘
│
▼
┌───────────────────┐
│ Format Context │ ← Profile + memories + questions
└────────┬──────────┘
│
▼
Injected into system prompt:
"## Memory Context (Engram)
## User Profile
- Prefers concise responses
- Works at Company X
## Relevant Memories
[1] /facts/2026-02-01/fact-123.md (score: 0.85)
API rate limit is 1000 requests per minute..."
After an agent completes a turn:
Agent Turn Complete
│
▼
┌───────────────────┐
│ Buffer Turn │ ← Add to smart buffer
└────────┬──────────┘
│
(Buffer full or forced flush?)
│
▼
┌───────────────────┐
│ GPT-5.2 │ ← Extract facts, entities, profile
│ Extraction │
└────────┬──────────┘
│
▼
┌───────────────────┐
│ Persist to │ ← Write markdown files
│ Storage │
└────────┬──────────┘
│
▼
┌───────────────────┐
│ QMD Update │ ← Re-index for search
└─────────────────── ┘
Periodically (every N extractions), the plugin:
- Merges duplicates - Combines redundant facts
- Invalidates stale - Marks outdated info as superseded
- Updates entities - Merges fragmented entity files
- Cleans expired - Removes fulfilled commitments, TTL-expired facts
- Summarizes - Compresses old memories into summaries
- Consolidates profile - Keeps profile.md under 600 lines
The codebase is a monorepo. Core logic lives in packages/remnic-core/;
host adapters and CLI live in sibling packages.
packages/
├── remnic-core/ # Core memory engine (primary source)
├── remnic-cli/ # CLI tooling
├── remnic-server/ # Server runtime
├── plugin-openclaw/ # OpenClaw host adapter
├── plugin-claude-code/ # Claude Code host adapter
├── plugin-codex/ # Codex host adapter
├── plugin-hermes/ # Hermes host adapter
├── hermes-provider/ # Hermes provider integration
├── connector-replit/ # Replit connector
├── shim-openclaw-engram/ # Legacy engram shim
└── bench/ # Benchmarks
packages/remnic-core/src/
│
│ ── Core pipeline ──────────────────────────────────
├── index.ts # Plugin entry, hook registration
├── config.ts # Config parsing with defaults
├── types.ts # TypeScript interfaces
├── logger.ts # Logging wrapper
├── orchestrator.ts # Core memory coordination
├── storage.ts # File I/O for memories
├── buffer.ts # Smart turn buffering
├── extraction.ts # GPT-5.2 extraction engine
├── qmd.ts # QMD search client
├── importance.ts # Importance scoring
├── chunking.ts # Large content chunking
├── threading.ts # Conversation threading
├── topics.ts # Topic extraction
├── tools.ts # Agent tools
├── cli.ts # CLI commands
│
│ ── Recall & retrieval ─────────────────────────────
├── retrieval.ts # Recall pipeline implementation
├── intent.ts # Intent heuristics (morphology-aware)
├── signal.ts # Signal-based flush triggers
├── recall-qos.ts # Recall quality-of-service
├── recall-mmr.ts # Maximal marginal relevance
├── recall-query-policy.ts # Query rewrite policy
├── recall-audit.ts # Recall audit trail
├── qmd-recall-cache.ts # QMD recall caching
├── rerank.ts # Re-ranking pipeline
├── harmonic-retrieval.ts # Harmonic retrieval scoring
├── verified-recall.ts # Verified recall checks
│
│ ── Classification & scoring ───────────────────────
├── himem.ts # Episode/Note classification (v8.0)
├── boxes.ts # Memory Box builder + Trace Weaver (v8.0)
├── extraction-judge.ts # LLM-as-judge fact-worthiness gate (#376)
├── semantic-chunking.ts # Topic-boundary chunking (#368)
├── source-attribution.ts # Citation/attribution helpers (#379)
├── relevance.ts # Relevance scoring
├── calibration.ts # Score calibration
│
│ ── Versioning & lifecycle ─────────────────────────
├── page-versioning.ts # Snapshot-based version history (#371)
├── lifecycle.ts # Memory lifecycle management
├── temporal-supersession.ts # Temporal supersession logic
├── temporal-index.ts # Temporal indexing
│
│ ── Session & context ──────────────────────────────
├── session-integrity.ts # Session integrity checks
├── session-toggles.ts # Per-session feature toggles
├── session-observer-bands.ts # Observer band system
├── session-observer-state.ts # Observer state tracking
├── profiling.ts # User profiling
├── identity-continuity.ts # Identity continuity
│
│ ── Causal reasoning ───────────────────────────────
├── causal-chain.ts # Causal chain tracking
├── causal-behavior.ts # Behavioral causal signals
├── causal-retrieval.ts # Causal-aware retrieval
├── causal-consolidation.ts # Causal consolidation
├── causal-trajectory.ts # Trajectory tracking
├── causal-trajectory-graph.ts # Trajectory graph
│
│ ── Graph & dashboard ──────────────────────────────
├── graph.ts # Knowledge graph
├── tmt.ts # Tree-of-memory-traces
├── graph-dashboard-*.ts # Dashboard rendering (diff, key, parser)
├── abstraction-nodes.ts # Abstraction node system
│
│ ── Access & MCP ───────────────────────────────────
├── access-mcp.ts # MCP access provider
├── access-cli.ts # CLI access provider
├── access-http.ts # HTTP access provider
├── access-service.ts # Access service coordinator
├── access-schema.ts # Access schema definitions
├── access-idempotency.ts # Idempotent access operations
│
│ ── Utilities & support ────────────────────────────
├── sanitize.ts # Content sanitization
├── tokens.ts # Token counting
├── json-extract.ts # JSON extraction helpers
├── json-store.ts # JSON-backed storage
├── whitespace.ts # Whitespace handling
├── bootstrap.ts # Bootstrap/init routines
├── model-registry.ts # LLM model registry
├── fallback-llm.ts # LLM fallback routing
├── local-llm.ts # Local LLM integration
│
│ ── Subdirectories ─────────────────────────────────
├── enrichment/ # External enrichment pipeline (#365)
├── binary-lifecycle/ # Binary file management (#367)
├── taxonomy/ # MECE taxonomy resolver (#366)
├── memory-extension/ # Extension publisher contract (#381, #382)
├── memory-extension-host/ # Extension host discovery (#381)
├── compat/ # Provider compatibility checks
├── adapters/ # Host adapter interfaces
├── connectors/ # External service connectors
├── conversation-index/ # Conversation indexing
├── compounding/ # Compounding memory logic
├── curation/ # Memory curation pipeline
├── dedup/ # Deduplication engine
├── lcm/ # Lifecycle management
├── maintenance/ # Maintenance tasks
├── migrate/ # Migration scripts
├── namespaces/ # Multi-tenant namespace logic
├── network/ # Network transport layer
├── onboarding/ # Onboarding flows
├── projection/ # Memory projections
├── replay/ # Replay/debug tooling
├── review/ # Review pipeline
├── routing/ # Routing logic
├── runtime/ # Runtime services
├── search/ # Search subsystem
├── shared-context/ # Shared context management
├── spaces/ # Memory spaces
├── surfaces/ # Surface adapters
├── sync/ # Sync engine
├── transfer/ # Data transfer utilities
├── utils/ # Shared utility functions
└── work/ # Work-product tracking
~/.openclaw/workspace/memory/local/
├── profile.md # User profile
├── facts/ # Daily fact directories
│ ├── 2026-02-01/
│ │ ├── fact-123.md
│ │ └── decision-456.md
│ └── 2026-02-07/
│ └── ...
├── entities/ # Entity files
│ ├── person-joshua-warren.md
│ ├── company-creatuity.md
│ └── project-openclaw.md
├── corrections/ # Explicit corrections
├── questions/ # Curiosity questions
├── summaries/ # Compressed old memories
└── state/
├── buffer.json # Current buffer state
└── meta.json # Extraction counters
Facts and entities use markdown with YAML frontmatter:
---
id: fact-1770469224307-eelr
category: decision
confidence: 0.85
created: 2026-02-07T10:00:00Z
updated: 2026-02-07T10:00:00Z
tags:
- architecture
- database
entityRef: project-openclaw
importance:
score: 0.7
reason: architectural decision
status: active
---
We decided to use PostgreSQL for the main database because it handles JSON well and has excellent extension support.In openclaw.json:
{
"plugins": {
"openclaw-engram": {
"openaiApiKey": "${OPENAI_API_KEY}",
"memoryDir": "~/.openclaw/workspace/memory/local",
"workspaceDir": "~/.openclaw/workspace",
"qmdEnabled": true,
"qmdCollection": "openclaw-engram",
"consolidateEveryN": 10,
"maxMemoryTokens": 2000,
"debug": false
}
}
}| Option | Type | Default | Description |
|---|---|---|---|
openaiApiKey |
string | env var | Optional OpenAI API key for direct-client paths; local/gateway fallback can run without it |
memoryDir |
string | see above | Where to store memories |
workspaceDir |
string | see above | Workspace root |
qmdEnabled |
boolean | true |
Enable QMD search |
qmdCollection |
string | "openclaw-engram" |
QMD collection name |
qmdMaxResults |
number | 10 |
Max search results |
consolidateEveryN |
number | 10 |
Consolidate every N extractions |
maxMemoryTokens |
number | 2000 |
Max tokens in context injection |
identityEnabled |
boolean | true |
Enable identity reflections |
injectQuestions |
boolean | false |
Inject curiosity questions |
commitmentDecayDays |
number | 90 |
Days before expired commitments are cleaned |
debug |
boolean | false |
Enable verbose logging |
Initialize the memory system on gateway startup.
api.on("gateway_start", async () => {
await orchestrator.initialize();
// - Ensure directories exist
// - Load entity aliases
// - Probe QMD availability
// - Load buffer state
});Inject memory context into agent's system prompt.
api.on("before_agent_start", async (event, ctx) => {
const prompt = event.prompt;
const context = await orchestrator.recall(prompt);
if (context) {
return {
systemPrompt: `## Memory Context (Engram)\n\n${context}`
};
}
});Buffer the completed turn for later extraction.
api.on("agent_end", async (event, ctx) => {
if (!event.success) return;
const messages = event.messages;
const lastTurn = extractLastTurn(messages);
for (const msg of lastTurn) {
const cleaned = cleanUserMessage(msg.content);
await orchestrator.processTurn(msg.role, cleaned, ctx.sessionKey);
}
});The Orchestrator class is the heart of Engram:
| Method | Purpose |
|---|---|
initialize() |
Set up storage, load aliases, probe QMD |
recall(prompt) |
Search and format memory context |
processTurn(role, content, sessionKey) |
Buffer a turn, maybe trigger extraction |
runExtraction(turns) |
Call GPT-5.2, persist results |
runConsolidation() |
Merge, update, clean memories |
| Subsystem | Responsibility |
|---|---|
SmartBuffer |
Decides when to flush and extract |
ExtractionEngine |
GPT-5.2 prompts for extraction/consolidation |
StorageManager |
Read/write markdown files |
QmdClient |
Search via QMD CLI |
ThreadingManager |
Group memories by conversation thread |
openclaw engram flushopenclaw engram search "API rate limit"cat ~/.openclaw/workspace/memory/local/profile.mdqmd update openclaw-engram
qmd embed openclaw-engramopenclaw engram statsSymptom: Extraction never runs, no new memories.
Cause: API key not configured or not in gateway's environment.
Fix: Add to launchd plist:
<key>EnvironmentVariables</key>
<dict>
<key>OPENAI_API_KEY</key>
<string>sk-...</string>
</dict>Symptom: "QMD: not available" in logs, fallback to recent memories only.
Cause: qmd command not in PATH or not installed.
Fix: Install QMD and ensure it's in the gateway's PATH.
Symptom: Slow recall, context truncation.
Cause: profile.md exceeded recommended size.
Fix: The plugin auto-consolidates at 600 lines. You can also manually edit profile.md.
Symptom: New memories not found in search.
Cause: QMD index not updated after extraction.
Fix: Run qmd update <collection> and qmd embed <collection>.
Symptom: Agents don't seem to know previous context.
Cause:
- Prompt too short (< 5 chars)
- No matching memories found
- Context trimmed due to token limit
Fix: Check debug logs, increase maxMemoryTokens.
Symptom: OpenAI API rejects schemas with "optional" fields.
Cause: OpenAI Responses API requires .optional().nullable(), not just .optional().
Fix: Always use .optional().nullable() for optional fields in Zod schemas passed to zodTextFormat.
Symptom: System metadata pollutes memories.
Cause: User messages contain injected context that wasn't cleaned.
Fix: The cleanUserMessage() function removes common patterns. Add new patterns if needed.
Symptom: Multiple entity files for the same person/project (e.g., "Josh", "Joshua", "Joshua Warren").
Cause: LLM used different name variants.
Fix: Add aliases to storage.ts:normalizeEntityName() function. Consolidation merges automatically.
# Build the plugin
cd ~/.openclaw/extensions/openclaw-engram
npm run build
# Full gateway restart (gateway_start hook needs this)
launchctl kickstart -k gui/501/ai.openclaw.gateway
# Or for hot reload (but gateway_start won't fire)
kill -USR1 $(pgrep openclaw-gateway)
# Trigger a conversation to test
# Check logs
grep "\[engram\]" ~/.openclaw/logs/gateway.log
# View extraction results
ls -la ~/.openclaw/workspace/memory/local/facts/$(date +%Y-%m-%d)/Enable in openclaw.json:
{
"plugins": {
"openclaw-engram": {
"debug": true
}
}
}This logs:
- Recall search results
- Buffer decisions
- Extraction prompts and results
- Consolidation actions
- QMD operations
Memories track how often they're accessed:
accessCountincrements on each recalllastAccessedtimestamp updated- Used for boosting frequently-accessed memories
Each memory gets an importance score (0-1):
- Based on category, tags, and content patterns
- Higher importance = higher search ranking
- Protected from summarization
When a new fact conflicts with an existing one:
- QMD finds similar memories
- GPT-5.2 verifies contradiction
- Old memory marked as superseded
- Link created between old and new
Related memories are linked:
supports- Provides evidence forcontradicts- Conflicts withelaborates- Adds detail tocauses/caused_by- Causal relationship
Old, low-importance memories are summarized:
- Triggered when memory count exceeds threshold
- Creates summary files with key facts
- Archives original memories
- Preserves important and entity-linked memories