feat: Week 3.5 — OG memory migrator (memories + Hebbian, read-only source)#4
Merged
Conversation
Bridges Week 3 (substrate shipped) and Week 4 (engines) by porting 1,141 OG memories + 8,808 Hebbian edges into companion-emergence SQLite stores. Read-only against OG; refuse-to-clobber on output; cryptographic source manifest as audit trail. Scope locked: memory substrate only. Embeddings explicitly skipped (regen naturally in Week 5 with the Ollama bridge). Soul, personality, self-model, logs, etc. are future migrator passes. Week 3 amendment: add metadata: dict[str, Any] to Memory dataclass to absorb OG-only fields (source_date, supersedes, emotional_tone, etc.) without proliferating the dataclass signature. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
8 tasks: Memory.metadata field + MemoryStore column (Week 3 amendment), four brain/migrator/ modules (og/transform/report/cli), integration test, close-out. 50 new tests targeting 241 total across macOS + Windows + Linux. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Week 3.5 amendment — prepares the Memory schema for the OG migrator,
which needs to absorb OG-only fields (source_date, source_summary,
supersedes, emotional_tone, access_count, emotion_count, intensity,
schema_version, connections) without proliferating first-class
dataclass attributes.
Default is {}. Round-trips cleanly through to_dict/from_dict. Legacy
dicts without the 'metadata' key restore as empty (migrator friendly).
Defensive copy on create_new — caller mutations don't leak.
MemoryStore schema column lands in a follow-up commit.
5 new tests; 196 total.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… null metadata
from_dict previously did `dict(data.get("metadata", {}))`, which returns
None (not {}) when the key is present with an explicit null value. dict(None)
would TypeError and crash the migrator, which will legitimately receive
OG JSON records containing "metadata": null.
Fix: `dict(data.get("metadata") or {})` absorbs both the absent-key case
and the present-but-null case. Regression test guards the null path.
197 tests green, ruff clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…olumn Completes the Week 3.5 amendment started in 4fc7fc2. MemoryStore now round-trips metadata through a metadata_json TEXT column with DEFAULT '{}'. create/update/_row_to_memory all handle the field; _row_to_memory guards empty/None column values for manually-modified DB compatibility. Existing Memory/MemoryStore tests continue to pass — metadata is an additive, zero-default field. 4 new tests; 201 total. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three hardenings before the migrator starts writing via store.create():
1. _safe_load_metadata helper — defends _row_to_memory against the
"null" string case (json.loads('null')→None would assign
metadata=None to Memory and silently poison every consumer doing
metadata.get(...)). Also catches JSONDecodeError for malformed
column contents and enforces dict-at-top-level.
2. create() wraps json.dumps(memory.metadata) in a TypeError that
names the memory id. Without this, a single bad record out of
1,141 ETL'd memories would surface a bare stdlib error with no
way to find which record. With this, the failure points at the
culprit directly.
3. update(metadata={}) semantic pinned via regression test — empty
dict is an explicit overwrite, consistent with update() behaviour
for every other field. Callers that want no change omit the kwarg.
Also adds the nested-metadata round-trip test the reviewer asked for
(dict-of-dict, list, None values, ints, floats all survive).
5 new regression tests; 206 total, ruff clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ck preflight OGReader provides read-only access to a NellBrain data/ directory. read_memories() parses memories_v2.json; read_hebbian() returns (ids, matrix); iter_nonzero_upper_edges() yields canonical undirected edges (i<j) with positive weight. Every file consumed is recorded in a FileManifest (relative_path, size_bytes, sha256, mtime_utc) — a cryptographic audit trail that the migrator is write-only against OG. check_preflight() refuses to proceed if memories_v2.json.lock has been modified within the last 5 minutes (live bridge detection). 9 tests green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Nitpick from Task 3 review — the .npy path reads bytes twice (once for np.load, once for the SHA manifest entry) because np.load doesn't expose its buffer. Inline comment explains why the pattern differs from the JSON path and notes that check_preflight + post-run re-stat close the TOCTOU window. No behavioural change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Permissive transformer — returns (Memory, None) on success or (None, SkippedMemory) on malformed input. Never raises. Skip reasons: missing_id, missing_content, non_numeric_emotion, unparseable_created_at. Field mapping: id/content/memory_type/domain/tags/importance/active verbatim with sensible defaults. last_accessed → last_accessed_at. emotion_score → score (prefers OG's stored value over recomputing sum). created_at coerced via the Week 3 _coerce_utc helper. OG-only fields (source_date, supersedes, emotional_tone, etc.) plus any unknown forward-drift keys absorbed into Memory.metadata verbatim. Guards bool values in isinstance(v, (int, float)) checks — Python's bool-is-int quirk would otherwise let True/False pass as emotion intensities. 14 tests green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two Important defects flagged by the quality reviewer — both violated
the 'never raises' contract or silently corrupted data:
1. float(og.get("importance") or 0.0) raised ValueError on truthy
non-numeric values ("high", "unknown"). Replaced with isinstance
guard matching the emotion_score pattern already in the module.
2. list(og.get("tags") or []) character-exploded string-valued tags
('mytag' → ['m','y','t','a','g']) and pulled dict keys from dict
values. Replaced with isinstance(..., list) check that degrades
non-list tags to [] — same soft-skip discipline as last_accessed.
5 regression tests pinning the behaviour:
- non-numeric importance string → 0.0
- list-valued importance → 0.0
- string-valued tags → []
- dict-valued tags → []
- bool emotion_score → falls back to sum(emotions.values())
234 tests green, ruff clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…fest writer MigrationReport aggregates a single run's outcome. format_report() produces human-readable text (totals, skip-reason counts, source manifest, next-steps). write_source_manifest() serialises the FileManifest list to JSON with a generation timestamp. Skip reasons grouped descending by count for quick eyeballing of the dominant failure mode when migrating real OG data. 5 tests green; 239 total. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ep numbering format_report previously hardcoded "1. Inspect the output:" + "2. When satisfied, install as a persona:". The --install-as mode passes an empty inspect_cmds list (already installed), so the output would read "2. When satisfied..." with no step 1 — visually broken. Renumber dynamically: track a `step` counter that increments only when a section is actually emitted. install-only mode now reads "1. When satisfied..." coherently. Two regression tests: - renumbers correctly when only install_cmd is set - empty report produces coherent minimal output (no crash, no dangling "Next steps" section) 241 tests green, ruff clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…afety MigrateArgs validates input (exactly one of --output / --install-as via __post_init__). run_migrate() orchestrates preflight → read OG → transform memories → populate MemoryStore + HebbianMatrix → post-run source re-stat → write report + manifest → optional atomic install-as-persona with timestamped backup. Safety invariants enforced: - LiveLockDetected from preflight aborts the run. - Non-empty output dir or existing persona → FileExistsError unless --force. - Install mode: write to <persona>.new (sibling of final dir), atomic os.rename swap. Pre-existing persona renamed to <persona>.backup-<YYYY-MM-DDTHHMMSS> first. - Source file size re-checked after all reads; mismatch raises RuntimeError. - Duplicate ids within OG are skipped with reason='duplicate_id'. Wires into brain/cli.py — migrate moves out of _STUB_COMMANDS, the real subparser registers with its own args and dispatch. Also removes the stale migrate entry from the stub-command parametrize list in test_cli.py (it no longer prints "not implemented"). 9 tests added; 249 total (net: +9 new, -1 stale stub case). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- run_migrate wraps both store.create() and hebbian.strengthen() loops in try/finally so a mid-loop exception on disk-backed DBs closes the connection cleanly and doesn't leak .db-wal files the next run would have to recover from. Not a correctness issue for in-memory tests but matters for the real 1,141-record ETL against a persistent output dir. - SkippedMemory docstring documents the external duplicate_id reason code set by run_migrate (only the four transform_memory reasons originate from transform.py itself). 249 tests green, ruff clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Builds a tmp_path OG-shaped fixture (5 memories incl. 1 malformed, 5x5 hebbian matrix with 3 non-zero edges), runs the migrator in --output mode, opens the resulting memories.db + hebbian.db via raw sqlite3, asserts counts + record shapes + metadata round-trip + source manifest coverage + OG read-only invariant. Four assertions: - memories + edges match expected counts; malformed m4 skipped - m1.metadata contains source_date, source_summary, supersedes - source-manifest.json covers all four OG files with valid sha256 - OG dir is byte-identical before and after the run 4 tests green; 253 total. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Manifest test now RE-HASHES each source file and compares to the sha256 in source-manifest.json. The length-only check would have missed a drift between the bytes np.load consumed and the bytes _record_manifest later hashed. Cheap check, much stronger guarantee. - New test pins the content of migration-report.md — asserts the file carries the expected totals and at least one skip reason. Prevents a format_report refactor from silently regressing the user-facing artefact to blank/malformed output. 5 integration tests green; 254 total. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
hanamorix
pushed a commit
that referenced
this pull request
Apr 24, 2026
… msg
- HeartbeatEngine.reflex_arcs_path and reflex_log_path now default to
None instead of bare cwd-relative Path("reflex_arcs.json"). When
either is None, _try_fire_reflex short-circuits with an empty
result — no more silent writes to cwd from tests that don't
explicitly configure reflex (items #2 and #4 from spec §15)
- CLI heartbeat handler now distinguishes first-tick + --dry-run
('Would initialize on first real tick — work deferred.') from
first-tick + live ('Heartbeat initialized — work deferred until
next tick.')
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
hanamorix
pushed a commit
that referenced
this pull request
Apr 30, 2026
SP-7 wraps SP-6's chat engine in a per-persona FastAPI daemon on localhost (dynamic port). Folds the conversation supervisor as a non-daemon thread for close_stale_sessions ticks, broadcasts brain events over WebSocket for Tauri/CLI subscribers, and ships dirty- shutdown recovery via a shutdown_clean flag in bridge.json. Resolves master-ref §8 open question #1 (transport: HTTP+WS, mirroring OG). Defers #2-#4 to SP-8, scopes #5-#6 to SP-6, marks #7-#8 unrelated. Folds in three audit must-fixes: shutdown_clean recovery, EventBus thread-safety with drop-on-overflow, pinned close_stale_sessions params. Six implementation chunks with smoke-test gates at each boundary; ~26 tests targeted (10 unit + 16 integration). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
hanamorix
added a commit
that referenced
this pull request
May 9, 2026
feat: Week 3.5 — OG memory migrator (memories + Hebbian, read-only source)
hanamorix
pushed a commit
that referenced
this pull request
May 9, 2026
… msg
- HeartbeatEngine.reflex_arcs_path and reflex_log_path now default to
None instead of bare cwd-relative Path("reflex_arcs.json"). When
either is None, _try_fire_reflex short-circuits with an empty
result — no more silent writes to cwd from tests that don't
explicitly configure reflex (items #2 and #4 from spec §15)
- CLI heartbeat handler now distinguishes first-tick + --dry-run
('Would initialize on first real tick — work deferred.') from
first-tick + live ('Heartbeat initialized — work deferred until
next tick.')
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
hanamorix
pushed a commit
that referenced
this pull request
May 13, 2026
…→ engine Bundle A item #4 — closes the v0.0.9 review TODO that ChatPanel's renderer-side POST /initiate/state replied_explicit was the only path to the state transition. The chat engine never saw the link. Now: ChatPanel passes reply_to_audit_id in the streamChat payload when active. Bridge ingests it, transitions the audit + memory server-side (atomic with the chat turn), and surfaces the linked subject to build_system_message so Nell sees "you're replying to your earlier outbound about X" in her context. Foundation for v0.0.10's acknowledged_unclear writer: the server now has the wire to distinguish "user replied to ia_001 specifically" from "user happened to mention something nearby." Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
hanamorix
pushed a commit
that referenced
this pull request
May 17, 2026
…→ engine Bundle A item #4 — closes the v0.0.9 review TODO that ChatPanel's renderer-side POST /initiate/state replied_explicit was the only path to the state transition. The chat engine never saw the link. Now: ChatPanel passes reply_to_audit_id in the streamChat payload when active. Bridge ingests it, transitions the audit + memory server-side (atomic with the chat turn), and surfaces the linked subject to build_system_message so Nell sees "you're replying to your earlier outbound about X" in her context. Foundation for v0.0.10's acknowledged_unclear writer: the server now has the wire to distinguish "user replied to ia_001 specifically" from "user happened to mention something nearby." Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
hanamorix
pushed a commit
that referenced
this pull request
Jun 4, 2026
…wire commit-failure backoff to dead-letter (A1 review) Finding #3: EmbeddingCache.evict(content) removes a candidate's vector when its commit fails, preventing cosine-1.0 self-dedup on the next retry pass. Called in both close_session and extract_session_snapshot commit loops after commit_failures += 1. Test seeds cache with a prior entry to force the get_or_compute path, then asserts the item is committed (not deduped away) on pass 2. Finding #1/#2: extract_session_snapshot now bumps the backoff sidecar on commit_failures > 0 (mirroring extraction-failure backoff), so repeated commit-failing passes climb naturally to _BACKOFF_FAILURE_THRESHOLD. test_finalize_deadletters_after_max_retry rewritten: drives N real snapshot passes (store.create always raises), sidecar climbs without pre-seeding, then finalize dead-letters the buffer. No tautological pre-seed. Finding #4: one-line accepted-narrow-window comment at the finalize dead-letter branch. No new logic. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
hanamorix
pushed a commit
that referenced
this pull request
Jun 13, 2026
…ires, roadmap Tier 2 #4 shipped Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
hanamorix
pushed a commit
that referenced
this pull request
Jun 13, 2026
New organ brain/self_model/ — a declared (aggregate_state) vs derived (orthogonal: recency-mean + body + decay) emotional read. A persisted wall-clock reflection cadence surfaces the gap in a hedged ambient block, articulates it with a budgeted Haiku, and lets her revise by her own choice via the reconcile_self_read tool. A sustained-then-resolved gap becomes growth (soul candidate + feed) via two resolution paths. The derived read is the only new computation; it never writes felt state. Every write is self-authored or routed through an existing guarded pipeline (vocab filter, soul candidates, crystalliser). All 9 spec risks pinned to named tests (orthogonality, vocab-flood guard, dead-loop). EXPERIMENTAL; DoD met (producer on live supervisor path + readers + test). Gate: 3275 backend (incl live integration) + 42 frontend + pnpm build + ruff. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
brain/migrator/package (og.py + transform.py + report.py + cli.py) withnell migratesubcommandMemory.metadatafield +metadata_jsonSQLite columnWhat landed per task
4fc7fc256560a67a0c97f86ec98e53def45ca071705d752460f08ec1c901062d673c6d9467654f2d3a1118df612823ef3eSafety summary
"rb"/"r". Source manifest with SHA-256 records every file read. Integration test snapshots OG dir before + after and asserts byte-identity.memories_v2.json.lockmtime < 5 min.--force.<persona>.new/, timestamp-backs-up any existing<persona>/, thenos.renames the new dir into place.Review-driven hardenings worth calling out
Memory.from_dict(metadata=null)no longer crashes — real OG JSON hazard._safe_load_metadatadefends_row_to_memoryagainst"null"string rows, malformed JSON, and non-dict top-level values.MemoryStore.createwrapsjson.dumpsin a TypeError that names the memory id (useful for 1,141-record ETL debugging).transform_memoryguardsfloat()against non-numeric importance andlist()against string-valued tags (both would silently corrupt data).run_migratewraps store/hebbian operations in try/finally so a mid-loop exception doesn't leak.db-walfiles.format_reportrenumbers next-steps dynamically (install-only mode no longer shows "step 2 without step 1").Test plan
uv sync --all-extrassucceedsuv run nell migrate --helpprints real subcommand usage--outputmode and inspects--install-as nellonce satisfied