Brain health follow-ups: vocabulary reconstruction wiring + soul plan + growth anomaly collector (#20)

hanamorix · Hana · web-flow · commit d5619e11377a · 2026-04-26T14:15:33.000+01:00
* feat(health): wire reconstruct_vocabulary_from_memories into vocabulary heal flow (F1) When emotion_vocabulary.json corrupts and no .bak is recoverable, the heal flow used to reset to empty `{"version":1,"emotions":[]}` — losing all persona-extension emotions the brain has been operating with. Now: if the loader has store access (caller passed `store=...`), the reset_to_default path is replaced with reconstruct_from_memories. The brain re-learns its own vocabulary from how it has been using emotions. Framework baseline (21 entries) + persona-extension entries detected in memories.db (with `(reconstructed from memory)` placeholder description and conservative 1.0-day decay). The anomaly's action field reflects the actual outcome: when reconstruction fires, action becomes `reconstructed_from_memories` (not `reset_to_default`). The forensic quarantine of the original corrupt file is preserved. Falls back to bare reset when no store is provided (some callers don't have one — that's fine, they'll get the empty default). Closes followup #1 from brain-health-module-design.md §9. * docs(health): concrete soul module health plan (F2) Spec §9.1 expanded from a one-line deferral into a concrete plan the next engineer can implement directly when soul module lands. Covers: - file classification (atomic-rewrite identity, same tier as emotion_vocabulary.json) - reconstruct_soul_from_memories(store) following F37's self-claims- from-experience pattern - schema validator shape - acceptance criteria for the soul-module PR Inline comments in walker.py (_DEFAULTS) and alarm.py (_IDENTITY_FILES) point at spec §9.1 so the plan is visible during code-reading too. Closes followup #2 from brain-health-module-design.md §9. * feat(health): thread anomaly collector through run_growth_tick (F3) When run_growth_tick reads a corrupt emotion_vocabulary.json via _read_current_vocabulary_names, the anomaly produced by the heal flow is now appended to an optional caller-provided collector instead of being silently dropped after a local warning. Wiring: - _read_current_vocabulary_names returns (set[str], BrainAnomaly | None) - run_growth_tick accepts anomalies_collector: list[BrainAnomaly] | None - HeartbeatEngine._try_run_growth forwards tick_anomalies as the collector when calling run_growth_tick After this lands, vocabulary corruption discovered inside the weekly growth tick surfaces in the heartbeat audit log + HeartbeatResult.anomalies + compact CLI 🩹/banner exactly like config/state corruption discovered at the top of the tick. No more silent loss. Calling run_growth_tick standalone (e.g., from tests, or in the future from a scheduled job runner) without a collector still works — the parameter is opt-in. Closes followup #3 from brain-health-module-design.md §9. --------- Co-authored-by: Hana <hana@nanoclaw.local>
diff --git a/brain/emotion/persona_loader.py b/brain/emotion/persona_loader.py
@@ -58,6 +58,37 @@ def load_persona_vocabulary_with_anomaly(
         path, _default_vocab_factory, schema_validator=_vocab_schema_validator
     )
 
+    # Reconstruct from memories when reset_to_default fires on vocabulary.
+    # The default factory writes empty `{"version":1,"emotions":[]}` — that's
+    # a truthful empty default but it loses the persona-extension entries
+    # the brain has been operating with. If we have memory access, the brain
+    # can re-learn its own vocabulary from how it has been using emotions.
+    if anomaly is not None and anomaly.action == "reset_to_default" and store is not None:
+        from brain.health.attempt_heal import save_with_backup
+        from brain.health.reconstruct import reconstruct_vocabulary_from_memories
+
+        recon_data = reconstruct_vocabulary_from_memories(store)
+        save_with_backup(path, recon_data)
+        data = recon_data
+        # Replace the anomaly with one whose action reflects the reconstruction.
+        # Same kind (json_parse_error / schema_mismatch — that's why we needed
+        # to reconstruct) and same forensic quarantine path; the heal path
+        # advanced beyond reset.
+        from brain.health.anomaly import BrainAnomaly
+
+        anomaly = BrainAnomaly(
+            timestamp=anomaly.timestamp,
+            file=anomaly.file,
+            kind=anomaly.kind,
+            action="reconstructed_from_memories",
+            quarantine_path=anomaly.quarantine_path,
+            likely_cause=anomaly.likely_cause,
+            detail=(
+                f"{anomaly.detail}; reconstructed "
+                f"{len(recon_data['emotions'])} entries from memories"
+            ),
+        )
+
     if anomaly is not None:
         logger.warning(
             "emotion_vocabulary anomaly detected: %s action=%s file=%s",
diff --git a/brain/engines/heartbeat.py b/brain/engines/heartbeat.py
@@ -502,7 +502,13 @@ def run_tick(self, *, trigger: str = "manual", dry_run: bool = False) -> Heartbe
         # Growth tick — autonomous self-development (Phase 2a). Runs after
         # all per-tick engines so it can observe the freshest state, before
         # the audit log writes so the audit can summarize the growth outcome.
-        growth_emotions_added, growth_ran = self._try_run_growth(state, now, config, dry_run)
+        # Passing `tick_anomalies` as the collector lets growth-tick-internal
+        # anomalies (e.g., vocab corruption discovered while reading current
+        # vocabulary names) surface in the audit log alongside the engine's
+        # own load anomalies.
+        growth_emotions_added, growth_ran = self._try_run_growth(
+            state, now, config, dry_run, anomalies_collector=tick_anomalies
+        )
 
         # Optional HEARTBEAT: memory
         heartbeat_memory_id: str | None = None
@@ -737,11 +743,17 @@ def _try_run_growth(
         now: datetime,
         config: HeartbeatConfig,
         dry_run: bool,
+        anomalies_collector: list[BrainAnomaly] | None = None,
     ) -> tuple[int, bool]:
         """Run a growth tick if due. Returns (emotions_added, ran).
 
         Fault-isolated: any exception logs a warning and returns (0, False).
         Heartbeat tick continues normally — same pattern as reflex/research.
+
+        `anomalies_collector` is forwarded to `run_growth_tick` so any
+        anomaly produced inside growth (e.g., vocabulary file corruption
+        detected by `_read_current_vocabulary_names`) surfaces in the
+        heartbeat tick's audit log alongside engine-level anomalies.
         """
         if not config.growth_enabled:
             return (0, False)
@@ -758,7 +770,13 @@ def _try_run_growth(
         try:
             from brain.growth.scheduler import run_growth_tick
 
-            result = run_growth_tick(persona_dir, self.store, now, dry_run=dry_run)
+            result = run_growth_tick(
+                persona_dir,
+                self.store,
+                now,
+                dry_run=dry_run,
+                anomalies_collector=anomalies_collector,
+            )
         except Exception as exc:
             logger.warning("growth tick raised; isolating: %.200s", exc)
             return (0, False)
diff --git a/brain/growth/scheduler.py b/brain/growth/scheduler.py
@@ -16,12 +16,16 @@
 from dataclasses import dataclass
 from datetime import datetime
 from pathlib import Path
+from typing import TYPE_CHECKING
 
 from brain.growth.crystallizers.vocabulary import crystallize_vocabulary
 from brain.growth.log import GrowthLogEvent, append_growth_event
 from brain.growth.proposal import EmotionProposal
 from brain.memory.store import MemoryStore
 
+if TYPE_CHECKING:
+    from brain.health.anomaly import BrainAnomaly
+
 logger = logging.getLogger(__name__)
 
 # Same character allowlist as brain.paths.get_persona_dir — names that
@@ -45,6 +49,7 @@ def run_growth_tick(
     now: datetime,
     *,
     dry_run: bool = False,
+    anomalies_collector: list[BrainAnomaly] | None = None,
 ) -> GrowthTickResult:
     """Run all crystallizers, apply their proposals atomically.
 
@@ -58,11 +63,19 @@ def run_growth_tick(
 
     `dry_run=True` calls the crystallizer but skips both writes; the
     returned `emotions_added` reflects "would-have-added" semantics.
+
+    `anomalies_collector` (optional): when the heartbeat tick passes its
+    per-tick anomaly list, any anomaly produced by reading the vocabulary
+    file (corruption, schema mismatch) gets appended so it surfaces in the
+    audit log + compact CLI alongside heartbeat-engine anomalies. Pass None
+    when calling `run_growth_tick` standalone (e.g., from tests).
     """
     vocab_path = persona_dir / "emotion_vocabulary.json"
     log_path = persona_dir / "emotion_growth.log.jsonl"
 
-    current_names = _read_current_vocabulary_names(vocab_path)
+    current_names, vocab_anomaly = _read_current_vocabulary_names(vocab_path)
+    if vocab_anomaly is not None and anomalies_collector is not None:
+        anomalies_collector.append(vocab_anomaly)
 
     proposals = crystallize_vocabulary(store, current_vocabulary_names=current_names)
 
@@ -108,18 +121,21 @@ def run_growth_tick(
     )
 
 
-def _read_current_vocabulary_names(vocab_path: Path) -> set[str]:
-    """Return the set of emotion names currently in the persona's vocabulary file.
+def _read_current_vocabulary_names(
+    vocab_path: Path,
+) -> tuple[set[str], BrainAnomaly | None]:
+    """Return (set of emotion names, optional anomaly) for the vocabulary file.
 
     Distinguishes three load outcomes:
-      - Missing file → return empty set silently (fresh persona; expected).
+      - Missing file → (empty set, None) silently (fresh persona; expected).
       - Corrupt JSON or wrong schema → quarantine + heal from .bak or reset to
-        default; return names from the recovered data. Logs a WARNING so the
-        anomaly is visible.
-      - Well-formed → return the set of names.
+        default; returns (names_from_recovered_data, BrainAnomaly). The caller
+        (run_growth_tick) feeds the anomaly into its `anomalies_collector` so
+        it surfaces in the heartbeat audit log. Logs WARNING locally too.
+      - Well-formed → (set of names, None).
     """
     if not vocab_path.exists():
-        return set()
+        return set(), None
 
     from brain.health.attempt_heal import attempt_heal
 
@@ -142,7 +158,8 @@ def _schema_validator(data: object) -> None:
             anomaly.action,
         )
 
-    return {e["name"] for e in data.get("emotions", []) if isinstance(e, dict) and "name" in e}
+    names = {e["name"] for e in data.get("emotions", []) if isinstance(e, dict) and "name" in e}
+    return names, anomaly
 
 
 def _is_valid_name(name: str) -> bool:
diff --git a/brain/health/alarm.py b/brain/health/alarm.py
@@ -14,7 +14,8 @@
         "emotion_vocabulary.json",
         "interests.json",
         "reflex_arcs.json",
-        # future: "soul.json"
+        # When the soul module lands, add "soul.json" here so its
+        # reset_to_default raises an alarm. See spec §9.1.
     }
 )
 
diff --git a/brain/health/walker.py b/brain/health/walker.py
@@ -10,6 +10,10 @@
 from brain.health.attempt_heal import attempt_heal
 
 # Atomic-rewrite files this walker checks. Each entry: filename -> default dict.
+#
+# When the soul module lands as a Phase 2a-extension, add `soul.json` here
+# with default `{"version": 1, "crystallizations": []}` (or whatever the
+# soul module's schema settles on). See spec §9.1 for the full plan.
 _DEFAULTS: dict[str, dict] = {
     "user_preferences.json": {"dream_every_hours": 24.0},
     "persona_config.json": {"provider": "claude-cli", "searcher": "ddgs"},
diff --git a/docs/superpowers/specs/2026-04-25-brain-health-module-design.md b/docs/superpowers/specs/2026-04-25-brain-health-module-design.md
@@ -347,7 +347,27 @@ The full health module ships when:
 
 ## 9. Open / Deferred
 
-- **Soul reconstruction.** When the soul module lands as a Phase 2a-extension, its `soul.json` healing strategy needs design. Soul crystallizations might be partially reconstructable from memories (F37 was self-claims-from-experience), but defer that until soul exists.
+### 9.1 Soul module health (concrete plan for when soul lands)
+
+When the Phase 2a-extension brings the soul module online, `soul.json` (or whatever its filename ends up being) joins the persona's identity-critical files. The heal strategy is already partially specified by the architecture; this section makes it concrete so the engineer building the soul module doesn't have to rediscover the plan.
+
+**File classification:** `soul.json` is an **atomic-rewrite identity file** — same tier as `emotion_vocabulary.json`, `interests.json`, `reflex_arcs.json`. Use `attempt_heal` + `save_with_backup`. Add it to:
+
+- `brain/health/walker.py:_DEFAULTS` with empty default `{"version": 1, "crystallizations": []}` (or whatever the schema settles on).
+- `brain/health/alarm.py:_IDENTITY_FILES` so `reset_to_default` on `soul.json` raises an alarm.
+
+**Reconstruction strategy:** F37 in OG NellBrain was *self-claims-from-experience* — the brain's soul names were derived from autobiographical patterns in memories. The same heuristic applies here: when all backups corrupt and reset would otherwise fire, scan `memories.db` for soul-claim patterns the brain has expressed and rebuild a partial `soul.json`. Implement as `brain/health/reconstruct.py:reconstruct_soul_from_memories(store) -> dict` mirroring `reconstruct_vocabulary_from_memories`. Wire it into the soul loader's heal flow the same way vocabulary does in `load_persona_vocabulary_with_anomaly` (Followup F1, 2026-04-26).
+
+**Schema validator:** mirror the vocabulary validator pattern — minimal type check (`isinstance(data, dict) and isinstance(data.get("crystallizations"), list)`) — so corrupt-but-parseable files trigger heal.
+
+**Acceptance:** when soul module lands, the soul module's PR must include:
+1. `soul.json` in `walker.py:_DEFAULTS` and `alarm.py:_IDENTITY_FILES`
+2. `reconstruct_soul_from_memories(store)` implementation + tests
+3. Soul loader's `*_with_anomaly` variant routes through `attempt_heal` and triggers reconstruction on `reset_to_default` when a store is provided
+4. Sandbox smoke: corrupt soul.json + run heartbeat tick → soul heals or reconstructs without user intervention
+
+### 9.2 Other deferred items
+
 - **Automatic .bak repair when a backup is detected corrupt mid-rotation.** v1 skips the corrupt backup and walks to the next; doesn't try to repair the backup itself. If real-world telemetry shows backups frequently corrupt mid-chain, revisit.
 - **GUI surface for "the brain self-healed."** Not a framework concern; future Tauri/NellFace work consumes the audit log directly.
 
diff --git a/tests/unit/brain/emotion/test_persona_loader.py b/tests/unit/brain/emotion/test_persona_loader.py
@@ -182,6 +182,85 @@ def test_load_persona_vocabulary_corrupt_no_bak_resets_to_default(tmp_path: Path
     assert path.exists()
 
 
+def test_load_corrupt_no_bak_with_store_reconstructs_from_memories(tmp_path: Path):
+    """When emotion_vocabulary.json corrupts and no .bak exists, the loader
+    reconstructs from memories rather than resetting to empty.
+
+    Followup F1 from the brain-health module: the brain re-learns its own
+    vocabulary from how it has been operating instead of forgetting it.
+    """
+    store = MemoryStore(":memory:")
+    try:
+        # Seed memories that reference both baseline + extension emotions.
+        store.create(
+            Memory.create_new(
+                content="x",
+                memory_type="conversation",
+                domain="us",
+                emotions={"love": 9.0, "body_grief": 5.0},
+            )
+        )
+        store.create(
+            Memory.create_new(
+                content="y",
+                memory_type="conversation",
+                domain="us",
+                emotions={"creative_hunger": 7.0},
+            )
+        )
+
+        # Pre-cleanup so the test is repeatable
+        _cleanup_emotion("body_grief")
+        _cleanup_emotion("creative_hunger")
+
+        # Corrupt the vocabulary file (no .bak alongside)
+        vocab_path = tmp_path / "emotion_vocabulary.json"
+        vocab_path.write_text("{not json", encoding="utf-8")
+
+        count, anomaly = load_persona_vocabulary_with_anomaly(vocab_path, store=store)
+
+        # Anomaly action reflects reconstruction, not bare reset
+        assert anomaly is not None
+        assert anomaly.action == "reconstructed_from_memories"
+        assert "reconstructed" in anomaly.detail
+
+        # File on disk now has the reconstructed content
+        on_disk = json.loads(vocab_path.read_text(encoding="utf-8"))
+        names = {e["name"] for e in on_disk["emotions"]}
+        assert "love" in names  # baseline
+        assert "body_grief" in names  # reconstructed extension
+        assert "creative_hunger" in names  # reconstructed extension
+
+        # Reconstructed extensions registered + count > 0
+        assert count > 0
+        assert vocabulary.get("body_grief") is not None
+        assert vocabulary.get("creative_hunger") is not None
+        # Persona-extension entries carry the placeholder description
+        body_grief = vocabulary.get("body_grief")
+        assert body_grief is not None
+        assert "reconstructed from memory" in body_grief.description
+    finally:
+        store.close()
+        _cleanup_emotion("body_grief")
+        _cleanup_emotion("creative_hunger")
+
+
+def test_load_corrupt_no_bak_no_store_falls_back_to_empty(tmp_path: Path):
+    """When the loader has no store (caller didn't pass one), reconstruction
+    can't happen; the empty default is used and anomaly action stays
+    'reset_to_default'."""
+    vocab_path = tmp_path / "emotion_vocabulary.json"
+    vocab_path.write_text("{not json", encoding="utf-8")
+
+    count, anomaly = load_persona_vocabulary_with_anomaly(vocab_path, store=None)
+
+    assert anomaly is not None
+    assert anomaly.action == "reset_to_default"
+    on_disk = json.loads(vocab_path.read_text(encoding="utf-8"))
+    assert on_disk == {"version": 1, "emotions": []}
+    assert count == 0
+
+
 def test_load_with_store_warns_on_missing_emotion(tmp_path: Path, caplog):
     """Store has memory referencing 'body_grief' but vocab file missing →
     one warning per missing emotion pointing at nell migrate.
diff --git a/tests/unit/brain/engines/test_heartbeat.py b/tests/unit/brain/engines/test_heartbeat.py
@@ -1744,3 +1744,74 @@ def test_heartbeat_alarm_increments_pending_alarms_count(tmp_path: Path) -> None
     finally:
         store.close()
         hebbian.close()
+
+
+# ---- F3: growth-tick-internal anomalies surface in heartbeat audit log ----
+
+
+def test_heartbeat_growth_anomaly_surfaces_in_audit_log(tmp_path: Path) -> None:
+    """When growth tick reads a corrupt vocabulary file, the anomaly appears
+    in the heartbeat audit log + HeartbeatResult.anomalies — not just logged
+    as a warning inside growth.
+
+    Followup F3 from the brain-health module.
+    """
+    persona_dir = tmp_path / "persona"
+    persona_dir.mkdir()
+
+    # Seed an empty interests file so growth has a persona dir to work with.
+    (persona_dir / "interests.json").write_text(
+        json.dumps({"version": 1, "interests": []}), encoding="utf-8"
+    )
+
+    store = MemoryStore(":memory:")
+    hebbian = HebbianMatrix(":memory:")
+    try:
+        engine = HeartbeatEngine(
+            store=store,
+            hebbian=hebbian,
+            provider=FakeProvider(),
+            state_path=persona_dir / "heartbeat_state.json",
+            config_path=persona_dir / "heartbeat_config.json",
+            dream_log_path=persona_dir / "dreams.log.jsonl",
+            heartbeat_log_path=persona_dir / "heartbeats.log.jsonl",
+            interests_path=persona_dir / "interests.json",
+            research_log_path=persona_dir / "research_log.json",
+            default_interests_path=DEFAULT_INTERESTS_PATH,
+            persona_name="test",
+            persona_system_prompt="You are test.",
+        )
+
+        # First tick initializes state — work deferred (first-tick semantics).
+        engine.run_tick(trigger="open")
+
+        # Force last_growth_at older than growth_every_hours so growth fires.
+        from brain.engines.heartbeat import HeartbeatState
+
+        s = HeartbeatState.load(persona_dir / "heartbeat_state.json")
+        assert s is not None
+        s.last_growth_at = datetime.now(UTC) - timedelta(hours=200)
+        s.last_tick_at = datetime.now(UTC) - timedelta(hours=200)
+        s.save(persona_dir / "heartbeat_state.json")
+
+        # Corrupt vocab file (no .bak → reset_to_default fires inside growth).
+        (persona_dir / "emotion_vocabulary.json").write_text("{not json", encoding="utf-8")
+
+        # Second tick runs growth. _read_current_vocabulary_names detects the
+        # corruption, heals it, and the anomaly is appended to tick_anomalies
+        # via the F3 wiring.
+        result = engine.run_tick(trigger="manual")
+
+        # The anomaly from inside growth surfaces in the result + audit log.
+        vocab_anomalies = [a for a in result.anomalies if a.file == "emotion_vocabulary.json"]
+        assert len(vocab_anomalies) >= 1
+        assert vocab_anomalies[0].action == "reset_to_default"
+
+        # Audit log also has it.
+        log_lines = (persona_dir / "heartbeats.log.jsonl").read_text().strip().splitlines()
+        last_entry = json.loads(log_lines[-1])
+        files_in_audit = {a["file"] for a in last_entry["anomalies"]}
+        assert "emotion_vocabulary.json" in files_in_audit
+    finally:
+        store.close()
+        hebbian.close()
diff --git a/tests/unit/brain/growth/test_scheduler.py b/tests/unit/brain/growth/test_scheduler.py

Original file line number	Diff line number	Diff line change
`@@ -14,7 +14,8 @@`
`14`	`14`	`"emotion_vocabulary.json",`
`15`	`15`	`"interests.json",`
`16`	`16`	`"reflex_arcs.json",`
`17`		`- # future: "soul.json"`
	`17`	`+ # When the soul module lands, add "soul.json" here so its`
	`18`	`+ # reset_to_default raises an alarm. See spec §9.1.`
`18`	`19`	`}`
`19`	`20`	`)`
`20`	`21`