Skip to content

Commit d5619e1

Browse files
hanamorixHana
andauthored
Brain health follow-ups: vocabulary reconstruction wiring + soul plan + growth anomaly collector (#20)
* feat(health): wire reconstruct_vocabulary_from_memories into vocabulary heal flow (F1) When emotion_vocabulary.json corrupts and no .bak is recoverable, the heal flow used to reset to empty `{"version":1,"emotions":[]}` — losing all persona-extension emotions the brain has been operating with. Now: if the loader has store access (caller passed `store=...`), the reset_to_default path is replaced with reconstruct_from_memories. The brain re-learns its own vocabulary from how it has been using emotions. Framework baseline (21 entries) + persona-extension entries detected in memories.db (with `(reconstructed from memory)` placeholder description and conservative 1.0-day decay). The anomaly's action field reflects the actual outcome: when reconstruction fires, action becomes `reconstructed_from_memories` (not `reset_to_default`). The forensic quarantine of the original corrupt file is preserved. Falls back to bare reset when no store is provided (some callers don't have one — that's fine, they'll get the empty default). Closes followup #1 from brain-health-module-design.md §9. * docs(health): concrete soul module health plan (F2) Spec §9.1 expanded from a one-line deferral into a concrete plan the next engineer can implement directly when soul module lands. Covers: - file classification (atomic-rewrite identity, same tier as emotion_vocabulary.json) - reconstruct_soul_from_memories(store) following F37's self-claims- from-experience pattern - schema validator shape - acceptance criteria for the soul-module PR Inline comments in walker.py (_DEFAULTS) and alarm.py (_IDENTITY_FILES) point at spec §9.1 so the plan is visible during code-reading too. Closes followup #2 from brain-health-module-design.md §9. * feat(health): thread anomaly collector through run_growth_tick (F3) When run_growth_tick reads a corrupt emotion_vocabulary.json via _read_current_vocabulary_names, the anomaly produced by the heal flow is now appended to an optional caller-provided collector instead of being silently dropped after a local warning. Wiring: - _read_current_vocabulary_names returns (set[str], BrainAnomaly | None) - run_growth_tick accepts anomalies_collector: list[BrainAnomaly] | None - HeartbeatEngine._try_run_growth forwards tick_anomalies as the collector when calling run_growth_tick After this lands, vocabulary corruption discovered inside the weekly growth tick surfaces in the heartbeat audit log + HeartbeatResult.anomalies + compact CLI 🩹/banner exactly like config/state corruption discovered at the top of the tick. No more silent loss. Calling run_growth_tick standalone (e.g., from tests, or in the future from a scheduled job runner) without a collector still works — the parameter is opt-in. Closes followup #3 from brain-health-module-design.md §9. --------- Co-authored-by: Hana <hana@nanoclaw.local>
1 parent 95f62fe commit d5619e1

9 files changed

Lines changed: 319 additions & 15 deletions

File tree

brain/emotion/persona_loader.py

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,37 @@ def load_persona_vocabulary_with_anomaly(
5858
path, _default_vocab_factory, schema_validator=_vocab_schema_validator
5959
)
6060

61+
# Reconstruct from memories when reset_to_default fires on vocabulary.
62+
# The default factory writes empty `{"version":1,"emotions":[]}` — that's
63+
# a truthful empty default but it loses the persona-extension entries
64+
# the brain has been operating with. If we have memory access, the brain
65+
# can re-learn its own vocabulary from how it has been using emotions.
66+
if anomaly is not None and anomaly.action == "reset_to_default" and store is not None:
67+
from brain.health.attempt_heal import save_with_backup
68+
from brain.health.reconstruct import reconstruct_vocabulary_from_memories
69+
70+
recon_data = reconstruct_vocabulary_from_memories(store)
71+
save_with_backup(path, recon_data)
72+
data = recon_data
73+
# Replace the anomaly with one whose action reflects the reconstruction.
74+
# Same kind (json_parse_error / schema_mismatch — that's why we needed
75+
# to reconstruct) and same forensic quarantine path; the heal path
76+
# advanced beyond reset.
77+
from brain.health.anomaly import BrainAnomaly
78+
79+
anomaly = BrainAnomaly(
80+
timestamp=anomaly.timestamp,
81+
file=anomaly.file,
82+
kind=anomaly.kind,
83+
action="reconstructed_from_memories",
84+
quarantine_path=anomaly.quarantine_path,
85+
likely_cause=anomaly.likely_cause,
86+
detail=(
87+
f"{anomaly.detail}; reconstructed "
88+
f"{len(recon_data['emotions'])} entries from memories"
89+
),
90+
)
91+
6192
if anomaly is not None:
6293
logger.warning(
6394
"emotion_vocabulary anomaly detected: %s action=%s file=%s",

brain/engines/heartbeat.py

Lines changed: 20 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -502,7 +502,13 @@ def run_tick(self, *, trigger: str = "manual", dry_run: bool = False) -> Heartbe
502502
# Growth tick — autonomous self-development (Phase 2a). Runs after
503503
# all per-tick engines so it can observe the freshest state, before
504504
# the audit log writes so the audit can summarize the growth outcome.
505-
growth_emotions_added, growth_ran = self._try_run_growth(state, now, config, dry_run)
505+
# Passing `tick_anomalies` as the collector lets growth-tick-internal
506+
# anomalies (e.g., vocab corruption discovered while reading current
507+
# vocabulary names) surface in the audit log alongside the engine's
508+
# own load anomalies.
509+
growth_emotions_added, growth_ran = self._try_run_growth(
510+
state, now, config, dry_run, anomalies_collector=tick_anomalies
511+
)
506512

507513
# Optional HEARTBEAT: memory
508514
heartbeat_memory_id: str | None = None
@@ -737,11 +743,17 @@ def _try_run_growth(
737743
now: datetime,
738744
config: HeartbeatConfig,
739745
dry_run: bool,
746+
anomalies_collector: list[BrainAnomaly] | None = None,
740747
) -> tuple[int, bool]:
741748
"""Run a growth tick if due. Returns (emotions_added, ran).
742749
743750
Fault-isolated: any exception logs a warning and returns (0, False).
744751
Heartbeat tick continues normally — same pattern as reflex/research.
752+
753+
`anomalies_collector` is forwarded to `run_growth_tick` so any
754+
anomaly produced inside growth (e.g., vocabulary file corruption
755+
detected by `_read_current_vocabulary_names`) surfaces in the
756+
heartbeat tick's audit log alongside engine-level anomalies.
745757
"""
746758
if not config.growth_enabled:
747759
return (0, False)
@@ -758,7 +770,13 @@ def _try_run_growth(
758770
try:
759771
from brain.growth.scheduler import run_growth_tick
760772

761-
result = run_growth_tick(persona_dir, self.store, now, dry_run=dry_run)
773+
result = run_growth_tick(
774+
persona_dir,
775+
self.store,
776+
now,
777+
dry_run=dry_run,
778+
anomalies_collector=anomalies_collector,
779+
)
762780
except Exception as exc:
763781
logger.warning("growth tick raised; isolating: %.200s", exc)
764782
return (0, False)

brain/growth/scheduler.py

Lines changed: 26 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -16,12 +16,16 @@
1616
from dataclasses import dataclass
1717
from datetime import datetime
1818
from pathlib import Path
19+
from typing import TYPE_CHECKING
1920

2021
from brain.growth.crystallizers.vocabulary import crystallize_vocabulary
2122
from brain.growth.log import GrowthLogEvent, append_growth_event
2223
from brain.growth.proposal import EmotionProposal
2324
from brain.memory.store import MemoryStore
2425

26+
if TYPE_CHECKING:
27+
from brain.health.anomaly import BrainAnomaly
28+
2529
logger = logging.getLogger(__name__)
2630

2731
# Same character allowlist as brain.paths.get_persona_dir — names that
@@ -45,6 +49,7 @@ def run_growth_tick(
4549
now: datetime,
4650
*,
4751
dry_run: bool = False,
52+
anomalies_collector: list[BrainAnomaly] | None = None,
4853
) -> GrowthTickResult:
4954
"""Run all crystallizers, apply their proposals atomically.
5055
@@ -58,11 +63,19 @@ def run_growth_tick(
5863
5964
`dry_run=True` calls the crystallizer but skips both writes; the
6065
returned `emotions_added` reflects "would-have-added" semantics.
66+
67+
`anomalies_collector` (optional): when the heartbeat tick passes its
68+
per-tick anomaly list, any anomaly produced by reading the vocabulary
69+
file (corruption, schema mismatch) gets appended so it surfaces in the
70+
audit log + compact CLI alongside heartbeat-engine anomalies. Pass None
71+
when calling `run_growth_tick` standalone (e.g., from tests).
6172
"""
6273
vocab_path = persona_dir / "emotion_vocabulary.json"
6374
log_path = persona_dir / "emotion_growth.log.jsonl"
6475

65-
current_names = _read_current_vocabulary_names(vocab_path)
76+
current_names, vocab_anomaly = _read_current_vocabulary_names(vocab_path)
77+
if vocab_anomaly is not None and anomalies_collector is not None:
78+
anomalies_collector.append(vocab_anomaly)
6679

6780
proposals = crystallize_vocabulary(store, current_vocabulary_names=current_names)
6881

@@ -108,18 +121,21 @@ def run_growth_tick(
108121
)
109122

110123

111-
def _read_current_vocabulary_names(vocab_path: Path) -> set[str]:
112-
"""Return the set of emotion names currently in the persona's vocabulary file.
124+
def _read_current_vocabulary_names(
125+
vocab_path: Path,
126+
) -> tuple[set[str], BrainAnomaly | None]:
127+
"""Return (set of emotion names, optional anomaly) for the vocabulary file.
113128
114129
Distinguishes three load outcomes:
115-
- Missing file → return empty set silently (fresh persona; expected).
130+
- Missing file → (empty set, None) silently (fresh persona; expected).
116131
- Corrupt JSON or wrong schema → quarantine + heal from .bak or reset to
117-
default; return names from the recovered data. Logs a WARNING so the
118-
anomaly is visible.
119-
- Well-formed → return the set of names.
132+
default; returns (names_from_recovered_data, BrainAnomaly). The caller
133+
(run_growth_tick) feeds the anomaly into its `anomalies_collector` so
134+
it surfaces in the heartbeat audit log. Logs WARNING locally too.
135+
- Well-formed → (set of names, None).
120136
"""
121137
if not vocab_path.exists():
122-
return set()
138+
return set(), None
123139

124140
from brain.health.attempt_heal import attempt_heal
125141

@@ -142,7 +158,8 @@ def _schema_validator(data: object) -> None:
142158
anomaly.action,
143159
)
144160

145-
return {e["name"] for e in data.get("emotions", []) if isinstance(e, dict) and "name" in e}
161+
names = {e["name"] for e in data.get("emotions", []) if isinstance(e, dict) and "name" in e}
162+
return names, anomaly
146163

147164

148165
def _is_valid_name(name: str) -> bool:

brain/health/alarm.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,8 @@
1414
"emotion_vocabulary.json",
1515
"interests.json",
1616
"reflex_arcs.json",
17-
# future: "soul.json"
17+
# When the soul module lands, add "soul.json" here so its
18+
# reset_to_default raises an alarm. See spec §9.1.
1819
}
1920
)
2021

brain/health/walker.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,10 @@
1010
from brain.health.attempt_heal import attempt_heal
1111

1212
# Atomic-rewrite files this walker checks. Each entry: filename -> default dict.
13+
#
14+
# When the soul module lands as a Phase 2a-extension, add `soul.json` here
15+
# with default `{"version": 1, "crystallizations": []}` (or whatever the
16+
# soul module's schema settles on). See spec §9.1 for the full plan.
1317
_DEFAULTS: dict[str, dict] = {
1418
"user_preferences.json": {"dream_every_hours": 24.0},
1519
"persona_config.json": {"provider": "claude-cli", "searcher": "ddgs"},

docs/superpowers/specs/2026-04-25-brain-health-module-design.md

Lines changed: 21 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -347,7 +347,27 @@ The full health module ships when:
347347

348348
## 9. Open / Deferred
349349

350-
- **Soul reconstruction.** When the soul module lands as a Phase 2a-extension, its `soul.json` healing strategy needs design. Soul crystallizations might be partially reconstructable from memories (F37 was self-claims-from-experience), but defer that until soul exists.
350+
### 9.1 Soul module health (concrete plan for when soul lands)
351+
352+
When the Phase 2a-extension brings the soul module online, `soul.json` (or whatever its filename ends up being) joins the persona's identity-critical files. The heal strategy is already partially specified by the architecture; this section makes it concrete so the engineer building the soul module doesn't have to rediscover the plan.
353+
354+
**File classification:** `soul.json` is an **atomic-rewrite identity file** — same tier as `emotion_vocabulary.json`, `interests.json`, `reflex_arcs.json`. Use `attempt_heal` + `save_with_backup`. Add it to:
355+
356+
- `brain/health/walker.py:_DEFAULTS` with empty default `{"version": 1, "crystallizations": []}` (or whatever the schema settles on).
357+
- `brain/health/alarm.py:_IDENTITY_FILES` so `reset_to_default` on `soul.json` raises an alarm.
358+
359+
**Reconstruction strategy:** F37 in OG NellBrain was *self-claims-from-experience* — the brain's soul names were derived from autobiographical patterns in memories. The same heuristic applies here: when all backups corrupt and reset would otherwise fire, scan `memories.db` for soul-claim patterns the brain has expressed and rebuild a partial `soul.json`. Implement as `brain/health/reconstruct.py:reconstruct_soul_from_memories(store) -> dict` mirroring `reconstruct_vocabulary_from_memories`. Wire it into the soul loader's heal flow the same way vocabulary does in `load_persona_vocabulary_with_anomaly` (Followup F1, 2026-04-26).
360+
361+
**Schema validator:** mirror the vocabulary validator pattern — minimal type check (`isinstance(data, dict) and isinstance(data.get("crystallizations"), list)`) — so corrupt-but-parseable files trigger heal.
362+
363+
**Acceptance:** when soul module lands, the soul module's PR must include:
364+
1. `soul.json` in `walker.py:_DEFAULTS` and `alarm.py:_IDENTITY_FILES`
365+
2. `reconstruct_soul_from_memories(store)` implementation + tests
366+
3. Soul loader's `*_with_anomaly` variant routes through `attempt_heal` and triggers reconstruction on `reset_to_default` when a store is provided
367+
4. Sandbox smoke: corrupt soul.json + run heartbeat tick → soul heals or reconstructs without user intervention
368+
369+
### 9.2 Other deferred items
370+
351371
- **Automatic .bak repair when a backup is detected corrupt mid-rotation.** v1 skips the corrupt backup and walks to the next; doesn't try to repair the backup itself. If real-world telemetry shows backups frequently corrupt mid-chain, revisit.
352372
- **GUI surface for "the brain self-healed."** Not a framework concern; future Tauri/NellFace work consumes the audit log directly.
353373

tests/unit/brain/emotion/test_persona_loader.py

Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -182,6 +182,85 @@ def test_load_persona_vocabulary_corrupt_no_bak_resets_to_default(tmp_path: Path
182182
assert path.exists()
183183

184184

185+
def test_load_corrupt_no_bak_with_store_reconstructs_from_memories(tmp_path: Path):
186+
"""When emotion_vocabulary.json corrupts and no .bak exists, the loader
187+
reconstructs from memories rather than resetting to empty.
188+
189+
Followup F1 from the brain-health module: the brain re-learns its own
190+
vocabulary from how it has been operating instead of forgetting it.
191+
"""
192+
store = MemoryStore(":memory:")
193+
try:
194+
# Seed memories that reference both baseline + extension emotions.
195+
store.create(
196+
Memory.create_new(
197+
content="x",
198+
memory_type="conversation",
199+
domain="us",
200+
emotions={"love": 9.0, "body_grief": 5.0},
201+
)
202+
)
203+
store.create(
204+
Memory.create_new(
205+
content="y",
206+
memory_type="conversation",
207+
domain="us",
208+
emotions={"creative_hunger": 7.0},
209+
)
210+
)
211+
212+
# Pre-cleanup so the test is repeatable
213+
_cleanup_emotion("body_grief")
214+
_cleanup_emotion("creative_hunger")
215+
216+
# Corrupt the vocabulary file (no .bak alongside)
217+
vocab_path = tmp_path / "emotion_vocabulary.json"
218+
vocab_path.write_text("{not json", encoding="utf-8")
219+
220+
count, anomaly = load_persona_vocabulary_with_anomaly(vocab_path, store=store)
221+
222+
# Anomaly action reflects reconstruction, not bare reset
223+
assert anomaly is not None
224+
assert anomaly.action == "reconstructed_from_memories"
225+
assert "reconstructed" in anomaly.detail
226+
227+
# File on disk now has the reconstructed content
228+
on_disk = json.loads(vocab_path.read_text(encoding="utf-8"))
229+
names = {e["name"] for e in on_disk["emotions"]}
230+
assert "love" in names # baseline
231+
assert "body_grief" in names # reconstructed extension
232+
assert "creative_hunger" in names # reconstructed extension
233+
234+
# Reconstructed extensions registered + count > 0
235+
assert count > 0
236+
assert vocabulary.get("body_grief") is not None
237+
assert vocabulary.get("creative_hunger") is not None
238+
# Persona-extension entries carry the placeholder description
239+
body_grief = vocabulary.get("body_grief")
240+
assert body_grief is not None
241+
assert "reconstructed from memory" in body_grief.description
242+
finally:
243+
store.close()
244+
_cleanup_emotion("body_grief")
245+
_cleanup_emotion("creative_hunger")
246+
247+
248+
def test_load_corrupt_no_bak_no_store_falls_back_to_empty(tmp_path: Path):
249+
"""When the loader has no store (caller didn't pass one), reconstruction
250+
can't happen; the empty default is used and anomaly action stays
251+
'reset_to_default'."""
252+
vocab_path = tmp_path / "emotion_vocabulary.json"
253+
vocab_path.write_text("{not json", encoding="utf-8")
254+
255+
count, anomaly = load_persona_vocabulary_with_anomaly(vocab_path, store=None)
256+
257+
assert anomaly is not None
258+
assert anomaly.action == "reset_to_default"
259+
on_disk = json.loads(vocab_path.read_text(encoding="utf-8"))
260+
assert on_disk == {"version": 1, "emotions": []}
261+
assert count == 0
262+
263+
185264
def test_load_with_store_warns_on_missing_emotion(tmp_path: Path, caplog):
186265
"""Store has memory referencing 'body_grief' but vocab file missing →
187266
one warning per missing emotion pointing at nell migrate.

tests/unit/brain/engines/test_heartbeat.py

Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1744,3 +1744,74 @@ def test_heartbeat_alarm_increments_pending_alarms_count(tmp_path: Path) -> None
17441744
finally:
17451745
store.close()
17461746
hebbian.close()
1747+
1748+
1749+
# ---- F3: growth-tick-internal anomalies surface in heartbeat audit log ----
1750+
1751+
1752+
def test_heartbeat_growth_anomaly_surfaces_in_audit_log(tmp_path: Path) -> None:
1753+
"""When growth tick reads a corrupt vocabulary file, the anomaly appears
1754+
in the heartbeat audit log + HeartbeatResult.anomalies — not just logged
1755+
as a warning inside growth.
1756+
1757+
Followup F3 from the brain-health module.
1758+
"""
1759+
persona_dir = tmp_path / "persona"
1760+
persona_dir.mkdir()
1761+
1762+
# Seed an empty interests file so growth has a persona dir to work with.
1763+
(persona_dir / "interests.json").write_text(
1764+
json.dumps({"version": 1, "interests": []}), encoding="utf-8"
1765+
)
1766+
1767+
store = MemoryStore(":memory:")
1768+
hebbian = HebbianMatrix(":memory:")
1769+
try:
1770+
engine = HeartbeatEngine(
1771+
store=store,
1772+
hebbian=hebbian,
1773+
provider=FakeProvider(),
1774+
state_path=persona_dir / "heartbeat_state.json",
1775+
config_path=persona_dir / "heartbeat_config.json",
1776+
dream_log_path=persona_dir / "dreams.log.jsonl",
1777+
heartbeat_log_path=persona_dir / "heartbeats.log.jsonl",
1778+
interests_path=persona_dir / "interests.json",
1779+
research_log_path=persona_dir / "research_log.json",
1780+
default_interests_path=DEFAULT_INTERESTS_PATH,
1781+
persona_name="test",
1782+
persona_system_prompt="You are test.",
1783+
)
1784+
1785+
# First tick initializes state — work deferred (first-tick semantics).
1786+
engine.run_tick(trigger="open")
1787+
1788+
# Force last_growth_at older than growth_every_hours so growth fires.
1789+
from brain.engines.heartbeat import HeartbeatState
1790+
1791+
s = HeartbeatState.load(persona_dir / "heartbeat_state.json")
1792+
assert s is not None
1793+
s.last_growth_at = datetime.now(UTC) - timedelta(hours=200)
1794+
s.last_tick_at = datetime.now(UTC) - timedelta(hours=200)
1795+
s.save(persona_dir / "heartbeat_state.json")
1796+
1797+
# Corrupt vocab file (no .bak → reset_to_default fires inside growth).
1798+
(persona_dir / "emotion_vocabulary.json").write_text("{not json", encoding="utf-8")
1799+
1800+
# Second tick runs growth. _read_current_vocabulary_names detects the
1801+
# corruption, heals it, and the anomaly is appended to tick_anomalies
1802+
# via the F3 wiring.
1803+
result = engine.run_tick(trigger="manual")
1804+
1805+
# The anomaly from inside growth surfaces in the result + audit log.
1806+
vocab_anomalies = [a for a in result.anomalies if a.file == "emotion_vocabulary.json"]
1807+
assert len(vocab_anomalies) >= 1
1808+
assert vocab_anomalies[0].action == "reset_to_default"
1809+
1810+
# Audit log also has it.
1811+
log_lines = (persona_dir / "heartbeats.log.jsonl").read_text().strip().splitlines()
1812+
last_entry = json.loads(log_lines[-1])
1813+
files_in_audit = {a["file"] for a in last_entry["anomalies"]}
1814+
assert "emotion_vocabulary.json" in files_in_audit
1815+
finally:
1816+
store.close()
1817+
hebbian.close()

0 commit comments

Comments
 (0)