Skip to content

Commit af38db8

Browse files
tcconnallytcconnallyclaude
authored
fix(#417): add journal.workspace_hash; scope purge redaction per-workspace (#426)
Follow-up to #416. journal had no workspace column, so purge's (category, key) redaction match was workspace-blind: purging workspace A's entity redacted workspace B's LIVE same-key journal rows (payloads → {}, identity scrubbed) — real audit-payload loss for B. - Schema: add journal.workspace_hash (SCHEMA_VERSION 10 → 11) via a gated ALTER migration + a (category, key, workspace_hash) index. ensure_column also backfills journal category/key so the index is robust on ancient journals that predate those columns. - Write path: Database::journal stamps workspace_hash — explicit on the event if set, else derived from the referenced entity (live row, or entity_history when the live id has since changed). System events (dream/synthesis, no entity_id) stay ''. JournalEvent + JournalArgs gain the field. - purge: JRN_MATCH's (category, key) branch is scoped to the purged entity's workspace. Rows with '' (legacy pre-v11 or genuine default-workspace) are still matched conservatively, so erasure never UNDER-redacts (no GDPR regression); residual over-redaction is narrowed to default-workspace rows sharing an exact (category, key) with a purged NAMED-workspace entity. The entity_id branch is exact and already workspace-safe. Redaction also scrubs workspace_hash on the redacted row. - Tests: new v10→v11 migration test; new test proving A-purge leaves B's live same-key journal row intact + audit chain still verifies. Full bin suite: 335 passed. - docs/retention.md: document workspace-scoped redaction + the derivative artifacts purge does NOT auto-erase (dream/consolidate outputs, community summaries, vault_export files). Co-authored-by: tcconnally <hermes@perseus.observer> Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
1 parent 82e293e commit af38db8

7 files changed

Lines changed: 306 additions & 17 deletions

File tree

CHANGELOG.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -121,6 +121,19 @@ All notable changes to Perseus Vault (formerly Mimir/Mneme) are documented here.
121121
behavior is unchanged.
122122

123123
### Fixed
124+
- Journal redaction is now workspace-scoped (#417, follow-up to #416): the
125+
`journal` table gained a `workspace_hash` column (SCHEMA_VERSION 10 → 11),
126+
stamped at write time in `Database::journal` from the referenced entity's
127+
workspace. `purge`'s `(category, key)` redaction match is scoped to the
128+
purged entity's workspace, so purging workspace A no longer redacts workspace
129+
B's live same-key journal rows. Rows with an empty `workspace_hash` (legacy
130+
pre-v11 rows, or genuine default-workspace rows) are still matched
131+
conservatively so erasure never *under*-redacts (no GDPR regression); the
132+
residual over-redaction is narrowed to default-workspace rows sharing an
133+
exact `(category, key)` with a purged *named*-workspace entity. `docs/
134+
retention.md` now also names the derivative artifacts `purge` does not
135+
auto-erase (dream/consolidate outputs, community summaries, vault_export
136+
files).
124137
- Default DB-path resolution surfaces the split-brain instead of hiding it
125138
(#421): the legacy single-user location `~/mimir.db` is now **added to the
126139
fallback chain** (adopted when it is the *only* existing DB, instead of

docs/retention.md

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -113,6 +113,30 @@ Deletion is explicit and two-step:
113113
because the audit hash chain covers row identity, so `verify_audit_chain`
114114
stays valid). `forget` then `purge` is the GDPR-style erasure path.
115115

116+
**Journal redaction is workspace-scoped (#417).** Journal rows carry the
117+
`workspace_hash` of the entity they reference (stamped at write time), so
118+
purging one workspace's entity no longer redacts another workspace's live
119+
same-key journal rows. Rows with an empty `workspace_hash` — legacy rows
120+
written before the schema v11 migration, or genuine default-workspace rows —
121+
are still matched conservatively (erasure never *under*-redacts), so the only
122+
residual over-redaction is a default-workspace row that shares an exact
123+
`(category, key)` with a purged *named*-workspace entity.
124+
125+
**Derivative artifacts are NOT auto-erased by `purge`.** Purge scopes to the
126+
archived source entities, their `entity_history`, and journal rows. Content
127+
*derived* from a purged entity is out of scope and, if it may echo the erased
128+
body, must be handled separately:
129+
- **Dream/consolidate outputs**`mimir_dream` and `mimir_consolidate` write
130+
new entities (`derived: true`) whose bodies summarize their sources. These
131+
are ordinary entities: to erase them, `forget` + `purge` them too (the
132+
`derivation`/source metadata on each derived entity identifies candidates).
133+
- **Community summaries** — LLM summaries over community clusters can quote
134+
member bodies; regenerate or clear them after a purge if the purged entity
135+
was a member.
136+
- **`mimir_vault_export` files** — exported Markdown/JSON on disk is a point-in-
137+
time copy outside the database and is never touched by `purge`; delete the
138+
export artifacts out-of-band.
139+
116140
## Version history retention (#398)
117141

118142
Every content overwrite of a `(category, key)` snapshots the prior version

src/db.rs

Lines changed: 169 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -3902,11 +3902,38 @@ impl Database {
39023902
crate::db::sha256_genesis(&event.id, event.created_at_unix_ms)
39033903
};
39043904

3905+
// #417: stamp the workspace of the referenced entity so purge can scope
3906+
// journal redaction per-workspace. Prefer an explicit value on the
3907+
// event; otherwise derive it from the referenced entity (the live row,
3908+
// or a superseded version in entity_history when the live id has since
3909+
// changed). System events with no entity_id stay '' (workspace-agnostic).
3910+
let workspace_hash = if !event.workspace_hash.is_empty() {
3911+
event.workspace_hash.clone()
3912+
} else if !event.entity_id.is_empty() {
3913+
conn.query_row(
3914+
"SELECT workspace_hash FROM entities WHERE id = ?1",
3915+
params![event.entity_id],
3916+
|r| r.get::<_, Option<String>>(0),
3917+
)
3918+
.or_else(|_| {
3919+
conn.query_row(
3920+
"SELECT workspace_hash FROM entity_history WHERE id = ?1 LIMIT 1",
3921+
params![event.entity_id],
3922+
|r| r.get::<_, Option<String>>(0),
3923+
)
3924+
})
3925+
.ok()
3926+
.flatten()
3927+
.unwrap_or_default()
3928+
} else {
3929+
String::new()
3930+
};
3931+
39053932
conn.execute(
39063933
"INSERT INTO journal
39073934
(id, event_type, evaluated_json, acted_json, forward_json,
3908-
category, key, entity_id, agent_id, audit_hash, created_at_unix_ms)
3909-
VALUES (?1, ?2, ?3, ?4, ?5, ?6, ?7, ?8, ?9, ?10, ?11)",
3935+
category, key, entity_id, agent_id, audit_hash, workspace_hash, created_at_unix_ms)
3936+
VALUES (?1, ?2, ?3, ?4, ?5, ?6, ?7, ?8, ?9, ?10, ?11, ?12)",
39103937
params![
39113938
event.id,
39123939
event.event_type,
@@ -3918,6 +3945,7 @@ impl Database {
39183945
event.entity_id,
39193946
event.agent_id,
39203947
computed_hash,
3948+
workspace_hash,
39213949
event.created_at_unix_ms,
39223950
],
39233951
)?;
@@ -4268,6 +4296,8 @@ impl Database {
42684296
key: row.get(6)?,
42694297
entity_id: row.get(7)?,
42704298
agent_id: row.get::<_, Option<String>>(8).unwrap_or(None).unwrap_or_default(),
4299+
// Not selected by this listing query; purge-scoping metadata only.
4300+
workspace_hash: String::new(),
42714301
created_at_unix_ms: row.get(9)?,
42724302
})
42734303
})?;
@@ -5074,13 +5104,11 @@ impl Database {
50745104

50755105
/// Get recent journal events.
50765106
///
5077-
/// NOTE: the `journal` table has no `workspace_hash` column, so
5078-
/// this cannot be scoped to a workspace the way `list_entities`/
5079-
/// `get_entity_graph`/`context`/`recall_when` now are. In a federated
5080-
/// vault, journal events from every workspace are visible here. Fixing
5081-
/// this properly needs a schema migration (new column + SCHEMA_VERSION
5082-
/// bump + JournalEvent struct + every journal() call site) — tracked as
5083-
/// a follow-up rather than folded into this pass.
5107+
/// NOTE: as of #417 the `journal` table has a `workspace_hash` column
5108+
/// (stamped at write time) that `purge` uses to scope redaction. This
5109+
/// listing query does not yet expose or filter by it — journal events from
5110+
/// every workspace are still visible here. Adding a workspace filter to the
5111+
/// read path is a separate, additive change.
50845112
pub fn get_recent_journal(
50855113
&self,
50865114
limit: i64,
@@ -5102,6 +5130,8 @@ impl Database {
51025130
key: row.get::<_, String>(6).unwrap_or_default(),
51035131
entity_id: row.get::<_, String>(7).unwrap_or_default(),
51045132
agent_id: row.get::<_, Option<String>>(8).unwrap_or(None).unwrap_or_default(),
5133+
// Not selected by this listing query; purge-scoping metadata only.
5134+
workspace_hash: String::new(),
51055135
created_at_unix_ms: row.get(9)?,
51065136
})
51075137
})?;
@@ -6068,6 +6098,7 @@ impl Database {
60686098
key: "dream-run".to_string(),
60696099
entity_id: String::new(),
60706100
agent_id: String::new(),
6101+
workspace_hash: String::new(), // workspace-agnostic system event
60716102
created_at_unix_ms: now,
60726103
};
60736104
self.journal(&event)?;
@@ -6395,8 +6426,21 @@ Return a JSON object with an "insights" array. Each insight has:
63956426
// so the preview counts exactly what the real run affects.
63966427
const HIST_MATCH: &str =
63976428
"id = ?1 OR (category = ?2 AND key = ?3 AND COALESCE(workspace_hash,'') = ?4)";
6429+
// #417: journal rows now carry the workspace of the entity they
6430+
// reference (stamped in `journal()`), so the (category, key) branch is
6431+
// scoped to the purged entity's workspace — purging workspace A no
6432+
// longer redacts workspace B's live same-key rows. Rows with an empty
6433+
// workspace_hash (`''`) are still matched: those are legacy rows
6434+
// (pre-SCHEMA_VERSION-11, workspace unknown) or genuine default-workspace
6435+
// rows, and could belong to the purged entity — matching them keeps
6436+
// erasure GDPR-complete (never under-redacts). The residual over-redaction
6437+
// is thus narrowed to default-workspace rows sharing an exact (category,
6438+
// key) with a purged *named*-workspace entity — strictly tighter than the
6439+
// pre-#417 cross-workspace behavior. The entity_id branch is exact and
6440+
// already workspace-safe.
63986441
const JRN_MATCH: &str =
6399-
"event_type != 'redacted' AND (entity_id = ?1 OR (category = ?2 AND key = ?3 AND key != ''))";
6442+
"event_type != 'redacted' AND (entity_id = ?1 OR (category = ?2 AND key = ?3 AND key != '' \
6443+
AND (COALESCE(workspace_hash,'') = ?4 OR COALESCE(workspace_hash,'') = '')))";
64006444

64016445
if dry_run {
64026446
// Dedupe by row id: two doomed entities sharing (category, key)
@@ -6414,7 +6458,7 @@ Return a JSON object with an "insights" array. Each insight has:
64146458
}
64156459
let mut stmt =
64166460
conn.prepare(&format!("SELECT id FROM journal WHERE {JRN_MATCH}"))?;
6417-
for row in stmt.query_map(params![id, cat, key], |r| r.get::<_, String>(0))? {
6461+
for row in stmt.query_map(params![id, cat, key, ws], |r| r.get::<_, String>(0))? {
64186462
jrn.insert(row?);
64196463
}
64206464
}
@@ -6458,9 +6502,9 @@ Return a JSON object with an "insights" array. Each insight has:
64586502
&format!(
64596503
"UPDATE journal SET event_type = 'redacted', evaluated_json = '{{}}', \
64606504
acted_json = '{{}}', forward_json = '{{}}', category = '', key = '', \
6461-
entity_id = '' WHERE {JRN_MATCH}"
6505+
entity_id = '', workspace_hash = '' WHERE {JRN_MATCH}"
64626506
),
6463-
params![id, cat, key],
6507+
params![id, cat, key, ws],
64646508
)? as i64;
64656509
}
64666510
tx.commit()?;
@@ -8011,10 +8055,12 @@ last_accessed: {}
80118055
key: key.clone(),
80128056
entity_id: id.clone(),
80138057
agent_id: String::new(),
8058+
// Derived from the referenced entity by journal(); left empty here.
8059+
workspace_hash: String::new(),
80148060
created_at_unix_ms: now,
80158061
};
80168062
self.journal(&event)?;
8017-
8063+
80188064
Ok(crate::models::CorrectResult {
80198065
entity_id: id,
80208066
journal_id,
@@ -8159,10 +8205,11 @@ If no clear lessons found, return: {{"lessons": []}}"#,
81598205
key: format!("session-{}", params.session_id),
81608206
entity_id: String::new(),
81618207
agent_id: String::new(),
8208+
workspace_hash: String::new(), // workspace-agnostic system event
81628209
created_at_unix_ms: now,
81638210
};
81648211
self.journal(&event)?;
8165-
8212+
81668213
Ok(crate::models::SynthesizeResult {
81678214
lessons,
81688215
entities_created,
@@ -13298,6 +13345,7 @@ mod tests {
1329813345
key: "use-pg".to_string(),
1329913346
entity_id: "e1".to_string(),
1330013347
agent_id: "agent-1".to_string(),
13348+
workspace_hash: String::new(),
1330113349
created_at_unix_ms: now_ms(),
1330213350
};
1330313351
db.journal(&event).unwrap();
@@ -16172,6 +16220,7 @@ mod tests {
1617216220
key: "t1".to_string(),
1617316221
entity_id: String::new(),
1617416222
agent_id: "security-bot".to_string(),
16223+
workspace_hash: String::new(),
1617516224
created_at_unix_ms: now_ms(),
1617616225
};
1617716226
db.journal(&event).unwrap();
@@ -17342,6 +17391,7 @@ mod tests {
1734217391
key: "pii".to_string(),
1734317392
entity_id: live_id.clone(),
1734417393
agent_id: "test".to_string(),
17394+
workspace_hash: String::new(),
1734517395
created_at_unix_ms: now_ms(),
1734617396
})
1734717397
.unwrap();
@@ -17357,6 +17407,7 @@ mod tests {
1735717407
key: "unrelated".to_string(),
1735817408
entity_id: String::new(),
1735917409
agent_id: "test".to_string(),
17410+
workspace_hash: String::new(),
1736017411
created_at_unix_ms: now_ms() + 1,
1736117412
})
1736217413
.unwrap();
@@ -17456,6 +17507,7 @@ mod tests {
1745617507
key: "shared".to_string(),
1745717508
entity_id: String::new(),
1745817509
agent_id: "test".to_string(),
17510+
workspace_hash: String::new(),
1745917511
created_at_unix_ms: now_ms(),
1746017512
})
1746117513
.unwrap();
@@ -17473,6 +17525,108 @@ mod tests {
1747317525
let _ = fs::remove_file(&path);
1747417526
}
1747517527

17528+
/// #417: purging one workspace's entity must NOT redact another workspace's
17529+
/// LIVE same-key journal rows. Before the workspace_hash column the
17530+
/// (category, key) redaction match was workspace-blind and scrubbed
17531+
/// workspace B's audit payloads when workspace A was purged.
17532+
#[test]
17533+
fn purge_does_not_redact_other_workspace_live_journal_rows() {
17534+
let (db, path) = temp_db();
17535+
17536+
// Two LIVE entities: same (category, key), different workspaces. Identity
17537+
// is (category, key, workspace_hash) (#339), so neither supersedes the other.
17538+
let mut e_a = make_entity("e-a", "facts", "k", r#"{"n":"a"}"#);
17539+
e_a.workspace_hash = "wsA".to_string();
17540+
db.remember(&e_a).unwrap();
17541+
let mut e_b = make_entity("e-b", "facts", "k", r#"{"n":"b"}"#);
17542+
e_b.workspace_hash = "wsB".to_string();
17543+
db.remember(&e_b).unwrap();
17544+
17545+
// One journal row per entity, referenced by id. journal() derives and
17546+
// stamps each row's workspace from the referenced entity.
17547+
db.journal(&crate::models::JournalEvent {
17548+
id: "jrn-a".to_string(),
17549+
event_type: "decision".to_string(),
17550+
evaluated_json: r#"{"secret":"A-only"}"#.to_string(),
17551+
acted_json: "{}".to_string(),
17552+
forward_json: "{}".to_string(),
17553+
category: "facts".to_string(),
17554+
key: "k".to_string(),
17555+
entity_id: "e-a".to_string(),
17556+
agent_id: "test".to_string(),
17557+
workspace_hash: String::new(),
17558+
created_at_unix_ms: now_ms(),
17559+
})
17560+
.unwrap();
17561+
db.journal(&crate::models::JournalEvent {
17562+
id: "jrn-b".to_string(),
17563+
event_type: "decision".to_string(),
17564+
evaluated_json: r#"{"keep":"B-live"}"#.to_string(),
17565+
acted_json: "{}".to_string(),
17566+
forward_json: "{}".to_string(),
17567+
category: "facts".to_string(),
17568+
key: "k".to_string(),
17569+
entity_id: "e-b".to_string(),
17570+
agent_id: "test".to_string(),
17571+
workspace_hash: String::new(),
17572+
created_at_unix_ms: now_ms() + 1,
17573+
})
17574+
.unwrap();
17575+
17576+
// journal() must have stamped each row with its referenced entity's workspace.
17577+
{
17578+
let conn = db.conn().unwrap();
17579+
let ws_a: String = conn
17580+
.query_row("SELECT workspace_hash FROM journal WHERE id='jrn-a'", [], |r| r.get(0))
17581+
.unwrap();
17582+
let ws_b: String = conn
17583+
.query_row("SELECT workspace_hash FROM journal WHERE id='jrn-b'", [], |r| r.get(0))
17584+
.unwrap();
17585+
assert_eq!(ws_a, "wsA", "journal() must stamp the referenced entity's workspace");
17586+
assert_eq!(ws_b, "wsB");
17587+
}
17588+
17589+
// Archive ONLY workspace A's entity, then purge. (forget by (category,
17590+
// key) is workspace-blind, so archive e-a directly to isolate wsA.)
17591+
{
17592+
let conn = db.conn().unwrap();
17593+
conn.execute("UPDATE entities SET archived = 1 WHERE id = 'e-a'", [])
17594+
.unwrap();
17595+
}
17596+
let report = db.purge(false).unwrap();
17597+
assert_eq!(report.entities_deleted, 1);
17598+
assert_eq!(
17599+
report.journal_rows_redacted, 1,
17600+
"only workspace A's journal row should be redacted"
17601+
);
17602+
17603+
let conn = db.conn().unwrap();
17604+
// wsA's row is redacted...
17605+
let a_type: String = conn
17606+
.query_row("SELECT event_type FROM journal WHERE id='jrn-a'", [], |r| r.get(0))
17607+
.unwrap();
17608+
assert_eq!(a_type, "redacted");
17609+
// ...but wsB's LIVE row is untouched — payload and identity intact.
17610+
let (b_type, b_eval, b_cat): (String, String, String) = conn
17611+
.query_row(
17612+
"SELECT event_type, evaluated_json, category FROM journal WHERE id='jrn-b'",
17613+
[],
17614+
|r| Ok((r.get(0)?, r.get(1)?, r.get(2)?)),
17615+
)
17616+
.unwrap();
17617+
assert_eq!(
17618+
b_type, "decision",
17619+
"another workspace's live journal row must not be redacted (#417)"
17620+
);
17621+
assert!(b_eval.contains("B-live"), "another workspace's payload must survive");
17622+
assert_eq!(b_cat, "facts");
17623+
17624+
drop(conn);
17625+
verify_audit_chain(&db).expect("audit chain must survive workspace-scoped redaction");
17626+
17627+
let _ = fs::remove_file(&path);
17628+
}
17629+
1747617630
fn retention_policy(
1747717631
age: Option<i64>,
1747817632
cap: Option<i64>,

src/grpc.rs

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -204,6 +204,9 @@ pub mod grpc {
204204
key: r.key,
205205
entity_id: r.entity_id,
206206
agent_id: r.agent_id,
207+
// #417: journal() derives the workspace from the referenced
208+
// entity; the gRPC JournalRequest carries no workspace field.
209+
workspace_hash: String::new(),
207210
created_at_unix_ms: crate::db::now_ms(),
208211
};
209212
db.journal(&event)?;

src/models.rs

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -161,7 +161,12 @@ pub struct JournalEvent {
161161
#[serde(default)]
162162
pub entity_id: String,
163163
pub agent_id: String,
164-
/// Visibility: 'private', 'workspace', or 'public' (v1.2.0)
164+
/// Workspace of the entity this event refers to, stamped at write time so
165+
/// `purge` can scope journal redaction per-workspace (#417). Empty for
166+
/// workspace-agnostic system events (dream/synthesis) and for legacy rows
167+
/// written before the SCHEMA_VERSION 11 migration added the column.
168+
#[serde(default)]
169+
pub workspace_hash: String,
165170
pub created_at_unix_ms: i64,
166171
}
167172

0 commit comments

Comments
 (0)