Skip to content

Commit 604e691

Browse files
fix(storage): guard collectActiveMemoryPaths against symlink escape (info leak) (#1565)
* fix(storage): guard collectActiveMemoryPaths against symlink escape (info leak) collectActiveMemoryPaths() (#1497) walked the RECALL_FALLBACK_DIRS memory category dirs with a raw readdir and no symlink/containment guard. It feeds the QMD-unavailable filesystem recall fallback, so a category dir symlinked outside memoryDir (e.g. decisions/ -> an external dir) would be followed, pulling out-of-store files into recall results. This is the same class of issue PR #1563 (issue #1546) hardened across the other memory-store walkers; collectActiveMemoryPaths predates that PR and was the one remaining straggler. Adopt the now-standard containment pattern: resolve the memory root once via realpath; per directory lstat + skip symlinked/non-dir entries and assert the realpath stays inside the root before reading; per entry skip symlinks and assert the file's realpath is inside the root before including it. The shared assertPathInsideRoot/pathIsInside check is extracted to utils/path-containment.ts rather than forking a new one. - Add symlinked-category-dir and symlinked-nested-entry containment tests (win32-guarded like the existing symlink tests). - Bump the storage.ts structural-ratchet baseline (#1520) for the added guards. * fix(storage): isolate per-entry failures in collectActiveMemoryPaths walk Cursor Bugbot (medium): a containment/realpath failure on a single .md entry was caught by the directory-wide try/catch, so the deferred subdir recursion loop never ran — valid nested in-store memories under that dir could be dropped from QMD fallback recall because one sibling entry was poisoned. Split the directory-level guard (lstat/realpath/readdir → skip subtree on failure) from per-entry handling, and wrap the per-.md containment assert in its own try/catch so a single bad entry only skips itself. Subdir recursion now runs unconditionally. Mirrors the per-file try/catch in document-scanner.ts scanDir and consolidation-provenance-check.ts walkMarkdownFiles. Add a regression test asserting a nested in-store memory is still returned when its category dir also holds a symlinked-out sibling (win32-guarded). Bump the storage.ts structural-ratchet baseline for the added lines.
1 parent fc5a972 commit 604e691

3 files changed

Lines changed: 202 additions & 16 deletions

File tree

packages/remnic-core/src/storage-fallback-category-dirs.test.ts

Lines changed: 150 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@
2727
*/
2828

2929
import assert from "node:assert/strict";
30-
import { mkdir, mkdtemp, rm, writeFile } from "node:fs/promises";
30+
import { mkdir, mkdtemp, rm, symlink, writeFile } from "node:fs/promises";
3131
import os from "node:os";
3232
import path from "node:path";
3333
import test from "node:test";
@@ -356,6 +356,155 @@ test("question-queue items written by writeQuestion() do NOT leak into fallback
356356
* directly with a config object that explicitly disables QMD, proving the disk
357357
* scan a QMD-disabled deployment relies on returns every category.
358358
*/
359+
/**
360+
* Symlink containment (same walker-hardening class as PR #1563 / issue #1546).
361+
*
362+
* `collectActiveMemoryPaths()` walks the RECALL_FALLBACK_DIRS category dirs for
363+
* the QMD-unavailable filesystem recall fallback. A category dir symlinked
364+
* outside `memoryDir` (e.g. `decisions/` -> an external directory) must NOT be
365+
* followed — otherwise out-of-store files leak into recall results. The walker
366+
* now `lstat`s each dir (skipping symlinks/non-dirs), realpaths it, and asserts
367+
* it stays inside the memory root; per entry it skips symlinks and asserts the
368+
* file resolves inside the root before including it.
369+
*/
370+
test("category dir symlinked outside memoryDir does NOT leak files into fallback recall", async (t) => {
371+
if (process.platform === "win32") {
372+
t.skip("directory symlink setup is platform-specific");
373+
return;
374+
}
375+
const baseDir = await mkdtemp(path.join(os.tmpdir(), "engram-1497-symlink-dir-"));
376+
const outsideDir = await mkdtemp(path.join(os.tmpdir(), "engram-1497-symlink-out-"));
377+
const storage = new StorageManager(baseDir);
378+
StorageManager.clearAllStaticCaches();
379+
storage.invalidateAllMemoriesCacheForDir();
380+
try {
381+
// A genuine in-store memory that MUST still be recalled.
382+
await seedMemory(baseDir, "facts", "real-fact", "fact", "A genuine in-store fact.");
383+
384+
// An out-of-store memory reachable only via a symlinked category dir.
385+
await writeFile(
386+
path.join(outsideDir, "leaked.md"),
387+
memoryFile("leaked-secret", "decision", "Out-of-store file must never be recalled."),
388+
"utf-8",
389+
);
390+
// `decisions/` is a RECALL_FALLBACK_DIRS category dir; point it outside the store.
391+
await symlink(outsideDir, path.join(baseDir, "decisions"), "dir");
392+
393+
storage.invalidateAllMemoriesCacheForDir();
394+
const memories = await storage.readAllMemories();
395+
const foundIds = new Set(memories.map((m) => m.frontmatter.id));
396+
397+
assert.ok(foundIds.has("real-fact"), "the genuine in-store fact must still be recalled");
398+
assert.ok(
399+
!foundIds.has("leaked-secret"),
400+
"a symlinked-out category dir must not leak files into recall",
401+
);
402+
assert.equal(memories.length, 1, "only the in-store memory should be returned");
403+
} finally {
404+
StorageManager.clearAllStaticCaches();
405+
await rm(baseDir, { recursive: true, force: true });
406+
await rm(outsideDir, { recursive: true, force: true });
407+
}
408+
});
409+
410+
test("symlinked entry nested inside a category dir does NOT leak into fallback recall", async (t) => {
411+
if (process.platform === "win32") {
412+
t.skip("directory symlink setup is platform-specific");
413+
return;
414+
}
415+
const baseDir = await mkdtemp(path.join(os.tmpdir(), "engram-1497-symlink-entry-"));
416+
const outsideDir = await mkdtemp(path.join(os.tmpdir(), "engram-1497-symlink-entry-out-"));
417+
const storage = new StorageManager(baseDir);
418+
StorageManager.clearAllStaticCaches();
419+
storage.invalidateAllMemoriesCacheForDir();
420+
try {
421+
// A genuine in-store memory in the same category dir that holds the symlink.
422+
await seedMemory(baseDir, "facts", "real-fact", "fact", "A genuine in-store fact.");
423+
424+
await writeFile(
425+
path.join(outsideDir, "leaked.md"),
426+
memoryFile("nested-leak", "fact", "Out-of-store nested file must never be recalled."),
427+
"utf-8",
428+
);
429+
// A symlinked *entry* inside facts/ that escapes the store.
430+
await symlink(outsideDir, path.join(baseDir, "facts", "escape"), "dir");
431+
432+
storage.invalidateAllMemoriesCacheForDir();
433+
const memories = await storage.readAllMemories();
434+
const foundIds = new Set(memories.map((m) => m.frontmatter.id));
435+
436+
assert.ok(foundIds.has("real-fact"), "the genuine in-store fact must still be recalled");
437+
assert.ok(
438+
!foundIds.has("nested-leak"),
439+
"a symlinked entry escaping the store must not leak files into recall",
440+
);
441+
assert.equal(memories.length, 1, "only the in-store memory should be returned");
442+
} finally {
443+
StorageManager.clearAllStaticCaches();
444+
await rm(baseDir, { recursive: true, force: true });
445+
await rm(outsideDir, { recursive: true, force: true });
446+
}
447+
});
448+
449+
/**
450+
* Regression for Cursor Bugbot "Poisoned md skips sibling subdirs".
451+
*
452+
* Per-entry containment/realpath failures must be isolated so they cannot drop
453+
* sibling `.md` files or, crucially, the nested-subdir recursion. This asserts
454+
* that a category dir which also contains a symlinked-out entry still recurses
455+
* into its real nested subdirectory and returns every in-store memory, while
456+
* the out-of-store target stays excluded.
457+
*/
458+
test("a guarded/skipped sibling entry does NOT drop nested in-store memories", async (t) => {
459+
if (process.platform === "win32") {
460+
t.skip("directory symlink setup is platform-specific");
461+
return;
462+
}
463+
const baseDir = await mkdtemp(path.join(os.tmpdir(), "engram-1497-sibling-"));
464+
const outsideDir = await mkdtemp(path.join(os.tmpdir(), "engram-1497-sibling-out-"));
465+
const storage = new StorageManager(baseDir);
466+
StorageManager.clearAllStaticCaches();
467+
storage.invalidateAllMemoriesCacheForDir();
468+
try {
469+
const factsDir = path.join(baseDir, "facts");
470+
await mkdir(factsDir, { recursive: true });
471+
// A top-level in-store memory and a nested in-store memory in the SAME
472+
// category dir that also holds the symlinked-out sibling.
473+
await writeFile(
474+
path.join(factsDir, "top.md"),
475+
memoryFile("top-fact", "fact", "Top-level in-store fact."),
476+
"utf-8",
477+
);
478+
await seedMemory(baseDir, "facts", "deep-fact", "fact", "Deeply nested in-store fact.", {
479+
nested: true,
480+
});
481+
482+
await writeFile(
483+
path.join(outsideDir, "leaked.md"),
484+
memoryFile("leaked-secret", "fact", "Out-of-store file must never be recalled."),
485+
"utf-8",
486+
);
487+
// A symlinked-out sibling entry alongside the real files/subdir.
488+
await symlink(outsideDir, path.join(factsDir, "escape"), "dir");
489+
490+
storage.invalidateAllMemoriesCacheForDir();
491+
const memories = await storage.readAllMemories();
492+
const foundIds = new Set(memories.map((m) => m.frontmatter.id));
493+
494+
assert.ok(foundIds.has("top-fact"), "the top-level in-store fact must be recalled");
495+
assert.ok(
496+
foundIds.has("deep-fact"),
497+
"the nested in-store fact must still be recalled past a skipped sibling",
498+
);
499+
assert.ok(!foundIds.has("leaked-secret"), "the symlinked-out file must not leak");
500+
assert.equal(memories.length, 2, "exactly the two in-store memories should be returned");
501+
} finally {
502+
StorageManager.clearAllStaticCaches();
503+
await rm(baseDir, { recursive: true, force: true });
504+
await rm(outsideDir, { recursive: true, force: true });
505+
}
506+
});
507+
359508
test("QMD-disabled deployment: disk-scan collector returns all categories", async () => {
360509
const baseDir = await mkdtemp(path.join(os.tmpdir(), "engram-1497-qmd-off-"));
361510
// entitySchemas is the only other constructor arg; QMD is never constructed by

packages/remnic-core/src/storage.ts

Lines changed: 51 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,11 @@
1-
import { access, readdir, readFile, stat, writeFile, mkdir, unlink, rename, appendFile, open } from "node:fs/promises";
2-
import { appendFileSync, createReadStream, mkdirSync, readFileSync, statSync } from "node:fs";
1+
import { access, lstat, readdir, readFile, realpath, stat, writeFile, mkdir, unlink, rename, appendFile, open } from "node:fs/promises";
2+
import { appendFileSync, createReadStream, mkdirSync, readFileSync, statSync, type Dirent } from "node:fs";
33
import { createHash } from "node:crypto";
44
import path from "node:path";
55
import { log } from "./logger.js";
66
import { isErrnoCode } from "./utils/errno.js";
77
import { RECALL_FALLBACK_DIRS, getCategoryDir, categoryDirName } from "./utils/category-dir.js";
8+
import { assertPathInsideRoot } from "./utils/path-containment.js";
89
import { getCachedEntities, invalidateAllForDir, setCachedEntities } from "./memory-cache.js";
910
import { rotateMarkdownFileToArchive } from "./hygiene.js";
1011
import { sanitizeMemoryContent } from "./sanitize.js";
@@ -4082,23 +4083,59 @@ export class StorageManager {
40824083
private async collectActiveMemoryPaths(): Promise<string[]> {
40834084
const filePaths: string[] = [];
40844085

4086+
// Resolve the memory root once for containment checks below. A category dir
4087+
// symlinked outside memoryDir (e.g. decisions/ -> an external dir) must NOT
4088+
// pull out-of-store files into the QMD-unavailable recall fallback (info
4089+
// leak). Same walker-hardening pattern as document-scanner.ts / cli.ts /
4090+
// consolidation-provenance-check.ts; reuses the shared containment helper.
4091+
let memoryRootReal: string;
4092+
try {
4093+
memoryRootReal = await realpath(this.baseDir);
4094+
} catch {
4095+
return filePaths;
4096+
}
4097+
40854098
const collectPaths = async (dir: string) => {
4099+
// Directory-level guard, isolated from per-entry handling: skip symlinked
4100+
// or non-directory category dirs and assert the resolved dir stays inside
4101+
// the memory root before reading. A failure here means the whole subtree
4102+
// does not exist or escaped the store — fail closed by skipping it.
4103+
let entries: Dirent[];
40864104
try {
4087-
const entries = await readdir(dir, { withFileTypes: true });
4088-
const subdirs: string[] = [];
4089-
for (const entry of entries) {
4090-
const fullPath = path.join(dir, entry.name);
4091-
if (entry.isDirectory()) {
4092-
subdirs.push(fullPath);
4093-
} else if (entry.name.endsWith(".md")) {
4105+
const dirStat = await lstat(dir);
4106+
if (dirStat.isSymbolicLink() || !dirStat.isDirectory()) return;
4107+
assertPathInsideRoot(memoryRootReal, await realpath(dir), dir);
4108+
entries = await readdir(dir, { withFileTypes: true });
4109+
} catch {
4110+
return;
4111+
}
4112+
4113+
const subdirs: string[] = [];
4114+
for (const entry of entries) {
4115+
// Never follow symlinked entries out of the store.
4116+
if (entry.isSymbolicLink()) continue;
4117+
const fullPath = path.join(dir, entry.name);
4118+
if (entry.isDirectory()) {
4119+
subdirs.push(fullPath);
4120+
} else if (entry.name.endsWith(".md")) {
4121+
// Isolate per-entry failures in their own try/catch: a containment or
4122+
// realpath failure on ONE .md entry must not drop sibling files or,
4123+
// crucially, the deferred subdir recursion below (Cursor Bugbot:
4124+
// "Poisoned md skips sibling subdirs"). Mirrors the per-file try/catch
4125+
// in search/document-scanner.ts scanDir and
4126+
// consolidation-provenance-check.ts walkMarkdownFiles.
4127+
try {
4128+
assertPathInsideRoot(memoryRootReal, await realpath(fullPath), fullPath);
40944129
filePaths.push(fullPath);
4130+
} catch {
4131+
// Skip just this entry (symlink/containment/realpath failure).
40954132
}
40964133
}
4097-
for (const subdir of subdirs) {
4098-
await collectPaths(subdir);
4099-
}
4100-
} catch {
4101-
// Directory does not exist yet.
4134+
}
4135+
// Recurse into real subdirectories regardless of any single poisoned entry
4136+
// above, so valid nested in-store memories are never dropped.
4137+
for (const subdir of subdirs) {
4138+
await collectPaths(subdir);
41024139
}
41034140
};
41044141

scripts/ratchet-baseline.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
"packages/remnic-core/src/orchestrator.ts": 20050,
88
"packages/remnic-core/src/cli.ts": 9851,
99
"packages/remnic-core/src/access-service.ts": 8277,
10-
"packages/remnic-core/src/storage.ts": 7283,
10+
"packages/remnic-core/src/storage.ts": 7320,
1111
"packages/remnic-core/src/config.ts": 4548
1212
},
1313
"oversizedFileCount": 10,

0 commit comments

Comments
 (0)