Skip to content

Commit 5d9fec1

Browse files
committed
docs(audit): production audit ledger + TDL refresh
Add docs/PRODUCTION_AUDIT_2026-06.md: the full findings ledger (21 confirmed, 0 refuted) with method, severity, evidence, fixes (with commit refs), and the deferred items + their rationale and recommendations. Register it in the docs curation test. Refresh TDL: the June feature batch moves to Shipped, and the audit's deferred items become the Next Feature Explorations.
1 parent 9529c00 commit 5d9fec1

3 files changed

Lines changed: 257 additions & 20 deletions

File tree

docs/PRODUCTION_AUDIT_2026-06.md

Lines changed: 236 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,236 @@
1+
# Production Audit — June 2026
2+
3+
Whole-stack adversarial audit of the NULLA local-first runtime, run after the
4+
June 2026 feature batch landed (embedder unification, local WorkProof reward,
5+
entity-graph recall, two-NULLA handshake, Web0 builder intent set, knowledge
6+
marketplace buy side).
7+
8+
## Method
9+
10+
Four dimension finders ran in parallel at high reasoning effort — **security**,
11+
**correctness**, **claims/docs**, **test-coverage** — each scoped to the freshly
12+
changed code and the money paths. Every raw finding then went to an independent
13+
**skeptic verifier** that opened the cited code and tried to refute it; only
14+
findings backed by concrete code and a concrete trigger were kept. Severities
15+
were re-graded by the verifier (for example, the mesh findings were lowered from
16+
P1 to P2 because that path is not wired into any live entry point today).
17+
18+
- Raw findings: **21**
19+
- Confirmed after adversarial verification: **21** (0 refuted)
20+
- P0: 0 · P1: 2 · P2: 5 · P3: 14
21+
22+
Fixes land on branch `fix/production-audit-2026-06`: code + tests in `a4e6f15`,
23+
the documentation language sweep in `22a00f0`.
24+
25+
## Test posture
26+
27+
Full suite: **2431 passing** (excludes one machine-local llama.cpp acceptance
28+
test that depends on a local draft GGUF; CI skips it too). The audit added **14**
29+
regression tests pinning each behavioural fix.
30+
31+
---
32+
33+
## Confirmed findings
34+
35+
| # | Dim | Sev | Title | Status |
36+
|---|-----|-----|-------|--------|
37+
| 1 | security/coverage | P1 | Marketplace concurrent buys double-debit the buyer | **Fixed** |
38+
| 2 | claims | P1 | Banned-word framing in public LAUNCH_TECH_BRIEF | **Fixed** |
39+
| 3 | security | P2 | Mesh `accept_bid` escrow fails open (wrong import, swallowed) | **Fixed** |
40+
| 4 | security | P2 | Mesh `TaskBid` signature never verified | **Fixed** |
41+
| 5 | security | P2 | `web0.publish` opt-in read from model arguments | **Fixed** |
42+
| 6 | security | P2 | Gate endpoint uses replayable v1 challenge; CORS open | **Deferred** |
43+
| 7 | claims | P2 | Banned-word in STATUS / README / INSTALL / proof docs | **Fixed** |
44+
| 8 | security | P3 | `task_completion` self-credit gated by a self-computable hash | **Deferred** |
45+
| 9 | security | P3 | Knowledge purchase burned the buyer but never paid the seller | **Fixed** |
46+
| 10 | correctness | P3 | BM25 keyword leg scores 0 on a single match | **Fixed** |
47+
| 11 | correctness | P3 | `record_purchase` rating average divided by purchase count | **Fixed** |
48+
| 12 | correctness | P3 | Mixed embedding dimensions in one table (fragile) | **Deferred** |
49+
| 13 | claims | P3 | Banned-word in PLATFORM_REFACTOR_PLAN | **Fixed** |
50+
| 14 | claims | P3 | Banned-word in INSTALL_PROVIDER_EXECUTION_PLAN | **Fixed** |
51+
| 15 | claims | P3 | Banned-word in TDL / soak / preflight / stabilization | **Fixed** |
52+
| 16 | claims | P3 | Code-derived metric identifiers embedding the banned word | **Deferred** |
53+
| 17 | coverage | P3 | Free-listing (price 0) purchase path untested | **Fixed** |
54+
| 18 | coverage | P3 | Concurrent-purchase debit-once untested | **Fixed** |
55+
| 19 | coverage | P3 | `record_purchase` rating average untested | **Fixed** |
56+
| 20 | coverage | P3 | `usdc_to_atomic` tie + `_allocate_pool_atomic` shares>pool untested | **Fixed** |
57+
| 21 | coverage | P3 | Embedding cosine zero-vector / shorter-vector untested | **Fixed** |
58+
59+
---
60+
61+
## Fixed
62+
63+
### P1 — Marketplace concurrent double-debit (and seller never paid)
64+
65+
`purchase_knowledge` guarded idempotency with a `has_entitlement` read, then
66+
burned the price under a fresh per-call UUID receipt. Two concurrent buys of the
67+
same `(buyer, shard)` both passed the entitlement check and both burned — with
68+
distinct receipts the ledger replay guard did not collapse them, so the buyer
69+
was charged twice while only one entitlement was granted. Separately, the price
70+
was *burned* rather than transferred, so the seller was never paid.
71+
72+
**Fix** (`core/knowledge_marketplace.py`): the price now moves buyer → seller in
73+
one atomic `transfer_credits` call, keyed on a **deterministic** `(buyer, shard)`
74+
receipt. A second concurrent buy hits the ledger replay guard and collapses to a
75+
single debit; the post-transfer path re-checks the entitlement and the new
76+
`credit_ledger.receipt_exists` helper to return an idempotent `already_purchased`
77+
without a second charge. A free (price 0) listing unlocks with no ledger
78+
movement. Regression: `test_concurrent_purchase_same_shard_debits_once`,
79+
`test_purchase_actually_pays_the_seller`, `test_free_listing_unlocks_without_charge`.
80+
81+
### P2 — Mesh `accept_bid` escrow failed open
82+
83+
`accept_bid` imported `CreditLedger` from `core.credit_ledger` — a module that
84+
defines no such class — so the import raised on every call, the bare `except`
85+
logged a warning, and assignment proceeded with no funds hold. The real control
86+
(`CreditLedger.spend` raises on insufficient balance) never ran.
87+
88+
**Fix** (`core/mesh/task_router.py`): import the real
89+
`core.mesh.credit_ledger.CreditLedger` and **fail closed** — on any escrow
90+
failure return `{"assigned": False, "reason": "escrow_failed"}` and skip the
91+
challenge and the winner notification. A genuinely free (0-credit) bid skips the
92+
hold. Regression: `test_accept_bid_fails_closed_when_escrow_cannot_be_funded`,
93+
plus the existing challenge test now funds the poster first.
94+
95+
### P2 — Mesh `TaskBid` signature never verified
96+
97+
`_solicit_bid` validated only field presence and the task id; the `signature`
98+
field and `canonical_payload()` existed but were never checked, so a peer (or an
99+
on-path rewrite of the unauthenticated HTTP response) could forge `bidder_node_id`
100+
and `credits_requested`.
101+
102+
**Fix**: `_solicit_bid` now verifies the ed25519 signature over the canonical
103+
payload against the claimed `bidder_node_id` and drops the bid on failure, before
104+
selection or escrow. Regression: `test_solicit_bid_rejects_an_unsigned_or_forged_bid`.
105+
106+
### P2 — `web0.publish` opt-in read from model arguments
107+
108+
The publish handler read `allow_network_publish` from the model-supplied
109+
`arguments`, so a model that could emit the intent could also set its own opt-in;
110+
only the wallet came from the trusted `source_context`.
111+
112+
**Fix** (`core/runtime_execution_tools.py`, `core/runtime_tool_contracts.py`):
113+
both gates — the opt-in and the wallet — now come from `source_context`. The
114+
opt-in was removed from the model-facing input schema. A request alone is never
115+
sufficient to publish. Regression: `test_web0_publish_ignores_model_supplied_optin`.
116+
117+
### P3 — Knowledge purchase now pays the seller
118+
119+
Folded into the P1 fix: `transfer_credits` credits the seller atomically instead
120+
of burning the price.
121+
122+
### P3 — BM25 keyword leg scored 0 on a single match
123+
124+
When exactly one node matched the FTS query, the score span collapsed and the
125+
sole match received 0.0 instead of the full keyword boost. **Fix**
126+
(`core/nulla_memory.py`): a single match (or all-equal ranks) now gets 1.0 — any
127+
MATCH row is a genuine keyword hit. Regression:
128+
`test_bm25_single_match_scores_full_not_zero`.
129+
130+
### P3 — Rating average divided by purchase count
131+
132+
`record_purchase` computed the running rating mean using `purchase_count`, so
133+
unrated purchases diluted each rating. **Fix**: a dedicated `rating_count` column
134+
(with a defensive `ALTER TABLE` for existing databases) now drives the mean.
135+
Regression: `test_record_purchase_rating_average_is_over_ratings_not_purchases`.
136+
137+
### Claims — Banned-word language sweep (P1 / P2 / P3)
138+
139+
The banned candor qualifier (the adjective and its adverb form) appeared in
140+
prose across the active git-tracked docs, including the public-facing launch
141+
brief. **Fix** (`22a00f0`): 56 prose occurrences across 20 files were rewritten
142+
to neutral, accurate wording (accurate / precise / sound / clean / open) that
143+
preserves meaning. Verified: zero prose occurrences remain in the active set.
144+
145+
### Coverage — added regression tests
146+
147+
Free-listing purchase, concurrent debit-once, rating average, `usdc_to_atomic`
148+
banker's-rounding tie, `_allocate_pool_atomic` shares > pool, and embedding
149+
cosine zero-vector / shorter-vector edge cases are now pinned by tests
150+
(`tests/test_audit_fix_regressions.py` and the marketplace/mesh suites).
151+
152+
---
153+
154+
## Deferred (with rationale and recommendation)
155+
156+
These are genuine findings whose correct fix needs a coordinated or higher-risk
157+
change that does not belong in an audit-remediation pass. Each is recorded here
158+
so it is tracked, not lost.
159+
160+
### P2 — Gate endpoint replayable v1 challenge + open CORS
161+
162+
The shipping `/gate/unlock` route uses the static v1 challenge; the hardened
163+
`GateChallengeStore` (single-use, TTL-bound v2 nonces) is implemented but not
164+
wired in, and the gate CORS allows any origin. A captured unlock tuple can be
165+
replayed to unlock the same gated block.
166+
167+
**Why deferred:** wiring the store to *require* v2 server-side would break the
168+
live browser-extension and portal clients (a separate repo) that still send v1,
169+
and the open CORS supports gated blocks embedded on arbitrary Arweave-served
170+
domains. This needs a coordinated client + server upgrade, not a one-sided
171+
change.
172+
173+
**Recommendation:** ship a `/gate/challenge` issue endpoint plus the client
174+
change that requests, signs, and consumes a v2 nonce in the same release; scope
175+
the gate CORS to the portal origin in that change.
176+
177+
### P3 — `task_completion` self-credit proof is self-computable
178+
179+
The local self-award verifies a SHA-256 receipt that the issuer can recompute;
180+
the award is bounded to the local peer, rate-limited to 30 per 60s, and
181+
idempotent per receipt.
182+
183+
**Why deferred:** the abuse ceiling is bounded self-minting to the local node's
184+
own balance with no cross-peer redemption. The correct fix — ed25519-signing the
185+
work receipt and verifying it — ripples through `Web0WorkReceipt`, `ProofReceipt`,
186+
`nullpass`, and the CLI receipt surface.
187+
188+
**Recommendation:** sign the work receipt (mirroring `task_capsule` /
189+
`two_nulla_handshake`) in a focused change; lower the per-window cap if these
190+
credits ever become broadly transferable.
191+
192+
### P3 — Mixed embedding dimensions in one table
193+
194+
The 384-dim chat embedder and the 64-dim fact embedder can in principle write
195+
vectors of different lengths into one `memory_nodes` table.
196+
197+
**Why deferred:** no live path co-mingles them — they run on different `agent_id`
198+
partitions — and the shared cosine helper already projects mismatched lengths and
199+
logs once instead of crashing or silently zeroing. The fix (pin the dimension per
200+
agent and reject mismatches, or route both writers through the canonical `embed()`)
201+
changes stored vectors and is best done as a dedicated migration.
202+
203+
**Recommendation:** record the embedding dimension per agent/database and reject
204+
out-of-dimension writes; migrate existing rows in the same pass.
205+
206+
### P3 — Code-derived metric identifiers that embed the banned word
207+
208+
The acceptance and proof docs reference snake_case latency/scenario metric
209+
identifiers whose suffix embeds the banned word (the `offline_*`, `failure_*`,
210+
`freshness_*`, `ultra_fresh_*`, `empty_lookup_*` gate names), plus the historical
211+
alpha-hardening branch name that also embeds it.
212+
213+
**Why deferred:** these are machine identifiers tied to code, JSON artifact
214+
filenames, and the acceptance harness; renaming them in the docs alone would
215+
desync the docs from the code.
216+
217+
**Recommendation:** rename the identifiers in code, tests, and docs together
218+
(suffix them `_accuracy_gate` instead of the banned suffix), then update the
219+
proof docs to match.
220+
221+
### Left untouched by design
222+
223+
- `docs/archive/**` — historical handovers and audit records; rewriting them would
224+
alter the record of what was written at the time.
225+
- A vendored third-party model card under `data/trainable_models/` — not project
226+
prose.
227+
228+
---
229+
230+
## Areas confirmed clean
231+
232+
The verifier explicitly cleared the production swarm/HIVE dispatch money path
233+
(`escrow_credits_for_task` — balance-gated, atomic, fail-closed), the
234+
commit/reveal WorkProof reward gate in `submit_result`, the AES-GCM content
235+
encryption (per-block key, nonce, AAD binding), and the RPC allowlist (no banned
236+
`api.mainnet-beta` URL in shipping code).

0 commit comments

Comments
 (0)