|
| 1 | +# Production Audit — June 2026 |
| 2 | + |
| 3 | +Whole-stack adversarial audit of the NULLA local-first runtime, run after the |
| 4 | +June 2026 feature batch landed (embedder unification, local WorkProof reward, |
| 5 | +entity-graph recall, two-NULLA handshake, Web0 builder intent set, knowledge |
| 6 | +marketplace buy side). |
| 7 | + |
| 8 | +## Method |
| 9 | + |
| 10 | +Four dimension finders ran in parallel at high reasoning effort — **security**, |
| 11 | +**correctness**, **claims/docs**, **test-coverage** — each scoped to the freshly |
| 12 | +changed code and the money paths. Every raw finding then went to an independent |
| 13 | +**skeptic verifier** that opened the cited code and tried to refute it; only |
| 14 | +findings backed by concrete code and a concrete trigger were kept. Severities |
| 15 | +were re-graded by the verifier (for example, the mesh findings were lowered from |
| 16 | +P1 to P2 because that path is not wired into any live entry point today). |
| 17 | + |
| 18 | +- Raw findings: **21** |
| 19 | +- Confirmed after adversarial verification: **21** (0 refuted) |
| 20 | +- P0: 0 · P1: 2 · P2: 5 · P3: 14 |
| 21 | + |
| 22 | +Fixes land on branch `fix/production-audit-2026-06`: code + tests in `a4e6f15`, |
| 23 | +the documentation language sweep in `22a00f0`. |
| 24 | + |
| 25 | +## Test posture |
| 26 | + |
| 27 | +Full suite: **2431 passing** (excludes one machine-local llama.cpp acceptance |
| 28 | +test that depends on a local draft GGUF; CI skips it too). The audit added **14** |
| 29 | +regression tests pinning each behavioural fix. |
| 30 | + |
| 31 | +--- |
| 32 | + |
| 33 | +## Confirmed findings |
| 34 | + |
| 35 | +| # | Dim | Sev | Title | Status | |
| 36 | +|---|-----|-----|-------|--------| |
| 37 | +| 1 | security/coverage | P1 | Marketplace concurrent buys double-debit the buyer | **Fixed** | |
| 38 | +| 2 | claims | P1 | Banned-word framing in public LAUNCH_TECH_BRIEF | **Fixed** | |
| 39 | +| 3 | security | P2 | Mesh `accept_bid` escrow fails open (wrong import, swallowed) | **Fixed** | |
| 40 | +| 4 | security | P2 | Mesh `TaskBid` signature never verified | **Fixed** | |
| 41 | +| 5 | security | P2 | `web0.publish` opt-in read from model arguments | **Fixed** | |
| 42 | +| 6 | security | P2 | Gate endpoint uses replayable v1 challenge; CORS open | **Deferred** | |
| 43 | +| 7 | claims | P2 | Banned-word in STATUS / README / INSTALL / proof docs | **Fixed** | |
| 44 | +| 8 | security | P3 | `task_completion` self-credit gated by a self-computable hash | **Deferred** | |
| 45 | +| 9 | security | P3 | Knowledge purchase burned the buyer but never paid the seller | **Fixed** | |
| 46 | +| 10 | correctness | P3 | BM25 keyword leg scores 0 on a single match | **Fixed** | |
| 47 | +| 11 | correctness | P3 | `record_purchase` rating average divided by purchase count | **Fixed** | |
| 48 | +| 12 | correctness | P3 | Mixed embedding dimensions in one table (fragile) | **Deferred** | |
| 49 | +| 13 | claims | P3 | Banned-word in PLATFORM_REFACTOR_PLAN | **Fixed** | |
| 50 | +| 14 | claims | P3 | Banned-word in INSTALL_PROVIDER_EXECUTION_PLAN | **Fixed** | |
| 51 | +| 15 | claims | P3 | Banned-word in TDL / soak / preflight / stabilization | **Fixed** | |
| 52 | +| 16 | claims | P3 | Code-derived metric identifiers embedding the banned word | **Deferred** | |
| 53 | +| 17 | coverage | P3 | Free-listing (price 0) purchase path untested | **Fixed** | |
| 54 | +| 18 | coverage | P3 | Concurrent-purchase debit-once untested | **Fixed** | |
| 55 | +| 19 | coverage | P3 | `record_purchase` rating average untested | **Fixed** | |
| 56 | +| 20 | coverage | P3 | `usdc_to_atomic` tie + `_allocate_pool_atomic` shares>pool untested | **Fixed** | |
| 57 | +| 21 | coverage | P3 | Embedding cosine zero-vector / shorter-vector untested | **Fixed** | |
| 58 | + |
| 59 | +--- |
| 60 | + |
| 61 | +## Fixed |
| 62 | + |
| 63 | +### P1 — Marketplace concurrent double-debit (and seller never paid) |
| 64 | + |
| 65 | +`purchase_knowledge` guarded idempotency with a `has_entitlement` read, then |
| 66 | +burned the price under a fresh per-call UUID receipt. Two concurrent buys of the |
| 67 | +same `(buyer, shard)` both passed the entitlement check and both burned — with |
| 68 | +distinct receipts the ledger replay guard did not collapse them, so the buyer |
| 69 | +was charged twice while only one entitlement was granted. Separately, the price |
| 70 | +was *burned* rather than transferred, so the seller was never paid. |
| 71 | + |
| 72 | +**Fix** (`core/knowledge_marketplace.py`): the price now moves buyer → seller in |
| 73 | +one atomic `transfer_credits` call, keyed on a **deterministic** `(buyer, shard)` |
| 74 | +receipt. A second concurrent buy hits the ledger replay guard and collapses to a |
| 75 | +single debit; the post-transfer path re-checks the entitlement and the new |
| 76 | +`credit_ledger.receipt_exists` helper to return an idempotent `already_purchased` |
| 77 | +without a second charge. A free (price 0) listing unlocks with no ledger |
| 78 | +movement. Regression: `test_concurrent_purchase_same_shard_debits_once`, |
| 79 | +`test_purchase_actually_pays_the_seller`, `test_free_listing_unlocks_without_charge`. |
| 80 | + |
| 81 | +### P2 — Mesh `accept_bid` escrow failed open |
| 82 | + |
| 83 | +`accept_bid` imported `CreditLedger` from `core.credit_ledger` — a module that |
| 84 | +defines no such class — so the import raised on every call, the bare `except` |
| 85 | +logged a warning, and assignment proceeded with no funds hold. The real control |
| 86 | +(`CreditLedger.spend` raises on insufficient balance) never ran. |
| 87 | + |
| 88 | +**Fix** (`core/mesh/task_router.py`): import the real |
| 89 | +`core.mesh.credit_ledger.CreditLedger` and **fail closed** — on any escrow |
| 90 | +failure return `{"assigned": False, "reason": "escrow_failed"}` and skip the |
| 91 | +challenge and the winner notification. A genuinely free (0-credit) bid skips the |
| 92 | +hold. Regression: `test_accept_bid_fails_closed_when_escrow_cannot_be_funded`, |
| 93 | +plus the existing challenge test now funds the poster first. |
| 94 | + |
| 95 | +### P2 — Mesh `TaskBid` signature never verified |
| 96 | + |
| 97 | +`_solicit_bid` validated only field presence and the task id; the `signature` |
| 98 | +field and `canonical_payload()` existed but were never checked, so a peer (or an |
| 99 | +on-path rewrite of the unauthenticated HTTP response) could forge `bidder_node_id` |
| 100 | +and `credits_requested`. |
| 101 | + |
| 102 | +**Fix**: `_solicit_bid` now verifies the ed25519 signature over the canonical |
| 103 | +payload against the claimed `bidder_node_id` and drops the bid on failure, before |
| 104 | +selection or escrow. Regression: `test_solicit_bid_rejects_an_unsigned_or_forged_bid`. |
| 105 | + |
| 106 | +### P2 — `web0.publish` opt-in read from model arguments |
| 107 | + |
| 108 | +The publish handler read `allow_network_publish` from the model-supplied |
| 109 | +`arguments`, so a model that could emit the intent could also set its own opt-in; |
| 110 | +only the wallet came from the trusted `source_context`. |
| 111 | + |
| 112 | +**Fix** (`core/runtime_execution_tools.py`, `core/runtime_tool_contracts.py`): |
| 113 | +both gates — the opt-in and the wallet — now come from `source_context`. The |
| 114 | +opt-in was removed from the model-facing input schema. A request alone is never |
| 115 | +sufficient to publish. Regression: `test_web0_publish_ignores_model_supplied_optin`. |
| 116 | + |
| 117 | +### P3 — Knowledge purchase now pays the seller |
| 118 | + |
| 119 | +Folded into the P1 fix: `transfer_credits` credits the seller atomically instead |
| 120 | +of burning the price. |
| 121 | + |
| 122 | +### P3 — BM25 keyword leg scored 0 on a single match |
| 123 | + |
| 124 | +When exactly one node matched the FTS query, the score span collapsed and the |
| 125 | +sole match received 0.0 instead of the full keyword boost. **Fix** |
| 126 | +(`core/nulla_memory.py`): a single match (or all-equal ranks) now gets 1.0 — any |
| 127 | +MATCH row is a genuine keyword hit. Regression: |
| 128 | +`test_bm25_single_match_scores_full_not_zero`. |
| 129 | + |
| 130 | +### P3 — Rating average divided by purchase count |
| 131 | + |
| 132 | +`record_purchase` computed the running rating mean using `purchase_count`, so |
| 133 | +unrated purchases diluted each rating. **Fix**: a dedicated `rating_count` column |
| 134 | +(with a defensive `ALTER TABLE` for existing databases) now drives the mean. |
| 135 | +Regression: `test_record_purchase_rating_average_is_over_ratings_not_purchases`. |
| 136 | + |
| 137 | +### Claims — Banned-word language sweep (P1 / P2 / P3) |
| 138 | + |
| 139 | +The banned candor qualifier (the adjective and its adverb form) appeared in |
| 140 | +prose across the active git-tracked docs, including the public-facing launch |
| 141 | +brief. **Fix** (`22a00f0`): 56 prose occurrences across 20 files were rewritten |
| 142 | +to neutral, accurate wording (accurate / precise / sound / clean / open) that |
| 143 | +preserves meaning. Verified: zero prose occurrences remain in the active set. |
| 144 | + |
| 145 | +### Coverage — added regression tests |
| 146 | + |
| 147 | +Free-listing purchase, concurrent debit-once, rating average, `usdc_to_atomic` |
| 148 | +banker's-rounding tie, `_allocate_pool_atomic` shares > pool, and embedding |
| 149 | +cosine zero-vector / shorter-vector edge cases are now pinned by tests |
| 150 | +(`tests/test_audit_fix_regressions.py` and the marketplace/mesh suites). |
| 151 | + |
| 152 | +--- |
| 153 | + |
| 154 | +## Deferred (with rationale and recommendation) |
| 155 | + |
| 156 | +These are genuine findings whose correct fix needs a coordinated or higher-risk |
| 157 | +change that does not belong in an audit-remediation pass. Each is recorded here |
| 158 | +so it is tracked, not lost. |
| 159 | + |
| 160 | +### P2 — Gate endpoint replayable v1 challenge + open CORS |
| 161 | + |
| 162 | +The shipping `/gate/unlock` route uses the static v1 challenge; the hardened |
| 163 | +`GateChallengeStore` (single-use, TTL-bound v2 nonces) is implemented but not |
| 164 | +wired in, and the gate CORS allows any origin. A captured unlock tuple can be |
| 165 | +replayed to unlock the same gated block. |
| 166 | + |
| 167 | +**Why deferred:** wiring the store to *require* v2 server-side would break the |
| 168 | +live browser-extension and portal clients (a separate repo) that still send v1, |
| 169 | +and the open CORS supports gated blocks embedded on arbitrary Arweave-served |
| 170 | +domains. This needs a coordinated client + server upgrade, not a one-sided |
| 171 | +change. |
| 172 | + |
| 173 | +**Recommendation:** ship a `/gate/challenge` issue endpoint plus the client |
| 174 | +change that requests, signs, and consumes a v2 nonce in the same release; scope |
| 175 | +the gate CORS to the portal origin in that change. |
| 176 | + |
| 177 | +### P3 — `task_completion` self-credit proof is self-computable |
| 178 | + |
| 179 | +The local self-award verifies a SHA-256 receipt that the issuer can recompute; |
| 180 | +the award is bounded to the local peer, rate-limited to 30 per 60s, and |
| 181 | +idempotent per receipt. |
| 182 | + |
| 183 | +**Why deferred:** the abuse ceiling is bounded self-minting to the local node's |
| 184 | +own balance with no cross-peer redemption. The correct fix — ed25519-signing the |
| 185 | +work receipt and verifying it — ripples through `Web0WorkReceipt`, `ProofReceipt`, |
| 186 | +`nullpass`, and the CLI receipt surface. |
| 187 | + |
| 188 | +**Recommendation:** sign the work receipt (mirroring `task_capsule` / |
| 189 | +`two_nulla_handshake`) in a focused change; lower the per-window cap if these |
| 190 | +credits ever become broadly transferable. |
| 191 | + |
| 192 | +### P3 — Mixed embedding dimensions in one table |
| 193 | + |
| 194 | +The 384-dim chat embedder and the 64-dim fact embedder can in principle write |
| 195 | +vectors of different lengths into one `memory_nodes` table. |
| 196 | + |
| 197 | +**Why deferred:** no live path co-mingles them — they run on different `agent_id` |
| 198 | +partitions — and the shared cosine helper already projects mismatched lengths and |
| 199 | +logs once instead of crashing or silently zeroing. The fix (pin the dimension per |
| 200 | +agent and reject mismatches, or route both writers through the canonical `embed()`) |
| 201 | +changes stored vectors and is best done as a dedicated migration. |
| 202 | + |
| 203 | +**Recommendation:** record the embedding dimension per agent/database and reject |
| 204 | +out-of-dimension writes; migrate existing rows in the same pass. |
| 205 | + |
| 206 | +### P3 — Code-derived metric identifiers that embed the banned word |
| 207 | + |
| 208 | +The acceptance and proof docs reference snake_case latency/scenario metric |
| 209 | +identifiers whose suffix embeds the banned word (the `offline_*`, `failure_*`, |
| 210 | +`freshness_*`, `ultra_fresh_*`, `empty_lookup_*` gate names), plus the historical |
| 211 | +alpha-hardening branch name that also embeds it. |
| 212 | + |
| 213 | +**Why deferred:** these are machine identifiers tied to code, JSON artifact |
| 214 | +filenames, and the acceptance harness; renaming them in the docs alone would |
| 215 | +desync the docs from the code. |
| 216 | + |
| 217 | +**Recommendation:** rename the identifiers in code, tests, and docs together |
| 218 | +(suffix them `_accuracy_gate` instead of the banned suffix), then update the |
| 219 | +proof docs to match. |
| 220 | + |
| 221 | +### Left untouched by design |
| 222 | + |
| 223 | +- `docs/archive/**` — historical handovers and audit records; rewriting them would |
| 224 | + alter the record of what was written at the time. |
| 225 | +- A vendored third-party model card under `data/trainable_models/` — not project |
| 226 | + prose. |
| 227 | + |
| 228 | +--- |
| 229 | + |
| 230 | +## Areas confirmed clean |
| 231 | + |
| 232 | +The verifier explicitly cleared the production swarm/HIVE dispatch money path |
| 233 | +(`escrow_credits_for_task` — balance-gated, atomic, fail-closed), the |
| 234 | +commit/reveal WorkProof reward gate in `submit_result`, the AES-GCM content |
| 235 | +encryption (per-block key, nonce, AAD binding), and the RPC allowlist (no banned |
| 236 | +`api.mainnet-beta` URL in shipping code). |
0 commit comments