Companion to test-result-matrix.md. The matrix records
what passes; this audit records where the model could lie, and what we do
about it. The governing rule for the 10k-developer board-farm mission:
No silent failure on a known-good IP block or routine. A block that cannot faithfully execute must fault honestly (a guest-visible error) — never return wrong-but-clean results.
When the model meets work it cannot compute faithfully, it has two acceptable outcomes: compute it correctly, or report an error the guest can see. The unacceptable third outcome — silently emitting wrong or zero data — is the bug class this audit exists to eliminate.
-global driver=arm.ethos-u,property=honest-fault,value=on makes an
uncomputable op (unknown command-stream opcode, unmodelled elementwise mode)
set the run's failure flag, so completion reports STATUS.CMD_PARSE_ERROR
instead of silently zero-filling the OFM. Default off preserves every passing
model byte-for-byte. qtests unsupported-lenient (default, clean) +
unsupported-honest-fault (error bit set, IRQ still raised — never wedges).
Commit 98bc4dee869.
Key invariant: the honest fault is a non-gating status — the IRQ is still raised and the inference still completes; it just carries the error. The engine never hangs on an unsupported op.
How an NXP remote accelerator can be faulted honestly depends on what gates completion in its Linux driver:
| Style | Completion gates on | Honest fault lives in |
|---|---|---|
| Neutron (i.MX95) | a polled retcode (MBOX0 == DONE) — a bad retcode hangs | a side-band field (MBOX1 error_code) that does not gate |
| Ethos-U (i.MX93) | response arrival (INFERENCE_RSP rpmsg; ethosu_inference_rsp() maps rsp->status then unconditionally wakes) |
the response's own status field — in-band but non-gating |
Invariant either way: the honest fault is a non-gating status channel, never
the completion signal. (Recorded jointly with i.MX95's docs/validation/.)
These are known fidelity boundaries. Each is documented and visible, which is the point — a caveat in the open is the opposite of a silent fail.
| Area | Boundary | Disposition |
|---|---|---|
| M33 concurrent boot | Booting the M33 while the desktop guest runs can wedge the guest | Traced to two NXP BSP defects, documented in tests/torture/; not a model bug |
| PXP G2D | scale + CSC not modelled |
Blocked by the libg2d / pxp_dma_v3 vendor stack (driver rejects the op), not a model gap — copy/fill/blit/blend/rotate are byte-exact |
| NPU mlw decoder | hw/npu/mlw is Apache-2.0 |
Upstream-licensing blocker for the NPU only; scopes NPU to bring-up for upstream, does not block the core machine |
| NPU nasnet | ±1-rounding accumulation, occasional argmax flip on the deepest degenerate-int8 model | Isolation-blocked and capped; every real trained workload tested is bit-exact, every micro-op passes ≤±1 in isolation |
| ELE soc_id (cross-fleet note) | i.MX91's ELE reports the 93's soc_id (chop-down artifact) |
i.MX91's item; noted here only because 93 is the derivation parent |
- Done: NPU uncomputable-op path converted from silent no-op to opt-in honest fault.
- Convention: new device models should prefer an honest guest-visible error
(bus error / status bit /
-d unimptrace) over a silent no-op when they meet an unsupported-but-known-good routine. - Open: sweep the other modelled blocks for silent no-op defaults on
known-good paths (the same class as the NPU
default: v=0), prioritised by Tier-A blocks (where a data path is claimed and a silent wrong would be worst).