Skip to content

Commit 43648d3

Browse files
Merge pull request #17 from BayyinahEnterprise/v0.9.4-cleanup-score-validity-docs-friction-gate
v0.9.4: Score-validity ADVISORY + cleanup + NeuroGolf docs + ruff friction + CLI integration gate
2 parents 771952c + c8ec5d6 commit 43648d3

25 files changed

Lines changed: 1906 additions & 93 deletions

.github/workflows/ci.yml

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -179,3 +179,35 @@ jobs:
179179
180180
- name: Run tests
181181
run: python -m pytest -q
182+
183+
test-onnx-runtime:
184+
# The [onnx-runtime] + [onnx-profile] install path (v0.9.4
185+
# Part 5b(c)). Runs the full pytest -q suite so the
186+
# CHANGELOG-math gate runs at full strength in CI (round-34
187+
# HIGH-2b closure: pre-v0.9.4 the gate's onnxruntime-skip
188+
# workaround meant the gate never ran with full extras in
189+
# CI; arithmetic drift in the CHANGELOG ### Tests line was
190+
# caught only on developer machines).
191+
#
192+
# Single Python version (3.12) sufficient for substrate
193+
# coverage; the matrix's other jobs cover py 3.10-3.13
194+
# cross-cutting concerns. Both [onnx-runtime] (v0.9.3
195+
# numpy_divergence) and [onnx-profile] (v0.9.4 score_validity)
196+
# are installed so all five ONNX checkers run end-to-end.
197+
runs-on: ubuntu-latest
198+
steps:
199+
- uses: actions/checkout@v4
200+
201+
- name: Set up Python
202+
uses: actions/setup-python@v5
203+
with:
204+
python-version: "3.12"
205+
206+
- name: Install dependencies
207+
run: |
208+
python -m pip install --upgrade pip
209+
pip install "git+https://github.com/BayyinahEnterprise/furqan-programming-language.git@v0.11.1"
210+
pip install -e ".[dev,onnx,onnx-runtime,onnx-profile]"
211+
212+
- name: Run tests
213+
run: python -m pytest -q

CHANGELOG.md

Lines changed: 221 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,227 @@ introduced this convention.
1919

2020
---
2121

22+
## [0.9.4] - 2026-05-05
23+
24+
Five-part-plus-carry-forward release. Closes Gap 2 (MEDIUM)
25+
score-validity advisory; closes round-30 META v0.9.0-pinned
26+
scaffold (in its fifth release cycle); closes round-34
27+
MEDIUM-1 (NeuroGolf canonical adapter docs), LOW-1 (ruff
28+
version-pin friction), and HIGH-1's bug class via Part 5's
29+
structural CLI-integration gate; closes the three v0.9.3.1
30+
carry-forwards via Part 5b (ONNX printer ``minimal_fix``
31+
consistency, ``_TESTS_LINE`` regex fragility, CI matrix
32+
``[onnx-runtime]`` gap).
33+
34+
### Fixed
35+
36+
- **Round-34 HIGH-1 bug class (HIGH).** Part 5 ships a new
37+
AST-based gate ``test_gate_cli_diagnostic_printer_includes_all_diagnostic_classes``
38+
that asserts the CLI's ``_check_onnx_file`` printer
39+
isinstance tuple is the union of every ``*Diagnostic``
40+
class defined in ``runner.py`` plus those exported via
41+
``furqan_lint.onnx_adapter.__all__``. The v0.9.3.1 hotfix
42+
fixed the bug instance for ``NumpyDivergenceDiagnostic``;
43+
v0.9.4's gate prevents recurrence when ``ScoreValidityDiagnostic``
44+
joins the family in this same release.
45+
- **Round-34 v0.9.3.1 carry-forward (Part 5b(a), MEDIUM):**
46+
``AllPathsEmitDiagnostic`` and ``OpsetComplianceDiagnostic``
47+
gain required ``minimal_fix`` fields; the v0.9.0 substrate
48+
shipped without them, leaving the ONNX printer inconsistent
49+
with the Python/Rust/Go marad printers from v0.9.0 through
50+
v0.9.3.1. The CLI's ONNX printer extends to print
51+
``minimal_fix`` for every ONNX diagnostic family matching
52+
the Python/Rust/Go pattern; Part 5's gate is extended to
53+
AST-scan for the ``getattr(d, 'minimal_fix', None)`` block
54+
presence so a future refactor cannot reintroduce the
55+
inconsistency.
56+
- **Round-34 HIGH-2a (Part 5b(b), MEDIUM):** the
57+
``_TESTS_LINE`` regex in
58+
``tests/test_changelog_math_gate.py`` required the literal
59+
``"Net delta:"`` on one line. v0.9.3's CHANGELOG body
60+
wrapped as ``"Net\\ndelta: +23."`` and the regex returned
61+
``None``, silently no-opping the gate. v0.9.4 changes
62+
``"Net delta:"`` to ``"Net\\s+delta:"`` so any whitespace
63+
works; a regression test pins the wrap case.
64+
- **Round-34 HIGH-2b (Part 5b(c), MEDIUM):** existing CI
65+
matrix had no ``[onnx-runtime]`` job, so the
66+
CHANGELOG-math gate's onnxruntime-skip workaround (added
67+
in v0.9.3.1's CI hotfix) meant the gate never ran at full
68+
strength in CI. v0.9.4 adds a ``test-onnx-runtime`` job
69+
installing ``[dev,onnx,onnx-runtime,onnx-profile]`` on
70+
Python 3.12; the gate runs without the skip in this job.
71+
- **Round-34 LOW-1 (Part 4, LOW):** ``test_ruff_check`` and
72+
``test_ruff_format_check`` skip with an actionable
73+
``"ruff version mismatch: pyproject pins X, installed is Y;
74+
run pip install ruff==X to match"`` message when the
75+
contributor's installed ruff differs from the pyproject
76+
``[dev]`` pin.
77+
- **CLI ADVISORY/MARAD split (Part 2 / Decision 3):** the
78+
ONNX printer partitions findings by ``severity`` field and
79+
prints with ``[MARAD]`` or ``[ADVISORY]`` prefix. ADVISORY
80+
findings exit 0; only MARAD findings exit 1.
81+
- ``check_onnx_module`` now runs five checkers (was four);
82+
``score_validity`` is the fifth, appended to the diagnostic
83+
tag list per Part 2 / Decision 1.
84+
85+
### Added
86+
87+
- **Score-validity ADVISORY checker (Part 2 / Gap 2 closure /
88+
MARAD-3 closure from cont45).** New
89+
``src/furqan_lint/onnx_adapter/score_validity.py`` wraps
90+
``onnx_tool.model_profile()`` in
91+
``contextlib.redirect_stdout`` and exception-trapping; on
92+
any unhandled exception, yields one
93+
``ScoreValidityDiagnostic`` with ``severity="ADVISORY"``.
94+
Op-type extraction heuristic walks the exception traceback
95+
for the deepest frame whose ``self.op_type`` is a string;
96+
falls back to ``"<unknown>"``. Closes Gap 2 (MEDIUM) per
97+
Decision 9 of v0.9.2 and the round-32 NeuroGolf leverage
98+
analysis. Empirically verified: cont45 substrate (TopK
99+
without axis) fires; clean Relu profiles cleanly.
100+
- ``ScoreValidityDiagnostic`` dataclass (frozen, 6 required
101+
fields: ``op_type``, ``exception_class``,
102+
``exception_message``, ``severity``, ``diagnosis``,
103+
``minimal_fix``). The ``severity`` field is load-bearing
104+
for the CLI ADVISORY/MARAD split.
105+
- ``OnnxProfileExtrasNotInstalled`` typed-exception (subclass
106+
of ``ImportError``); mirrors
107+
``OnnxExtrasNotInstalled`` and
108+
``OnnxRuntimeExtrasNotInstalled``. Three-extra architecture
109+
symmetry: ``[onnx]`` graph-structure checks,
110+
``[onnx-runtime]`` inference-based divergence,
111+
``[onnx-profile]`` profile-validity advisory.
112+
- ``[onnx-profile]`` pip extra (Decision 5) listing
113+
``onnx>=1.14,<1.19`` and ``onnx-tool>=1.0,<2``.
114+
- ``score_validity_optin_extra`` four-place documented limit
115+
with full coverage (CHANGELOG / fixture / pinning test /
116+
README): the score-validity checker is opt-in via the
117+
``[onnx-profile]`` extra and silent-passes when the extra
118+
is missing.
119+
- **NeuroGolf canonical adapter README section (Part 3 /
120+
round-34 MEDIUM-1 closure).** New section "ONNX
121+
numpy_reference convention for NeuroGolf-shape models"
122+
documents two canonical patterns (Pattern A: pre-one-hot
123+
input; Pattern B: raw grid + local encoding) with code
124+
blocks and pointers to verified worked-example fixtures
125+
at ``tests/fixtures/onnx/numpy_reference_examples/``. The
126+
v0.9.3 four-place limit named the convention dependency
127+
but did not show the canonical adapter; v0.9.4 closes the
128+
adoption gap.
129+
- **AllPathsEmitDiagnostic.minimal_fix and
130+
OpsetComplianceDiagnostic.minimal_fix** required fields
131+
(Part 5b(a)). Pre-v0.9.4 these dataclasses had no
132+
``minimal_fix`` field, so the v0.9.0/v0.9.1/v0.9.2/v0.9.3.1
133+
ONNX printer had no field to print even if it had been
134+
trying to. v0.9.4 closes the inconsistency at both layers.
135+
- **CI ``test-onnx-runtime`` job (Part 5b(c)).** New job in
136+
``.github/workflows/ci.yml`` installs
137+
``[dev,onnx,onnx-runtime,onnx-profile]`` on Python 3.12
138+
and runs the full pytest suite. The CHANGELOG-math gate
139+
runs at full strength in this job; gates remain active in
140+
the other jobs.
141+
- **CLI integration AST gate (Part 5).**
142+
``test_gate_cli_diagnostic_printer_includes_all_diagnostic_classes``
143+
uses ``ast.parse`` to walk both ``runner.py`` (for
144+
``class *Diagnostic`` definitions) and ``__init__.py``
145+
(for ``__all__`` exports) plus ``cli.py`` (for the
146+
isinstance tuple inside ``_check_onnx_file``); asserts
147+
the printer's tuple is a superset. Plus a self-verifying
148+
negative test on a synthetic CLI source missing one
149+
family.
150+
- ``ONNX_ADAPTER_PUBLIC_SURFACE_v0_9_4`` baseline pins the
151+
new surface (15 + 3 = 18 names).
152+
- ``test_tests_line_regex_accepts_multiline_net_delta_wrap``
153+
regression test pinning the regex against the v0.9.3 wrap
154+
pattern that silently no-opped the gate.
155+
- Worked-example fixtures
156+
``onehot_input_example_build.py`` and
157+
``raw_grid_input_example_build.py`` documenting the two
158+
Pattern A / Pattern B numpy_reference adapters.
159+
160+
### Changed
161+
162+
- ``check_onnx_module(module, model_proto, model_path)`` now
163+
runs five checkers (was four).
164+
- ``cli._check_onnx_file``: imports ``ScoreValidityDiagnostic``
165+
from the source module (per the v0.9.3.1 / Part 5
166+
AST-gated source-module pattern); isinstance tuple extends
167+
from four families to five; ADVISORY/MARAD partition
168+
determines headline ("MARAD <path>", "ADVISORY <path>",
169+
or mixed-case summary line); each diagnostic line prints
170+
``minimal_fix`` matching the Python/Rust/Go marad pattern.
171+
- README install section adds the ``[onnx-profile]`` row
172+
and updates the combined-extras example to
173+
``[rust,go,onnx-runtime,onnx-profile]``. The "ONNX
174+
adapter (current as of v0.9.3)" header bumps to v0.9.4
175+
with the ``score_validity_optin_extra`` documented limit
176+
added at the top of the inventory.
177+
- ``tests/fixtures/onnx/documented_limits/README.md``:
178+
inventory bumps to v0.9.4 with the
179+
``score_validity_optin_extra`` entry first.
180+
- ``_TESTS_LINE`` regex: ``Net delta:`` to ``Net\\s+delta:``.
181+
182+
### Retired
183+
184+
- ``tests/test_onnx_gates.py::test_gate_changelog_math_v0_9_0``
185+
(Part 1 / round-30 META closure). The v0.9.0-pinned
186+
scaffold has been skip-only across v0.9.1, v0.9.2, v0.9.3,
187+
v0.9.3.1, and v0.9.4 release cycles; the canonical
188+
``test_changelog_math_matches_pytest_collect`` covers the
189+
same arithmetic check version-agnostically. Retiring the
190+
scaffold removes 1 test from the collected total. Round-30
191+
META was originally an INFORMATIONAL observation; the
192+
five-release escalation reflects cumulative cost of dead
193+
code in the test suite.
194+
195+
### Tests
196+
197+
Test count: 413 (v0.9.3.1 ship state) -> 441 (v0.9.4). Net
198+
delta: +28.
199+
200+
Breakdown:
201+
202+
- Part 1 retirement: -1 (test_gate_changelog_math_v0_9_0)
203+
- Part 2 score-validity (commit 3): +7 (4 firing/clean + 2
204+
op-type extraction + 1 stdout-capture)
205+
- Part 2 CLI ADVISORY + Part 5b(a) minimal_fix (commit 4):
206+
+9 (4 ADVISORY/MARAD split + 5 minimal_fix regression
207+
tests, one per ONNX diagnostic family)
208+
- Part 2 silent-pass + four-place + runner integration
209+
(commit 5): +3 (silent-pass pin + runner-integration
210+
alongside four v0.9.3 checkers + OnnxProfileExtrasNotInstalled
211+
exception type check)
212+
- Part 3 NeuroGolf docs (commit 6): +3 (Pattern A zero-
213+
divergence + Pattern B zero-divergence + README section
214+
presence)
215+
- Part 4 ruff friction (commit 7): +2 (helper-skips-on-mismatch
216+
+ helper-does-not-skip-when-versions-match)
217+
- Part 5 + Part 5b(a) AST gate (commit 8): +3 (printer
218+
isinstance gate + minimal_fix block gate + negative-test
219+
parser self-verification)
220+
- Part 5b(b) regex regression (commit 9): +1
221+
- Part 5b(c) CI matrix job (commit 10): +0 (CI infrastructure)
222+
- V0_9_4 surface snapshot (commit 11): +1
223+
224+
Sum: -1 + 7 + 9 + 3 + 3 + 2 + 3 + 1 + 0 + 1 = +28 ✓
225+
226+
### Round-35 closure ledger
227+
228+
| Finding | Source | Severity | Closure |
229+
| --- | --- | --- | --- |
230+
| Gap 2 | v0.9.1 NeuroGolf evaluation | MEDIUM | Closed via Decisions 2-9 (Part 2). Score-validity ADVISORY checker via ``onnx_tool.model_profile``; three-extra architecture. |
231+
| Round-30 META | round-30 audit, surfaced across v0.9.0-v0.9.3.1 | INFORMATIONAL -> MEDIUM | Closed via Part 1 (single-test deletion of v0.9.0-pinned scaffold). Five-release escalation. |
232+
| MARAD-3 (cont45 TopK profiler-blocker) | round-32 leverage analysis | MEDIUM | Closed via Part 2 (score-validity advisory fires on cont45 fixture). |
233+
| Round-34 MEDIUM-1: NeuroGolf adapter convention not worked through | round-34 audit | MEDIUM | Closed via Part 3 (README section + worked-example fixtures for Pattern A and Pattern B). |
234+
| Round-34 LOW-1: ruff version pin friction | round-34 audit | LOW | Closed via Part 4 (skip-on-mismatch with actionable message). |
235+
| Round-34 HIGH-1 bug class | round-34 audit | HIGH | Closed via Part 5 (AST-based CLI integration gate). v0.9.3.1 fixed the instance; v0.9.4 prevents recurrence. |
236+
| v0.9.3.1 carry-forward: ONNX printer minimal_fix gap | v0.9.3.1 hotfix audit | MEDIUM | Closed via Part 5b(a). 5 regression tests + gate extension. |
237+
| Round-34 HIGH-2a: _TESTS_LINE regex literal-space fragility | v0.9.3.1 PR build | MEDIUM | Closed via Part 5b(b). Regex changed to ``Net\\s+delta:``. |
238+
| Round-34 HIGH-2b: CI matrix has no [onnx-runtime] job | v0.9.3.1 PR build | MEDIUM | Closed via Part 5b(c). New ``test-onnx-runtime`` CI job. |
239+
| v0.9.0 Decision 1 standing rule operating as three-extra architecture | round-33 architectural finding | INFORMATIONAL | Affirmed. ``[onnx]``, ``[onnx-runtime]``, ``[onnx-profile]`` for three concerns. |
240+
241+
10 closures (1 MEDIUM = Gap 2, 1 MEDIUM = stale scaffold, 1 MEDIUM = MARAD-3, 1 MEDIUM = NeuroGolf docs, 1 LOW = ruff friction, 1 HIGH = bug class, 1 MEDIUM = printer minimal_fix consistency, 1 MEDIUM = regex fragility, 1 MEDIUM = CI matrix gap, 1 INFORMATIONAL = architectural affirmation).
242+
22243
## [0.9.3.1] - 2026-05-04
23244

24245
CLI hotfix. Closes round-34 HIGH-1: the CLI's

README.md

Lines changed: 67 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,8 @@ pip install "furqan-lint[rust]" # tree-sitter Rust adapter
3737
pip install "furqan-lint[go]" # Go adapter (requires Go 1.22+ toolchain at install time)
3838
pip install "furqan-lint[onnx]" # ONNX graph-only checks (D24-onnx + opset-compliance + D11-onnx shape/type)
3939
pip install "furqan-lint[onnx-runtime]" # ONNX + numpy-vs-ONNX divergence (v0.9.3+; brings in onnxruntime + numpy)
40-
pip install "furqan-lint[rust,go,onnx-runtime]" # all adapters with full ONNX inference checks
40+
pip install "furqan-lint[onnx-profile]" # ONNX + score-validity ADVISORY (v0.9.4+; brings in onnx_tool)
41+
pip install "furqan-lint[rust,go,onnx-runtime,onnx-profile]" # all adapters with full ONNX runtime + profile checks
4142
```
4243

4344
### Install from a specific commit or tag
@@ -172,6 +173,60 @@ and `graph.initializer` (parameter tensors) are explicitly out
172173
of scope; including them would create false positives on every
173174
model retraining.
174175

176+
177+
### ONNX numpy_reference convention for NeuroGolf-shape models
178+
179+
The v0.9.3 numpy-vs-ONNX divergence checker discovers a
180+
``numpy_reference`` callable from a sibling ``_build.py`` per
181+
the NeuroGolf-specific four-place documented limit
182+
``numpy_divergence_neurogolf_convention``. The convention
183+
assumes ARC-AGI grids encoded as ``(1, 10, H, W)`` one-hot
184+
tensors (10 channels, one per cell color).
185+
186+
Two canonical patterns satisfy the convention. Both produce
187+
zero divergence findings on a well-formed model; pick the one
188+
that matches your build pipeline. Worked examples live at
189+
``tests/fixtures/onnx/numpy_reference_examples/``.
190+
191+
**Pattern A: pre-one-hot input.** The build pipeline encodes
192+
the raw ARC-AGI grid into ``(1, 10, H, W)`` before invoking
193+
both the ONNX model and the ``numpy_reference``. The reference
194+
function accepts the already-encoded tensor:
195+
196+
```python
197+
def numpy_reference(grid):
198+
import numpy as np
199+
# grid is already (1, 10, H, W) one-hot.
200+
return np.array(grid, dtype=np.float32)
201+
```
202+
203+
The companion ``.json`` task file stores probe grids in the
204+
encoded ``(1, 10, H, W)`` shape under ``train[i]["input"]``.
205+
206+
**Pattern B: raw grid + local encoding.** The build pipeline
207+
keeps the ARC-AGI grid as a raw rank-2 integer grid; the
208+
``numpy_reference`` encodes it locally to match the ONNX
209+
model's expected ``(1, 10, H, W)`` input shape:
210+
211+
```python
212+
def numpy_reference(grid):
213+
import numpy as np
214+
arr = np.array(grid, dtype=np.int64)
215+
h, w = arr.shape
216+
one_hot = np.zeros((1, 10, h, w), dtype=np.float32)
217+
for c in range(10):
218+
one_hot[0, c, :, :] = (arr == c).astype(np.float32)
219+
return one_hot
220+
```
221+
222+
The companion ``.json`` task file stores probe grids in the
223+
raw rank-2 form (the standard ARC-AGI format).
224+
225+
Both patterns are validated by tests under
226+
``tests/test_onnx_neurogolf_adapter_examples.py``; future
227+
``onnx`` / ``onnxruntime`` / ``numpy`` version changes that
228+
break the convention surface as test failures rather than
229+
stale documentation.
175230
## Usage
176231

177232
```bash
@@ -550,14 +605,24 @@ translator-level limits, in `tests/test_go_translator.py`).
550605
`tests/fixtures/go/documented_limits/r3_compile_rejected.go`
551606
(added in v0.8.1).
552607
553-
### ONNX adapter (current as of v0.9.3)
608+
### ONNX adapter (current as of v0.9.4)
554609
555610
Each ONNX limit has a fixture in
556611
`tests/fixtures/onnx/documented_limits/` and a pinning test in
557612
`tests/test_onnx_correctness.py`,
558613
`tests/test_onnx_public_surface_additive.py`, or
559614
`tests/test_onnx_shape_coverage.py`.
560615
616+
- **score_validity is opt-in via [onnx-profile] (v0.9.4).**
617+
The v0.9.4 score-validity ADVISORY checker wraps
618+
`onnx_tool.model_profile()` to surface profiler-coverage
619+
gaps (e.g., the cont45 TopK-without-axis crash). It runs only
620+
when the `[onnx-profile]` extra is installed (which brings in
621+
`onnx_tool`); otherwise it silent-passes. ADVISORY findings
622+
exit 0 (the model is structurally valid; the failure is in
623+
the deployment-side profiler). Pinned as
624+
`tests/fixtures/onnx/documented_limits/score_validity_optin_extra.py`.
625+
561626
- **numpy_divergence requires NeuroGolf-convention sidecars
562627
(v0.9.3).** The numpy-vs-ONNX divergence checker is opt-in by
563628
reference presence: it only runs when (a) the

0 commit comments

Comments
 (0)