This repository is a Rust workspace for a staged research and engineering effort around a Halo2-based wrapper that may eventually verify Groth16 BN254 proofs inside an outer Halo2 proof system.
The project is intentionally incremental. The current codebase now includes a circuit-backed BN254 primitive layer covering Week 1 foundations, the narrow Week 2 slices, the Week 3 extension-field slice, the Week 4 pairing-core slice through real optimal-ate Miller traversal, final exponentiation, and a narrow multi-pairing product check, the Week 5 Groth16 BN254 verifier slice on top of that pairing core, and the direct Halo2/Midnight outer setup/prove/verify lane built on the canonical OuterWrapperCircuit. It is still far from a broad or production-ready wrapper verifier.
Current phase: Stage 1 / Week 5+ direct outer setup/prove/verify lane.
Implemented in scope today:
- Workspace structure, crate boundaries, docs, CLI, CI, and benchmark conventions
- BN254 foreign-field support in
wrapper-circuitsbacked bymidnight-circuits/midnight-proofs - Circuit-backed
fp add,fp mul, and related minimal field wiring - Circuit-backed BN254
fp2support represented asa + buwithu^2 = -1 AssignedFp2over twoAssignedFpcoordinates withnew, assignment,zero,one,add,sub,neg,mul,square, and equality helpers- Circuit-backed BN254
fp6support represented asc0 + c1 * v + c2 * v^2 AssignedFp6over threeAssignedFp2coordinates withnew, assignment,zero,one,add,sub,neg,mul,square, and equality helpers- Circuit-backed BN254
fp12support represented asc0 + c1 * w AssignedFp12over twoAssignedFp6coordinates withnew, assignment,zero,one,add,sub,neg,mul,square, and equality helpers- Shared internal field/circuit traits in
wrapper-circuits/src/bn254/traits.rs - Shared host-side constant/reference arithmetic in
wrapper-circuits/src/bn254/host/ AssignedFieldExtnow captures the commonzero/one/add/sub/neg/ equality surface acrossAssignedFp,AssignedFp2,AssignedFp6, andAssignedFp12AssignedCircuitValueplus shared unary/binary synthesize helpers now back the smallFp2*Circuit,Fp6*Circuit, andFp12*Circuitwrappers- Host-side reference formulas and arkworks/Midnight conversion helpers are centralized in
wrapper-circuits/src/bn254/tests/support.rs - Minimal BN254 G1 support backed by Midnight foreign ECC chips
- Circuit-backed G1 addition
- Coordinate-to-point construction with on-curve enforcement
- Minimal BN254 G2 affine support backed by
AssignedFp2 AssignedG2Affinewith assignment,neg,assert_equal, and explicit twistassert_on_curve- Narrow BN254 G2 projective support in Jacobian coordinates over
AssignedFp2 AssignedG2Projectivewith reserved identity encoding plusfrom_affine,neg,double, and incompleteadd- Miller-path BN254 G2 step support in homogeneous projective coordinates over
AssignedFp2 AssignedG2MillerPointwith non-identityfrom_affine,double_with_line, andmixed_add_with_line- Miller-ready sparse line coefficients via
AssignedG2LineCoeffs = (ell_0, ell_w, ell_vw) AssignedMillerAccumulatoris now the public consumption boundary for line coefficients, withmul_by_line(...)- sparse line evaluation into
Fp12is now an internal accumulator detail rather than a publicAssignedG2LineCoeffsAPI - real BN254 optimal-ate Miller traversal shape backed by a fixed deterministic prepared schedule
- narrow BN254 final exponentiation over Miller-loop output, aligned with arkworks on supported non-exceptional single-pair inputs
- narrow multi-pairing product check that multiplies Miller outputs first, applies exactly one shared final exponentiation, and compares the result against the target-group identity
- narrow end-to-end pairing-core correctness against arkworks on supported non-exceptional 1-term, 2-term, and 3-term products
- narrow Groth16 BN254 verifier types in
wrapper-circuits/src/groth16.rs - verifier-only BN254 G1 IC accumulation using fixed public-input scalars over the existing Midnight G1 chip
- real snarkjs Groth16 BN254 JSON parsing in
wrapper-backends/src/snarkjs.rs - generic snarkjs Groth16 BN254 artifact-set loading in
wrapper-backends/src/groth16.rs - outer proof backend contracts in
wrapper-backends/src/outer.rs - a real Halo2/Midnight outer wrapper circuit in
wrapper-circuits/src/outer/ - domain-level wrapper-job planning in
wrapper-core/src/job.rs - serializable wrapper execution packages in
wrapper-core/src/package.rs - expected honest direct outer artifact modeling in
wrapper-core/src/output.rs - execution result modeling in
wrapper-core/src/execution.rs - verifier-equation reduction to one multi-pairing product check using
e(A, B) * e(-vk_x, gamma) * e(-C, delta) = e(alpha, beta), withe(alpha, beta)precomputed as a fixed GT constant - a real Circom/snarkjs fixture under
crates/wrapper-tests/fixtures/groth16/circom_multiplier2/ - a real Semaphore Groth16 BN254 fixture under
crates/wrapper-tests/fixtures/groth16/semaphore/ - end-to-end valid / invalid Groth16 verifier tests on top of the existing pairing core
- bundle -> job -> package plus direct setup/prove/verify validation on top of the Semaphore fixture
- Real layout and row visibility through the Halo2/Midnight cost model
- Deterministic arkworks-backed tests for
Fp,Fp2,Fp6,Fp12, G1, and the current narrow G2 affine / Jacobian / Miller-step behavior - Criterion sanity benchmarks for the currently implemented primitive circuits
- a canonical primitive registry in
wrapper-circuits/src/planning.rsnow drives measured primitive metadata for CLI reporting and benchmark-info output - a single authoritative BN254 primitive path in
wrapper-circuits/src/bn254/
Out of scope right now:
- G2 subgroup checks
- scalar multiplication on G2
- broad public full-pairing or multi-pairing APIs beyond the narrow pairing-check boundary
- broad Groth16 verifier frameworks beyond the first narrow BN254 slice
- MSM as a public supported layer
- wrapper verifier circuit composition
- production optimization of layout/cost beyond the narrow implemented sanity circuits
Do not treat the current code as a full verifier foundation. It is a deliberately narrow primitive-plus-first-verifier slice.
Week 5 verifier-memory notes:
- the committed real fixture lives under
crates/wrapper-tests/fixtures/groth16/circom_multiplier2/ - it comes from
circom+snarkjsand keeps the rawproof.json,public.json, andverification_key.jsonartifacts in the snarkjsbn128format - snarkjs G1 points in that fixture use projective
[x, y, z]; the parser accepts affinez = 1plus the snarkjs G1 identity encoding[0, 1, 0] - the current Groth16 pairing reduction is
e(A, B) * e(-vk_x, gamma) * e(-C, delta) = e(alpha, beta), withe(alpha, beta)precomputed off-circuit - the current IC accumulation path is verifier-only and uses fixed public-input scalars over the existing Midnight G1 chip; it is not a broad public MSM API
- the current generic artifact-set path is
snarkjs artifacts -> Groth16Bn254ArtifactBundle -> WrapperJob -> WrapperExecutionPackage -> WrapperExecutionResult - the current delivery lane is
Groth16Bn254 -> Halo2Outerover the canonical Halo2/Midnight outer circuit - the current expected outer artifact model is honest to the direct backend:
halo2-plonkish/bn254, withserdeJSON carrying hex-encoded proof, VK, and verifier-param payloads - the current execution model includes both the legacy stub result and the real direct CLI execution result payload
- the current CLI surfaces for that lane are
inspect-groth16-bundle,plan-wrapper-job,export-wrapper-job,export-wrapper-package,execute-wrapper-stub, andexecute-wrapper-direct - the current placeholder outer backend is
PlannedHalo2OuterBackend, which materializes the honest direct-output contract without generating a real proof - the selected concrete outer backend lane is
MidnightDirectOuterBackendBn254Host, and it treats the Halo2/Midnight outer circuit as canonical - the current outer backend lane can now adapt artifact bundles into the canonical outer circuit input, build a real outer circuit, plan setup/proof outputs, and validate produced proof/VK shapes
- the repository now also exposes a direct canonical outer-circuit backend surface in
wrapper-backends/src/outer.rsthroughCanonicalOuterCircuitProofBackend,plan_direct_outer_circuit_setup(...), andplan_direct_outer_circuit_proof(...) - the repository now also contains a canonical R1CS line under
crates/wrapper-circuits/src/r1cs/, including deterministic lowering, canonical identity hashing, a zkInterface-style export bridge, and a first Arkworks adapter - that canonical R1CS line should currently be treated as an alternate backend / later phase, not the critical path for the first real
.circom-> outer-wrapper flow - the remaining blockers are mainly performance/CI ergonomics and broader production hardening, not missing direct setup/prove/verify plumbing
- the prover-strategy design pass for that blocker lives in
docs/outer-prover-strategy-plan.md
Choose the shortest route that matches the task instead of reading the whole repo every time.
If you need the current truth fast:
README.mdAGENTS.mdup throughFast Context Loaddocs/architecture.md
If you need Groth16 verifier context:
crates/wrapper-circuits/src/groth16.rscrates/wrapper-backends/src/snarkjs.rscrates/wrapper-tests/fixtures/groth16/circom_multiplier2/README.mdcrates/wrapper-circuits/src/groth16/profiling.rs
If you need wrapper planning / execution-package context:
crates/wrapper-backends/src/groth16.rscrates/wrapper-core/src/job.rscrates/wrapper-core/src/package.rscrates/wrapper-core/src/output.rscrates/wrapper-core/src/execution.rscrates/wrapper-cli/src/main.rs
If you need the current Semaphore migration fixture context:
crates/wrapper-tests/fixtures/groth16/semaphore/README.mdcrates/wrapper-tests/src/lib.rscrates/wrapper-backends/src/groth16.rscrates/wrapper-cli/src/main.rs
If you need the current direct-lane Semaphore runbook:
docs/semaphore-direct-execution-playbook.mdcrates/wrapper-tests/fixtures/groth16/semaphore/README.mdcrates/wrapper-tests/src/lib.rscrates/wrapper-cli/src/main.rs
If you need the ZK Email integration study context:
docs/plans/0004-zk-email-integration-plan.mdcrates/wrapper-tests/fixtures/groth16/semaphore/README.mdcrates/wrapper-tests/src/lib.rscrates/wrapper-backends/src/groth16.rsdocs/plans/0003-plutus-aiken-integration-plan.md
If you need the remaining path to a real .circom end-to-end wrapper flow:
docs/outer-prover-strategy-plan.mddocs/real-circom-wrapper-integration-plan.mddocs/r1cs-backend-status.mddocs/decisions/0003-direct-outer-setup-cost-reduction.mdcrates/wrapper-backends/src/outer.rscrates/wrapper-circuits/src/outer/mod.rscrates/wrapper-core/src/output.rscrates/wrapper-core/src/execution.rs
If you need outer wrapper circuit context:
docs/outer-wrapper-circuit-layered-walkthrough.mddocs/plans/0008-outer-vk-public-binding-plan.mdcrates/wrapper-circuits/src/outer/mod.rscrates/wrapper-circuits/src/outer/input.rscrates/wrapper-circuits/src/outer/statement.rscrates/wrapper-backends/src/outer.rsdocs/outer-prover-strategy-plan.md
If you need pairing-core / final-exponentiation context:
crates/wrapper-circuits/src/bn254/g2/miller.rscrates/wrapper-circuits/src/bn254/host/pairing_host.rscrates/wrapper-circuits/src/bn254/tests/pairing.rsdocs/midnight-optimizations.mddocs/profiling.mddocs/midnight-optimizations.md
If you need Midnight-local optimization context:
docs/midnight-optimizations.mddocs/decisions/0002-bn254-local-optimization-policy.mddocs/plans/0005-halo2-row-optimization-plan.mddocs/plans/0007-pairing-kernel-opportunity-audit-plan.mdcrates/wrapper-circuits/src/bn254/types.rscrates/wrapper-circuits/src/bn254/fp2.rscrates/wrapper-circuits/src/bn254/fp6.rscrates/wrapper-circuits/src/bn254/g2/miller.rsdocs/profiling.mddocs/plans/0002-cyclotomic-unitary-kernel-design.md
If you need BN254 primitive structure / ownership context:
crates/wrapper-circuits/src/bn254/mod.rscrates/wrapper-circuits/src/bn254/traits.rscrates/wrapper-circuits/src/bn254/host/mod.rsdocs/architecture.md
If you need CLI / measurement context:
crates/wrapper-circuits/src/planning.rscrates/wrapper-circuits/src/groth16/profiling.rscrates/wrapper-cli/src/main.rsdocs/profiling.mddocs/benchmarking.mddocs/midnight-optimizations.md
If you need deferred speed follow-up context after the current h_poly memory
blocker is solved:
docs/h-poly-followup-speed-plan.mddocs/finalize-successful-run-metrics.mddocs/decisions/0003-direct-outer-setup-cost-reduction.mddocs/decisions/0004-local-midnight-proofs-patch.md
If you need the current successful direct prove-finalize baseline with real
timings and peak memory:
docs/finalize-successful-run-metrics.mddocs/h-poly-followup-speed-plan.mddocs/plans/0006-finalize-checkpoint-profiling-plan.mdpatches/midnight-proofs/src/plonk/evaluation.rs
If you need ultra-fine finalize profiling context for the current memory blocker:
docs/plans/0006-finalize-checkpoint-profiling-plan.mddocs/decisions/0003-direct-outer-setup-cost-reduction.mddocs/decisions/0004-local-midnight-proofs-patch.mdpatches/midnight-proofs/src/plonk/prover.rspatches/midnight-proofs/src/plonk/mod.rspatches/midnight-proofs/src/plonk/evaluation.rs
If you need stage boundaries / "is this in scope?" context:
AGENTS.mdCurrent Phase and Scope Boundariesdocs/roadmap.mddocs/architecture.md
When you need to build context quickly, read in this order:
crates/wrapper-circuits/src/groth16.rscrates/wrapper-backends/src/snarkjs.rscrates/wrapper-tests/fixtures/groth16/circom_multiplier2/README.mdcrates/wrapper-circuits/src/bn254/mod.rscrates/wrapper-circuits/src/bn254/traits.rscrates/wrapper-circuits/src/bn254/host/mod.rscrates/wrapper-circuits/src/bn254/fp2.rs,fp6.rs,fp12.rscrates/wrapper-circuits/src/bn254/g2/mod.rscrates/wrapper-circuits/src/bn254/g2/affine.rscrates/wrapper-circuits/src/bn254/g2/jacobian.rscrates/wrapper-circuits/src/bn254/g2/miller.rscrates/wrapper-circuits/src/bn254/host/pairing_host.rscrates/wrapper-circuits/src/bn254/tests/support.rscrates/wrapper-circuits/src/bn254/tests/pairing.rscrates/wrapper-circuits/src/groth16/profiling.rscrates/wrapper-circuits/src/outer/mod.rscrates/wrapper-backends/src/groth16.rscrates/wrapper-backends/src/outer.rscrates/wrapper-core/src/job.rs,package.rs,output.rs,execution.rscrates/wrapper-circuits/src/planning.rs,crates/wrapper-cli/src/main.rsdocs/outer-prover-strategy-plan.mddocs/profiling.mddocs/midnight-optimizations.md
This is the highest-signal order for understanding the current primitive surface, reusable helpers, and measured costs.
Use each top-level doc for one job:
README.md: fastest repo snapshot, workspace map, contributor commands, and entry pointsAGENTS.md: binding scope, architectural boundaries, staged constraints, and code-touching rulesdocs/architecture.md: crate ownership, data flow, and primitive-layer boundariesdocs/roadmap.md: what stage the repo is in and what remains explicitly out of scopedocs/profiling.md: how to measure layout cost and compare optimization baselinesdocs/benchmarking.md: benchmark naming, bench-info wiring, and benchmark/reporting sync rulesdocs/plans/0005-halo2-row-optimization-plan.md: ordered implementation plan for row-count optimization work on the current BN254 pairing-core lane, with a ready-to-start Phase 1docs/midnight-optimizations.md: prioritized Midnight primitives and local optimization candidates that already proved useful or look promising for the BN254 tower / pairing pathdocs/decisions/0002-bn254-local-optimization-policy.md: durable retained/rejected optimization decisions for the BN254 pairing-core lanedocs/decisions/0003-direct-outer-setup-cost-reduction.md: accepted direction for reducing direct outer setup cost via a lean setup artifact and later params cachingdocs/decisions/0004-local-midnight-proofs-patch.md: accepted rationale for carrying a localmidnight-proofspatch to support richer direct setup/prove artifactsdocs/plans/0006-finalize-checkpoint-profiling-plan.md: implementation plan for ultra-fineprove-finalizecheckpoint logging, iteration heartbeats, memory snapshots, elapsed-time profiling, and real-time log inspectiondocs/plans/0007-pairing-kernel-opportunity-audit-plan.md: prioritized audit and experiment plan for new pairing kernels that must reduce real base-arithmetic cost rather than merely repackage existing formulasdocs/plans/0008-outer-vk-public-binding-plan.md: implementation plan for binding the outer proof publicly to a specific inner Groth16 verification key via a public commitment rather than a witness-only VKdocs/h-poly-followup-speed-plan.md: deferred speed-oriented follow-ups for the retained chunkedh_polypath once the current memory blocker is solveddocs/finalize-successful-run-metrics.md: successfulprove-finalizebaseline report with real wall-clock, peak-memory, and hotspot rankings for the current chunked direct lanedocs/plans/0002-cyclotomic-unitary-kernel-design.md: proposed compressed-torus-region design for repeatedcyclotomic * unitary_inverse(cyclotomic)sites in the hard partdocs/plans/0004-zk-email-integration-plan.md: phased plan for the first larger Circom-origin integration track using ZK Email as the reference casedocs/real-circom-wrapper-integration-plan.md: phased implementation plan for finishing the real.circom-> outer-wrapper end-to-end pathdocs/r1cs-backend-status.md: current status of the canonical R1CS line and why it is currently an alternate backend / later phasedocs/outer-prover-strategy-plan.md: current proving-strategy decision and direct backend surface for the canonical Halo2/Midnight outer circuitdocs/outer-wrapper-circuit-layered-walkthrough.md: layer-by-layer explanation of the canonical outer circuit, from normalized input and statement semantics through host-lane wiring, Groth16 verification, and pairing-core enforcementdocs/semaphore-direct-execution-playbook.md: operational direct-lane commands, host recommendation, chunk-size starting point, semantic public-input naming, and artifact layout for the committed Semaphore fixture
When adding a new major doc, update this list and at least one context route so future agents know when to read it.
crates/wrapper-core: domain models, traits, config, errors, metadata, capability/status reporting, wrapper-job planning, execution packages, expected output-artifact shapes, and execution resultscrates/wrapper-circuits: Halo2-facing code, Midnight-backed BN254 primitive layer, planning, layout reportingcrates/wrapper-backends: backend adapter placeholders, artifact parsing entry points, generic Groth16 artifact-set loading, and bundle-to-wrapper planning adapterscrates/wrapper-cli: developer commands and diagnosticscrates/wrapper-tests: shared fixtures, benchmark entry points, and integration helpersdocs/architecture.md: intended layering and current primitive boundariesdocs/roadmap.md: staged implementation plandocs/benchmarking.md: benchmark structure and conventionsdocs/profiling.md: reproducible layout-profiling workflow for the current Groth16 slicedocs/midnight-optimizations.md: local Midnight-backed optimization guidance for repeated tower operations and fixed-constant arithmeticdocs/outer-prover-strategy-plan.md: strategy document for the remaining prover/backend decision on the outer Halo2/Midnight circuitdocs/outer-wrapper-circuit-layered-walkthrough.md: detailed circuit walkthrough of the canonical outer wrapper lanedocs/decisions/0001-initial-workspace-structure.md: ADR for the workspace splitdocs/decisions/0002-bn254-local-optimization-policy.md: ADR for retained versus rejected BN254 local pairing-core optimizations
wrapper-core
- Must remain mostly domain-oriented.
- Prefer no Halo2 dependency unless a boundary cannot be expressed otherwise.
- Own shared enums, traits, config structs, metadata, capabilities, and stable public concepts.
- Own wrapper-job planning, execution-package, expected-output, and execution-result concepts when they can stay proving-system-agnostic.
- Must not absorb chip-specific or region-specific logic.
wrapper-circuits
- Own Halo2-facing code, Midnight integration, circuit planning, and primitive gadget boundaries.
- Own the canonical outer wrapper circuit under
src/outer/; backend crates may adapt inputs into that circuit but must not define a second competing circuit source of truth. - Currently owns the BN254
AssignedFp,AssignedFp2,AssignedFp6,AssignedFp12,AssignedG1,AssignedG2Affine, narrowAssignedG2Projective, Miller-pathAssignedG2MillerPoint,AssignedG2LineCoeffs, andAssignedMillerAccumulatorcircuit-backed layer. - Keeps the active BN254 primitive implementation under
src/bn254/, split by concern instead of one monolithic file. - The current
g2/subtree is split by model:g2/affine.rs,g2/jacobian.rs,g2/miller.rs, withg2/mod.rsholding shared aliases, constants, helpers, and re-exports. - The current host-side support is split by concern under
bn254/host/:host/mod.rsfor the shared tower surface,host/g2_host.rsfor G2/Jacobian/Miller host constants,host/pairing_host.rsfor final-exponentiation host formulas. - Reuse
bn254/host/before duplicating tuple-based host/reference arithmetic acrossfp2.rs,fp6.rs,fp12.rs, org2/mod.rs. - Reuse
bn254/traits.rsbefore adding more tiny wrapper-specific circuit boilerplate infp2.rs,fp6.rs, orfp12.rs. - The current BN254 test tree is split by concern under
bn254/tests/:tests/mod.rsas the root,tests/support.rsfor shared arkworks/Midnight helpers and test fixtures,tests/field_and_tower.rsfor field/Fp2/Fp6/Fp12 coverage,tests/curve.rsfor G1/G2/projective/line-extraction coverage,tests/accumulator.rsfor accumulator/sparse-line/mixed-add-consumption coverage,tests/pairing.rsfor the pairing-core lane. - Reuse
bn254/tests/support.rsbefore adding new arkworks/Midnight conversion helpers or duplicating host-side reference formulas in test modules. - Keep expensive pairing-core assertions in
tests/pairing.rsand cheaper primitive/G2 coverage intests/field_and_tower.rs,tests/curve.rs, andtests/accumulator.rsso the slow lane remains explicit. - Prefer short public methods over formula-heavy bodies: keep APIs as orchestration layers and move algebraic steps into internal helpers with explicit names.
- Should depend on
wrapper-core. - Must not absorb artifact parsing or backend-specific concerns.
- Keep dead compatibility shims and obsolete host-side leftovers out of the crate.
wrapper-backends
- Own parsing, loaders, serialization adapters, and future ecosystem bridges.
- Should depend on
wrapper-core. - Must not define circuit semantics independently of
wrapper-circuits. - It now owns the generic
ArtifactSetLoadercontract plus the currentsnarkjsGroth16 BN254 bundle loader. - It may adapt parsed bundles into domain-level wrapper jobs/packages, but should not absorb application-specific public-input naming.
wrapper-cli
- Own user-facing commands, output formatting, and developer diagnostics.
- Must report missing functionality honestly.
- Should expose measured primitive status without overstating what is implemented.
- The current narrow optimization-baseline surface is
profile-layout, which emits TSV layout metrics for Groth16, pairing-term scaling, public-input scaling, and existing pairing-core blocks. - Treat
profile-layoutas layout/constraint profiling, not runtime benchmarking. - The current planning/execution surfaces for wrapper experiments are
inspect-groth16-bundle,plan-wrapper-job,export-wrapper-job,export-wrapper-package,execute-wrapper-stub,execute-wrapper-direct,execute-wrapper-direct-setup,execute-wrapper-direct-prove,execute-wrapper-direct-prove-trace,execute-wrapper-direct-prove-finalize, andexecute-wrapper-direct-verify. - For the committed Semaphore fixture, the current operational runbook lives in
docs/semaphore-direct-execution-playbook.md; it recommends starting frommidnight-bls12381-host, persisting artifacts underartifacts/direct-profile-semaphore/, and carrying the semantic public-input names through the direct CLI via repeated--public-input-nameflags. - Direct execution commands now enforce a
24 GiBprocess memory limit. - The direct setup artifact now persists verification materials plus a proving-key sidecar.
- Artifact hygiene rule for future agents:
- when changing setup-producing code or patch state, delete setup artifacts produced before that change before trusting later prove/finalize runs
- when changing
prove-trace-producing code or persisted trace format, delete previously materialized trace artifacts and trace logs before rerunning - when changing
prove-finalize-producing code or finalized proof-bundle format, delete previously materialized finalized proof artifacts and finalize logs before rerunning
- The repo now carries a local
[patch.crates-io]override formidnight-proofsunderpatches/midnight-proofs. - That local patch adds
BaseProvingKey,keygen_pk_base(...),create_proof_from_base(...),create_proof_trace_from_base(...), andfinalise_proof_from_base_trace(...)so richer direct setup/prove artifacts can be expressed without waiting on upstream. - The direct prove path now avoids rerunning
keygen_pk(...)in the wrapper backend. - Current known limitation: the next suspected memory hotspot is eager coset materialization inside
compute_h_poly(...)(advice_cosets/instance_cosets) in the patched prover. - The split
execute-wrapper-direct-prove-trace/execute-wrapper-direct-prove-finalizeflow exists so the pre-compute_h_poly(...)phase can be cached as an artifact between experiments. - Current active debugging focus: reduce memory pressure around
compute_h_poly(...)and identify the last successful finalize subphase from the emitted finalize checkpoints. execute-wrapper-direct-prove-finalizenow exposes--h-poly-row-chunk-size; it is optional and should normally be omitted unless you are tuning memory after an OOM.- That flag accepts a base-2 exponent rather than a raw row count; for example,
16means65536rows. - Current Semaphore guidance: start from
--h-poly-row-chunk-size 13only as an operational baseline inherited fromcircom_multiplier2; do not treat it as a Semaphore-specific optimum until a real Semaphore run confirms it. - One successful measured lean setup run on
circom_multiplier2producedcircuit_k = 21,public_input_count = 1, andsetup_elapsed_ms = 1554572(about25m 54s).
wrapper-tests
- Own fixtures, shared test helpers, and benchmark entry points.
- Should host cross-crate integration coverage and Criterion runners.
- Should not become a dumping ground for reusable circuit logic that belongs in
wrapper-circuits.
- Preserve the separation between domain, circuits, and backends unless there is a documented reason to change it.
- Update
docs/architecture.mdand the relevant ADR when changing public architecture or ownership boundaries. - Prefer narrow interfaces in
wrapper-coreover leaking circuit implementation details across crates. - Keep the current primitive layer small and explicit.
- Do not introduce speculative abstractions for later pairing/verifier work unless they are needed immediately.
- Add dependencies conservatively.
- Prefer workspace-managed dependency versions.
- Heavy cryptographic dependencies must earn their place through current-stage use, tests, and CI viability.
- Document why a new dependency belongs now, not later.
- Feature-gate stage-specific dependencies when appropriate.
- For current circuit work, prefer existing Midnight and arkworks infrastructure over inventing parallel stacks.
- Start from documented interfaces and roadmap items.
- Add tests and docs alongside any cryptographic implementation.
- Keep arithmetic, ECC, pairing, and verifier logic explicit and reviewable.
- Do not hide critical behavior behind abstractions that obscure invariants or cost.
- Record major cryptographic architecture choices in
docs/decisions/. - Treat retained or rejected local pairing-core optimization directions as ADR-worthy once they materially change the measured baseline or future search space.
- Prefer extending the existing Midnight-backed BN254 layer over creating a second incompatible primitive stack.
The current primitive layer is built around:
midnight-circuitsmidnight-proofsmidnight-curves- arkworks as the reference implementation in tests
When touching the current BN254 primitive code:
- keep
fpwork limited to the currently supported primitive surface unless the roadmap explicitly expands it - keep
fp2work aligned with the current representationFq2(c0, c1)andu^2 = -1 - keep
fp6work aligned with the current representationFq6(c0, c1, c2)andv^3 = 9 + u - keep
fp12work aligned with the current representationFq12(c0, c1)andw^2 = v - keep extension-field wrapper circuits aligned with the shared
AssignedCircuitValuesynthesize helpers unless there is a clear reason not to - keep G1 work limited to the currently supported primitive surface unless the roadmap explicitly expands it
- keep G2 work limited to the currently supported affine plus narrow Jacobian projective surface unless the roadmap explicitly expands it
- keep Miller-path G2 work aligned with the homogeneous prepared-step formulas used by arkworks BN prepared-G2 generation
- keep final exponentiation work aligned with the standard BN easy-part / hard-part decomposition used by arkworks unless a measured circuit-oriented rewrite clearly improves the current slice
- keep pairing-check work verifier-shaped: accumulate Miller outputs first, apply exactly one final exponentiation to the total product, and avoid per-term final exponentiation
- for final-exponentiation and local tower optimization work, read
docs/midnight-optimizations.mdfirst so you inherit the current proved-useful Midnight primitives and ruled-out local paths - when looking for local tower wins, read
docs/midnight-optimizations.mdbefore inventing new gadgets; it records whichmidnight-circuitsprimitives (mul_by_constant,linear_combination,add_constant, etc.) already paid off in this repo, which ones were explicitly tried and ruled out, and which wins are only local instead of pairing-core-wide - when a public method contains a full algebraic step, prefer extracting the formula into a well-named internal helper such as
double_step_jacobian,double_step_hom_projective, ormixed_add_step_hom_projective - preserve real layout measurement support
- keep benchmarks honest and tied to actually implemented circuits
- keep CLI reporting aligned with the measured state of the codebase
Concrete BN254 conventions already in use:
AssignedFp2follows the standard BN254 extension representationa + buFq2coordinate order is(c0, c1)to match arkworksu^2 = -1AssignedFp6followsc0 + c1 * v + c2 * v^2Fq6coordinate order is(c0, c1, c2)to match arkworks- the cubic nonresidue is
9 + u, soFp6 = Fp2[v] / (v^3 - (9 + u)) AssignedFp12followsc0 + c1 * wFq12coordinate order is(c0, c1)to match arkworks- the quadratic nonresidue is
v = Fp6(0, 1, 0), soFp12 = Fp6[w] / (w^2 - v) - Miller-path G2 line coefficients use the sparse BN254 D-twist layout
(ell_0, ell_w, ell_vw) - evaluating those coefficients at a G1 affine point
(x_P, y_P)yieldsell_0 * y_P + ell_w * x_P * w + ell_vw * v * w - that sparse embedding maps directly into Fp12 slots
(c0, c3, c4)for the latermul_by_034-style Miller accumulator path - the public boundary for that consumption is
AssignedMillerAccumulator::mul_by_line(...), not a direct public helper onAssignedFp12 - Miller-path
double_with_lineandmixed_add_with_linefollow the homogeneous-projective BN prepared-G2 formulas used by arkworks / Midnight, not the Jacobian formulas used byAssignedG2Projective - final exponentiation follows the standard BN254 easy-part / hard-part split used by arkworks over the Miller-loop output
- the narrow pairing-check path computes each real Miller loop, multiplies the Miller outputs in
Fp12, applies exactly one final exponentiation, and checks equality with theFp12multiplicative identity - the current Groth16 verifier route fully precomputes the fixed verifier-key term
e(alpha, beta)into a GT constant, precomputes Miller-step line coefficients off-circuit for the remaining constant verifier-key G2 terms (gamma_g2,delta_g2), and feeds only those prepared lines into the interleaved multi-Miller loop; only the proof term stays on the variable G2 path - the current final-exponentiation code now exposes
final_exponentiation_easy_part(...)andfinal_exponentiation_hard_part(...)as audit-friendly internal helpers without changing semantics - the current hard-part hotspot is still the repeated
exp_by_neg_x(...)lane; readdocs/profiling.mdplusdocs/midnight-optimizations.mdbefore changing it so you inherit the current measured state and the local Midnight primitives that already paid off - the current best class of local wins came from replacing generic constant multiplies in repeated tower helpers with Midnight-backed
mul_by_constant(...); checkdocs/midnight-optimizations.mdbefore changing repeatedFp2/Fp6/Fp12transforms - the current repo evidence says foreign-field
linear_combination(...)is not an automatic optimization win: an April 27, 2026 pass that rewroteAssignedFp2::mul_by_constant(...),AssignedFp6::mul_by_nonresidue_fp2(...), and the Fp123t +/- 2zhelpers regressedfp12 cyclotomic square(1622 -> 1886),final exponentiation(587420 -> 678119), andpairing check(1682524 -> 1805233), so treat that exact rewrite family as ruled out unless you have a materially different constraint shape - if you revisit
linear_combination(...)in the BN254 tower, compare against the retainedmul_by_constant(...)baseline and do not keep the rewrite unlesswrapper-cli doctororprofile-layout --family blocksshows a clear row win - the current repo evidence says
add_constant(...)does have one retained win: folding the fixed BN254 twist coefficient directly intoAssignedG2Affine::assert_on_curve(...)improvedg2 on_curve(400 -> 378),g2 neg(930 -> 886),g2 proj from_affine(970 -> 948),g2 proj double(2594 -> 2550),g2 proj add(4582 -> 4516),g2 double_with_line(2698 -> 2654), andg2 mixed_add_with_line(3374 -> 3330) - treat that
add_constant(...)result as a local G2 / Miller-prep win, not as proof that pairing-core blocks will move;miller loop,final exponentiation, andpairing checkrows stayed unchanged inprofile-layout --family blocks - the current repo evidence says the obvious
select/is_equal*/is_zerocleanup on the final GT identity check is row-neutral: an April 27, 2026 pass that wrapped the manual coordinate checks into compositeFp2/Fp6/Fp12boolean equality helpers left bothwrapper-cli doctorandprofile-layout --family blocksunchanged, so do not keep that rewrite for performance alone - the current retained
exp_by_neg_x(...)chain is now a signed-window schedule from35with steps<<6,-35,<<9,+101,<<8,-83,<<9,+37,<<9,+105,<<11,+79,<<5,+17; it wins because one extra cyclotomic square in precomputation is cheaper than the saved main-chain multiplication - the next retained subgroup-aware win after that was compressed cyclotomic squaring inside the repeated square blocks of
exp_by_neg_x(...); it improvedfinal_exponentiation_hard_partfrom561254to492083andfinal_exponentiationfrom574562to505391 - the first torus-style prototype for
cyclotomic * unitary_inverse(cyclotomic)was also a non-win when applied only toy7inside the hard part: it regressedfinal_exponentiation_hard_partfrom561254to571604andfinal_exponentiationfrom574562to584912, so do not retry isolated call-site torus substitutions - the broad
CyclotomicFp12MulChiprollout overy3,y9,y10, andy11was also a non-win: it regressedfinal_exponentiation_hard_partfrom561254to561344andfinal_exponentiationfrom574562to574652, so do not retry chip-level repackaging of the ambient Fp12 multiplication formula without a genuinely different subgroup arithmetic kernel - the fixed BN254
exp_by_neg_x(...)recipe now lives incrates/wrapper-circuits/src/bn254/final_exp_chain.rsand is consumed by both host/reference code and the circuit path; keep that module canonical - minimal G2 affine on-curve checks use the arkworks BN254 twist equation
y^2 = x^3 + b - the twist coefficient is
b = 3 / (u + 9)with the exact arkworks valueFq2(19485874751759354771024239261021720505790618469301721065564631296452457478373, 266929791119991161246907387137283842545076965332900288569378510910307636690)
Current measured primitive costs from wrapper-cli doctor:
fp add: 40 rows / 58 queries,k=9fp mul: 38 rows / 58 queries,k=9fp2 add: 80 rows / 58 queries,k=9fp2 mul: 152 rows / 58 queries,k=9fp2 square: 114 rows / 58 queries,k=9fp6 add: 240 rows / 58 queries,k=9fp6 mul: 1252 rows / 58 queries,k=11fp6 square: 736 rows / 58 queries,k=10fp12 add: 480 rows / 58 queries,k=9fp12 mul: 4076 rows / 58 queries,k=12fp12 square: 2594 rows / 58 queries,k=12fp12 cyclotomic square: 1622 rows / 58 queries,k=11g1 add: 319 rows / 105 queries,k=9g2 on_curve: 378 rows / 58 queries,k=9g2 neg: 886 rows / 58 queries,k=10g2 proj from_affine: 948 rows / 58 queries,k=10g2 proj double: 2550 rows / 58 queries,k=12g2 proj add: 4516 rows / 58 queries,k=13g2 double_with_line: 2654 rows / 58 queries,k=12g2 mixed_add_with_line: 3330 rows / 58 queries,k=12miller accumulator square: 2714 rows / 58 queries,k=12miller accumulator mul_by_line: 4248 rows / 58 queries,k=13miller accumulator mul_by_line sparse: 2592 rows / 58 queries,k=12miller loop narrow: 457060 rows / 58 queries,k=19final exponentiation: 505391 rows / 58 queries,k=19pairing check: 1600495 rows / 94 queries,k=21
Interpretation guidance:
g2 negis not a measure of a raw sign flip alone; the current benchmark circuit includes assignment, on-curve checks, negation, and equality against the expected outputfp12 mulandfp12 squareare measurements of the actual sanity circuits over the implemented tower, not optimized pairing-ready kernelsfp12 cyclotomic squareis a subgroup-only specialization for the final-exponentiation hard part; it must not be treated as a general Fp12 squareg2 double_with_lineandg2 mixed_add_with_lineare measurements of the actual Miller-step sanity circuits, not a full Miller loopmiller accumulator mul_by_lineis the generic baseline path, whilemiller accumulator mul_by_line sparseis the optimized public accumulator path for the current BN254 D-twist(ell_0, ell_w, ell_vw)layoutmiller loop narrownow measures the real fixed single-pair BN254 optimal-ate Miller traversal, not the earlier synthetic schedulefinal exponentiationmeasures the narrow single-pair BN254 final-exponentiation sanity circuit over a Miller-loop output, not a verifier-facing full pairing APIprofile-layout --family blocksnow also exposesfinal exponentiation easy partandfinal exponentiation hard part; the current measured split is12288rows /k=14for the easy part and492083rows /k=19for the hard part, so future optimization work should focus overwhelmingly on the hard partdocs/midnight-optimizations.mdis the canonical short list of Midnight primitives and local optimization targets; keep it updated when a newmidnight-circuitsprimitive proves useful or a local candidate is ruled outpairing checkshould always be described as the narrow verifier-shaped product-check slice with one shared final exponentiation, not as a full pairing engine or Groth16 verifier- as of the current repo state, local accumulator-square rewrites that only swap formulas inside the existing Fp12 tower did not beat the generic
miller accumulator squarecost; future square optimization likely needs a more structural/cross-step design rather than a small algebraic rewrite, so do not keep partialsquare_optimizedexperiments in the tree unless they measurably win inwrapper-cli doctor - as of the current repo state, the obvious foreign-field
linear_combination(...)replacements for shortFp2affine transforms are also ruled out by measurement; do not re-land that family of rewrites without freshdoctor/profile-layout --family blocksevidence - as of the current repo state, the retained
add_constant(...)win is specifically the fixed-twist-coefficient path inside G2 on-curve arithmetic; if a newadd_constant(...)idea does not involve a truly fixed offset already present in the formula, be skeptical and measure it before keeping the rewrite - as of the current repo state, the tested
select/is_equal*/is_zerocleanup for GT-identity checking is performance-neutral; only keep a rewrite in that family if it buys clarity or unlocks later branching logic, not because it is expected to lower rows by itself - as of the current repo state, the most promising structural local lever after the easy wins was indeed the
exp_by_neg_x(...)chain itself; signed windows are now the retained direction, so future attempts should compare against the current signed schedule rather than the older all-positive one - as of the current repo state, torus/compressed representations for cyclotomic-unitary products remain only a design path, not a retained optimization; the
y7-only prototype already lost, so future work must amortize compression across a longer region or it is unlikely to win - as of the current repo state, an explicit
CyclotomicFp12MulChipthat simply packages the current quadratic-over-Fp6product as a subgroup-aware gadget is also ruled out by measurement; future Halo2-side gadget work must bring genuinely different arithmetic, not just a different wrapper over the same tower operations - as of the current repo state, compressed cyclotomic squaring is the retained subgroup-aware arithmetic optimization; future work should compare against that implementation before revisiting any torus or explicit-mul-chip design
- cost numbers should always be described as measurements of the actual sanity circuits, not abstract algebraic lower bounds
- Prefer explicit, readable Rust over cleverness.
- Use crate-level docs and module docs when they clarify purpose.
- Keep comments purposeful and sparse.
- Keep circuit-backed adapters thin where possible.
- Avoid duplicate primitive stacks or parallel APIs for the same concept.
- Prefer removing obsolete compatibility files once the Midnight-backed path replaces them.
- Do not leave misleading stubs that imply verifier completeness.
- Delete files that have become genuinely unused instead of keeping stale alternative paths around.
- Use
thiserrorfor library-facing error types. - Use
anyhowat CLI or orchestration boundaries where context aggregation is helpful. - Errors should state what failed, at what boundary, and whether the feature is intentionally unimplemented.
- If a circuit path is deliberately unsupported at this stage, say so explicitly instead of faking behavior.
- Do not keep custom error layers that are no longer used after an integration shift.
- Every new public behavior should have at least one test at the owning layer.
- Keep unit tests near the crate that owns the behavior.
- Use arkworks as the reference implementation for BN254 field, Fp2, G1, and minimal G2 affine sanity checks when appropriate.
- Keep randomized tests deterministic via fixed seeds unless there is a strong reason not to.
- Use
wrapper-testsfor shared fixtures, integration coverage, and benchmark entry points. - Do not add tests that imply pairing or verifier support before those stages exist.
- Keep the default local test lane practical. Expensive pairing-core
MockProvertests intests/pairing.rsshould be marked#[ignore = "slow pairing-core"]unless they are truly cheap smoke coverage. - The intended split is:
- always-run: field arithmetic, narrow G1/G2 primitives, Miller-step / accumulator tests, and cheap host-side pairing-core structure checks
- slow pairing-core: real Miller-loop, final-exponentiation, and pairing-check
MockProverend-to-end tests
- To run the slow pairing-core lane explicitly, use
cargo test -p wrapper-circuits -- --ignored.
Current test expectations for the primitive layer:
Fp2tests should include algebra identities, deterministic randomized add/mul/square checks, and edge-oriented real/imaginary casesFp6tests should include algebra identities, deterministic randomized add/mul/square checks, and structured single-coordinate casesFp12tests should include algebra identities, deterministic randomized add/mul/square checks, and structuredc0-only /c1-only cases- minimal G2 tests should include valid affine points, negative on-curve cases, negation validity, and equality behavior
- narrow G2 projective tests should stay explicit about the supported domain:
from_affine,neg,double, incompleteadd, and reserved identity encoding - Miller-path G2 tests should cover
double_with_line,mixed_add_with_line, sparseFp12embedding, and explicitly unsupported exceptional cases such asP = Q - for the current narrow Miller slice, keep a few stable fixed fixtures alongside deterministic randomized checks: generator-based
double_with_line, generator-baseddouble + add, baseline-vs-sparsemul_by_linecross-checks, and at least one longer deterministic prepared schedule - explicitly keep unsupported Miller mixed-add cases documented by tests for both
P = QandP = -Q; do not silently widen support claims just because randomized tests pass - if a test needs a host-side reference formula, put the logic in
tests/support.rsand keep the domain files focused on cases/assertions - if a test-local helper becomes shared across multiple test groups, move it into
tests/support.rsin the same refactor rather than leaving partial duplicates behind
- Use Criterion.
- Keep benchmark names in the
bench_<module>_<operation>form. - Benchmarks must reflect real implemented circuits, not aspirational future behavior.
- Do not make performance claims beyond what the current benchmark actually measures.
- When changing benchmark structure, update
docs/benchmarking.mdandwrapper-cli bench-info. - For current Groth16 optimization baselines, prefer
wrapper-cli profile-layoutover ad hoc timing or new benchmark scaffolding. profile-layoutoutput is TSV and intended to be redirected to a file for before/after diffs.- The
groth16,pairing-terms, andallprofiling families are intentionally heavier thanblocksandpublic-inputs; let them finish before inspecting the output file, or the TSV may appear empty/incomplete. - The
blocksprofiling family now includesbn254_final_exponentiation_easy_part,bn254_final_exponentiation_hard_part, and totalbn254_final_exponentiation; use those rows before changing the final-exponentiation chain. - The current
pairing-termsprofiling family models one variable proof-like G2 term and the remaining terms as prepared constant verifier-key-style G2 terms, so it is intended as a Groth16-relevant scaling proxy rather than an all-variable pairing benchmark.
Current benchmark entry points include:
bench_fp_addbench_fp_mulbench_fp2_addbench_fp2_mulbench_fp2_squarebench_fp6_addbench_fp6_mulbench_fp6_squarebench_fp12_addbench_fp12_mulbench_fp12_squarebench_g1_addbench_g2_on_curvebench_g2_negbench_g2_proj_from_affinebench_g2_proj_doublebench_g2_proj_addbench_g2_double_with_linebench_g2_mixed_add_with_linebench_miller_accumulator_squarebench_miller_accumulator_mul_by_linebench_miller_accumulator_mul_by_line_sparsebench_miller_loop_narrow
Benchmark/metrics integration rules that have already bitten this repo:
wrapper-cli bench-infois derived from the canonical primitive registry incrates/wrapper-circuits/src/planning.rs; if a new primitive is missing frombench-info, fix the registry/layer wiring before touching docs text- when adding a new measured primitive, keep
crates/wrapper-tests/benches/...,crates/wrapper-tests/benches/primitives.rs,crates/wrapper-circuits/src/planning.rs,wrapper-cli bench-info, anddocs/benchmarking.mdin sync in the same turn - use explicit honest names for Miller work such as
*_narrow,*_sparse, or*_baselinewhen the slice is not a full pairing pipeline - when changing Groth16 optimization-baseline reporting, keep
crates/wrapper-circuits/src/groth16/profiling.rs,crates/wrapper-cli/src/main.rs,docs/profiling.md, and the relevant README/AGENTS references in sync in the same turn - keep profiling identifiers stable:
family,id, andlabelshould remain diff-friendly across runs unless there is a deliberate reporting-schema change - when changing final-exponentiation decomposition or local-Midnight optimization guidance, keep
crates/wrapper-circuits/src/bn254/g2/miller.rs,crates/wrapper-circuits/src/bn254/host/pairing_host.rs,crates/wrapper-circuits/src/bn254/metrics.rs,docs/profiling.md, anddocs/midnight-optimizations.mdin sync in the same turn
- Update the README when implemented scope or contributor workflow changes.
- Update
docs/architecture.mdwhen circuit boundaries or ownership changes. - Update
docs/roadmap.mdwhen stage boundaries or sequencing change. - Add or amend ADRs for architectural decisions that affect crate ownership or public interfaces.
- Be explicit about what is circuit-backed, what is reference-tested, and what is still missing.
- When cleanup removes obsolete files or paths, reflect the new simpler state in contributor docs.
When refactoring wrapper-circuits/src/bn254/:
- keep the public API stable through
bn254/mod.rsre-exports when possible - prefer splitting by concept, for example
types.rs,field.rs,fp2.rs,g2/mod.rs,g2/affine.rs,g2/jacobian.rs,g2/miller.rs,host/mod.rs,host/pairing_host.rs,metrics.rs,tests/mod.rs,tests/pairing.rs - if docs mention the primitive path, keep them pointed at
src/bn254/, not the old deletedsrc/bn254.rs - if primitive metadata, measured labels, or bench-info output changes, update the canonical registry in
wrapper-circuits/src/planning.rsfirst and derive downstream surfaces from it - after any structural refactor, update
AGENTS.mdin the same turn so it reflects the new module boundaries, reuse points, and context-loading order
- Identify the stage and boundary the change belongs to.
- Check whether the change fits an existing crate responsibility.
- Update docs first when the architecture or scope changes.
- Implement the smallest honest increment.
- Add tests that prove the increment, not future claims.
- Verify
cargo check,cargo test, and relevant benches or CLI paths when applicable.
- Do not collapse crates for convenience.
- Do not place Halo2-specific concerns in
wrapper-corewithout strong justification. - Do not move application-specific public-input naming such as Semaphore field labels into generic backend parsing.
- Do not implement broad verifier-facing full pairings, multi-pairings beyond the narrow product-check slice, or Groth16 verifier logic unless the task explicitly asks for that stage.
- Do not jump from minimal G2 affine support to G2 arithmetic or subgroup logic unless the task explicitly asks for it.
- Do not write placeholder code that pretends proofs are verified.
- Do not add a second BN254 primitive implementation path that competes with the Midnight-backed one without a documented reason.
- Do not overclaim performance or soundness from the current sanity circuits.
For tasks in the current repository state, do not assume that because fp add, fp mul, fp2, minimal G1, and minimal G2 affine support exist, the project is ready for:
- generalized
Fp12helper optimizations for pairing workloads - pairing gadgets
- Groth16 verification
- wrapped verifier composition
- public-input verifier logic
- G2 arithmetic beyond the currently implemented narrow Jacobian
from_affine/neg/double/ incompleteaddslice - extending Miller-path G2 steps into a full Miller loop without a dedicated design pass
- full MSM infrastructure
Those remain future-stage work unless the task explicitly advances the roadmap.
- Extend
wrapper-coreonly with the minimal new domain concept. - Expand
wrapper-circuitsin the narrowest possible way around the existing Midnight-backed foundation. - Preserve or improve real layout visibility when adding new circuit-backed primitives.
- Add arkworks-backed or equivalent reference tests.
- Add or update benchmarks only for code that truly exists.
- Document the design decision before scaling implementation breadth.