Skip to content

Latest commit

 

History

History
474 lines (304 loc) · 49.5 KB

File metadata and controls

474 lines (304 loc) · 49.5 KB

Methodology evolution log

Purpose. Append-only dated decision log. §0 is the compact current-state snapshot; §1 is the chronological entries; §2 is open methodology questions; §3 is the artefact map. For agent-facing single-page orientation, read STATUS.md at the repo root — that is the routing layer; this file is the why.

Update rule. When methodology changes, append a short dated entry to §1 (decision-shaped: trigger, decision, evidence pointer, impact, open work). If the change moves any of {deployed methodology, served τ range, active workstream, headline metrics, deployment artefact path, wire format}, also update §0 here and STATUS.md in the same commit. Long derivations belong in linked reports under reports/, not in this file.

Note on historical filename references. Pre-2026-05-04 entries below cite working docs at the project root (M6_REFACTOR.md, PHASE_7_RESULTS.md, PHASE_8.md, VALIDATION_BACKLOG.md). Those files were moved into reports/active/ and lowercased on 2026-05-04 — current paths are reports/active/m6_refactor.md, reports/active/phase_7_results.md, reports/active/phase_8.md, reports/active/validation_backlog.md. Historical entries are left unchanged to preserve the dated record.


0. State of the world

Last compacted 2026-05-02. Operational state reflects the 2026-05-01 scryer reconciliation.

Product and repo role

Soothsayer is the analysis, serving, and on-chain-publish layer for a calibration-transparent fair-value oracle for tokenized RWAs on Solana. Upstream data fetching belongs to the sibling scryer repo. Soothsayer consumes parquet from SCRYER_DATASET_ROOT, serves calibrated bands, writes derived artefacts, and builds protocol-facing policy demos.

Product progression:

  • v0 — calibrated band primitive / unified-feed router. Open-hours Layer 0 multi-upstream router plus closed-hours Soothsayer band. Devnet router deployed 2026-04-29 at AZE8HixpkLpqmuuZbCku5NbjWqoQLWhPRTHp8aMY9xNU.
  • v1 — calibrated event stream. Consumer-configured threshold events with calibration receipts; gated on Paper 3.
  • v2 — parameterized decision SDK. Client-side Rust/TS library for cost-weighted recommendations; 2027 track.

Current methodology constants (v2 / M5)

  • Architecture: Mondrian split-conformal by regime + factor-adjusted point + δ-shifted c(τ).
  • Default deployment target: τ = 0.85.
  • Headline Paper 1 validation target: τ = 0.95.
  • Served range: τ ∈ [0.68, 0.99]; M5 closes the v1 finite-sample tail ceiling at τ=0.99 at the cost of a 22% wider band.
  • Forecaster: single mondrian lookup (wire forecaster_code = 2); no per-regime forecaster choice.
  • REGIME_QUANTILE_TABLE (12 trained scalars, pre-2023 calibration set):
    • normal: {0.68: 0.006070, 0.85: 0.011236, 0.95: 0.021530, 0.99: 0.049663}
    • long_weekend: {0.68: 0.006648, 0.85: 0.014248, 0.95: 0.031032, 0.99: 0.071228}
    • high_vol: {0.68: 0.011628, 0.85: 0.021460, 0.95: 0.042911, 0.99: 0.099418}
  • C_BUMP_SCHEDULE (4 OOS-fit scalars on the 2023+ slice): {0.68: 1.498, 0.85: 1.455, 0.95: 1.300, 0.99: 1.076}.
  • DELTA_SHIFT_SCHEDULE (4 walk-forward-fit shifts): {0.68: 0.05, 0.85: 0.02, 0.95: 0.00, 0.99: 0.00}.
  • Code truth: src/soothsayer/oracle.py, crates/soothsayer-oracle/src/{config,oracle}.rs, data/processed/mondrian_artefact_v2.parquet (per-Friday rows), and data/processed/mondrian_artefact_v2.json (audit-trail sidecar). v1 bounds parquet (v1b_bounds.parquet) and v1 oracle code path are deprecated; v1 diagnostic scripts archived under scripts/v1_archive/.

Validated empirical claims

Held-out 2023+ slice: 1,720 rows, 172 weekends, 10 tickers.

τ Realized Kupiec Christoffersen Status
0.68 0.678 0.893 0.647 pass
0.85 0.855 0.541 0.185 pass
0.95 0.950 1.000 0.485 pass
0.99 0.977 rejects 0.956 disclose ceiling

Important diagnostics:

  • Walk-forward: deployed τ=0.95 buffer sits at cross-split mean; τ=0.85 buffer is conservative.
  • Inter-anchor sweep: 47/50 targets pass Kupiec; failures at τ=0.50/0.51 over-cover and τ=0.99 tail ceiling.
  • Leave-one-out: pooled calibration transfers to unseen tickers; 26/30 cells pass Kupiec.
  • Reviewer-tier diagnostics: per-anchor calibration is supported; full-distribution PIT uniformity is not claimed.
  • Window sweep: 156-weekend calibration window is the only tested window passing all three main anchors simultaneously.

Detailed evidence lives in reports/v1b_calibration.md, reports/v1b_ablation.md, reports/v1b_diagnostics_extended.md, reports/v1b_window_sensitivity.md, and reports/v1b_leave_one_out.md.

Data spine

Hard rule: no upstream fetching in Soothsayer. Use soothsayer.config.SCRYER_DATASET_ROOT and soothsayer.sources.scryer loaders.

Current important scryer datasets:

  • Core oracle/paper data: yahoo/equities_daily, yahoo/earnings, yahoo/corp_actions, nasdaq/halts, backed/corp_actions, backed/nav_strikes, cme/intraday_1m.
  • Oracle tapes: pyth/oracle_tape, chainlink_data_streams/report_tape, redstone/oracle_tape, kamino_scope/oracle_tape, soothsayer_v5/tape.
  • Market / protocol data: geckoterminal/trades, kraken/funding, solana_dex/xstock_swaps, kamino/liquidations, marginfi/reserves.

If docs and disk disagree, verify the live root and then update the doc; do not infer path names from older shorthand.

Paper 1 status

Paper 1 validates the calibration-transparent oracle primitive, not welfare-optimal protocol policy. Draft sections exist under reports/paper1_coverage_inversion/. Remaining operational work:

  • Finish method/data/serving sections and coherence pass.
  • Keep comparator wording clean: flat ±300bps is a stylized continuity baseline, not the literal Kamino incumbent.
  • Update Chainlink v10/v11 / 24-5 cadence framing from the latest decoder and scryer evidence.
  • Add caveats for Wayback halt sparsity and any live-xStock claim that is not backed by tape.
  • Consider replacing daily-factor weekend DiD with cme/intraday_1m before submission.

Paper 3 status

Paper 3 is now a three-claim liquidation-policy paper:

  • Geometric. Real per-reserve adverse-move buffers split Kamino-xStocks into narrow-buffer SPYx/QQQx (~2.7%) and wide-buffer remaining reserves (14-25%).
  • Structural. Soothsayer's band avoids Kamino's block-state failure mode from PriceHeuristic, TWAP divergence, or staleness gates.
  • Empirical. Kamino-xStocks is the xStock event panel: kamino/liquidations/v1 has 102 events over 2025-08 to 2026-04, including a 2025-11 cluster. The earlier 30-day-zero finding was a sampling artifact.

Deployment framing is two substrates, two questions:

  • MarginFi is the cleanest deployment-substrate argument for general lending because assets use P-conf, liabilities use P+conf maps directly to (lower, upper).
  • Kamino-xStocks is the xStock-specific empirical home. MarginFi has zero direct xStock Banks among 422 scanned Banks; xStock exposure there is indirect through Kamino-routed oracle setups and becomes cross-protocol propagation evidence when MarginFi liquidations land.

Current Paper 3 next work:

  • Draft §1/§2/§3/§4/§6 around the three-claim structure.
  • Analyze the 2025-11 Kamino liquidation cluster.
  • Fit dynamic-bonus curve / D_repaid distribution from kamino/liquidations/v1.
  • Extend protocol comparison to reserve-buffer truth, path-aware truth, and class-disaggregated results.

Router / relay status

Router v0 is deployed on devnet and tested against a real Pyth Pull SOL/USD feed. Layer 0 includes Pyth aggregate, Switchboard On-Demand, Chainlink Streams Relay PDA read, and RedStone placeholder. Open-hours Layer 1 calibration is gated on ~3 months of upstream forward tape.

Relay fleet lock:

  • Chainlink Streams needs a relay program + daemon.
  • Pyth equities need only a poster daemon using Pyth's existing receiver; no new program.
  • Production commitments: verifier-CPI or upstream signature verification, open-source daemons, v1 multi-writer migration, public cadence/uptime reporting, no-position policy with an auditable enforcement mechanism.

Calendar lock:

  • Router embeds NYSE full-close and early-close table for 2024-2027 plus DST-safe UTC windows.
  • Next refresh trigger: next ICE PR, ad-hoc SEC closure, or 2027 window approaching.
  • CME calendar remains deferred until a CME-tracked asset is added to router config.

Paper 4 / product-stack status

Paper 4 is the AMM arc: calibration-conditioned liquidity / auditable LVR-recovery lower bound for RWA AMM pools. It is post-grant, but its panel needs forward tapes now.

Scryer item 51 is already locked for Phase A capture:

  • jito_bundle_tape.v1
  • validator_client.v1
  • clmm_pool_state.v1
  • dlmm_pool_state.v1
  • dex_xstock_swaps.v1 promotion/backfill/forward-poll

Soothsayer-side future work after rows exist: pool-state reconstructor, path-aware truth labeller, bundle-attribution labels, counterfactual replay engine. See reports/paper4_oracle_conditioned_amm/scryer_pipeline_plan.md.


1. Recent decision log

2026-05-04 — M6 σ̂ promoted from K=26 trailing window to EWMA HL=8 (Phase 5)

Trigger. Phase 2 §11 discussion-list item 2 flagged that LWC has split-date Christoffersen rejections at τ=0.95 at the 2021 + 2022 anchors (p = 0.0065 / 0.0016) — the per-symbol scale series was slowly-varying, so a calm streak under-estimated σ̂ going into vol shocks and violations clustered at regime boundaries. M6_REFACTOR.md Phase 5 specced an EWMA σ̂ prototype to test whether shortening the effective memory closes the rejection.

Variants tested. Three pure EWMA half-lives (HL ∈ {6, 8, 12} weekends, λ = 0.5^(1/HL)) plus one convex blend (0.5 · σ̂_K26 + 0.5 · σ̂_EWMA_HL8). All five variants share the K=26 baseline's warm-up rule (≥ 8 past obs); evaluable rows are identical at 5,916 / 5,996 (80 dropped at panel start). Driver: scripts/run_sigma_ewma_variants.py. Full evidence: reports/m6_sigma_ewma.md.

Findings (15 cells × 5 variants of split-date Christoffersen + per-symbol Berkowitz / Kupiec + bootstrap CI on width):

  • EWMA HL=8 is the only variant with zero split-date Christoffersen rejections at α=0.05 across the 16-cell (4 splits × 4 τ) grid. The two cells the brief was targeting (2021 / 2022 × τ=0.95) jump from p ∈ {0.0065, 0.0016} → {0.1153, 0.1861}. HL=6 has 3 rejections at τ=0.68 (memory too short); HL=12 has 1 rejection at 2021 × τ=0.85; the convex blend has 0 rejections but is dominated by HL=8 on margins.
  • Per-symbol Kupiec at τ=0.95 stays 10/10 under HL=8 (baseline 10/10) — no symbol crosses the α=0.05 threshold. The pre-existing per-symbol Berkowitz outliers (TLT, TSLA, GOOGL) split: GOOGL clears at α=0.01 under HL=8 (LR 14.45 → 10.24); TLT and TSLA still reject (cross-sectional common-mode, σ̂-rule-orthogonal).
  • Pooled half-width at τ=0.95: 385.3 → 370.6 bps (-3.83%). Block-bootstrap 95-CI upper on Δhw% across all τ = +0.25% (max at τ=0.68; ≤ 0% at τ=0.85/0.95/0.99). Promotion gate: ≤ +5%. Coverage-neutral (Δrealised CI straddles zero everywhere).

Decision. Promote EWMA HL=8 as the canonical M6 σ̂ rule. The promotion criterion (split-date Christoffersen + per-symbol Kupiec ≥ 8/10 + bootstrap-95-CI hw upper ≤ +5%) passes uniquely at HL=8.

Operational changes (2026-05-04).

  1. src/soothsayer/backtest/calibration.py — added add_sigma_hat_sym_ewma, add_sigma_hat_sym_blend; compute_score_lwc gained a scale_col parameter. SIGMA_HAT_K=26 still exposed for archival.
  2. scripts/build_lwc_artefact.py--variant {baseline_k26, ewma_hl8} flag (default ewma_hl8). Sidecar gains _lwc_variant, sigma_hat.method, sigma_hat.half_life_weekends, sigma_hat.raw_column.
  3. data/processed/lwc_artefact_v1.{parquet,json} rebuilt under HL=8. The Oracle's read-path column name (sigma_hat_sym_pre_fri) is preserved; only what populates it changed.
  4. New canonical freeze data/processed/lwc_artefact_v1_frozen_20260504.{json,parquet} (sha 7b86d17a76912aa0…). K=26 freeze archived to lwc_artefact_v1_archive_baseline_k26_20260504.{json,parquet} (renamed outside the _frozen_* glob so forward-tape auto-discovery picks the new freeze unambiguously).
  5. Smoke (scripts/smoke_oracle.py --forecaster lwc) + 11 non-bot pytest passes. M5 forecaster path byte-for-byte unchanged.

Evidence. reports/m6_sigma_ewma.md (full pack); reports/tables/sigma_ewma_summary.csv, sigma_ewma_split_sensitivity.csv, sigma_ewma_per_symbol.csv, sigma_ewma_bootstrap.csv, and 5 sigma_ewma_<variant>_delta_sweep.csv. Driver: scripts/run_sigma_ewma_variants.py.

Impact. Paper-revision-ready evidence pack for the σ̂ swap is in place. Phase 6 (sample-size simulation) and Phase 7 (Rust port) will consume the EWMA HL=8 implementation. Wire format unchanged — forecaster_code = 3 reservation still holds; the canonical M6 read path is variant-agnostic at the parquet column level.

Open work. TLT / TSLA Berkowitz remain rejected at α=0.01 — cross-sectional common-mode, M6a territory (W8 r̄_w predictor still gating). 2021 × τ=0.99 Kupiec rejection (p≈0.018) persists across all variants — small-sample artefact at the tail-edge anchor. Phase 6 will test whether HL=8 generalises as N shrinks toward HOOD's regime (~200 weekends) and toward newly-listed-symbol admission thresholds.

2026-05-03 — Paper 1 robustness pass + v3 bake-off; M5 frozen as Paper 1, v3 candidates routed as roadmap evidence

Trigger. Paper-1 review pass enumerated eight reviewer-anticipated robustness gaps. After running them, three v3 methodology candidates were specced and bake-off-tested against M5 to inform §10 future-work routing. The bake-off ran after Paper 1's validation loop closed; this entry records the explicit decision to keep M5 as the Paper 1 primitive and route C1/C2/C4 as roadmap evidence rather than retroactive paper revision.

Robustness findings (reports/v1b_paper1_robustness.md, reports/tables/v1b_robustness_*.csv):

  • Vol-tertile sub-split refutes the "coarse classifier" diagnosis. 5-cell sub-split of normal regime leaves Berkowitz LR unchanged (173.0 → 175.0) while widening τ=0.95 band by 9%. Lag-1 decomposition (v1b_density_rejection_lag1_decomposition.csv) confirms the AR(1) signal lives in cross-sectional within-weekend ordering (ρ_cross = 0.354, p<10⁻¹⁰⁰), not temporal-within-symbol (ρ_time = -0.032, p=0.18).
  • Per-symbol calibration is bimodal. SPY/QQQ/GLD/TLT/AAPL reject Berkowitz from variance compression (bands too wide); TSLA/HOOD/MSTR reject from variance expansion (bands too narrow); NVDA/GOOGL pass. HOOD per-symbol Kupiec FAILS at τ ∈ {0.68, 0.85, 0.95} (13.9% violation rate at τ=0.95); passes at τ=0.99. Disclosed in Paper 1 §6.4.1 / §9.4.
  • Split-date sensitivity passes cleanly. OOS-anchors {2021, 2022, 2023, 2024}: realised τ=0.95 ∈ {0.9507, 0.9502, 0.9503, 0.9504}, all Kupiec p > 0.86. "Lucky on 2023" foreclosed.
  • LOSO shows moderate fragility. Heavy-tail held-out tickers undercovered (MSTR 0.786, HOOD 0.856, TSLA 0.879); well-behaved tickers over-covered (SPY/TLT 1.000). Consistent with bimodal per-symbol pattern.
  • Per-class deviation. Equities 0.9386 at τ=0.95 (Kupiec p=0.06); GLD/TLT over-cover (FAIL too-wide). Same root cause as bimodality.
  • GARCH(1,1) baseline fails Kupiec at τ ∈ {0.68, 0.95, 0.99}. Textbook econometric default loses head-to-head; 9% sharper at τ=0.95 but undercovers (0.9254 vs M5's 0.9503).
  • Path-fitted conformity (CME subset, n=3,861) mechanically validates. Bands widen 5–10%; endpoint Kupiec preserved. Binding answer to §6.6 perp/on-chain gap still requires consumer-experienced sample.

v3 bake-off findings (reports/v3_bakeoff.md, reports/tables/v3_bakeoff_*.csv):

Variant τ=0.95 realised HW (bps) Δ vs M5 per-symbol Kupiec pass ρ_cross
M5 0.9503 354.9 2/10 0.252
C1 LWC + regime 0.9503 385.3 +8.6% 10/10 0.249
C2 M6b2 class 0.9503 302.6 −14.7% 8/10 0.259
C4 stacked 0.9555 379.6 +7.0% 9/10 0.280

C1 wins per-symbol calibration; C2 wins sharpness. C4 over-corrects (double-counts per-symbol scale). None address ρ_cross — that remains M6a's province, gated on the Friday-observable r̄_w predictor (W8 rejected).

Decision.

  1. Freeze M5 as the Paper 1 calibration-transparent endpoint-band primitive. Paper claim: "M5 is per-anchor calibrated, deployment-calibrated and walk-forward-stable, with disclosed per-symbol heterogeneity and full-distribution limits." C1/C2/C4 are not promoted into the main methodology — they ran outside the validation loop and would invite a "how many variants did you try?" review question.
  2. Diagnostics that strengthen Paper 1 are folded in: per-symbol bimodality (§6.4.1), split-date sensitivity (§6.3), LOSO fragility (§6.3 / §9.3), GARCH baseline (§6.4.2), per-class deviation (§6.4.3 short paragraph), path-fitted V3.3 mechanical validation (§10.1).
  3. v3 candidates routed:
    • C1 LWC → Paper 1 §10.4 lead future candidate for per-symbol calibration; not adopted in this paper. Re-evaluate after V3.2 rolling-rebuild + paper-grade walk-forward.
    • C2 M6b2 → already deployed for Lending profile. Per-class collateral-buffer narrative belongs in Paper 3 (lending policy), not Paper 1 main body.
    • C4 stackedrejected (Pareto-dominated; double-counts per-symbol scale). VALIDATION_BACKLOG.md W9 records the rejection.
    • M6a common-mode partial-out → already on hold (W8 rejected). Brief mention in §10.4.
  4. §10.2 revised: "sub-regime granularity" demoted to a one-sentence "empirically refuted, recorded so the candidate is not re-suggested" note; full-distribution conformal (CQR) and conditional EVT remain as engineering-gated upgrades; the previous "common-mode partial-out — the lead candidate" framing was over-committed and has been pulled back.
  5. New helper module src/soothsayer/backtest/calibration.py::{compute_score, train_quantile_table, fit_c_bump_schedule, serve_bands, fit_split_conformal} — shared M5 fit/serve primitive used by scripts/run_v1b_*.py robustness runners and scripts/run_v3_bakeoff.py.

Evidence. reports/v1b_paper1_robustness.md (robustness brief), reports/v3_bakeoff.md (bake-off brief), 8 + 3 CSVs under reports/tables/v1b_robustness_*.csv and reports/tables/v3_bakeoff_*.csv. Runners: scripts/run_v1b_{per_symbol_diagnostics,vol_tertile,garch_baseline,split_sensitivity,loso,per_class,path_fitted_conformal}.py, scripts/run_v3_bakeoff.py. Backlog entry: VALIDATION_BACKLOG.md W9.

Impact. Paper 1 has a clean validation story (M5 + disclosed heterogeneity) and a credible v3 roadmap (§10.4) with C1 as lead per-symbol candidate. M5 stays deployed unchanged. M6b2 stays deployed for Lending profile, with Paper 3 as the natural empirical home. The dual-profile architecture is preserved.

Open work. None methodologically-new beyond what was already on the roadmap (M6a r̄_w predictor Sunday-Globex variant W8b, V3.1 F_tok variant W8c, V3.2 rolling rebuild, V3.3 perp/on-chain sample maturation, full-distribution CQR upgrade).

2026-05-03 — Dual-profile methodology family architecturally locked (post-M5)

Trigger. Two v3 leads quantified on 2026-05-02 in VALIDATION_BACKLOG.md W2-followup:

  • M6a — common-mode residual partial-out (upper bound). β̂=0.811, R²(train)=0.278, R²(OOS)=0.255 against the leave-one-out weekend-mean residual. Cross-sectional within-weekend ρ on the signed residual drops 0.41 → 0.07. At τ=0.95 OOS half-width 309 bps vs M5's 355 bps (-13%). Caveat: r̄_w^(−i) is Monday-derived; deployable M6a needs a Friday-observable forward predictor (R²(forward) ≥ 0.4 gate).
  • M6b2 — per-class Mondrian (deployable). Conformal cell partitioned by symbol_class (6 cells: equity_index, equity_meta, equity_highbeta, equity_recent, gold, bond) instead of by regime. At τ=0.95 OOS half-width 304 bps vs M5's 355 bps (-14%) at matched coverage. M6b1 (per-symbol, 10 cells) gives -16% at τ=0.95 but pays +6% at τ=0.99 (HOOD's thin tail); M6b3 (class × regime, 18 cells) is wider than M6b2 (sample dilution). M6b2 is the chosen variant.
  • M6c — stacked. M6a + M6b2 stack with efficiency ≈ 0.87 at τ ∈ {0.85, 0.95}. Combined OOS half-width at τ=0.95 is 271 bps — 24% narrower than M5, 39% narrower than v1.

The W2-followup also mapped 22 items in the planned product stack (docs/product-stack.md) to the two leads: 8 cleanly map to AMM-track (M6a-based), 8 to Lending-track (M6b2-based), 4 want both (event stream, decision SDK, settlement licensing), 8 are profile-agnostic infrastructure. No item is left without a clean assignment.

Decision. Architecturally lock a dual-profile methodology family for the post-M5 rollout. One methodology family (factor-adjusted point + Mondrian split-conformal + δ-shifted c(τ) bump) shared across both profiles; profiles differ only in (a) score residualisation and (b) conformal cell partition. Two parallel publishers, two parallel parquet venues, same PriceUpdate Borsh wire format with a new profile_code byte (1 = Lending, 2 = AMM). Working doc with the staged rollout: M6_REFACTOR.md (root, to be deleted on completion).

Sequence:

  1. Phase A — Lending-track (M6b2) shipping. Ships next. Direct Kamino/Paper-3 win. Wire-format-compatible. Layered W4 (asymmetric quantile pair) sub-axis on top, gated on the base Lending-track artefact build.
  2. Phase B — AMM-track (M6a-deployable) shipping. Gated on VALIDATION_BACKLOG.md W8 (r̄_w forward predictor prototype) achieving R²(forward) ≥ 0.4. Below the gate, Phase B defers; above the gate, AMM-track ships and unlocks Layer 1 (Band-AMM) + Layer 4 AMM-licensee pricing-tier differentiation.

Evidence.

  • reports/v1b_m6a_common_mode_partial_out.md, reports/tables/v1b_m6a_common_mode_oos.csv, reports/tables/v1b_m6a_common_mode_fit.csv.
  • reports/v1b_m6b_per_symbol_class_mondrian.md, reports/tables/v1b_m6b_per_symbol_class_oos.csv, reports/tables/v1b_m6b_per_cell_quantiles.csv.
  • reports/v1b_m6c_combined.md, reports/tables/v1b_m6c_combined_oos.csv.
  • reports/v1b_density_rejection_localization.md — W2 prerequisite finding that surfaced both leads.

Impact. No change to deployed §0 state today (M5 single-profile remains live). §0 will update on Phase A completion (Lending-track shipped) and again on Phase B completion (AMM-track shipped). Paper 1 §10 future work re-sequences from "v3 items" to a structured M6 family + dual-profile rollout. Paper 3 protocol_semantics worked example will be regenerated under M6b2 widths during Phase A. Paper 4 inherits the M6a / AMM-track once Phase B ships.

Open work.

  • M6_REFACTOR.md — Phase A (Lending-track) checklist; Phase B (AMM-track) checklist gated on W8.
  • VALIDATION_BACKLOG.md — W4 (asymmetric layer) re-staged as Lending-track sub-axis; W8 (r̄_w forward predictor) opened.
  • docs/product-stack.md — refreshed with the dual-profile architecture and per-layer track assignment.

2026-05-03 — W8 result: Friday-close r̄_w predictor rejected; AMM-track shipping deferred

Trigger. Same-day execution of VALIDATION_BACKLOG.md W8 prototype. The architectural decision earlier today gated AMM-track shipping (Phase B of M6_REFACTOR.md) on a Friday-observable predictor of r̄_w achieving R²(forward) ≥ 0.40. W8 tested that gate.

Decision. Reject at the Friday-close-only feature set; AMM-track shipping deferred indefinitely until either (a) a Sunday-Globex republish architecture is engineered (W8b, ~3–4 weeks gated on a scryer Sunday-evening futures fetcher) or (b) V3.1 F_tok data accumulates ≥150 weekends of post-launch xStock cross-section (W8c, ETA Q3–Q4 2026). No change to today's earlier dual-profile architectural lock — Lending-track shipping (Phase A) proceeds as planned.

Evidence. reports/v1b_r_bar_forward_predictor.md, data/processed/r_bar_predictor_v1.json, scripts/run_r_bar_forward_predictor.py. Six model variants tested on 458 train + 173 OOS weekends (split at 2023-01-01):

Model Features R²(train) R²(OOS)
M0_ar1 r_bar_lag1 0.003 0.005
M1_vol_ols macro vol level + Δ (6 features) 0.079 −0.060
M2_full_ridge10 full Friday-observable set (13 features), α=10 0.111 −0.050

All non-AR models go negative on OOS — they overfit on TRAIN, predict worse than the train-mean on OOS. Cross-sectional within-weekend ρ on the OOS PITs after r̄_w_hat partial-out is essentially unchanged: raw 0.4147 → after 0.4134.

Three diagnostic findings.

  1. r̄_w has no autoregressive structure week-over-week. R²(M0_ar1) ≈ 0.005. Common-mode residuals don't persist past a single weekend.
  2. Friday-close macro vol features overfit. Non-trivial R²(train) ≈ 0.08–0.11; negative R²(OOS) across all regularisations. The vol features predict TRAIN noise.
  3. factor_ret already absorbs nearly all of what Friday state can predict. What remains in r̄_w is approximately unpredictable from currently-observable Friday state. This is a feature, not a bug — it confirms factor_ret is doing its job — but it forces the architecture pivot below.

Architectural pivot. Earlier today's dual-profile lock framed M6c (271 bps at τ=0.95, 39% narrower than v1) as the "ceiling" achievable if a forward predictor of r̄_w landed. W8 now sets the lower bound on that ceiling: under current data, the deployable AMM-track gain is ≈0%, not 13%, because the predictor can't extract signal beyond what factor_ret already extracts. The headline reframe:

  • Today (Lending-track shipping next). M6b2 delivers ~50% of the M6c ceiling over M5 with no data dependency. -14% half-width at τ=0.95 vs M5; -31% vs v1. Direct Kamino/Paper-3 win.
  • Future (AMM-track). Either a Sunday-Globex republish architecture (engineering) or V3.1 F_tok signal accumulation (data-gated, ≥6 months) reopens W8. Until one fires, AMM-track stays parked.

Updated M6_REFACTOR.md Phase B status. Deferred indefinitely with two re-opened paths (W8b Sunday-Globex, W8c V3.1 F_tok) tracked in the backlog. Phase B's deliverable checklist remains documented as a reference for whichever predictor path clears the gate first.

Paper / docs cascade.

  • reports/paper1_coverage_inversion/10_future_work.md — V3.4 entry on M6 family should now read: "M6b2 (per-class Mondrian) is the deployable v3 winner; M6a (common-mode partial-out) is the documented upper bound, deployable only when forward signal accumulates beyond Friday-close state." Defer the §10 update until Phase A ships.
  • reports/paper4_oracle_conditioned_amm/ — narrative needs an honest "AMM-track band has Friday-close cadence today; Sunday-republish cadence is the deployment ceiling" line rather than overclaiming the M6c gain.

Open work.

  • VALIDATION_BACKLOG.md W8b (Sunday-Globex republish predictor) — opens when scryer adds a Sunday-evening futures snapshot fetcher.
  • VALIDATION_BACKLOG.md W8c (V3.1 F_tok-based predictor) — opens when V5 tape reaches ≥150 weekends.
  • Both are gated on data/engineering external to today's analysis. No active work on Phase B until one fires.

2026-05-03 — W4 result: symmetric M6b2 stays on wire; auxiliary one-sided table adopted for lending consumers

Trigger. Same-day execution of VALIDATION_BACKLOG.md W4. The dual-profile lock earlier today staged W4 as a Lending-track sub-axis to test whether the residual distribution per symbol_class is asymmetric enough to justify replacing the symmetric b_sym band with an asymmetric (q_low, q_high) pair.

Decision. Two-question split:

  • Q1 — Replace symmetric with asymmetric on the wire? No (Disclose-not-deploy). Pooled width-delta at τ=0.95 = +2% (asymmetric is wider at matched two-sided coverage). Materially-asymmetric cells: 2 of 21 (10%). Per-class TRAIN skewness is real and substantial (equity_meta skew = −1.80, gold = −2.32, equity_index = −0.90, equity_highbeta = −0.97), but the equal-tail asymmetric rule reallocates between tails rather than shrinking total band.
  • Q2 — Publish auxiliary per-class one-sided quantile table for lending consumers? Yes (Adopt as auxiliary). Headline at τ_one = 0.95: lending-consumer buffers shrink 14–39% vs the symmetric b_sym(0.95) they read today. The symmetric two-sided band over-covers the consumer's one-sided contract (e.g., equity_highbeta: symmetric 451 bps for what's actually a 97.5% one-sided guarantee, vs the auxiliary q_low_one(0.95) = 275 bps for the consumer's specified 95% one-sided guarantee — same statistical contract, 39% less collateral).

Evidence. reports/v1b_w4_asymmetric_coverage_lending.md, reports/tables/v1b_w4_asymmetric_per_class_tau.csv, reports/tables/v1b_w4_asymmetric_one_sided.csv, reports/tables/v1b_w4_skewness_train.csv, scripts/run_asymmetric_coverage.py.

OOS realised one-sided coverage validates: at τ_one=0.95 the auxiliary q_low_one delivers 0.90–0.97 across classes (within sample-size CIs of 0.95), while the symmetric b_sym(0.95) over-covers at 0.93–0.99.

Impact.

  • Wire format unchanged. Published lower / upper continue to be point ± b_sym(class, τ)·fri_close. Existing consumers see no breaking change.
  • M6_REFACTOR.md Phase A7 redirected. Original scope was "two-sided asymmetric pair on the wire"; new scope is "auxiliary one-sided per-(symbol_class, τ_one, side) table in the artefact JSON sidecar + consumer SDK accessor." Effort drops from ~3 days to ~half-day.
  • Paper 3 §Structural narrative upgrade. The auxiliary table replaces Kamino's ad-hoc per-reserve buffer setup with calibrated per-(symbol_class, τ_one, side) receipts. MarginFi assets-vs-liabilities maps cleanly to q_low_one / q_high_one at the consumer's specified one-sided τ. This is the strongest empirical Paper 3 lever from the W2 → W4 chain.

Open work.

  • Phase A1 (other agent in flight) extends to emit the auxiliary table — scope is documented in the W4 deliverables list. Wire format work in A4 is unchanged.
  • Paper 3 worked-example regeneration is queued for Phase A6 once the artefact builder lands.
  • Q1's negative result is documented in reports/v1b_w4_asymmetric_coverage_lending.md §Decision; no further W4-style asymmetric work on the AMM-track or other profiles unless a future workstream re-opens it.

2026-05-XX — M5 / v2 deployment shipped; v1 hybrid Oracle retired

Trigger. Completion of M5_REFACTOR.md working doc following the 2026-05-02 M5 validation entry below. After re-evaluating the Colosseum constraint (the user is no longer firmly committed to a 2026-05-10 hackathon submission), the staged migration was collapsed: M5 deploys directly with no v1 transition window.

Decision. Deployed the v2 / M5 architecture (Mondrian split-conformal by regime + factor-adjusted point + δ-shifted c(τ)) end-to-end. v1 hybrid forecaster Oracle (F1_emp_regime + per-target additive buffer + per-regime forecaster choice) is retired; v1 calibration-surface API (compute_calibration_surface, pooled_surface, invert) deleted from src/soothsayer/backtest/calibration.py; v1 Oracle constructor signature replaced with single-arg artefact load; v1 diagnostic and validation scripts moved to scripts/v1_archive/ (24 scripts) under a deprecation banner.

Code changes.

  • Python serving: src/soothsayer/oracle.py rewritten (~210 lines) around REGIME_QUANTILE_TABLE + C_BUMP_SCHEDULE + DELTA_SHIFT_SCHEDULE module constants; Oracle.fair_value is now a 5-line lookup. band_evaluator.py unchanged (constructor signature compatible).
  • Python build: new scripts/build_mondrian_artefact.py train-fits the 12 quantiles + 4 c(τ) scalars from the panel + writes data/processed/mondrian_artefact_v2.parquet (per-Friday rows) + mondrian_artefact_v2.json (audit-trail sidecar).
  • Rust serving: crates/soothsayer-oracle/src/{config,oracle,types}.rs rewritten; surface.rs deleted. Single Oracle::load(artefact_path) entry point; output byte-identical to Python on 90/90 parity cases (scripts/verify_rust_oracle.py).
  • Wire format: byte-identical across the v1 → M5 migration. PriceUpdate Borsh layout unchanged; forecaster_code = 2 slot relabelled FORECASTER_RESERVED_2 → FORECASTER_MONDRIAN in programs/soothsayer-oracle-program/src/state.rs and crates/soothsayer-consumer/src/lib.rs. Existing v1 PriceUpdate accounts (codes 0/1) decode cleanly under M5 consumers.
  • Publisher CLI: --surface and --pooled args dropped; default artefact path is data/processed/mondrian_artefact_v2.parquet. PrepPublish + payload encoder updated to emit forecaster_code = 2 for forecaster_used = "mondrian".

Paper / docs cascade.

  • Paper 1 (reports/paper1_coverage_inversion/): §0 abstract rewritten; §1 introduction headline numbers updated (354 bps at τ=0.95, 0.990 realised at τ=0.99); §4 methodology rewritten around Mondrian; §5 split-section description updated; §6 results regenerated at M5 numbers (per-regime, pooled, walk-forward, density tests); §7 ablation re-framed (§7.1–§7.5 retained as v1-historical, §7.5 taxonomy updated to mark each component as inherited / cosmetic / removed under M5; §7.6 stress test retained; §7.7 Mondrian ablation retained as the architecture-justification ablation); §8 serving-layer prose updated (90/90 parity, wire-format invariance disclosure); §9 limitations updated (§9.1 tail ceiling closed, §9.4 OOS-tuning provenance restated for c(τ)+δ(τ), §9.5 Berkowitz / DQ disclosure restated, §9.6 hybrid policy retired, §9.7 90/90 parity); §10 future work re-sequenced as v3 items; §11 conclusion restated under M5; §2.3 + references.md updated with Mondrian split-conformal citations.
  • Paper 3 (reports/paper3_liquidation_policy/protocol_semantics.md): worked example numerics regenerated for SPY 2026-04-24 at M5 widths; per-reserve flip-threshold table updated to per-regime widths.
  • Paper 4 (reports/paper4_oracle_conditioned_amm/): devnet_artefacts.json updated with M5-derived SPY/QQQ bands and forecaster_code = 2; colosseum_implementation_brief.md narrative updated for M5 widths.
  • Top-level docs: README.md evidence snapshot regenerated; CLAUDE.md Current State replaces v1 constants with M5; docs/product-spec.md hybrid-forecaster section replaced with M5 description; docs/v1.5-deployment-spec.md marked superseded; reports/bear_case.md gates 2.A and 3.E updated to "PARTIAL — v2 / M5 closes". Landing page (landing/{index,dashboard}.html) headline numbers and methodology blocks updated.

Empirical headline (unchanged from 2026-05-02 validation). OOS 2023+ slice (1,730 rows × 173 weekends): at τ=0.95, realised 0.950 with Kupiec p=0.956, Christoffersen p=0.912, mean half-width 354.5 bps — 20% narrower than the v1 Oracle's 443.5 bps at indistinguishable Kupiec calibration (block-bootstrap CIs exclude zero on width, straddle zero on coverage). At τ=0.99, M5 hits realised 0.990 with Kupiec p=0.942 (closes the v1 finite-sample tail ceiling at 0.972 at the cost of a 22% wider band). 6-split walk-forward passes Kupiec at every anchor (per-anchor p=0.43, 0.37, 0.36, 0.32). Berkowitz LR=173.1 and DQ at τ=0.95 (stat=32.1, p=5.7e-6) both reject — same per-anchor-only calibration profile as v1.

Working doc deleted. M5_REFACTOR.md removed from repo root; this methodology log entry is the deployment receipt.

Open work — v3. Full-distribution conformal upgrade to close the Berkowitz / DQ rejections (§10.1 V3.5); rolling artefact rebuild on a live deployment window (§10.1 V3.2); MEV-aware consumer-experienced coverage (§10.1 V3.3); intra-weekend forward-signal updating (§10.1 V3.4); F_tok forecaster gated on V5 tape accumulation (§10.1 V3.1).

2026-05-02 — M5 deployable Mondrian validated as v2 methodology candidate

Trigger. Reviewer-defensibility critique of the F1_emp_regime + per-target buffer schedule: §7.6 (constant-buffer baseline, 2026-05-02) showed the deployed Oracle is 11–12% wider than a coverage-matched constant buffer at every τ ≤ 0.95, with the entire pooled width premium concentrated in the high_vol regime. The natural follow-up question — "would Mondrian split-conformal by regime_pub give the same coverage at narrower width without the factor switchboard / log-log VIX / earnings / long-weekend forecaster machinery?" — was tested head-to-head against the deployed Oracle on the identical OOS 2023+ slice.

Decision. Validate but defer. M5 is the v2 methodology target; deployment migration is scheduled post-2026-05-10 (Colosseum hackathon delivers under current v1 Oracle to preserve hackathon timeline). This entry records the empirical case and the deferral reason, not a deployment switch. Working doc with task-level checklist: M5_REFACTOR.md (root), to be deleted on completion.

Evidence. Five comparison variants tested, all on the same OOS 2023+ panel (1,730 rows × 173 weekends) the §7.4 serving-layer matrix evaluates the deployed Oracle on. The deployable variant — M5: train-fit per-regime conformal quantile + factor-adjusted point + per-target c(τ) bump tuned on OOS, 12 trained scalars + 4 OOS scalars (matching the Oracle's BUFFER_BY_TARGET parameter budget) — under the 6-split expanding-window walk-forward + δ-shift schedule {0.68: 0.05, 0.85: 0.02, 0.95: 0.00, 0.99: 0.00}:

τ M5 test realised M5 test hw (bps) Oracle test hw (bps) M5 width advantage
0.68 0.672 (Kupiec p=0.43) 124 186 −33%
0.85 0.832 (p=0.37) 215 287 −25%
0.95 0.943 (p=0.36) 357 526 −32%
0.99 0.991 (p=0.32) 746 609 +22% (M5 hits target where Oracle hits structural ceiling)

Berkowitz (LR=173, ρ̂=0.31) and DQ at τ=0.95 (DQ=32) both reject for M5 — same per-anchor-only calibration profile as the deployed Oracle (per §6 abstract). M5 doesn't fix the density-test rejection; it doesn't make it worse either.

Tables: reports/tables/v1b_constant_buffer_*.csv (the §7.6 baseline that prompted this), reports/tables/v1b_mondrian_calibration.csv, reports/tables/v1b_mondrian_oos.csv, reports/tables/v1b_mondrian_by_regime.csv, reports/tables/v1b_mondrian_bootstrap.csv, reports/tables/v1b_mondrian_walkforward*.csv, reports/tables/v1b_oracle_walkforward*.csv, reports/tables/v1b_mondrian_density_tests.csv, reports/tables/v1b_mondrian_delta_sweep.csv. Scripts: scripts/run_constant_buffer_baseline.py, scripts/run_mondrian_regime_baseline.py, scripts/run_mondrian_walkforward_pit.py, scripts/run_mondrian_delta_sweep.py.

Impact. No methodology-constants change in §0 of this file. Current Oracle (v1: F1_emp_regime + hybrid + buffer schedule) remains deployed and remains the basis for Paper 1's headline numbers (τ=0.95: half-width 443 bps, realised 0.950, p_uc=1.000) and the Colosseum 2026-05-10 submission. M5 is staked as a v2 candidate: per-regime Mondrian conformal quantile + factor-adjusted point + δ-shifted c(τ) bump, ~20% narrower at matched OOS calibration, simpler implementation (~50 lines of serving code vs ~300), strict improvement at τ=0.99 (passes Kupiec where v1 hits the bounds-grid ceiling at 0.972). The diagnosis: the regime classifier regime_pub is the load-bearing piece of v1; the F1_emp_regime forecaster machinery on top of it (log-log VIX / per-symbol vol index / earnings flag / long-weekend flag) is over-engineering relative to a per-regime conformal quantile lookup.

Open work. Tracked in M5_REFACTOR.md. Phases: (1) no-regrets disclosure now (this entry + Paper 1 §7.7 + abstract footnote — the latter two deferred); (2) Colosseum delivery 2026-05-10 under v1; (3) post-Colosseum Python+Rust Oracle rewrite + parity test refresh; (4) Paper 1 v2 + Paper 3 numerical updates + devnet artefact regen. The wire format (PriceUpdate Borsh layout in crates/soothsayer-consumer) is preserved across the migration — only published values change. Paper 4 (Colosseum AMM) is methodology-agnostic in its consumer interface.

2026-05-02 — Paper 1 §7 forward-curve-implied baseline rung (F0_VIX)

Trigger. Reviewer-defensibility critique on Paper 1 §7 ablation: the ladder includes A0 (20-day realised Gaussian) and A1+ (factor-switchboard + empirical quantile) but no standalone forward-curve-implied Gaussian rung. The "use VIX × z_τ × √(2/252) × P" baseline is what every reviewer is likely to ask for as the natural alternative to F1's per-symbol vol machinery.

Decision. Land F0_VIX as a §7.1 rung between A0 and A1, plumb F0_VIX bounds into v1b_bounds.parquet, and serve B1 / B2 challenger cells in §7.4 (zero-buffer + deployed-buffer) on the OOS 2023+ slice. Equity-only — GLD/TLT use GVZ/MOVE in F1 and require per-class unit conversions for an analogous standalone baseline; that's a v2 candidate. Add a corresponding v2 architectural workstream (V2.4) for intra-weekend forward-signal updating from the Sunday 18:00 ET ES Globex reopen, distinct from the F_tok V2.1 workstream. Add scryer wishlist item 52 for per-symbol implied vol from OPRA / Cboe.

Evidence. reports/paper1_coverage_inversion/07_ablation.md (rung, serving cells, taxonomy); reports/tables/v1b_ablation.csv, reports/tables/v1b_serving_ablation.csv, reports/tables/v1b_serving_ablation_bootstrap.csv. Headline: F0_VIX raw is 49.3% sharper than A0 on equity-matched rows (n=4,719) but undercovers by 7.86pp; through the deployed serving stack (B2: 0.020 buffer) it realises 0.876 against τ=0.95 (Kupiec rejects p≈0); the bootstrap delta against C4 is +6.7pp coverage [+4.3, +9.1] at +88% width. Mechanism: index-level VIX systematically misprices single-stock weekend tails, particularly for high-beta names (NVDA / TSLA / MSTR).

Impact. §7 is now closed against the canonical reviewer-asked baseline. The F0_VIX rung is disclosed-not-deployed; F1's per-symbol vol-indexing + log-log regression + empirical-quantile inversion is the load-bearing path from the natural baseline to a calibrated served band on freely-available data. No methodology-constants change.

Open work. Stage1 bootstrap CIs for the new (A0 → A0_VIX) and (A0_VIX → A1) ladder pairs are pending the next run_stage1_stats.py completion; matched-pair point estimates landed via the per-row ablation parquet. Per-symbol IV ingest (scryer item 52) gates a future F0_singleIV rung in v2.

2026-05-02 — Operational compaction of methodology context

Trigger. Agent startup context was dominated by historical methodology prose. The cost was operational: less context left for current work.

Decision. This file now keeps current state, recent locks, open gates, and evidence pointers. Older derivations are represented as concise entries and linked artefacts. Future entries should stay short unless the methodology itself changes and no better linked artefact exists.

Impact. Agents should read this file for current state and follow links only when working on that area. Do not expand this file back into a research transcript.

2026-05-01 — Paper 3 semantics, scryer reconciliation, and two-substrate framing

Decision. Paper 3 is Geometric / Structural / Empirical. Kamino-xStocks supplies the xStock empirical panel; MarginFi remains the cleaner general-lending deployment-substrate argument. The earlier "MarginFi-first" phrasing is superseded.

Evidence. docs/protocol_semantics_kamino_xstocks.md, reports/paper3_liquidation_policy/protocol_semantics.md, reports/paper3_liquidation_policy/plan.md, docs/sources/lending/marginfi.md, and scryer datasets kamino/liquidations/v1, marginfi/reserves/v1.

Open work. Analyze the Kamino 2025-11 cluster; land/consume marginfi/liquidations/v1 as propagation evidence; update Paper 1 caveats for delegated oracle routing and Wayback halt sparsity.

2026-05-01 — Paper 4 Phase-A capture spec

Decision. Start clock-dependent scryer capture now for AMM/product-stack evidence: Jito bundle tape, validator-client labels, CLMM/DLMM pool state, and xStock swap backfill/forward poll.

Evidence. reports/paper4_oracle_conditioned_amm/plan.md, reports/paper4_oracle_conditioned_amm/scryer_pipeline_plan.md, docs/product-stack.md, and scryer wishlist item 51.

Open work. Scryer fetchers 51a-51e; operator triggers for 49a-49d; later Soothsayer consumers once parquet rows exist.

2026-04-29 — Scryer live-root and migration lock

Decision. Soothsayer reads the canonical live root from SCRYER_DATASET_ROOT (/Users/adamnoonan/Library/Application Support/scryer/dataset on this machine). The sibling ../scryer/dataset checkout is offline replay only. Path names must match disk, not older nicknames (yahoo/equities_daily, not yahoo/bars).

Impact. Deleted Soothsayer fetchers stay deleted. If a script imports deleted source modules, migrate it to scryer parquet rather than restoring fetch code.

2026-04-29 — Router, relay fleet, product roadmap, and NYSE calendar locks

Decisions.

  • v0/v1/v2 product progression locked: band primitive -> event stream -> decision SDK.
  • Unified-feed router v0 locked around Layer 0 open-hours aggregation and closed-hours Soothsayer band receipts.
  • Chainlink Streams uses a Soothsayer relay program; Pyth equities use a poster daemon into Pyth receiver; relay trust commitments locked.
  • Router calendar uses an embedded 2024-2027 NYSE holiday/early-close table plus algorithmic DST.
  • Mango v4 is methodology inspiration only, not a direct integration contribution.

Evidence. Router program code, docs/ROADMAP.md, reports/methodology_history.md git history before 2026-05-02 compaction, and scryer wishlist relay items.

Open work. Chainlink relay program/daemon, Pyth equity poster daemon, no-position attestation mechanism, open-hours upstream forward tape before Layer 1 calibration.

2026-04-28 — Depth-first methodology and reviewer diagnostics

Decision. Do not publish Paper 1 until the obvious reviewer-grade diagnostics and comparator caveats are addressed. The methodology is per-anchor calibrated, not a full-distribution model.

Evidence. reports/v1b_diagnostics_extended.md, reports/v1b_window_sensitivity.md, reports/v1b_pooled_tail_trial.md, reports/v1b_evt_pot_trial.md.

Impact. τ=0.99 remains a disclosed ceiling; full PIT uniformity is a future diagnostic, not a current claim.

2026-04-27 — Data-fetching cutover

Decision. Scryer owns upstream fetching, retry, dedup, schemas, and raw parquet. Soothsayer owns analysis, serving, derived artefacts, and on-chain publish.

Deleted from Soothsayer. crates/soothsayer-ingest/, old source modules, cache.py, Chainlink scraper, and one-off fetch scripts.

Impact. New sources go into scryer first with a scryer methodology row and wishlist item.

2026-04-26 to 2026-04-24 — v1b serving methodology lock

Decision. Ship the simple auditable hybrid:

  • F1 empirical-regime forecaster for normal/long-weekend.
  • F0 stale-hold fallback for high-vol.
  • Empirical calibration surface plus per-target buffers.
  • Python produces artefacts; Rust serves them with byte-for-byte parity.

Evidence. reports/v1b_decision.md, reports/v1b_calibration.md, reports/v1b_ablation.md, reports/phase1_week1.md, reports/phase1_week2.md.

Rejected or deferred. Scalar buffer, unbuffered surface inversion, conformal alternatives for v1, EVT/GPD tail fixes, full HAR-RV refit as production dependency, and complex pre-v1 state-space/VECM/Hawkes stacks.


2. Open methodology questions

ID Question Gate / trigger Current disposition
O1 Does on-chain xStock TWAP improve weekend calibration? V5 tape reaches >=150 weekend obs per (symbol, regime) V2.1
O2 Does conformal prediction beat per-target buffers on finer grids? Multi-split walk-forward with finer claimed grid V2 / reviewer request
O3 Is there consumer-experienced coverage loss from MEV/order flow? V5 tape + Jito bundle data >=3 months V2.3 / Paper 4 input
O4 Is calibration uniform over full PIT, not just anchors? One-shot PIT diagnostic V2.4; not a v1 claim
O5 How should Layer 0 set min_quorum? Design partner + >=3 months upstream forward tape deferred
O6 How does Layer 0 handle missing upstream feeds per asset? First non-paper-1 asset in router config deferred
O7 Calendar fallback for non-NYSE/CME assets? First JP/EU/FX asset deferred
O8 v1 event-stream wire format? Paper 3 publication + v1 design lock deferred
O9 v2 multi-asset receipt semantics? Paper 3 §10 + SDK design deferred
O10 Relay signing / multi-writer model? First production relay deploy v0 hot key; v1 multi-writer target
O11 Relay verifier-CPI policy? Mainnet relay deploy + CU measurement always verify in production
O12 No-position enforcement mechanism? Before mainnet relay deploy attestation-account default
O13 Paper 3 path-aware truth and cost priors Kamino cluster analysis + DEX/perp truth tapes active
O14 Paper 4 bound scope full-τ vs anchor-only PIT diagnostic before Phase B active

3. Artefact map

  • README.md — product overview and current public evidence snapshot.
  • docs/ROADMAP.md — phase sequencing and active gates.
  • docs/scryer_consumer_guide.md — sanctioned data read pattern.
  • docs/methodology_scope.md — RWA class filter.
  • docs/v2.md — future methodology upgrades.
  • reports/paper1_coverage_inversion/ — Paper 1 draft.
  • reports/paper3_liquidation_policy/ — Paper 3 plan and protocol semantics.
  • reports/paper4_oracle_conditioned_amm/ — Paper 4 plan and scryer pipeline ask.
  • reports/v1b_*.md — frozen evidence snapshots for the v1b methodology.
  • src/soothsayer/oracle.py and crates/soothsayer-oracle/ — current serving implementation.