Skip to content

Latest commit

 

History

History
65 lines (53 loc) · 3.17 KB

File metadata and controls

65 lines (53 loc) · 3.17 KB

Experimental Scope and Selection Funnel

Generated by scripts/generate_experimental_scope_report.py on 2026-05-09.

This report reads saved configs, reports, and publication claim tables only. It does not call a model, rewrite prompts, or unlock any held-out split. Counts are VAE cost/selection proxies and should not be read as iid statistical sample sizes.

Artifacted 3D Search Surface

Quantity Count
3D stability config files 131
3D Qwen-3B scaffold-family config files 109
3D output run-root directories 134
3D report JSON files 108
3D report Markdown files 128
Report files with candidate/family rows 39
Top-level family rows 109
Top-level candidate rows 48
Top-level manual-seed rows 51
Top-level reference-candidate rows 25
Top-level metric rows 60

ETHICS Scope and Filtering

Stage Scope Selection / interpretation
Full-seed directional runs 1 Directional background
Checkpoint examples 64 selector-dev + 64 final-test Held-out delta = 2 examples
10-seed scaffold tournament 10 6 frozen wins, 2 ties, 2 continued wins (60.0% win; 80.0% non-loss)
10-seed route-cost proxies 188 candidate prompts; 142 teacher calls; 446 student-eval calls 20 final-test access events; 41727 seconds summed wall-clock
Post-selection fixed-artifact audit 10 seeds x 256 examples 8 wins, 1 tie, 1 loss (80.0% win; 90.0% non-loss)
Capacity audit 3 student sizes 1 positive unchanged-transfer size (33.3%)

3D Publication-Registry Funnel

Stage Count Rate / status Interpretation
Official 3D status rows 10 Surfaced unlocked rows plus seed 2903 blocked-before-final Publication registry boundary
Unlocked held-out rows 9 90.0% Reached held-out metric table
Blocked-before-final rows 1 final_test locked Protocol discipline, not a failed metric row
Claim-bearing rows 8 88.9% Excludes post-selection audit
Metric no-regression rows 5 55.6% No baseline-winning metric in seven-metric scorecard
Paper-clean held-out wins 2 22.2% of unlocked; 25.0% of claim-bearing Seeds 2801 and 4523

Paper-Ready Interpretation

The ETHICS track supplies the broadest prompt-search ledger: the 10-seed package records candidate prompts, teacher calls, student-evaluation calls, final-test access events, and wall-clock cost next to frozen-vs-continued residual accuracy. The 3D track supplies the perturbation-stability ledger: ten official paper-facing status rows include one disciplined blocked-before-final row, nine unlocked rows, eight claim-bearing rows after excluding post-selection audit, and two clean held-out wins. These counts strengthen rather than dilute the main claim: the selected prompt artifacts are surviving moral-attention scaffolds from a logged search route, not isolated wording anecdotes.

Related Sources

  • paper/tables/publication_claim_tables.json
  • reports/statistical_reporting_3d_2026-05-09.json
  • reports/3d_ethics_scaffold_family_confirmatory_seed2903_2026-05-06.json
  • ETHICS-only imported tables in paper/refined_prompt_shape_epiplexity_paper.tex