Skip to content

Latest commit

 

History

History
54 lines (49 loc) · 32.7 KB

File metadata and controls

54 lines (49 loc) · 32.7 KB

Claim-to-Artifact Matrix

This matrix maps the repository’s paper-facing claims to concrete artifacts and keeps the line clear between supported evidence, directional evidence, and development-only diagnostics.

Paper-facing prose uses incumbent baseline for the locked pre-scaffold commonsense comparator. Historical reports and raw artifact paths retain the legacy id current_round_7.

Claim Artifact Current support Claim boundary Release use
Full-seed iterative rewriting is directionally better than the manual fixed prompt ../outputs/runs/seed_17/final_evaluation/statistics.json, neurips_assets_summary.json Directionally supported Pairwise test is not significant Repeat across more seeds with matched local backend
A frozen teacher-refined prompt can beat continued adaptation ../outputs/final_gemini_experiment_qwen_0p5b_seed17_checkpoint320/final_experiment_summary.json, figures/checkpoint320_final_test_accuracy.png Narrow claim-bearing support Held-out final test is only 64 examples Re-run the same controlled frozen-vs-adaptive design on more seeds
Selector-dev can mis-rank the held-out winner figures/checkpoint320_selector_vs_final.png, ../outputs/final_gemini_experiment_qwen_0p5b_seed17_checkpoint320/runs/seed_17/frozen_track/track_summary.json Artifact-backed Based on one completed checkpoint seed Measure selector regret across more seeds and selector slices
ETHICS scaffold-freezing works beyond the checkpoint mechanism microscope ../paper_aies_expanded/supplement.pdf, ../paper/tables/publication_claim_tables.md, figures/ethics_10seed_final_deltas.png Static-classification evidence: the 10-seed tournament reports 6 frozen wins, 2 ties, 2 continued wins, and mean frozen-minus-continued advantage +0.0438 ETHICS is supporting route evidence, not the headline perturbation proof Treat as scaffold-freezing evidence, not as universal ETHICS victory
Fixed ETHICS prompt artifacts retain useful audit signal and expose capacity sensitivity figures/ethics_postselection_audit_fixed_artifacts.png, figures/ethics_capacity_audit_fixed_artifacts.png, ../paper/refined_prompt_shape_epiplexity_paper.pdf Post-selection audit and capacity audit support artifact robustness and route-specificity Audit rows are not independent discovery; unchanged prompt transfer is student-size sensitive Use as VAE cost/residual and transfer-boundary evidence
Prompt-shape search is a better conceptual frame than local prompt polishing alone ../paper/refined_prompt_shape_epiplexity_paper.tex, ../docs/3d_ethics_prompt_shape_framework.md, figures/prompt_shape_landscape.png, matched_budget_revision_qwen_0p5b_smoke.md Conceptual plus exploratory operational support Broad matched-budget dominance is outside the release claim Keep the paper claim conceptual and artifact-backed rather than broad
The corrected 3D Ethics prompt-rewriting protocol is implemented with locked-split discipline and can produce clean held-out teacher-family wins against the incumbent baseline ../docs/3d_ethics_stability_protocol.md, 3d_ethics_scaffold_family_prospective_protocol_seed2801_2026-05-06.md, 3d_ethics_scaffold_family_prospective_seed2801_2026-05-06.md, 3d_ethics_qwen3b_scaffold_family_tournament_v2_3s_wvs_guarded_seed4523_2026-05-07.md Two narrow claim-bearing held-out wins: seed 2801 and seed 4523 Wins are relative to the locked incumbent baseline; broad all-seed confirmation is outside the release claim Use the wins as real anchors, while keeping mixed rows visible as the repeatability boundary
The surfaced 3D held-out/prospective rows show a positive but underpowered seed-level pattern statistical_reporting_3d_2026-05-09.md, statistical_reporting_3d_2026-05-09.json, ../paper/tables/publication_claim_tables.md Descriptive statistical support Rows are sequential probes, not iid samples; clean-win interval is wide and p-values are diagnostic Use the report to calibrate language; do not claim population-level significance
A teacher-generated scaffold family can clear strict 3D development gates on fresh splits 3d_ethics_scaffold_family_replay_seed211_dev_2026-05-04.md, 3d_ethics_qwen3b_scaffold_family_tournament_v2_1_dev_2026-05-04.md, 3d_ethics_qwen3b_scaffold_family_tournament_v2_1_replay_seed613_dev_2026-05-04.md Development-only support No single family stayed gate-passing across all later fresh dev seeds Continue dev-only family consolidation
The first prospective 3D scaffold-family smoke showed a real selector gap and real family weakness 3d_ethics_scaffold_family_prospective_seed307_2026-05-04.md, 3d_ethics_scaffold_family_seed307_postprospective_autopsy_2026-05-04.md Strong diagnostic support Negative result, not a launch Use only as development evidence; do not reuse the split confirmatorily
The seed-2801 support-state win plus the immediate three-seed confirmation boundary sharply localized the first repeatability problem 3d_ethics_scaffold_family_confirmatory_matrix_seeds2903_3001_3109_2026-05-06.md, 3d_ethics_scaffold_family_confirmatory_seed3001_2026-05-06.md, 3d_ethics_scaffold_family_confirmatory_seed3109_2026-05-06.md, ../docs/current_status.md Mixed: the seed-2801 claim-bearing win plus strong negative repeatability evidence Broad multi-seed support is not safe; the wall was tiny salience-margin misses, fragility regression, and one selector-gap recurrence Use this boundary as the rationale for the later WVS-guarded named-criterion lane, not as a broad confirmation
Post-matrix support-basis/criterion-lock search can create strong tuned-dev candidates, but the current best did not survive fresh-dev replay 3d_ethics_qwen3b_scaffold_family_tournament_v2_3l_criterion_lock_seed3709_dev_2026-05-07.md, 3d_ethics_qwen3b_scaffold_family_tournament_v2_3l_replay_seed3907_dev_2026-05-07.md, 3d_ethics_qwen3b_scaffold_family_tournament_v2_3m_minimal_criterion_patch_seed3907_dev_2026-05-07.md, 3d_ethics_v2_3l_v2_3m_pareto_diagnostic_2026-05-07.md, ../docs/research_logs/3d_ethics_v2_3j_to_v2_3m_dev_cycle_2026-05-07.md Development-only diagnostic support Seed 3709 v2.3l passed all dev gates, but seed 3907 v2.3l/v2.3m failed salience and fragility; no final-test was accessed Use the seed-4127 Pareto-frontier cycle as the follow-up evidence; do not promote the 3709 tuned pass
Seed-4127 Pareto-frontier and changed-case probes show the support-change mechanism can recover WVS sensitivity and low fragility, but still cannot clear the strict salience gate 3d_ethics_qwen3b_scaffold_family_tournament_v2_3n_pareto_seed4127_dev_2026-05-07.md, 3d_ethics_qwen3b_scaffold_family_tournament_v2_3o_terminal_patch_seed4127_dev_2026-05-07.md, 3d_ethics_qwen3b_scaffold_family_tournament_v2_3p_changed_case_seed4127_dev_2026-05-07.md, 3d_ethics_seed4127_changed_case_salience_gate_audit_2026-05-07.md, 3d_ethics_seed4127_wvs_salience_expert_adjudication_2026-05-07.md, ../docs/research_logs/3d_ethics_v2_3n_to_v2_3p_pareto_seed4127_dev_2026-05-07.md Development-only no-launch evidence plus derived/single-expert audit Best changed-case candidate failed only the strict salience-improvement gate; all three runs kept final_test locked; expert adjudication says the gate should not be waived because at least one WVS preservation omission is real Run the pre-specified seed-4303 dev-only WVS-preservation patch protocol before any held-out spend
The seed-4303/4409 bridge showed that WVS preservation and WVS fact-change sensitivity must be gated explicitly before held-out launch 3d_ethics_wvs_preservation_patch_protocol_seed4303_dev_2026-05-07.md, 3d_ethics_qwen3b_scaffold_family_tournament_v2_3q_wvs_preservation_seed4303_dev_2026-05-07.md, 3d_ethics_qwen3b_scaffold_family_tournament_v2_3r_salience_micro_lift_seed4409_2026-05-07.md Development/no-launch plus negative held-out diagnostic Seed 4303 did not launch; seed 4409 launched a WVS-blind selector winner and failed final salience/fragility Justifies the prospective WVS-specific sensitivity gate introduced for seed 4523; does not itself support a positive held-out claim
A WVS-guarded named-criterion scaffold can produce a second clean held-out win against the incumbent baseline 3d_ethics_qwen3b_scaffold_family_tournament_v2_3s_wvs_guarded_seed4523_2026-05-07.md, 3d_ethics_qwen3b_scaffold_family_tournament_v2_3s_wvs_guarded_seed4523_2026-05-07.json, ../outputs/3d_ethics_stability_qwen_3b_scaffold_family_tournament_v2_3s_wvs_guarded_seed4523/stability_prompt_rewrite_runs/seed_4523/data/access_log.json New claim-bearing held-out support vs the incumbent baseline Seed 4523 is one clean prospective win; it should be combined with seed 2801 as evidence of a real scaffold family, not described as broad all-seed confirmation Use in paper as a second held-out win; keep seed 4627/4703 replication boundaries visible
The v2.3s WVS-guarded basin improves salience/fragility repeatedly while exposing held-out WVS-sensitivity instability across fresh seeds 3d_ethics_v2_3s_to_v2_3u_wvs_guarded_replication_matrix_2026-05-07.md, 3d_ethics_qwen3b_scaffold_family_tournament_v2_3t_wvs_guarded_replication_seed4627_2026-05-07.md, 3d_ethics_qwen3b_scaffold_family_tournament_v2_3u_wvs_guarded_replication_seed4703_2026-05-07.md Mixed prospective replication evidence Seed 4627 ties salience instead of strictly winning; seed 4703 wins salience/fragility but WVS sensitivity remains 0.0; both accessed final_test exactly once Use as residual-frontier evidence for WVS sensitivity-control analysis
A semantic WVS audit suggests some official WVS sensitivity misses are threshold/measurement boundaries rather than total support-removal blindness 3d_ethics_wvs_changed_fact_semantic_audit_2026-05-07.md, 3d_ethics_wvs_changed_fact_semantic_audit_2026-05-07.json, 3d_ethics_wvs_changed_fact_semantic_audit_2026-05-07/audit_manifest.json Post-hoc measurement-audit support This is not the official metric and cannot replace held-out WVS sensitivity Use as a measurement caveat and define auxiliary gates prospectively
v2.6a shows the WVS boundary is weak directional support updates below the official sensitivity threshold, not pure support-change blindness 3d_ethics_v2_6_support_basis_contrast_audit_2026-05-08.md, 3d_ethics_v2_6_support_basis_contrast_audit_2026-05-08.json, ../configs/3d_ethics_stability_qwen_3b_scaffold_family_tournament_v2_6a_support_basis_contrast_seed7307_dev.yaml, ../outputs/3d_ethics_stability_qwen_3b_scaffold_family_tournament_v2_6a_support_basis_contrast_seed7307_dev/stability_prompt_rewrite_runs/seed_7307/data/access_log.json Development-only no-launch plus row-level measurement audit No candidate passed hard gates and final_test remained locked; this cannot support a positive held-out claim Use with v2.6b as metric-aware dev evidence
v2.6b shows weak WVS support recognition is real but cannot replace official WVS sensitivity as the paper gate 3d_ethics_v2_6b_semantic_wvs_weak_update_audit_2026-05-08.md, 3d_ethics_v2_6b_semantic_wvs_weak_update_audit_2026-05-08.json, 3d_ethics_v2_6b_semantic_wvs_weak_update_audit_2026-05-08/audit_manifest.json, ../docs/research_logs/3d_ethics_v2_6b_semantic_wvs_weak_update_audit_2026-05-08.md Supplemental measurement-audit support over saved selector-dev rows This audit runs after prompt selection, does not access final_test, and weak one-point updates are also common in the incumbent baseline; it is diagnostic, not headline held-out evidence Run a fresh dev-only metric-aware replay that reports official WVS sensitivity and supplemental weak-update sensitivity side by side
v2.6c--v2.6f show that the metric-aware frozen pool can still pass fresh dev but does not transfer prospectively enough for another held-out launch 3d_ethics_v2_6c_to_v2_6f_metric_aware_replay_2026-05-08.md, 3d_ethics_v2_6c_to_v2_6f_metric_aware_replay_2026-05-08.json, ../docs/research_logs/3d_ethics_v2_6c_metric_aware_replay_seed7403_dev_protocol_2026-05-08.md, ../docs/research_logs/3d_ethics_v2_6d_metric_aware_prospective_seed7507_protocol_2026-05-08.md, ../docs/research_logs/3d_ethics_v2_6f_metric_aware_prospective_seed7703_protocol_2026-05-08.md Mixed development/prospective no-launch evidence Seed 7403 is a strong dev pass, but seeds 7507 and 7703 both blocked before final-test and seed 7603 repair failed; no new held-out win exists Stop immediate prospective attempts from this frozen pool; move to selector calibration or support-basis tagging
v2.7a--v2.7g show that native WVS label handling plus a minimal output lock can pass one dev split while failing paper-ready transfer 3d_ethics_v2_7a_to_v2_7g_native_label_output_lock_cycle_2026-05-08.md, 3d_ethics_v2_7a_to_v2_7g_native_label_output_lock_cycle_2026-05-08.json, ../docs/research_logs/3d_ethics_v2_7c_minimal_output_lock_seed7803_dev_protocol_2026-05-08.md, ../docs/research_logs/3d_ethics_v2_7f_minimal_output_lock_prospective_seed8009_protocol_2026-05-08.md Development-only pass plus prospective no-launch evidence Seed 7803 v2.7c passed all dev hard gates, but seed 7901 fresh replay failed fragility/WVS sensitivity and seed 8009 prospective blocked before final-test; no new held-out win exists Use as evidence for split-safe support-basis tags and selector calibration
v2.8a--v2.8f show that split-safe support-basis artifacts are a promising VAE route, but still not stable enough for a new held-out claim 3d_ethics_v2_8a_to_v2_8f_support_basis_artifact_cycle_2026-05-08.md, 3d_ethics_v2_8a_to_v2_8f_support_basis_artifact_cycle_2026-05-08.json, ../configs/3d_ethics_stability_qwen_3b_scaffold_family_tournament_v2_8c_support_basis_tags_prospective_seed8303.yaml, ../outputs/3d_ethics_stability_qwen_3b_scaffold_family_tournament_v2_8c_support_basis_tags_prospective_seed8303/stability_prompt_rewrite_runs/seed_8303/data/access_log.json Strong development-only mechanism evidence plus prospective no-launch Seeds 8101 and 8209 passed dev gates with WVS sensitivity 1.0, but seed 8303 blocked before final-test; repair runs v2.8d--f did not recover a clean gate package. No v2.8 run accessed final-test. Build WVS polarity / least-supportive-label instrumentation and require multi-dev transfer before any new prospective launch
v2.9a--v2.9c show that WVS polarity artifacts can execute endpoint movement, but official WVS sensitivity can be infeasible for already-skeptical canonical rows 3d_ethics_v2_9a_to_v2_9b_wvs_polarity_artifact_audit_2026-05-08.md, 3d_ethics_v2_9a_to_v2_9b_wvs_polarity_artifact_audit_2026-05-08.json, 3d_ethics_v2_9c_wvs_endpoint_credit_audit_2026-05-08.md, 3d_ethics_v2_9c_wvs_endpoint_credit_audit_2026-05-08.json, ../outputs/3d_ethics_stability_qwen_3b_scaffold_family_tournament_v2_9a_wvs_polarity_artifact_seed8407_dev/stability_prompt_rewrite_runs/seed_8407/data/access_log.json, ../outputs/3d_ethics_stability_qwen_3b_scaffold_family_tournament_v2_9b_wvs_support_endpoints_seed8407_dev/stability_prompt_rewrite_runs/seed_8407/data/access_log.json Development-only near-win plus derived measurement audit v2.9a improved salience, fragility, alignment, and WVS salience but failed official WVS sensitivity. The v2.9c audit found 16/16 WVS changed-support rows reached the printed least-supportive endpoint, while only 5/16 could receive official credit; no v2.9 run accessed final-test. Run a feasibility-stratified dev replay that reports official WVS sensitivity and endpoint movement separately before any prospective launch
v2.10a--v2.10d show that feasibility-stratified WVS core-values rows can recover official WVS sensitivity, but the minimal endpoint repair still needs fresh-dev transfer before any held-out launch 3d_ethics_v2_10a_to_v2_10d_feasibility_stratified_cycle_2026-05-08.md, 3d_ethics_v2_10a_to_v2_10d_feasibility_stratified_cycle_2026-05-08.json, ../configs/3d_ethics_stability_qwen_3b_scaffold_family_tournament_v2_10d_minimal_baseline_endpoint_seed8511_dev.yaml, ../outputs/3d_ethics_stability_qwen_3b_scaffold_family_tournament_v2_10d_minimal_baseline_endpoint_seed8511_dev/stability_prompt_rewrite_runs/seed_8511/data/access_log.json Development-only mechanism support and strongest current near-pass v2.10d beats the incumbent baseline on salience (0.9796 vs 0.9558), WVS sensitivity (1.0 vs 0.5), WVS salience (0.9388 vs 0.8673), valid format (1.0 tied), and fragility (0.0238 vs 0.0754), but aggregate sensitivity is exactly 4/6 = 0.6667, below the configured decimal 0.67 floor. No v2.10 run accessed final-test. Fresh dev transfer of the minimal endpoint family with an explicitly pre-specified exact-count sensitivity gate (>= 4/6) if that is the intended threshold
v2.10e--v2.10f show exact-count endpoint transfer on fresh dev and a mixed-negative prospective held-out boundary 3d_ethics_v2_10e_to_v2_10f_exact_count_transfer_2026-05-08.md, 3d_ethics_v2_10e_to_v2_10f_exact_count_transfer_2026-05-08.json, ../configs/3d_ethics_stability_qwen_3b_scaffold_family_tournament_v2_10e_minimal_endpoint_exact_count_seed8541_dev.yaml, ../configs/3d_ethics_stability_qwen_3b_scaffold_family_tournament_v2_10f_minimal_endpoint_exact_count_prospective_seed8563.yaml, ../outputs/3d_ethics_stability_qwen_3b_scaffold_family_tournament_v2_10f_minimal_endpoint_exact_count_prospective_seed8563/stability_prompt_rewrite_runs/seed_8563/data/access_log.json Fresh dev support plus mixed-negative prospective held-out evidence v2.10e passed all selector-dev gates. v2.10f unlocked final-test exactly once and improved held-out salience, fragility, and WVS salience, but aggregate sensitivity collapsed to 0/3 vs baseline 1/3; therefore no new held-out win exists. Use as non-WVS changed-support sensitivity boundary evidence
v2.10g--v2.10h show that local non-WVS wording repairs hit a sensitivity/fragility tradeoff rather than producing a launchable candidate 3d_ethics_v2_10g_to_v2_10h_non_wvs_repair_2026-05-08.md, 3d_ethics_v2_10g_to_v2_10h_non_wvs_repair_2026-05-08.json, ../configs/3d_ethics_stability_qwen_3b_scaffold_family_tournament_v2_10g_non_wvs_update_repair_seed8597_dev.yaml, ../configs/3d_ethics_stability_qwen_3b_scaffold_family_tournament_v2_10h_changed_only_no_drift_seed8597_dev.yaml, ../outputs/3d_ethics_stability_qwen_3b_scaffold_family_tournament_v2_10h_changed_only_no_drift_seed8597_dev/stability_prompt_rewrite_runs/seed_8597/data/access_log.json Development-only no-launch evidence v2.10g recovered sensitivity only with high fragility; v2.10h preserved low fragility only with sensitivity stuck at 3/6. Both runs kept final_test locked and produced no new held-out win. Move to a structural row-local support-state execution artifact before any new prospective held-out launch
v2.10i--v2.10l show that row-local operation tags are a strong VAE-style artifact route with a fresh-dev transfer boundary 3d_ethics_v2_10i_to_v2_10l_operation_artifact_2026-05-08.md, 3d_ethics_v2_10i_to_v2_10l_operation_artifact_2026-05-08.json, ../configs/3d_ethics_stability_qwen_3b_scaffold_family_tournament_v2_10i_operation_tag_seed8601_dev.yaml, ../configs/3d_ethics_stability_qwen_3b_scaffold_family_tournament_v2_10l_operation_artifact_salience_lift_fresh_seed8629_dev.yaml, ../outputs/3d_ethics_stability_qwen_3b_scaffold_family_tournament_v2_10l_operation_artifact_salience_lift_fresh_seed8629_dev/stability_prompt_rewrite_runs/seed_8629/data/access_log.json, 3d_ethics_operation_route_ablation_config_audit_2026-05-09.md Development-only mechanism evidence plus fresh-dev transfer boundary; route-ablation config pair is ready v2.10k passed all same-seed dev gates, but v2.10l failed salience and fragility versus the stronger operation-artifact baseline despite keeping sensitivity 5/6, valid format 1.0, WVS sensitivity 1.0, and high alignment. All v2.10i-l access logs show zero final-test events. Seed 8707 is the matched route-ablation config pair. Use the paired dev-only route ablation to compare operation-tag off vs on with prompts and splits held fixed
The current implementation now guards the operation-artifact route against inconsistent configs and misleading selector summaries 3d_ethics_implementation_design_audit_2026-05-09.md, 3d_ethics_operation_route_ablation_config_audit_2026-05-09.md, ../docs/research_logs/3d_ethics_operation_artifact_route_ablation_protocol_2026-05-09.md, ../src/ethics_prompt_rewrite/config.py, ../src/ethics_prompt_rewrite/stability_experiment.py Implementation and protocol hardening These audits do not add a new model result and cannot support a held-out win by themselves. Use the paired dev-only route ablation as the release control lane
v2.4a fresh-dev semantic gating strengthened the named-criterion basin before a prospective check 3d_ethics_qwen3b_scaffold_family_tournament_v2_4a_semantic_gate_seed4801_dev_2026-05-07.md, 3d_ethics_qwen3b_scaffold_family_tournament_v2_4a_semantic_gate_seed4801_dev_2026-05-07.json, ../outputs/3d_ethics_stability_qwen_3b_scaffold_family_tournament_v2_4a_semantic_gate_seed4801_dev/stability_prompt_rewrite_runs/seed_4801/data/access_log.json Development-only support final_test remained locked, so it cannot support a held-out claim by itself Use as the rationale for the seed-4909 prospective check
v2.4b seed-4909 is negative prospective evidence: the selected named-criterion scaffold failed held-out clean-win criteria due to selector-gap fragility collapse 3d_ethics_qwen3b_scaffold_family_tournament_v2_4b_semantic_gate_seed4909_2026-05-07.md, 3d_ethics_v2_4b_seed4909_fragility_autopsy_2026-05-07.md, ../outputs/3d_ethics_stability_qwen_3b_scaffold_family_tournament_v2_4b_semantic_gate_seed4909/stability_prompt_rewrite_runs/seed_4909/data/access_log.json Claim-bearing negative held-out evidence It weakens broad all-seed claims; it does not erase the seed-2801/4523 wins Treat as selector-gap/fragility frontier evidence and avoid further held-out launches without a selector repair
v2.4c/v2.4d show that prompt wording alone has not solved the joint salience/fragility/WVS gate problem 3d_ethics_qwen3b_scaffold_family_tournament_v2_4c_fragility_hardening_seed5003_dev_2026-05-07.md, 3d_ethics_qwen3b_scaffold_family_tournament_v2_4d_minimal_support_patch_seed5101_dev_2026-05-07.md, ../docs/research_logs/3d_ethics_v2_4a_to_v2_4d_semantic_gate_and_fragility_cycle_2026-05-07.md Development-only no-launch evidence No held-out access; no positive claim. Seed 5101 shows a strong named-criterion near-pass but still missed the pre-specified salience gate Run a dev-only selector/gate audit before any new prospective seed

How to use this matrix

  • If you are writing the paper, treat the ETHICS checkpoint/scaffold-freezing rows plus the 3D 2801/4523 rows as the strongest release-surface evidence.
  • If you are touching the 3D program, read rows 16 onward together; the 3D line now has two access-log verified held-out wins (2801, 4523) and a clearly labeled boundary against broad all-seed confirmation.
  • If a result depends on a split whose final_test was later used to redesign the method, treat that result as diagnostic only.