This directory holds the raw and structured experiment artifacts that support the repository’s claims. Treat it as a read-mostly evidence store.
| Path pattern | Purpose |
|---|---|
runs/ |
Original smoke and base-experiment outputs |
final_gemini_experiment_* |
Frozen-vs-adaptive final-experiment outputs |
matched_budget_pilot_* |
Frozen prompt-family pilot outputs |
matched_budget_revision_* |
Cross-only prompt-family revision outputs |
3d_ethics_stability_* |
3D Ethics prompt-rewriting runs, diagnostics, preflights, and audits |
cache/teacher/ |
Cached teacher responses keyed by request hash |
Corrected 3D runs write under:
stability_prompt_rewrite_runs/seed_<seed>/
Most useful files inside a corrected run root:
| File | Purpose |
|---|---|
stability_summary.json |
Run-level metrics, gates, and interpretation limits |
prompt_rewriting/rounds.json or prompt_rewriting/prompt_rewrite_progress.json |
Prompt evolution across rounds |
prompt_rewriting/final_prompt_selection.json |
Selected prompt and selector rationale |
data/access_log.json |
Locked-split access log |
diagnostics/selector_validity_diagnostic.* |
Whether selector-dev ranking tracked held-out behavior |
audit/audit_manifest.json |
Blinded audit packet summary |
3d_ethics_stability_qwen_0p5b_expanded_smoke/: exposed the 0.5B format and sensitivity bottleneck3d_ethics_stability_qwen_1p5b_expanded_smoke/: showed that 1.5B fixes the invalid-format sensitivity collapse3d_ethics_stability_qwen_1p5b_selector_calibrated_expanded_smoke/: first selector-calibrated prospective 1.5B diagnostic
3d_ethics_stability_qwen_3b_scaffold_family_replay_seed211_dev/: first fresh dev replay where a teacher-generated family cleared all dev gates3d_ethics_stability_qwen_3b_scaffold_family_prospective_seed307/: clean held-out no-launch under a frozen protocol3d_ethics_stability_qwen_3b_scaffold_family_prospective_seed2801/: first clean held-out win bycontext_preserving_support_state_scaffold3d_ethics_stability_qwen_3b_scaffold_family_tournament_v2_3s_wvs_guarded_seed4523/: second clean held-out win bynamed_criterion_no_import_update_scaffold3d_ethics_stability_qwen_3b_scaffold_family_tournament_v2_4b_semantic_gate_seed4909/: negative prospective run showing the selector/fragility frontier is still real3d_ethics_stability_qwen_3b_scaffold_family_tournament_v2_5o_targeted_named_support_seed6803_dev/: strongest current dev-only gate-clean named-support hybrid3d_ethics_stability_qwen_3b_scaffold_family_tournament_v2_5s_localized_wvs_direction_seed7207_dev/: latest localized WVS-direction boundary;final_testlocked3d_ethics_stability_qwen_3b_scaffold_family_tournament_v2_10f_minimal_endpoint_exact_count_prospective_seed8563/: exact-count transfer held-out boundary with salience/fragility gains but sensitivity collapse3d_ethics_stability_qwen_3b_scaffold_family_tournament_v2_10l_operation_artifact_salience_lift_fresh_seed8629_dev/: latest fresh-dev operation-artifact route boundary; no held-out launch
This directory is intentionally large because it stores claim-relevant raw
predictions, access logs, split manifests, and config snapshots. For broad
public distribution, keep reports/artifact_index.md and this README in git
even if raw run roots are mirrored to Git LFS, a release asset, or another
artifact store.
- Start with a top-level summary markdown or JSON file if one exists.
- Inspect
config_snapshot.jsonor the run report to confirm what was frozen. - Use prompt-selection artifacts before interpreting top-line metrics.
- Check
data/access_log.jsonbefore treating any result as claim-relevant. - Use
../docs/current_status.mdand../reports/artifact_index.mdas the release-facing navigation layer.