Skip to content

Latest commit

 

History

History
81 lines (64 loc) · 3.67 KB

File metadata and controls

81 lines (64 loc) · 3.67 KB

WVS changed-fact semantic audit

Claim Boundary

This is a blinded automated semantic audit over saved final-test rows. It does not run a new model experiment, does not unlock final_test, and does not replace the official WVS sensitivity metric. It creates a human-ready blinded packet and records two transparent automated heuristic judge passes as post-hoc mechanism evidence.

Main Result

Arm Judge A actionable update Judge B actionable update Official WVS pass
selected scaffold 3/3 3/3 2/3
current_round_7 1/3 1/3 0/3

Interpretation: selected scaffold shows actionable semantic WVS update in 3/3 audited changed-fact pairs under both automated blinded heuristics; current_round_7 shows 1/3. Official WVS sensitivity remains stricter at selected 2/3 versus baseline 0/3 across these three rows.

Blinding And Access Checks

{
  "arm_name_hidden_from_visible_packet": true,
  "automated_judgments_hidden_from_visible_packet": true,
  "official_metrics_hidden_from_visible_packet": true
}

Access log verification:

  • seed 4523: 1 final-test event; access log outputs/3d_ethics_stability_qwen_3b_scaffold_family_tournament_v2_3s_wvs_guarded_seed4523/stability_prompt_rewrite_runs/seed_4523/data/access_log.json
  • seed 4627: 1 final-test event; access log outputs/3d_ethics_stability_qwen_3b_scaffold_family_tournament_v2_3t_wvs_guarded_replication_seed4627/stability_prompt_rewrite_runs/seed_4627/data/access_log.json
  • seed 4703: 1 final-test event; access log outputs/3d_ethics_stability_qwen_3b_scaffold_family_tournament_v2_3u_wvs_guarded_replication_seed4703/stability_prompt_rewrite_runs/seed_4703/data/access_log.json

Automated Judge Agreement

{
  "actionable_support_removal_update": {
    "agree": 6,
    "agreement": 1.0,
    "total": 6
  },
  "avoided_unsupported_new_frame": {
    "agree": 6,
    "agreement": 1.0,
    "total": 6
  },
  "changed_judgment_or_score": {
    "agree": 6,
    "agreement": 1.0,
    "total": 6
  },
  "reasoning_was_inspectable": {
    "agree": 6,
    "agreement": 1.0,
    "total": 6
  },
  "recognized_support_removal": {
    "agree": 6,
    "agreement": 1.0,
    "total": 6
  }
}

Artifact Paths

  • manifest: reports/3d_ethics_wvs_changed_fact_semantic_audit_2026-05-07/audit_manifest.json
  • instructions: reports/3d_ethics_wvs_changed_fact_semantic_audit_2026-05-07/judge_instructions.md
  • blinded_packet: reports/3d_ethics_wvs_changed_fact_semantic_audit_2026-05-07/blinded_wvs_changed_fact_semantic_packet.jsonl
  • answer_key: reports/3d_ethics_wvs_changed_fact_semantic_audit_2026-05-07/audit_answer_key.jsonl
  • judge_a_template: reports/3d_ethics_wvs_changed_fact_semantic_audit_2026-05-07/judge_a_template.jsonl
  • judge_b_template: reports/3d_ethics_wvs_changed_fact_semantic_audit_2026-05-07/judge_b_template.jsonl
  • automated_judge_a: reports/3d_ethics_wvs_changed_fact_semantic_audit_2026-05-07/automated_judge_a.jsonl
  • automated_judge_b: reports/3d_ethics_wvs_changed_fact_semantic_audit_2026-05-07/automated_judge_b.jsonl

Paper Use

Use this result as a post-selection measurement-audit finding: selected support-state/named-criterion scaffolds appear to make WVS support removal more usable than current_round_7 even when the official score/stance metric under-credits seed 4703. Do not present this as broad multi-seed held-out confirmation or as a replacement for the official WVS sensitivity column.

Next Scientific Step

The next model-facing step can be a fresh dev-only v2.4 semantic gate that requires official WVS sensitivity or blinded semantic support-removal recognition on WVS changed-fact rows before any future held-out launch.