Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@
This is the main TaijiOS repository.

- Reviewer start page: [docs/START_HERE_FOR_REVIEWERS.md](docs/START_HERE_FOR_REVIEWERS.md) — 5-minute public review path, exact verdicts, and no-overclaim boundaries.
- Agent Reliability False-Pass Gate: `python scripts/check_false_pass_gate.py --self-test examples/false_pass_gate/fixtures` — local proof that unsupported "done" claims and missing `cannot_claim` boundaries are blocked before success language is accepted.
- Agent Reliability False-Pass Gate: `python scripts/check_false_pass_gate.py --self-test examples/false_pass_gate/fixtures` — local schema-level check that blocks success language when passing evidence pointers or explicit `cannot_claim` boundaries are missing.
- SpaceXAI proof packet: [docs/SPACE_X_AI_PROOF_PACKET.md](docs/SPACE_X_AI_PROOF_PACKET.md) — human-review evidence for an evidence-first AI agent runtime; not SpaceX endorsement, not production readiness, and not real hardware control.
- 权威边界文档: [Product Spine](docs/architecture/PRODUCT_SPINE_AUTHORITY.md), [Provider Gate](docs/provider/PROVIDER_BOUNDARY_GATE.md), [Direct LLM Caller](docs/provider/DIRECT_LLM_CALLER_BOUNDARY.md), [Multi-Model Gate](docs/provider/MULTI_MODEL_ARCHITECTURE_GATE.md), [Runtime Matrix](docs/runtime/RUNTIME_MATURITY_MATRIX.md), [HSDL](docs/design/HSDL_CANONICAL_SPEC_v0.1.md), [小九通天录](xiaojiu_tongtianlu/BOUNDARY.md), [Life Systems](life_systems/BIOSECURITY_BOUNDARY.md) — docs-only review gates; not repo-level PASS, runtime readiness, or provider readiness.
- Machine-readable proof index: [docs/proof_index.json](docs/proof_index.json)
Expand Down
10 changes: 6 additions & 4 deletions docs/START_HERE_FOR_REVIEWERS.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,8 +45,9 @@ Then inspect:

## Agent Reliability: False-Pass Gate

The False-Pass Gate checks whether an AI-agent success claim has passing evidence
and explicit `cannot_claim` boundaries. It is intentionally local and synthetic:
The False-Pass Gate is a schema-level check for AI-agent success claims. It
requires passing evidence pointers and explicit `cannot_claim` boundaries before
success language is accepted. It is intentionally local and synthetic:

```bash
python scripts/check_false_pass_gate.py --self-test examples/false_pass_gate/fixtures
Expand All @@ -59,8 +60,9 @@ self_test=PASS cases=3
```

This gate can support `LOCAL_VALIDATED` after the self-test and pytest pass. It
does not prove remote CI, public adoption, production readiness, provider/API
readiness, or recruiting validation.
does not execute evidence commands, prove that the success claim is true, prove
remote CI, public adoption, production readiness, provider/API readiness, or
recruiting validation.

## Verdict Semantics

Expand Down
4 changes: 2 additions & 2 deletions docs/proof_index.json
Original file line number Diff line number Diff line change
Expand Up @@ -39,11 +39,11 @@
"limitation": "The 100-run and 1,000-run batches are plans until executed and recorded."
},
{
"claim": "The Agent Reliability False-Pass Gate blocks unsupported AI-agent done/readiness claims and missing cannot_claim boundaries.",
"claim": "The Agent Reliability False-Pass Gate blocks AI-agent success language when passing evidence pointers or explicit cannot_claim boundaries are missing.",
"evidence_file": "scripts/check_false_pass_gate.py and examples/false_pass_gate/fixtures/",
"command": "python scripts/check_false_pass_gate.py --self-test examples/false_pass_gate/fixtures",
"verdict": "LOCAL_VALIDATED when self-test and pytest pass",
"limitation": "Synthetic local fixtures only; this does not prove remote CI, public proof update, production readiness, provider/API readiness, recruiting validation, or canonical truth."
"limitation": "Synthetic local fixtures only; this does not execute evidence commands, prove that the success claim is true, prove remote CI, public proof update, production readiness, provider/API readiness, recruiting validation, or canonical truth."
}
],
"blocked_claims": [
Expand Down
14 changes: 14 additions & 0 deletions scripts/check_false_pass_gate.py
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,21 @@ def run_case(path: Path) -> int:


def run_self_test(fixtures_dir: Path) -> int:
if not fixtures_dir.is_dir():
print(
f"self_test=FAIL cases=0 "
f"reason=fixtures_dir_not_found path={fixtures_dir}"
)
return 1

paths = sorted(fixtures_dir.glob("*.json"))
if not paths:
print(
f"self_test=FAIL cases=0 "
f"reason=no_fixtures path={fixtures_dir}"
)
return 1

failures: list[str] = []

for path in paths:
Expand Down
23 changes: 23 additions & 0 deletions tests/test_false_pass_gate.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,29 @@ def test_self_test_fixture_suite_passes():
assert "cases=3" in result.stdout


def test_self_test_rejects_empty_fixture_directory(tmp_path):
empty_fixtures = tmp_path / "fixtures"
empty_fixtures.mkdir()

result = run_gate("--self-test", str(empty_fixtures))

assert result.returncode == 1
assert "self_test=FAIL" in result.stdout
assert "cases=0" in result.stdout
assert "reason=no_fixtures" in result.stdout


def test_self_test_rejects_missing_fixture_directory(tmp_path):
missing_fixtures = tmp_path / "missing"

result = run_gate("--self-test", str(missing_fixtures))

assert result.returncode == 1
assert "self_test=FAIL" in result.stdout
assert "cases=0" in result.stdout
assert "reason=fixtures_dir_not_found" in result.stdout


def test_unsupported_done_claim_is_blocked():
result = run_gate("--case", str(FIXTURES / "fail_unsupported_done.json"))

Expand Down
Loading