docs: add agent reliability proof portfolio

yangfei222666-9 · web-flow · commit e526eb8a8874 · 2026-06-24T23:24:56.000+08:00
Squash merge PR #41. Adds a recruiter-readable Agent Reliability proof page, links it from README, and records it in docs/proof_index.json with explicit no-overclaim boundaries.
diff --git a/README.md b/README.md
@@ -33,6 +33,7 @@ This is the main TaijiOS repository.
 
 - Reviewer start page: [docs/START_HERE_FOR_REVIEWERS.md](docs/START_HERE_FOR_REVIEWERS.md) — 5-minute public review path, exact verdicts, and no-overclaim boundaries.
 - Agent Reliability False-Pass Gate: `python scripts/check_false_pass_gate.py --self-test examples/false_pass_gate/fixtures` — local schema-level check that blocks success language when passing evidence pointers or explicit `cannot_claim` boundaries are missing.
+- Agent Reliability proof: [docs/portfolio/agent-reliability-proof.md](docs/portfolio/agent-reliability-proof.md) - recruiter-readable evidence map for PR #38, PR #39, and PR #40; not provider readiness, runtime readiness, or hiring validation.
 - SpaceXAI proof packet: [docs/SPACE_X_AI_PROOF_PACKET.md](docs/SPACE_X_AI_PROOF_PACKET.md) — human-review evidence for an evidence-first AI agent runtime; not SpaceX endorsement, not production readiness, and not real hardware control.
 - 权威边界文档: [Product Spine](docs/architecture/PRODUCT_SPINE_AUTHORITY.md), [Provider Gate](docs/provider/PROVIDER_BOUNDARY_GATE.md), [Direct LLM Caller](docs/provider/DIRECT_LLM_CALLER_BOUNDARY.md), [Multi-Model Gate](docs/provider/MULTI_MODEL_ARCHITECTURE_GATE.md), [Runtime Matrix](docs/runtime/RUNTIME_MATURITY_MATRIX.md), [HSDL](docs/design/HSDL_CANONICAL_SPEC_v0.1.md), [小九通天录](xiaojiu_tongtianlu/BOUNDARY.md), [Life Systems](life_systems/BIOSECURITY_BOUNDARY.md) — docs-only review gates; not repo-level PASS, runtime readiness, or provider readiness.
 - Machine-readable proof index: [docs/proof_index.json](docs/proof_index.json)
diff --git a/docs/portfolio/agent-reliability-proof.md b/docs/portfolio/agent-reliability-proof.md
@@ -0,0 +1,124 @@
+# Agent Reliability Proof
+
+## Ten-second read
+
+I build evidence gates for AI agent workflows. The goal is to prevent false-pass behavior: an agent says a task is done, but the evidence does not support that claim.
+
+This page links the current public proof to merged GitHub PRs, local validation commands, and explicit `cannot_claim` boundaries.
+
+## What problem this targets
+
+AI agents often close tasks with unsupported success language. Common failure modes:
+
+- A local check passes, but remote CI has not run.
+- A self-test reports pass with zero cases.
+- A provider or model output is treated as truth instead of candidate evidence.
+- A tool claims readiness without showing evidence gates, unsafe-write boundaries, or closeout proof.
+
+The reliability target here is narrow: make the evidence visible enough that a reviewer can inspect what was built and what is still not proven.
+
+## What is implemented
+
+### 1. False-Pass Gate
+
+The false-pass gate is a local schema-level check for agent success claims. It blocks success language when required passing evidence pointers or explicit `cannot_claim` boundaries are missing.
+
+Run:
+
+```bash
+python scripts/check_false_pass_gate.py --self-test examples/false_pass_gate/fixtures
+python -m pytest tests/test_false_pass_gate.py -q
+```
+
+### 2. Zero-fixture false-pass fix
+
+The gate was hardened so an empty or missing fixture directory cannot produce a fake pass such as `self_test=PASS cases=0`.
+
+Evidence:
+
+- PR #38: [Fix zero-fixture false-pass gate](https://github.com/yangfei222666-9/taiji/pull/38)
+- Merge commit: [`de1907fb`](https://github.com/yangfei222666-9/taiji/commit/de1907fb17ce5492895595767c3484a2e719a7e0)
+
+### 3. GLM-5.2 candidate review bridge
+
+The bridge creates a local candidate-review envelope from sanitized stdin. It does not read repository files, does not write files, does not read API keys, and does not call a provider in dry-run mode.
+
+Run:
+
+```bash
+printf 'sanitized summary only; no secrets.\n' \
+  | python tools/glm52_candidate_review.py --task local_review \
+  | python -m json.tool
+```
+
+Evidence:
+
+- PR #39: [Add GLM candidate review bridge](https://github.com/yangfei222666-9/taiji/pull/39)
+- Merge commit: [`342cc55d`](https://github.com/yangfei222666-9/taiji/commit/342cc55d6e14b09667783372b2c85ec5b1cfc068)
+
+### 4. GLM-5.2 provider lock
+
+The local bridge scripts are locked to Zhipu GLM-5.2 only. Dynamic model selection, dynamic endpoint selection, and non-Zhipu API-key environment fallbacks were removed.
+
+Dry-run validation:
+
+```bash
+env -u ZHIPUAI_API_KEY -u GLM_API_KEY -u BIGMODEL_API_KEY -u ZAI_API_KEY \
+  python tools/glm52_smoke.py
+```
+
+Expected dry-run properties:
+
+```text
+provider_called=false
+api_key_read=false
+locked_provider=zhipuai
+locked_sdk=zhipuai.ZhipuAI
+locked_model=glm-5.2
+api_key_env=ZHIPUAI_API_KEY
+dynamic_model_allowed=false
+dynamic_endpoint_allowed=false
+fallback_provider_allowed=false
+```
+
+Evidence:
+
+- PR #40: [Lock GLM bridge to Zhipu GLM-5.2](https://github.com/yangfei222666-9/taiji/pull/40)
+- Merge commit: [`54c2b636`](https://github.com/yangfei222666-9/taiji/commit/54c2b6366e8417b5807bd13338e362aced896969)
+- Main CI run: [28108395046](https://github.com/yangfei222666-9/taiji/actions/runs/28108395046)
+
+## Evidence map
+
+| Evidence | Status | What it supports | What it does not prove |
+| --- | --- | --- | --- |
+| PR #38 merged | Remote main evidence | Empty or missing fixture directories are rejected instead of passing with zero cases | Production readiness |
+| PR #39 merged | Remote main evidence | Candidate review envelope exists and is local-only by default | Provider readiness |
+| PR #40 merged | Remote main evidence | GLM bridge scripts are locked to `glm-5.2` and `ZHIPUAI_API_KEY` | Long-task readiness |
+| Main CI at `54c2b636` | Remote CI evidence | CI passed after PR #40 merged | Runtime deployment readiness |
+| Local dry-run commands | Local validation evidence | No provider call or key read in dry-run mode | API execution correctness |
+
+## Recruiter-readable summary
+
+- Built a false-pass gate that rejects unsupported AI-agent success claims when evidence pointers or `cannot_claim` boundaries are missing.
+- Fixed a real false-pass bug where missing or empty fixtures could have allowed a zero-case self-test to appear successful.
+- Added a candidate-review bridge that prepares sanitized review envelopes while preserving local-only and candidate-only boundaries.
+- Locked the GLM bridge to Zhipu GLM-5.2 only, removing dynamic model, endpoint, and fallback-provider paths.
+- Kept provider output separate from canonical truth: GLM output can assist planning and review, but local verification, GitHub CI, and human approval remain separate gates.
+
+## `cannot_claim`
+
+This proof does not claim:
+
+- production readiness
+- provider readiness
+- runtime readiness
+- long-task readiness
+- autonomous self-improvement
+- hiring validation
+- customer validation
+- Zhipu endorsement
+- that provider output is canonical truth
+
+## Next evidence gate
+
+The next useful gate is to validate, commit, and publish this evidence page, then link it from resume or recruiter-facing materials. Do not add another architecture layer before this proof is visible and reviewable.
diff --git a/docs/proof_index.json b/docs/proof_index.json
@@ -44,6 +44,13 @@
       "command": "python scripts/check_false_pass_gate.py --self-test examples/false_pass_gate/fixtures",
       "verdict": "LOCAL_VALIDATED when self-test and pytest pass",
       "limitation": "Synthetic local fixtures only; this does not execute evidence commands, prove that the success claim is true, prove remote CI, public proof update, production readiness, provider/API readiness, recruiting validation, or canonical truth."
+    },
+    {
+      "claim": "The Agent Reliability proof page links the false-pass gate, zero-fixture fix, GLM candidate-review bridge, and GLM-5.2 provider lock to merged PR and CI evidence.",
+      "evidence_file": "docs/portfolio/agent-reliability-proof.md",
+      "command": "test -f docs/portfolio/agent-reliability-proof.md && python -m json.tool docs/proof_index.json",
+      "verdict": "PORTFOLIO_EVIDENCE_LINKED",
+      "limitation": "Recruiter-readable documentation only; this does not prove provider readiness, runtime readiness, long-task readiness, production deployment, hiring validation, or that GLM output is canonical truth."
     }
   ],
   "blocked_claims": [
@@ -53,6 +60,10 @@
     "provider_api_readiness",
     "release_evidence_pass",
     "trading_or_order_authority",
-    "false_pass_gate_without_evidence"
+    "false_pass_gate_without_evidence",
+    "runtime_readiness",
+    "long_task_readiness",
+    "hiring_validation",
+    "glm_output_as_canonical_truth"
   ]
 }