Skip to content

Commit e526eb8

Browse files
docs: add agent reliability proof portfolio
Squash merge PR #41. Adds a recruiter-readable Agent Reliability proof page, links it from README, and records it in docs/proof_index.json with explicit no-overclaim boundaries.
1 parent 54c2b63 commit e526eb8

3 files changed

Lines changed: 137 additions & 1 deletion

File tree

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,7 @@ This is the main TaijiOS repository.
3333

3434
- Reviewer start page: [docs/START_HERE_FOR_REVIEWERS.md](docs/START_HERE_FOR_REVIEWERS.md) — 5-minute public review path, exact verdicts, and no-overclaim boundaries.
3535
- Agent Reliability False-Pass Gate: `python scripts/check_false_pass_gate.py --self-test examples/false_pass_gate/fixtures` — local schema-level check that blocks success language when passing evidence pointers or explicit `cannot_claim` boundaries are missing.
36+
- Agent Reliability proof: [docs/portfolio/agent-reliability-proof.md](docs/portfolio/agent-reliability-proof.md) - recruiter-readable evidence map for PR #38, PR #39, and PR #40; not provider readiness, runtime readiness, or hiring validation.
3637
- SpaceXAI proof packet: [docs/SPACE_X_AI_PROOF_PACKET.md](docs/SPACE_X_AI_PROOF_PACKET.md) — human-review evidence for an evidence-first AI agent runtime; not SpaceX endorsement, not production readiness, and not real hardware control.
3738
- 权威边界文档: [Product Spine](docs/architecture/PRODUCT_SPINE_AUTHORITY.md), [Provider Gate](docs/provider/PROVIDER_BOUNDARY_GATE.md), [Direct LLM Caller](docs/provider/DIRECT_LLM_CALLER_BOUNDARY.md), [Multi-Model Gate](docs/provider/MULTI_MODEL_ARCHITECTURE_GATE.md), [Runtime Matrix](docs/runtime/RUNTIME_MATURITY_MATRIX.md), [HSDL](docs/design/HSDL_CANONICAL_SPEC_v0.1.md), [小九通天录](xiaojiu_tongtianlu/BOUNDARY.md), [Life Systems](life_systems/BIOSECURITY_BOUNDARY.md) — docs-only review gates; not repo-level PASS, runtime readiness, or provider readiness.
3839
- Machine-readable proof index: [docs/proof_index.json](docs/proof_index.json)
Lines changed: 124 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,124 @@
1+
# Agent Reliability Proof
2+
3+
## Ten-second read
4+
5+
I build evidence gates for AI agent workflows. The goal is to prevent false-pass behavior: an agent says a task is done, but the evidence does not support that claim.
6+
7+
This page links the current public proof to merged GitHub PRs, local validation commands, and explicit `cannot_claim` boundaries.
8+
9+
## What problem this targets
10+
11+
AI agents often close tasks with unsupported success language. Common failure modes:
12+
13+
- A local check passes, but remote CI has not run.
14+
- A self-test reports pass with zero cases.
15+
- A provider or model output is treated as truth instead of candidate evidence.
16+
- A tool claims readiness without showing evidence gates, unsafe-write boundaries, or closeout proof.
17+
18+
The reliability target here is narrow: make the evidence visible enough that a reviewer can inspect what was built and what is still not proven.
19+
20+
## What is implemented
21+
22+
### 1. False-Pass Gate
23+
24+
The false-pass gate is a local schema-level check for agent success claims. It blocks success language when required passing evidence pointers or explicit `cannot_claim` boundaries are missing.
25+
26+
Run:
27+
28+
```bash
29+
python scripts/check_false_pass_gate.py --self-test examples/false_pass_gate/fixtures
30+
python -m pytest tests/test_false_pass_gate.py -q
31+
```
32+
33+
### 2. Zero-fixture false-pass fix
34+
35+
The gate was hardened so an empty or missing fixture directory cannot produce a fake pass such as `self_test=PASS cases=0`.
36+
37+
Evidence:
38+
39+
- PR #38: [Fix zero-fixture false-pass gate](https://github.com/yangfei222666-9/taiji/pull/38)
40+
- Merge commit: [`de1907fb`](https://github.com/yangfei222666-9/taiji/commit/de1907fb17ce5492895595767c3484a2e719a7e0)
41+
42+
### 3. GLM-5.2 candidate review bridge
43+
44+
The bridge creates a local candidate-review envelope from sanitized stdin. It does not read repository files, does not write files, does not read API keys, and does not call a provider in dry-run mode.
45+
46+
Run:
47+
48+
```bash
49+
printf 'sanitized summary only; no secrets.\n' \
50+
| python tools/glm52_candidate_review.py --task local_review \
51+
| python -m json.tool
52+
```
53+
54+
Evidence:
55+
56+
- PR #39: [Add GLM candidate review bridge](https://github.com/yangfei222666-9/taiji/pull/39)
57+
- Merge commit: [`342cc55d`](https://github.com/yangfei222666-9/taiji/commit/342cc55d6e14b09667783372b2c85ec5b1cfc068)
58+
59+
### 4. GLM-5.2 provider lock
60+
61+
The local bridge scripts are locked to Zhipu GLM-5.2 only. Dynamic model selection, dynamic endpoint selection, and non-Zhipu API-key environment fallbacks were removed.
62+
63+
Dry-run validation:
64+
65+
```bash
66+
env -u ZHIPUAI_API_KEY -u GLM_API_KEY -u BIGMODEL_API_KEY -u ZAI_API_KEY \
67+
python tools/glm52_smoke.py
68+
```
69+
70+
Expected dry-run properties:
71+
72+
```text
73+
provider_called=false
74+
api_key_read=false
75+
locked_provider=zhipuai
76+
locked_sdk=zhipuai.ZhipuAI
77+
locked_model=glm-5.2
78+
api_key_env=ZHIPUAI_API_KEY
79+
dynamic_model_allowed=false
80+
dynamic_endpoint_allowed=false
81+
fallback_provider_allowed=false
82+
```
83+
84+
Evidence:
85+
86+
- PR #40: [Lock GLM bridge to Zhipu GLM-5.2](https://github.com/yangfei222666-9/taiji/pull/40)
87+
- Merge commit: [`54c2b636`](https://github.com/yangfei222666-9/taiji/commit/54c2b6366e8417b5807bd13338e362aced896969)
88+
- Main CI run: [28108395046](https://github.com/yangfei222666-9/taiji/actions/runs/28108395046)
89+
90+
## Evidence map
91+
92+
| Evidence | Status | What it supports | What it does not prove |
93+
| --- | --- | --- | --- |
94+
| PR #38 merged | Remote main evidence | Empty or missing fixture directories are rejected instead of passing with zero cases | Production readiness |
95+
| PR #39 merged | Remote main evidence | Candidate review envelope exists and is local-only by default | Provider readiness |
96+
| PR #40 merged | Remote main evidence | GLM bridge scripts are locked to `glm-5.2` and `ZHIPUAI_API_KEY` | Long-task readiness |
97+
| Main CI at `54c2b636` | Remote CI evidence | CI passed after PR #40 merged | Runtime deployment readiness |
98+
| Local dry-run commands | Local validation evidence | No provider call or key read in dry-run mode | API execution correctness |
99+
100+
## Recruiter-readable summary
101+
102+
- Built a false-pass gate that rejects unsupported AI-agent success claims when evidence pointers or `cannot_claim` boundaries are missing.
103+
- Fixed a real false-pass bug where missing or empty fixtures could have allowed a zero-case self-test to appear successful.
104+
- Added a candidate-review bridge that prepares sanitized review envelopes while preserving local-only and candidate-only boundaries.
105+
- Locked the GLM bridge to Zhipu GLM-5.2 only, removing dynamic model, endpoint, and fallback-provider paths.
106+
- Kept provider output separate from canonical truth: GLM output can assist planning and review, but local verification, GitHub CI, and human approval remain separate gates.
107+
108+
## `cannot_claim`
109+
110+
This proof does not claim:
111+
112+
- production readiness
113+
- provider readiness
114+
- runtime readiness
115+
- long-task readiness
116+
- autonomous self-improvement
117+
- hiring validation
118+
- customer validation
119+
- Zhipu endorsement
120+
- that provider output is canonical truth
121+
122+
## Next evidence gate
123+
124+
The next useful gate is to validate, commit, and publish this evidence page, then link it from resume or recruiter-facing materials. Do not add another architecture layer before this proof is visible and reviewable.

docs/proof_index.json

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,13 @@
4444
"command": "python scripts/check_false_pass_gate.py --self-test examples/false_pass_gate/fixtures",
4545
"verdict": "LOCAL_VALIDATED when self-test and pytest pass",
4646
"limitation": "Synthetic local fixtures only; this does not execute evidence commands, prove that the success claim is true, prove remote CI, public proof update, production readiness, provider/API readiness, recruiting validation, or canonical truth."
47+
},
48+
{
49+
"claim": "The Agent Reliability proof page links the false-pass gate, zero-fixture fix, GLM candidate-review bridge, and GLM-5.2 provider lock to merged PR and CI evidence.",
50+
"evidence_file": "docs/portfolio/agent-reliability-proof.md",
51+
"command": "test -f docs/portfolio/agent-reliability-proof.md && python -m json.tool docs/proof_index.json",
52+
"verdict": "PORTFOLIO_EVIDENCE_LINKED",
53+
"limitation": "Recruiter-readable documentation only; this does not prove provider readiness, runtime readiness, long-task readiness, production deployment, hiring validation, or that GLM output is canonical truth."
4754
}
4855
],
4956
"blocked_claims": [
@@ -53,6 +60,10 @@
5360
"provider_api_readiness",
5461
"release_evidence_pass",
5562
"trading_or_order_authority",
56-
"false_pass_gate_without_evidence"
63+
"false_pass_gate_without_evidence",
64+
"runtime_readiness",
65+
"long_task_readiness",
66+
"hiring_validation",
67+
"glm_output_as_canonical_truth"
5768
]
5869
}

0 commit comments

Comments
 (0)