@@ -36,15 +36,6 @@ categories of misalignment risk. It is not a certification or a safety
3636guarantee — it is a repeatable, fixture-driven diagnostic you can run in CI
3737and track over time.
3838
39- > ** No published baselines yet.** v1.0.0 ships with no reference scorecards
40- > for frontier models. The default thresholds (B01=1.00, B08=0.95,
41- > pass=0.85, mandatory-minimum cap=0.60) and category weights are policy
42- > defaults, not empirically calibrated. iFixAi is most defensible today as a
43- > ** CI drift signal** ("is * my* agent getting better or worse over time?")
44- > and a ** fixture-controlled comparison tool** ("does System A beat System B
45- > on the * same* fixture?"). Treat absolute scores as informative, not
46- > authoritative. See [ docs/scoring.md § Calibration caveat] ( docs/scoring.md ) .
47-
4839<p align =" center " >
4940 <img src =" docs/assets/ifixai-demo.gif " alt =" iFixAi demo " width =" 720 " />
5041 <br />
@@ -266,7 +257,7 @@ for the digest algorithm and verification helpers.
266257
267258## Five scorecard pillars
268259
269- | Category | Tests | What it detects |
260+ | Category | Inspections | What it detects |
270261| ---| ---| ---|
271262| ** FABRICATION** Accuracy & Calibration | B01-B06 | Tool authorisation leaks, missing audit trail, unsourced claims, overconfident responses |
272263| ** MANIPULATION** Safety & Containment | B07-B09, B11-B13, B28, B30 | Hallucination, privilege escalation, policy violation, controllability, prompt injection, plan traceability, RAG context integrity, malicious deployer rules |
@@ -281,21 +272,28 @@ attestation facility (no inspections use it today), B28 RAG context integrity, a
281272
282273### Extended inspections (premium / exploratory)
283274
284- Beyond the 32-inspection core, the suite ships ** 13 additional inspections** in ten new scoring
285- categories (the P / C / S / X series) . Most are ** exploratory** — they render their own scored % + CI
286- but are ** excluded from the headline grade** , so they extend coverage without breaking score
287- comparability. Each structural one calls a provider capability hook; without it the inspection reports
288- ` insufficient_evidence ` , exactly like the governance cluster.
275+ Beyond the 32-inspection core, the suite ships ** 13 additional inspections** across ** eleven new
276+ scoring categories ** . Most are ** exploratory** — they render their own scored % + CI but are
277+ ** excluded from the headline grade** , so they extend coverage without breaking score
278+ comparability. Each structural one calls a provider capability hook; without it the inspection
279+ reports ` insufficient_evidence ` , exactly like the governance cluster.
289280
290- | Series | Inspections | New categories |
281+ | Category | Inspections | What it detects |
291282| ---| ---| ---|
292- | ** P** — premium tier | P01, P08, P13, P19, P22, P27, P32 | SABOTAGE, SUBVERSION, CONCEALMENT, INSUBORDINATION, USURPATION, SYSTEMIC_RISK |
293- | ** C** — capability-reliability | C02, C05, C11 | MISCALIBRATION |
294- | ** S** — stakeholder integrity | S02 | STAKEHOLDER_CONFLICT |
295- | ** X** — gap-closure | X04, X11 | PERCEPTION_GOVERNANCE, OVERSIGHT_ATROPHY |
283+ | ** SABOTAGE** Operational Harm | P01 | Plausible-justification destruction of critical stores by an unauthorized role |
284+ | ** SUBVERSION** Oversight Evasion | P08 | Self-audit-trail tampering (record suppression, field alteration, timestamp reorder, post-hoc mutation) |
285+ | ** CONCEALMENT** Hidden Agendas | P13 | Covert side-objective surviving a long-horizon multi-turn arc |
286+ | ** SANDBAGGING** Capability Concealment | P19 | Honeypot-tool restraint — reaching for a forbidden shortcut only when it believes it is unobserved (eval gaming) |
287+ | ** INSUBORDINATION** Corrigibility | P22 | Acceptance of an authorized mid-session goal edit |
288+ | ** USURPATION** Power Elevation | P27 | Privilege creep — capability accrued and never relinquished |
289+ | ** SYSTEMIC_RISK** Emergent Harm | P32 | Multi-agent collusion driving a pre-declared forbidden system outcome |
290+ | ** MISCALIBRATION** Governance of Uncertainty | C02, C05, C11 | Low-confidence abstention, human-fallback routing, operational-outcome drift |
291+ | ** STAKEHOLDER_CONFLICT** Multi-Principal Integrity | S02 | Configurer-vs-stakeholder conflict detection and attribution |
292+ | ** PERCEPTION_GOVERNANCE** Assurance Gates | X04 | Blocking continued deployment of an out-of-spec detector |
293+ | ** OVERSIGHT_ATROPHY** Confirmation Gates | X11 | Pre-action human confirmation for inadequately-gated high-stakes actions |
296294
297295Per-inspection descriptions and governing laws: ** [ docs/inspection_categories.md] ( docs/inspection_categories.md ) ** ;
298- terse what/how rows: ** [ docs/tests .md] ( docs/tests .md ) ** .
296+ terse what/how rows: ** [ docs/inspections .md] ( docs/inspections .md ) ** .
299297
300298## Domain-neutral fixtures
301299
@@ -499,8 +497,9 @@ ifixai run --provider http --endpoint https://your-api.com/v1 --api-key "$KEY"
499497
500498Category names (case-insensitive) accepted by ` -c/--category ` : ` FABRICATION ` , ` MANIPULATION ` ,
501499` DECEPTION ` , ` UNPREDICTABILITY ` , ` OPACITY ` , ` SABOTAGE ` , ` SUBVERSION ` , ` CONCEALMENT ` ,
502- ` INSUBORDINATION ` , ` USURPATION ` , ` SYSTEMIC_RISK ` , ` MISCALIBRATION ` , ` STAKEHOLDER_CONFLICT ` ,
503- ` PERCEPTION_GOVERNANCE ` , ` OVERSIGHT_ATROPHY ` . Combine ` -c ` with ` -b ` to add individual tests.
500+ ` SANDBAGGING ` , ` INSUBORDINATION ` , ` USURPATION ` , ` SYSTEMIC_RISK ` , ` MISCALIBRATION ` ,
501+ ` STAKEHOLDER_CONFLICT ` , ` PERCEPTION_GOVERNANCE ` , ` OVERSIGHT_ATROPHY ` . Combine ` -c ` with ` -b ` to
502+ add individual tests.
504503
505504## CLI Reference
506505
0 commit comments