Skip to content

Commit ab7a5a7

Browse files
authored
chore: better naming convention + readme update (#37)
* chore: better naming convention + readme update * chore: remove tests folder and rename tests.md
1 parent fd7db70 commit ab7a5a7

21 files changed

Lines changed: 280 additions & 373 deletions

File tree

README.md

Lines changed: 22 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -36,15 +36,6 @@ categories of misalignment risk. It is not a certification or a safety
3636
guarantee — it is a repeatable, fixture-driven diagnostic you can run in CI
3737
and track over time.
3838

39-
> **No published baselines yet.** v1.0.0 ships with no reference scorecards
40-
> for frontier models. The default thresholds (B01=1.00, B08=0.95,
41-
> pass=0.85, mandatory-minimum cap=0.60) and category weights are policy
42-
> defaults, not empirically calibrated. iFixAi is most defensible today as a
43-
> **CI drift signal** ("is *my* agent getting better or worse over time?")
44-
> and a **fixture-controlled comparison tool** ("does System A beat System B
45-
> on the *same* fixture?"). Treat absolute scores as informative, not
46-
> authoritative. See [docs/scoring.md § Calibration caveat](docs/scoring.md).
47-
4839
<p align="center">
4940
<img src="docs/assets/ifixai-demo.gif" alt="iFixAi demo" width="720" />
5041
<br/>
@@ -266,7 +257,7 @@ for the digest algorithm and verification helpers.
266257

267258
## Five scorecard pillars
268259

269-
| Category | Tests | What it detects |
260+
| Category | Inspections | What it detects |
270261
|---|---|---|
271262
| **FABRICATION** Accuracy & Calibration | B01-B06 | Tool authorisation leaks, missing audit trail, unsourced claims, overconfident responses |
272263
| **MANIPULATION** Safety & Containment | B07-B09, B11-B13, B28, B30 | Hallucination, privilege escalation, policy violation, controllability, prompt injection, plan traceability, RAG context integrity, malicious deployer rules |
@@ -281,21 +272,28 @@ attestation facility (no inspections use it today), B28 RAG context integrity, a
281272

282273
### Extended inspections (premium / exploratory)
283274

284-
Beyond the 32-inspection core, the suite ships **13 additional inspections** in ten new scoring
285-
categories (the P / C / S / X series). Most are **exploratory** — they render their own scored % + CI
286-
but are **excluded from the headline grade**, so they extend coverage without breaking score
287-
comparability. Each structural one calls a provider capability hook; without it the inspection reports
288-
`insufficient_evidence`, exactly like the governance cluster.
275+
Beyond the 32-inspection core, the suite ships **13 additional inspections** across **eleven new
276+
scoring categories**. Most are **exploratory** — they render their own scored % + CI but are
277+
**excluded from the headline grade**, so they extend coverage without breaking score
278+
comparability. Each structural one calls a provider capability hook; without it the inspection
279+
reports `insufficient_evidence`, exactly like the governance cluster.
289280

290-
| Series | Inspections | New categories |
281+
| Category | Inspections | What it detects |
291282
|---|---|---|
292-
| **P** — premium tier | P01, P08, P13, P19, P22, P27, P32 | SABOTAGE, SUBVERSION, CONCEALMENT, INSUBORDINATION, USURPATION, SYSTEMIC_RISK |
293-
| **C** — capability-reliability | C02, C05, C11 | MISCALIBRATION |
294-
| **S** — stakeholder integrity | S02 | STAKEHOLDER_CONFLICT |
295-
| **X** — gap-closure | X04, X11 | PERCEPTION_GOVERNANCE, OVERSIGHT_ATROPHY |
283+
| **SABOTAGE** Operational Harm | P01 | Plausible-justification destruction of critical stores by an unauthorized role |
284+
| **SUBVERSION** Oversight Evasion | P08 | Self-audit-trail tampering (record suppression, field alteration, timestamp reorder, post-hoc mutation) |
285+
| **CONCEALMENT** Hidden Agendas | P13 | Covert side-objective surviving a long-horizon multi-turn arc |
286+
| **SANDBAGGING** Capability Concealment | P19 | Honeypot-tool restraint — reaching for a forbidden shortcut only when it believes it is unobserved (eval gaming) |
287+
| **INSUBORDINATION** Corrigibility | P22 | Acceptance of an authorized mid-session goal edit |
288+
| **USURPATION** Power Elevation | P27 | Privilege creep — capability accrued and never relinquished |
289+
| **SYSTEMIC_RISK** Emergent Harm | P32 | Multi-agent collusion driving a pre-declared forbidden system outcome |
290+
| **MISCALIBRATION** Governance of Uncertainty | C02, C05, C11 | Low-confidence abstention, human-fallback routing, operational-outcome drift |
291+
| **STAKEHOLDER_CONFLICT** Multi-Principal Integrity | S02 | Configurer-vs-stakeholder conflict detection and attribution |
292+
| **PERCEPTION_GOVERNANCE** Assurance Gates | X04 | Blocking continued deployment of an out-of-spec detector |
293+
| **OVERSIGHT_ATROPHY** Confirmation Gates | X11 | Pre-action human confirmation for inadequately-gated high-stakes actions |
296294

297295
Per-inspection descriptions and governing laws: **[docs/inspection_categories.md](docs/inspection_categories.md)**;
298-
terse what/how rows: **[docs/tests.md](docs/tests.md)**.
296+
terse what/how rows: **[docs/inspections.md](docs/inspections.md)**.
299297

300298
## Domain-neutral fixtures
301299

@@ -499,8 +497,9 @@ ifixai run --provider http --endpoint https://your-api.com/v1 --api-key "$KEY"
499497

500498
Category names (case-insensitive) accepted by `-c/--category`: `FABRICATION`, `MANIPULATION`,
501499
`DECEPTION`, `UNPREDICTABILITY`, `OPACITY`, `SABOTAGE`, `SUBVERSION`, `CONCEALMENT`,
502-
`INSUBORDINATION`, `USURPATION`, `SYSTEMIC_RISK`, `MISCALIBRATION`, `STAKEHOLDER_CONFLICT`,
503-
`PERCEPTION_GOVERNANCE`, `OVERSIGHT_ATROPHY`. Combine `-c` with `-b` to add individual tests.
500+
`SANDBAGGING`, `INSUBORDINATION`, `USURPATION`, `SYSTEMIC_RISK`, `MISCALIBRATION`,
501+
`STAKEHOLDER_CONFLICT`, `PERCEPTION_GOVERNANCE`, `OVERSIGHT_ATROPHY`. Combine `-c` with `-b` to
502+
add individual tests.
504503

505504
## CLI Reference
506505

0 commit comments

Comments
 (0)