Skip to content

Latest commit

 

History

History
79 lines (57 loc) · 5.95 KB

File metadata and controls

79 lines (57 loc) · 5.95 KB

Audit Modes

Plamen v2.1.4. All modes run through the PTY-supervised V2 driver (plamen_driver.py) on Windows, macOS, and Linux, with either the Claude Code or OpenAI Codex CLI backend.

Comparison

Dimension Light Core Thorough
Target plan Pro Max Max
Orchestrator model Session model (Sonnet default) Opus Opus
Agent models All Sonnet/Haiku Opus + Sonnet Opus + Sonnet (Opus tier resolves to Opus-4.8)
Recon 2 sonnet (no RAG, no fork) 4 agents (RAG fire-and-forget) 4 agents (full RAG)
Breadth 3-4 sonnet 5-9 opus 5-9 opus
Re-scan (3b/3c) Skip Skip Full (sonnet, 2 iters + per-contract)
Depth loop 4 merged sonnet, iter 1 8+ agents, iter 1 Iter 1-3 (Devil's Advocate)
Niche agents Skip Flag-triggered (up to 8) Flag-triggered (up to 8)
Semantic invariants Skip Pass 1 only Pass 1 + Pass 2 (recursive trace)
Confidence scoring None (verdicts only) 2-axis (Evidence + Quality) 4-axis (Evidence, Consensus, Quality, RAG)
RAG Sweep Skip 1 sonnet 1 sonnet
Invariant / Medusa fuzz Skip Skip Yes (EVM, zero budget cost)
Design stress testing Skip Skip 1 reserved slot, UNCONDITIONAL
Chain analysis 1 sonnet (merged) 2 agents 2 agents + iteration 2
Verification (PoC) Chains + ALL Medium+ (sonnet) Chains + ALL Medium+ ALL severities (with fuzz)
Skeptic-Judge Skip Skip HIGH/CRIT
Cross-batch consistency Skip 1 sonnet 1 sonnet
Report 2 agents (sonnet + haiku) 5 agents (opus + sonnet + haiku) 5 agents
Agent count ~18-22 ~30-50 ~40-100

The "agent" counts above are role counts. Agents in the breadth, rescan, and depth rows execute as a driver-owned PTY worker pool (one Claude PTY per agent artifact) rather than as Task() subagents under a Claude coordinator. The driver supervises each worker through a pseudo-terminal and infers turn completion from artifacts written to disk (PLAMEN_STATUS markers) rather than from a stdout/JSON envelope, eliminating the 0-byte-stdio / silent-hang class. See pipeline-phases-presentation.md for execution-shape details.

Cross-platform & ecosystem auto-detect: All modes run on Windows, macOS, and Linux. The driver auto-detects (and auto-corrects, without a halt-to-rerun) the language ecosystem at startup, shows it on the startup banner, and resolves it via manifest-priority rules. POSIX runs use Popen-owned PTY execution with nested-session env isolation.

Haltless resilience: A finished audit is never discarded at the finish line. The report_index, verify, inventory, and resume paths repair-then-degrade and surface any unfinished obligations as flagged Appendix-B items in AUDIT_REPORT.md instead of halting. Stale or corrupt checkpoints recover instead of stranding the run.

Deterministic plumbing: Fragile LLM-prose-parsing phases have been replaced with deterministic Python — mechanical SC report_index recovery, mechanical verify backfill / queue manifests, and the data-loss-free report_dedup builder — so structure-assembly steps no longer depend on LLM output shape.

When to Use Each

  • Light: Pro plan, codebases under 3000 lines, quick first pass. Reports all severities but skips semantic invariants, fuzzing, and design stress testing.
  • Core: Standard audit. Reports all severities, PoC-verifies Medium+, flag-triggered niche agents. Best balance of coverage and cost.
  • Thorough: Maximum coverage. Iterative depth with Devil's Advocate, fuzz campaigns (invariant + Medusa), design stress testing, skeptic-judge for HIGH/CRIT, a Thorough-only Exploration-Completeness skeptic (Phase 4b.6), 4-axis confidence scoring, and a durable post-report report_dedup pass. Use for high-value or pre-deployment audits.

Proven-Only Mode

Available in all modes via --proven-only. Caps findings with only [CODE-TRACE] evidence (no executed PoC or fuzzer counterexample) at Low severity. Useful for benchmark comparisons where only mechanically proven findings should drive severity.


L1 Mode

L1 infrastructure audits use the same Light/Core/Thorough tiers with these differences:

Dimension L1 Difference
Languages Go, Rust (instead of Solidity, Move, etc.)
Depth agents + depth-consensus-invariant, depth-network-surface; no depth-token-flow
Phase 0.5 Bake phase: deterministic SCIP batch indexing — _bake_go_scip (Go) and rust-analyzer SCIP (Rust)
Phase 4c Removed (no chain analysis for L1 point vulnerabilities)
Severity matrix L1-specific, aligned with Immunefi v2.3
Skills 22+ L1 injectable skills (consensus, p2p, mempool, RPC, validator, etc.)
Evidence tags + [DIFF-PASS], [CONFORMANCE-PASS], [NON-DET-PASS], [FUZZ-PASS], [LSP-TRACE]
plamen l1 core /path/to/node-client    # terminal wrapper (both backends)

Inside Claude Code, use /plamen-l1-wizard for interactive L1 audit configuration.

See l1-mode/design.md for the full L1 architecture.


Codex Backend (BETA)

All audit modes work with both Claude Code and OpenAI Codex CLI (codex exec) backends. The Codex backend is a cost-saving beta: it adds a research-backed model/tier/compact configuration, per-job depth fan-out (one codex exec per depth job, which fixes the never-cut-stub halt), and natural-language usage-cap detection that auto-waits instead of halting. Codex depth runs the full Devil's-Advocate iteration 2 and seeds the complete mandatory first-pass artifact set so recon/depth degrade losslessly, and a context-exceeded turn no longer perma-fails.

The V2 driver (plamen_driver.py) auto-detects the active backend via plamen_home() and handles tool translation and path rewriting (~/.claude/ vs ~/.codex/plamen/). Install the Codex backend with plamen install --codex. See the main README for full Codex setup.