Skip to content

AbhinavGupta707/Vigil

Repository files navigation

Vigil

Workspace-scale adverse-event horizon scanner for clinical AI. Scans an entire workspace in one shot, runs a three-method statistical ensemble (Bayesian online change-point detection + Poisson z-score syndromic surveillance + CUSUM control chart) across every drug × event × stratum combination, applies Benjamini–Hochberg FDR control across the stratum grid, and emits signed FHIR DetectedIssue resources for each implicated patient — plus a workspace-level Composition with the statistical methodology travelling inline as a Provenance attachment.

The thesis: signal-significance decisions are made by statistics, not by an LLM. Three independent methods (BOCPD + Poisson-z + CUSUM) must all agree before any signal fires. The LLM is invoked only for clinician-facing narrative — CI greps the repo and fails the build if any LLM call escapes the vigil/vigil_core/explain/ allow-list.

CI Python 3.11+ License: MIT


Live demo

Surface URL What it serves
MCP server https://vigil-mcp-ag707.fly.dev/mcp Scan, detect, explain, and write tools over MCP
A2A workspace agent https://vigil-a2a.fly.dev/ Agent-to-Agent endpoint, workspace-scope (no per-patient selection required)

Both deployed on Fly.io. Local quick-start in DEMO.md runs a full workspace scan against 65 committed FHIR patients in ~3 seconds.


What's different about Vigil

Every per-patient agent in this ecosystem requires a selected patient before it can do anything. Vigil does not. Vigil ships as a workspace-scope A2A agent and a workspace-scope MCP server: experimental.fhir_context_required.value=false, experimental.workspace_scope.value=true. The operator invokes Vigil with a single workspace-level prompt — "scan the workspace for emerging adverse-event clusters" — and receives a complete signal report: which clusters fired, which methods agreed, which patients are implicated, what the Benjamini–Hochberg q-value is, and an ed25519 signature on the whole packet.


Frozen evidence — Eval v1.0

Cluster detection : 30/30 (100%) on C30 corpus
                     10/10 true clusters flagged
                      0/20 false positives
                    expected FDR 0.000  (gate ≤ 0.10)
Live demo path    : 65 FHIR patients · 4/4 hero clusters fired
                    through compose_workspace_signal_report
Calibration       :  4/4 engineered hero clusters clear
                    z-score + BOCPD + CUSUM with margin
B20/T15 corpora   : 20 baseline + 15 temporal cases committed
                    and schema/aggregate-validated offline
─────────────────────────────────────────
65 committed eval cases + live FHIR workspace evidence:
  python -m vigil.evals.runner --frozen v1
  python -m vigil.scripts.calibrate_demo_thresholds
  pytest vigil/tests/test_demo_workspace_live_path.py -q

C30 is the numeric runner (frozen 30/30). B20 and T15 are committed corpora with validator tests rather than part of the C30 runner command. No live openFDA calls at demo or judging time — frozen runs use only the committed seed parquet at data/faers_seed.parquet.


Hero scenarios

Five scenarios round-trip end-to-end against the 65-patient demo workspace at fhir-bundles/:

# Scenario Signal
1 Warfarin + Amiodarone bleeding 15 patients on combo, 15 major bleeding events. Calibration: z=15.88, BOCPD=1.000, CUSUM fired; BH q ≤ 0.05 live.
2 NSAID + ACE-I + Diuretic AKI 12 patients on triple-whammy, 12 AKI events. Connects to Lumen's Margaret scenario. z=16.26, BOCPD=1.000, CUSUM fired.
3 Sulfonylurea ≥75y hypoglycaemia 10 patients ≥75 on sulfonylurea, 10 severe hypoglycaemia events. Age-stratified detection. z=17.71, BOCPD=1.000, CUSUM fired.
4 Methotrexate hepatotoxicity 8 patients on long-term MTX, 8 ALT/LFT elevations. Observation-based extraction (no Condition code needed). z=12.02, BOCPD=1.000, CUSUM fired.
5 No-signal control 20-patient control cohort. Vigil scans, returns zero flagged clusters. The all-three-agree gate is not a rubber stamp.

Methodology

Three independent methods must all agree before any signal fires:

  1. Bayesian online change-point detection — Adams & MacKay 2007 (arXiv:0710.3742). Posterior over run-length; threshold default 0.95. Hazard rate recorded in signal metadata.
  2. Poisson z-score syndromic surveillance — CDC EARS (Hutwagner 2003) / FDA Sentinel (Platt 2012). Observed-vs-expected rate-ratio z on stratified workspace cohorts; threshold default 3.0.
  3. CUSUM control chart — Page 1954. Running cumulative-sum upward chart with reference, slack k, decision threshold h; threshold default 4.0.

A signal fires only when all three clear their thresholds. The Benjamini & Hochberg 1995 q-value is computed once across the workspace stratum grid and reported on every fired signal (target FDR ≤ 0.05; over-target signals are surfaced but flagged q_above_target).

LLMs operate only inside vigil/vigil_core/explain/ for clinician-facing narrative. The CI invariant test_no_hardcoded_llm_model_strings rejects any LLM client invocation outside that allow-list.

Full methodology with hazard-rate sensitivity analysis: docs/METHODOLOGY.md.


Architecture

flowchart TB
    A[Workspace prompt:<br/>"scan for emerging AE clusters"] --> O[Vigil A2A Orchestrator<br/>workspace-scope agent]
    O -->|self-fan| M[Vigil MCP<br/>FastMCP · workers=1]

    M --> S[SCAN<br/>build cohort · stratify ·<br/>extract recent events]
    S --> D[DETECT ensemble]

    subgraph D[DETECT ensemble — all 3 must agree]
      D1[BOCPD<br/>Adams &amp; MacKay 2007]
      D2[Poisson z-score<br/>CDC EARS · FDA Sentinel]
      D3[CUSUM<br/>Page 1954]
    end

    D --> F[BH FDR control<br/>across stratum grid]
    F --> E[EXPLAIN<br/>guideline cite · confounders<br/>· LLM narrative]
    E --> W[COMPOSE + WRITE]

    W --> R[Signed FHIR Bundle<br/>DetectedIssue × N patients +<br/>workspace Composition +<br/>Provenance + AuditEvent<br/>ed25519 · SHA-256 chain]
Loading

Module map:

  • vigil/vigil_mcp/tools/scan/ — workspace cohort builder, stratification, recent-event extraction
  • vigil/vigil_mcp/tools/detect/ — BOCPD, z-score, CUSUM, ensemble combiner, baseline store
  • vigil/vigil_mcp/tools/explain/ — guideline citation, confounder check, LLM narrative
  • vigil/vigil_mcp/tools/write/DetectedIssue, workspace Composition, audit Provenance
  • vigil/vigil_mcp/tools/compose_workspace_signal_report.py — end-to-end composer
  • vigil/vigil_orchestrator/ — A2A endpoint with workspace-scope agent card
  • vigil/vigil_core/explain/ — explanation logic (sole LLM call site)
  • vigil/evals/ — C30 numeric runner + B20/T15 validators
  • shared/ — vendored FHIR client, ed25519 audit chain, LLM client, types

Full walkthrough: ARCHITECTURE.md.


Determinism guarantees

Enforced by CI, not aspirational:

  • Statistics decide signals, not LLMs. Three methods, all must agree, plus BH FDR control. The LLM only writes clinician-facing narrative on signals already fired.
  • LLM call sites are an allow-list. vigil/vigil_core/explain/ only. CI greps the rest of the repo for claude-, gpt-, o1-, haiku, gemini- and fails the build on a hit outside the allow-list.
  • uvicorn --workers 1 everywhere. FastMCP's session cache breaks under multi-worker. CI rejects any other value.
  • Forward-hash audit chain. Every DetectedIssue and the workspace Composition are ed25519-signed and SHA-256 chained in canonical JSON. Chain.verify() raises on tamper.
  • Frozen eval is hermetic. python -m vigil.evals.runner --frozen v1 uses only committed parquet data; no network, no openFDA hits, no API keys required.

Quick start

git clone https://github.com/AbhinavGupta707/Vigil
cd Vigil
python3 -m venv .venv && source .venv/bin/activate
pip install -e '.[dev]'
pytest                                                         # full suite
python -m vigil.evals.runner --frozen v1                       # eval headline
uvicorn vigil.vigil_mcp.main:app --workers 1                   # MCP at :8000
uvicorn vigil.vigil_orchestrator.main:app --port 8080 --workers 1  # A2A at :8080

Hands-on hero scenario walkthrough: DEMO.md.


What makes this engineering-substantial

  1. A statistical ensemble that won't fire alone. Three independent methods, three different theoretical bases (Bayesian, frequentist, control-chart). Any one might be wrong; all three agreeing is a strong signal.
  2. FDR-controlled across the stratum grid. Benjamini–Hochberg q-value computed once over the full grid, not per-cluster, so multiple-testing inflation doesn't silently quadruple the false-positive rate.
  3. Workspace-scope agent capability. Most marketplace agents need a patient context. Vigil exposes experimental.workspace_scope=true and experimental.fhir_context_required=false, demonstrating the first surveillance-style agent against a real FHIR workspace.
  4. FHIR-native output. Not a JSON report, not a CSV — signed FHIR R4 DetectedIssue resources, one per implicated patient, plus a workspace-level Composition. Slots directly into downstream EHR pipelines.
  5. Verifiable audit chain. ed25519 per entry, SHA-256 forward chain. External auditors can verify the entire chain with only the public key.

Companion product

Vigil ships alongside Lumen — a formal-verification meta-layer for prospective clinical recommendations. Same author, shared infrastructure (shared/fhir/, shared/audit/, shared/llm/, shared/types/ are vendored in both repos). Different scope: Lumen proves safety for a single recommendation per patient; Vigil detects harm signals retrospectively across a workspace. Together they form a forward-proof + backward-surveillance stack.


License, clinical disclaimer & FAERS attribution

MIT. See LICENSE. This is a research and engineering demonstration — not a medical device. It has not been evaluated by any regulatory authority and must not be used to make clinical decisions for real patients.

This work uses openFDA FAERS (FDA Adverse Event Reporting System) data as a prior baseline for surveillance comparison only — not as standalone clinical evidence. Source: U.S. FDA — openFDA Drug Adverse Event endpoint (https://api.fda.gov/drug/event.json). License: public domain. Use governed by openFDA terms. Disclaimer: "Do not rely on openFDA to make decisions regarding medical care. Always speak to your health provider about the risks and benefits of FDA-regulated products." Snapshot provenance: vigil/data/faers_seed_metadata.json.

About

Workspace-scale adverse-event horizon scanner — BOCPD + Poisson z-score + CUSUM ensemble with Benjamini–Hochberg FDR control. Workspace-scope A2A agent, signed FHIR DetectedIssue output.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors