Vigil

Workspace-scale adverse-event horizon scanner for clinical AI. Scans an entire workspace in one shot, runs a three-method statistical ensemble (Bayesian online change-point detection + Poisson z-score syndromic surveillance + CUSUM control chart) across every drug × event × stratum combination, applies Benjamini–Hochberg FDR control across the stratum grid, and emits signed FHIR DetectedIssue resources for each implicated patient — plus a workspace-level Composition with the statistical methodology travelling inline as a Provenance attachment.

The thesis: signal-significance decisions are made by statistics, not by an LLM. Three independent methods (BOCPD + Poisson-z + CUSUM) must all agree before any signal fires. The LLM is invoked only for clinician-facing narrative — CI greps the repo and fails the build if any LLM call escapes the vigil/vigil_core/explain/ allow-list.

Live demo

Surface	URL	What it serves
MCP server	`https://vigil-mcp-ag707.fly.dev/mcp`	Scan, detect, explain, and write tools over MCP
A2A workspace agent	`https://vigil-a2a.fly.dev/`	Agent-to-Agent endpoint, workspace-scope (no per-patient selection required)

Both deployed on Fly.io. Local quick-start in DEMO.md runs a full workspace scan against 65 committed FHIR patients in ~3 seconds.

What's different about Vigil

Every per-patient agent in this ecosystem requires a selected patient before it can do anything. Vigil does not. Vigil ships as a workspace-scope A2A agent and a workspace-scope MCP server: experimental.fhir_context_required.value=false, experimental.workspace_scope.value=true. The operator invokes Vigil with a single workspace-level prompt — "scan the workspace for emerging adverse-event clusters" — and receives a complete signal report: which clusters fired, which methods agreed, which patients are implicated, what the Benjamini–Hochberg q-value is, and an ed25519 signature on the whole packet.

Frozen evidence — Eval v1.0

Cluster detection : 30/30 (100%) on C30 corpus
                     10/10 true clusters flagged
                      0/20 false positives
                    expected FDR 0.000  (gate ≤ 0.10)
Live demo path    : 65 FHIR patients · 4/4 hero clusters fired
                    through compose_workspace_signal_report
Calibration       :  4/4 engineered hero clusters clear
                    z-score + BOCPD + CUSUM with margin
B20/T15 corpora   : 20 baseline + 15 temporal cases committed
                    and schema/aggregate-validated offline
─────────────────────────────────────────
65 committed eval cases + live FHIR workspace evidence:
  python -m vigil.evals.runner --frozen v1
  python -m vigil.scripts.calibrate_demo_thresholds
  pytest vigil/tests/test_demo_workspace_live_path.py -q

C30 is the numeric runner (frozen 30/30). B20 and T15 are committed corpora with validator tests rather than part of the C30 runner command. No live openFDA calls at demo or judging time — frozen runs use only the committed seed parquet at data/faers_seed.parquet.

Hero scenarios

Five scenarios round-trip end-to-end against the 65-patient demo workspace at fhir-bundles/:

#	Scenario	Signal
1	Warfarin + Amiodarone bleeding	15 patients on combo, 15 major bleeding events. Calibration: z=15.88, BOCPD=1.000, CUSUM fired; BH q ≤ 0.05 live.
2	NSAID + ACE-I + Diuretic AKI	12 patients on triple-whammy, 12 AKI events. Connects to Lumen's Margaret scenario. z=16.26, BOCPD=1.000, CUSUM fired.
3	Sulfonylurea ≥75y hypoglycaemia	10 patients ≥75 on sulfonylurea, 10 severe hypoglycaemia events. Age-stratified detection. z=17.71, BOCPD=1.000, CUSUM fired.
4	Methotrexate hepatotoxicity	8 patients on long-term MTX, 8 ALT/LFT elevations. Observation-based extraction (no `Condition` code needed). z=12.02, BOCPD=1.000, CUSUM fired.
5	No-signal control	20-patient control cohort. Vigil scans, returns zero flagged clusters. The all-three-agree gate is not a rubber stamp.

Methodology

Three independent methods must all agree before any signal fires:

Bayesian online change-point detection — Adams & MacKay 2007 (arXiv:0710.3742). Posterior over run-length; threshold default 0.95. Hazard rate recorded in signal metadata.
Poisson z-score syndromic surveillance — CDC EARS (Hutwagner 2003) / FDA Sentinel (Platt 2012). Observed-vs-expected rate-ratio z on stratified workspace cohorts; threshold default 3.0.
CUSUM control chart — Page 1954. Running cumulative-sum upward chart with reference, slack k, decision threshold h; threshold default 4.0.

A signal fires only when all three clear their thresholds. The Benjamini & Hochberg 1995 q-value is computed once across the workspace stratum grid and reported on every fired signal (target FDR ≤ 0.05; over-target signals are surfaced but flagged q_above_target).

LLMs operate only inside vigil/vigil_core/explain/ for clinician-facing narrative. The CI invariant test_no_hardcoded_llm_model_strings rejects any LLM client invocation outside that allow-list.

Full methodology with hazard-rate sensitivity analysis: docs/METHODOLOGY.md.

Architecture

flowchart TB
    A[Workspace prompt:<br/>"scan for emerging AE clusters"] --> O[Vigil A2A Orchestrator<br/>workspace-scope agent]
    O -->|self-fan| M[Vigil MCP<br/>FastMCP · workers=1]

    M --> S[SCAN<br/>build cohort · stratify ·<br/>extract recent events]
    S --> D[DETECT ensemble]

    subgraph D[DETECT ensemble — all 3 must agree]
      D1[BOCPD<br/>Adams &amp; MacKay 2007]
      D2[Poisson z-score<br/>CDC EARS · FDA Sentinel]
      D3[CUSUM<br/>Page 1954]
    end

    D --> F[BH FDR control<br/>across stratum grid]
    F --> E[EXPLAIN<br/>guideline cite · confounders<br/>· LLM narrative]
    E --> W[COMPOSE + WRITE]

    W --> R[Signed FHIR Bundle<br/>DetectedIssue × N patients +<br/>workspace Composition +<br/>Provenance + AuditEvent<br/>ed25519 · SHA-256 chain]

Module map:

vigil/vigil_mcp/tools/scan/ — workspace cohort builder, stratification, recent-event extraction
vigil/vigil_mcp/tools/detect/ — BOCPD, z-score, CUSUM, ensemble combiner, baseline store
vigil/vigil_mcp/tools/explain/ — guideline citation, confounder check, LLM narrative
vigil/vigil_mcp/tools/write/ — DetectedIssue, workspace Composition, audit Provenance
vigil/vigil_mcp/tools/compose_workspace_signal_report.py — end-to-end composer
vigil/vigil_orchestrator/ — A2A endpoint with workspace-scope agent card
vigil/vigil_core/explain/ — explanation logic (sole LLM call site)
vigil/evals/ — C30 numeric runner + B20/T15 validators
shared/ — vendored FHIR client, ed25519 audit chain, LLM client, types

Full walkthrough: ARCHITECTURE.md.

Determinism guarantees

Enforced by CI, not aspirational:

Statistics decide signals, not LLMs. Three methods, all must agree, plus BH FDR control. The LLM only writes clinician-facing narrative on signals already fired.
LLM call sites are an allow-list. vigil/vigil_core/explain/ only. CI greps the rest of the repo for claude-, gpt-, o1-, haiku, gemini- and fails the build on a hit outside the allow-list.
uvicorn --workers 1 everywhere. FastMCP's session cache breaks under multi-worker. CI rejects any other value.
Forward-hash audit chain. Every DetectedIssue and the workspace Composition are ed25519-signed and SHA-256 chained in canonical JSON. Chain.verify() raises on tamper.
Frozen eval is hermetic. python -m vigil.evals.runner --frozen v1 uses only committed parquet data; no network, no openFDA hits, no API keys required.

Quick start

git clone https://github.com/AbhinavGupta707/Vigil
cd Vigil
python3 -m venv .venv && source .venv/bin/activate
pip install -e '.[dev]'
pytest                                                         # full suite
python -m vigil.evals.runner --frozen v1                       # eval headline
uvicorn vigil.vigil_mcp.main:app --workers 1                   # MCP at :8000
uvicorn vigil.vigil_orchestrator.main:app --port 8080 --workers 1  # A2A at :8080

Hands-on hero scenario walkthrough: DEMO.md.

What makes this engineering-substantial

A statistical ensemble that won't fire alone. Three independent methods, three different theoretical bases (Bayesian, frequentist, control-chart). Any one might be wrong; all three agreeing is a strong signal.
FDR-controlled across the stratum grid. Benjamini–Hochberg q-value computed once over the full grid, not per-cluster, so multiple-testing inflation doesn't silently quadruple the false-positive rate.
Workspace-scope agent capability. Most marketplace agents need a patient context. Vigil exposes experimental.workspace_scope=true and experimental.fhir_context_required=false, demonstrating the first surveillance-style agent against a real FHIR workspace.
FHIR-native output. Not a JSON report, not a CSV — signed FHIR R4 DetectedIssue resources, one per implicated patient, plus a workspace-level Composition. Slots directly into downstream EHR pipelines.
Verifiable audit chain. ed25519 per entry, SHA-256 forward chain. External auditors can verify the entire chain with only the public key.

Companion product

Vigil ships alongside Lumen — a formal-verification meta-layer for prospective clinical recommendations. Same author, shared infrastructure (shared/fhir/, shared/audit/, shared/llm/, shared/types/ are vendored in both repos). Different scope: Lumen proves safety for a single recommendation per patient; Vigil detects harm signals retrospectively across a workspace. Together they form a forward-proof + backward-surveillance stack.

License, clinical disclaimer & FAERS attribution

MIT. See LICENSE. This is a research and engineering demonstration — not a medical device. It has not been evaluated by any regulatory authority and must not be used to make clinical decisions for real patients.

This work uses openFDA FAERS (FDA Adverse Event Reporting System) data as a prior baseline for surveillance comparison only — not as standalone clinical evidence. Source: U.S. FDA — openFDA Drug Adverse Event endpoint (https://api.fda.gov/drug/event.json). License: public domain. Use governed by openFDA terms. Disclaimer: "Do not rely on openFDA to make decisions regarding medical care. Always speak to your health provider about the risks and benefits of FDA-regulated products." Snapshot provenance: vigil/data/faers_seed_metadata.json.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.github/workflows		.github/workflows
docs		docs
shared		shared
vigil		vigil
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
DEMO.md		DEMO.md
Dockerfile.a2a		Dockerfile.a2a
Dockerfile.mcp		Dockerfile.mcp
LICENSE		LICENSE
README.md		README.md
fly.a2a.toml		fly.a2a.toml
fly.toml		fly.toml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vigil

Live demo

What's different about Vigil

Frozen evidence — Eval v1.0

Hero scenarios

Methodology

Architecture

Determinism guarantees

Quick start

What makes this engineering-substantial

Companion product

License, clinical disclaimer & FAERS attribution

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Vigil

Live demo

What's different about Vigil

Frozen evidence — Eval v1.0

Hero scenarios

Methodology

Architecture

Determinism guarantees

Quick start

What makes this engineering-substantial

Companion product

License, clinical disclaimer & FAERS attribution

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages