CRL Intelligence Graph

Why did FDA say no — and is it your risk?

An open, free knowledge graph of FDA Complete Response Letters (CRLs) — the rejection letters FDA sends when it won't approve a drug — fused with the enforcement record (recalls, Import Alert 66-40 detentions, debarments) at the sponsor and facility level.

The paid tools (FDAzilla/Redica and friends) give you a searchable list of letters. They can't answer the question a Head of Regulatory Intelligence actually asks: "a company in my supply chain got a CRL — does that company also show up in FDA's enforcement data, and is anyone on my vendor list exposed?" This tool answers that, for free, and lets you point it at your own data to make it about you.

Built entirely on public FDA records. MIT licensed. Runs with zero pip install (Python standard library only).

Disclaimer. This is an educational tool built on public FDA records. It is not regulatory, legal, or investment advice. Every fuzzy match is confidence-scored and is not a definitive identification. Absence of a signal is not a clean bill of health. Discuss any judgment call with qualified regulatory counsel.

What's in the box

A 2,360-node graph built from 333 structured CRLs joined to the public enforcement record:

Node type	Count
CRLs	333
Companies (sponsors)	260
Drugs	320
Deficiency types	50
Therapeutic areas	22
Manufacturing firms	681
Enforcement records	263
Watch-list entries	431

Screened against 443 Import Alert 66-40 firms, 272 debarment-list entries, and 3,000 recall records.

Four surfaces, all included:

Graph explorer (portal/explorer.html) — an interactive Cytoscape.js view of the cross-enforcement subgraph: click a sponsor, expand to its CRLs, drugs, deficiency types, and enforcement footprint. Filter by node type.
648-page wiki (portal/wiki/) — a cross-linked page per company, drug, deficiency type, and therapeutic area, so every node is a readable, linkable record.
Vendor screener — paste your CDMO / vendor list and get any name that appears on an FDA watch list, with a confidence score and the source record. Plus a transparent, fully traceable exposure index (descriptive, not predictive).
A grounded companion — a chat guide that answers questions about the graph by citing the real nodes and numbers (never invents a figure) and walks you to the screener. Bring your own Anthropic key; the rest of the portal works without it.

The findings it surfaces (all from public data)

15 sponsors carry both a CRL and an enforcement footprint (a recall, an Import Alert detention, or a debarment) — the cross-enforcement intersection the searchable databases don't compute.
Repeat-rejected sponsors: Teva, Mylan, and Celltrion each appear across 5 CRLs.
Where drugs die: CMC (chemistry/manufacturing/controls) deficiencies dominate at 188 citations, ahead of Clinical (150) and Facility (46). The full per-therapeutic-area failure fingerprint is in docs/FINDING_cascade.md.

Quickstart (under a minute, zero dependencies)

git clone https://github.com/gauravpandey36/crl-intelligence-graph
cd crl-intelligence-graph
python3 portal/server.py 8791
# open http://localhost:8791

That serves the dashboard, the graph explorer, the wiki, and the vendor screener — all from the Python standard library. To enable the chat companion, bring your own key:

export ANTHROPIC_API_KEY=sk-ant-...   # only the companion needs this
python3 portal/server.py 8791

Bring your own data — make it about you

This is the part the incumbents can't give you: fuse your internal lists with the public record.

Vendor / CDMO screen (works today). POST /api/screen {"vendors": ["...","..."]} or paste into the portal. Returns each name's watch-list hits with confidence + source + date, and a traceable exposure index. CLI: python3 screener/screen.py.
Your CSV. screener/screen.py reads a vendor CSV directly (screen_csv).
Veeva Vault RIM (documented upgrade). The screener takes a list of names; wiring it to a Veeva VQL query (SELECT name__v FROM ...) over OAuth2 is a thin adapter — your admin enables API access, you map the name field, and the same screen runs over your live RIM data. This is a v2 connector, intentionally not bundled (it needs per-customer credentials).

How it's built (the pipeline, all re-runnable)

scripts/normalize_crls.py     # CRLs -> canonical sponsor/drug/deficiency entities
scripts/ingest_enforcement.py # openFDA recalls + Import Alert 66-40 + debarment (keyless)
scripts/build_graph.py        # NetworkX graph: typed nodes + edges, entity resolution
scripts/analytics.py          # centrality, cross-enforcement, repeat-rejected, fingerprints
scripts/build_portal.py       # the 648-page wiki + Cytoscape explorer + dashboard

The pre-built data/, enforcement/, and graph/ JSON are included, so you can explore immediately. Re-run the pipeline to refresh from updated FDA data (the CRL source is the companion repo gauravpandey36/fda-crl-intelligence).

No graph database required — the graph is plain NetworkX → JSON → Cytoscape.js. No vendor lock-in, no subscription, no Neo4j.

Honest limitations

Read docs/KNOWN_LIMITATIONS.md before you trust a result. The short version: matching is company-level (CRLs redact the facility); recalls are recorded under the US distributor, not the foreign manufacturer (an FEI-based join is the v2 fix); the exposure index is descriptive, not predictive; and fuzzy matches are confidence-scored, never definitive. The tool is built to miss a match rather than assert a wrong one.

Fork it and prove it in an afternoon

Clone, run python3 portal/server.py 8791, open the explorer.
Paste a few of your real CDMOs into the screener — see what's already public.
Re-run scripts/ against the latest FDA pull to confirm the numbers.
Plug your own list (CSV today, Veeva VQL with the documented adapter).

If it's useful, bring it in-house and point it at your data. That's the whole idea.

License

MIT — see LICENSE. Public FDA data. Educational, not regulatory advice.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.claude		.claude
companion		companion
data		data
docs		docs
enforcement		enforcement
graph		graph
portal		portal
screener		screener
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
LINKEDIN.md		LINKEDIN.md
Procfile		Procfile
README.md		README.md
railway.json		railway.json
requirements.txt		requirements.txt
runtime.txt		runtime.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CRL Intelligence Graph

What's in the box

The findings it surfaces (all from public data)

Quickstart (under a minute, zero dependencies)

Bring your own data — make it about you

How it's built (the pipeline, all re-runnable)

Honest limitations

Fork it and prove it in an afternoon

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CRL Intelligence Graph

What's in the box

The findings it surfaces (all from public data)

Quickstart (under a minute, zero dependencies)

Bring your own data — make it about you

How it's built (the pipeline, all re-runnable)

Honest limitations

Fork it and prove it in an afternoon

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages