This is the shortest path from clone to credible proof.
If you want evidence before UI:
make validate-strict
make research-intel-refresh
make benchmark-demoWhat you get:
- strict validation across API, web, and demo evaluation
- seeded research-intel artifacts in
artifacts/research-intel/ - a local benchmark snapshot in
artifacts/benchmarks/demo-benchmark.json - a readable benchmark summary in
artifacts/benchmarks/demo-benchmark.md - a casebook-style proof artifact with dataset coverage, top-k queue previews, and reviewer cues for each benchmark case in the current 10-report demo corpus
If you want the reviewer workflow:
docker compose up --buildThen open:
- web:
http://localhost:3000 - research intelligence:
http://localhost:3000/research-intel - worklist:
http://localhost:3000/cases - benchmark proof:
http://localhost:3000/prooffor the checked-in demo comparison plus the multi-cohort retrospective-style sample - imports:
http://localhost:3000/imports - API docs:
http://localhost:8000/docs
To import the sample dataset:
curl -F "file=@data/examples/reports.jsonl" http://localhost:8000/api/v1/imports/reportsThe repo now carries a checked-in benchmark snapshot for outside collaborators:
- JSON snapshot:
docs/examples/demo-benchmark-current.json - Markdown summary:
docs/examples/demo-benchmark-current.md
Those published artifacts now include:
- 10 labeled demo reports across 5 benchmark buckets
- dataset coverage by benchmark bucket
- current top-k queue previews for rules and hybrid scoring
- reviewer-facing casebook notes plus expected rationale cues per case
Maintain that published proof with:
make refresh-demo-proofUse this when the demo benchmark changes and you want the landing page plus docs to reflect the new state.
To seed the local pancreatic oncology watchtower and generate the first cited digest:
make research-intel-refreshFor the reproducible discovery-ingest path:
make research-intel-discovery-fixtureTo inspect the watchtower schedule or run only due sources:
make research-intel-schedule
make research-intel-ingest-due
make research-intel-watchtowerThen explore:
/research-intelfor the primary discovery dashboard, ingest health, topic heat, and recent digest activity/research-intel/schedulefor due-source planning, live-ready coverage, and watchtower cadence/research-intel/documentsfor the normalized document explorer/research-intel/graphfor the active pancreatic oncology knowledge graph/research-intel/digestsfor the cited council summaries/research-intel/opportunitiesfor benchmark, rule, trial-catalog, and tooling proposals
Current note:
- The current implementation is still intentionally seeded and manual-triggered. It establishes the data model, audit trail, case-brief linkage, and contributor workflow before live discovery coverage is expanded further.
If you need release or pilot evidence rather than just a local walkthrough:
- release operator path: RELEASE_RUNBOOK.md
- release checklist: RELEASE_READINESS.md
Hosted pilot evidence currently flows through the Pilot Smoke workflow with manual smoke_scope=fhir-success-only and smoke_scope=hl7-success-only dispatches. The current recorded baseline is hosted FHIR run #23563902873 plus hosted HL7 trial #23564057337.
- comparable external bundle:
make benchmark-external \
LABELS=docs/examples/benchmark-label-template.jsonl \
PREDICTIONS=docs/examples/benchmark-prediction-template.jsonl \
MANIFEST=docs/examples/benchmark-manifest-template.jsonThat external helper now emits a casebook-shaped JSON and Markdown bundle with dataset coverage, top-k queue previews, and reviewer-facing notes drawn from the optional label fields.
If you want the checked-in external proof packs rather than the tiny template pack, run:
make benchmark-external-sample
make refresh-external-sample-proof
make benchmark-external-wording-sample
make refresh-external-wording-sample-proofThose commands write and refresh:
artifacts/benchmarks/retrospective-benchmark-sample.jsonartifacts/benchmarks/retrospective-benchmark-sample.mdartifacts/benchmarks/wording-variance-benchmark-sample.jsonartifacts/benchmarks/wording-variance-benchmark-sample.mddocs/examples/published-external-benchmarks.jsondocs/examples/retrospective-benchmark-sample-current.jsondocs/examples/retrospective-benchmark-sample-current.mddocs/examples/wording-variance-benchmark-sample-current.jsondocs/examples/wording-variance-benchmark-sample-current.md
The /proof page now reads the published demo proof plus every external pack listed in docs/examples/published-external-benchmarks.json, compares those packs side by side, and still links into each full casebook section. Registry entries should keep unique id values, point at checked-in relative JSON snapshot paths, and stay green under make validate-strict.
- product framing: README.md
- benchmark philosophy: EVALUATION.md
- deployment and smoke matrix: DEPLOYMENT.md
- open-source posture: OPEN_SOURCE_STRATEGY.md
- research-use workflow software only
- human review stays in the loop
- explainability is a feature, not an afterthought
- benchmark claims should stay reproducible and honest