Updated: 2026-04-04
This repository is no longer in early MVP scaffolding. The core research prototype is implemented and validated, Phase 6 hosted smoke evidence capture is now complete, and the codebase now also carries the research-intel branch-wrap work that adds watchtower scheduling, audited watchtower automation, multi-run council comparison, longer-horizon council calibration, contributor packets plus collaborator bundles, richer stress-test experiments, and explicit governance for paid-source or donation-funded expansion. The best next work should now move from the expanded proof set and automated research-intel watchtower toward broader de-identified or externally supplied benchmark inputs and deeper live corpus quality built on top of the existing pilot packaging, proof surfaces, public benchmark pack, external evaluation bundle writer, and cited pancreatic oncology watchtower.
- Deterministic pancreatic triage is implemented with section-aware sentence evidence, rationale codes, persistence, exports, and evaluation.
- Reviewer workflow is implemented end to end with worklist filters, case detail, review actions, feedback capture, hybrid prioritization, and trial matching.
- A second primary pillar, Research Intelligence, is now implemented with a sibling
/research-intelworkspace, a/api/v1/research-intel/*namespace, seeded pancreatic oncology source and topic catalogs, a lightweight domain graph, persisted run audits, normalized research documents, cited evidence spans, topic watchlists, council digests, opportunity proposals, and case-level research briefs. - Landing, about, quickstart, README, and handoff surfaces now frame Research Intelligence as the primary entry point, with triage, imports, benchmark proof, and case briefs positioned as explainable downstream applications of the discovery engine.
- The discovery layer now also includes watchtower scheduling and automation: broader curated literature and trial watches, due-only ingest support, schedule snapshots, consecutive-failure tracking, a
make research-intel-ingest-duepath, amake research-intel-watchtowerpath, an audited/api/v1/research-intel/runs/watchtowertrigger, and a dedicated/research-intel/scheduleworkspace. - Research-intel digests now compare themselves against recent runs, preserving confidence trend, disagreement deltas, recurring open questions, recurring disagreement points, resolved items, and a longer-horizon calibration snapshot instead of treating each digest as a sealed one-off summary.
- Research-intel opportunities now carry contributor packets for issue, benchmark, dataset, rule, trial, case-brief, and tooling follow-through, with packet artifacts written under
artifacts/research-intel/packets/and collaborator bundle artifacts underartifacts/research-intel/collaborator-bundles/. - Safe experiment support is stronger too: benchmark and rule opportunities now expose both readiness and stress-test modes with dimension-level scoring, while still staying proposal-only and human-gated.
- Governance for paid-source access and donation-funded operations is now documented in
docs/RESEARCH_INTELLIGENCE_GOVERNANCE.md, and autonomous payment or procurement remains explicitly out of scope. - The research-intel watchtower is still fixture-first and operator-triggered for reproducibility, but it now also carries selectively live-ready curated connectors with pancreas-aware filtering and source-health tracking instead of remaining purely seeded.
- The watchtower can now run as one audited automation tick: schedule state is captured before and after each tick, digest generation is policy-gated to avoid synthetic churn, and a cache-backed GitHub Actions workflow under
.github/workflows/research-watchtower.ymlcan preserve state across hosted runs. - Phase 1 of the scientific-discovery pivot is now implemented: discovery ingest supports fixture-backed connector runs, selectively live-ready Europe PMC, ClinicalTrials.gov, NCI, FDA, and GitHub-backed open-source discovery connectors, source-health tracking, and per-document provenance plus novelty metadata.
- Phase 2 is now underway as well: the pancreatic oncology graph is richer, typed, family-aware, and exposed through graph-backed entity resolution plus a dedicated
/research-intel/graphsurface. - Phase 3 is now underway as well: the research council persists independent opinions, peer critique, explicit confidence, open questions, evidence gaps, and next experiments.
- Phase 4 is now underway too: opportunities are now typed discovery-to-action specs with evidence bundles, measurable outcomes, downstream artifact hints, and artifact-backed promotion flows.
- Phase 5 is now underway too: benchmark and rule opportunities support safe experiment runs with keep-or-discard ratchet outcomes, experiment artifacts, and persisted last-result summaries on each opportunity.
- Phase 6 pilot work is materially in place: observability, readiness probes, trusted-proxy auth, site scoping, pilot Docker overlays, checked-in env bundles, de-identified research views, FHIR ingestion, HL7 ORU ingestion, structured import metadata persistence, persisted import-run audit records, env-driven field-preference overrides for upstream variability, a dedicated web import workspace with recent-run audit visibility, and live report/FHIR/HL7 success, successful shared-visibility, report-path failed shared-visibility for
validation_error, structured failed shared-visibility for FHIRunsupported_payloadplus HL7parse_error, report-path and structured-adapter site-scope rejection, report-path parse/validation, adapter-specific malformed-import, and generic plus structured cross-actor audit denial smoke coverage in both the header-auth and trusted-proxy pilot paths. - The web app now has clearer outside-collaborator surfaces: the home and about pages frame the product as an explainable, benchmarkable workflow stack, and
/proofnow publishes the checked-in demo benchmark snapshot plus a registry of published external benchmark packs as a public proof surface with side-by-side cross-pack comparison, anchored navigation, dataset coverage, queue previews, and reviewer-facing casebooks. - The repo now includes an adoption-facing quickstart, checked-in benchmark snapshots in
docs/examples/demo-benchmark-current.jsonanddocs/examples/demo-benchmark-current.md, and a refresh path throughmake benchmark-demoplusmake refresh-demo-proof; that published proof now carries benchmark buckets, reviewer cues, expected rationale codes, per-case rules-versus-hybrid outcomes, and an expanded 10-report casebook spanning 5 recurring benchmark buckets. - The repo also now includes a public benchmark pack:
docs/LABELING_GUIDE.md,docs/BENCHMARK_SUBMISSIONS.md, checked-in label and submission templates, a Pydantic benchmark submission schema, and a validator entrypoint throughmake validate-benchmark-submission SUBMISSION=.... - The repo now also includes a comparable external evaluation bundle writer through
scripts/run_external_eval.pyplusmake benchmark-external, producing JSON, Markdown, and validator-compatible submission-draft artifacts from label and prediction JSONL inputs; that external path now also preserves cohort labels, benchmark buckets, reviewer focus, expected rationale cues, optional deidentified report excerpts, dataset coverage, queue previews, and a reviewer-facing casebook in its generated outputs. - The external benchmark helper now also accepts an optional manifest JSON so collaborator-supplied datasets can carry dataset framing, labeling-policy notes, cohort descriptions, benchmark-bucket descriptions, and default operating-point metadata into the generated JSON, Markdown, submission draft, and checked-in proof sample without ad hoc hand editing.
- Published external proof packs are now registry-driven through
docs/examples/published-external-benchmarks.json, andmake validate-strictnow verifies that registry entries keep valid checked-in JSON artifact paths. - The repo now also carries two checked-in deidentified external proof packs: the retrospective-style multi-cohort sample under
docs/examples/retrospective-benchmark-sample-*and a wording-variance challenge pack underdocs/examples/wording-variance-benchmark-sample-*, each with reproducible JSON, Markdown, and submission outputs refreshed by dedicated Make targets. - The release-facing docs now also include a concrete operator runbook in
docs/RELEASE_RUNBOOK.mdplus a reusabledocs/RELEASE_NOTES_TEMPLATE.mdso the remaining hosted smoke evidence path is explicit rather than tribal. - FHIR
DiagnosticReportimports now preserve patient, encounter, and accession metadata from inlineReference.identifiervalues when upstream payloads omit fully resolvedPatient,Encounter, orServiceRequestresources. - FHIR
DiagnosticReportimports now also decode supported text-likepresentedFormattachments, including base64 XHTML narratives with explicit charsets, bundled or containedBinary-backed attachment URLs, merge multiple supported attachments in order, suppress shorter overlapping fragments when a richer attachment already contains them, expand referencedObservation.componentplus groupedObservation.hasMemberfindings into reviewer-visible report text, preserveinterpretationplusreferenceRangemeasurement context, fall back toconclusionCodetext when free-textconclusionis absent, avoid duplicate or cyclic grouped-member expansion, and keep unsectioned attachment findings merged withconclusiontext when that preserves a more reviewer-usable report shape. - HL7 ORU imports now decode base64
EDreport text, normalize repeatedOBX-5values, respect customMSH-2component plus repetition separators, normalize common HL7 escape sequences, and clean composite metadata fields with subcomponent-aware extraction before the existing report-text assembly flows into triage and audit persistence. - Top-level repo docs now reflect the implemented platform instead of the earlier scaffold framing, and the repository includes checked-in contributor, security, and code-of-conduct docs appropriate for a near-1.0 open-source handoff.
- The repo still includes checked-in GitHub Actions workflows for strict validation plus hosted base, attachment-backed FHIR success-path, report-path site-rejection, and structured adapter site-rejection pilot smoke coverage, and the hosted pilot workflow now supports narrower manual FHIR-only plus HL7-only dispatches with per-job raw-log and structured-summary artifacts. Those hosted slices are now verified with green March 25, 2026 runs for FHIR and HL7, while HL7 remains manual-only in the default weekly matrix by explicit decision.
- The highest-value remaining work is no longer bootstrapping or late-Phase-6 smoke capture. The repo-side Phase 6D release polish is already in place, the research-first repositioning is visible across product and docs surfaces, the watchtower and council history loop are in place, the external starter pack matches the reviewer-facing proof shape,
/proofcompares the checked-in packs directly, recurring watchtower automation is in place, and digests now carry longer-horizon calibration plus collaborator bundle output, so the next work should move toward broader collaborator-supplied or deidentified benchmark inputs and deeper live discovery quality instead of reopening parser or smoke-baseline work. - The next benchmark-proof slice should focus on either a larger collaborator-supplied benchmark drop through the registry flow or deeper cross-pack analysis features such as bucket-level comparisons or reviewer-focused filtering, rather than just adding another static pack section.
Confirmed on 2026-04-04:
make validate-strictpasses- Summary:
12 pass, 0 warn, 0 fail - Research-intel validation now covers seeded catalogs plus a temp-database ingest and digest run, reporting
9sources,7topics,25graph nodes,7seeded documents, and a passing artifact-backed digest pipeline - Discovery ingest now also validates
9discovery fixtures and8live-ready sources through the same catalog check, runs the watchtower pipeline in fixture mode for reproducible validation, and verifies both pre-run and post-run schedule state - Opportunity generation now also emits structured JSON and Markdown action specs under
artifacts/research-intel/opportunities/, plus contributor packet artifacts underartifacts/research-intel/packets/ - Research-intel validation now also runs one safe experiment and records a ratchet outcome in the strict pipeline summary
pytestnow passes with155 passedcd apps/api && .venv/bin/python -m pytest tests/test_research_intel.py tests/test_research_intel_connectors.py -qpassed with11 passednpm run lintandnpm run buildinapps/webboth pass with the new/research-intelroutes included in the build
Previously confirmed during the broader post-Phase-6 slice:
make validate-strictpasses- Summary:
10 pass, 0 warn, 0 fail - API tests:
144 passed apps/api/.venv/bin/python -m pytest apps/api/tests/test_imports.py -qpassed with34 passed, including split, overlapping, bundled or containedBinary-backed FHIRpresentedForm, grouped and cycle-safeObservation.hasMember,conclusionCode, measurementinterpretationplusreferenceRange, andObservation.componentcoverageapps/api/.venv/bin/python -m pytest apps/api/tests/test_summarize_pilot_smoke.py apps/api/tests/test_smoke_proxy_auth.py -qpassed with23 passed, covering the hosted smoke summary parser plus the attachment-backed FHIR smoke fixture and its expectedpresentedFormpayload shapeapps/api/.venv/bin/python -m pytest apps/api/tests/test_smoke_proxy_auth.py -qpassed with21 passed, covering the attachment-backed FHIR smoke fixture and its expectedpresentedFormpayload shape- Unsandboxed full local reruns of
make pilot-proxy-demo-fhir-smokeandmake pilot-header-demo-fhir-smokeboth passed on 2026-03-25 after the compose fix, recording persisted import runs50and51plus web,/imports, visible-case, and reviewer round-trip verification - Web checks:
npm run lintandnpm run buildnow both pass throughmake validate-strict - Demo evaluation compare and sweep both run through the validation script
make refresh-demo-proofsucceeded and refreshed the checked-in benchmark snapshot that powers/proofapps/api/.venv/bin/python -m pytest apps/api/tests/test_demo_benchmark_snapshot.py apps/api/tests/test_exports_metrics.py -qpassed with11 passed, covering the casebook-backed proof snapshot and richer evaluation-case payload fieldsapps/api/.venv/bin/python -m pytest apps/api/tests/test_validate_repo.py apps/api/tests/test_external_evaluation.py -qpassed with10 passed, covering the new published-registry validator plus the retrospective and wording-variance external sample packsmake validate-benchmark-submission SUBMISSION=docs/examples/benchmark-submission-template.jsonpassedcd apps/api && .venv/bin/python -m pytest tests/test_benchmark_submission.py -qpassedmake benchmark-external LABELS=docs/examples/benchmark-label-template.jsonl PREDICTIONS=docs/examples/benchmark-prediction-template.jsonl OUT_DIR=/tmp/pancreatic-signal-external-eval BASENAME=template-external TOP_K=2passed and wrote JSON, Markdown, and submission-draft artifactsapps/api/.venv/bin/python -m pytest apps/api/tests/test_external_evaluation.py -qpassed with6 passed, covering manifest-driven dataset framing plus CLI override precedence for collaborator benchmark bundlesmake benchmark-external-samplepassed and wrote a retrospective-style external bundle from the checked-in sample packmake refresh-external-sample-proofpassed and refreshed the checked-in retrospective-style external benchmark proof underdocs/examples/make benchmark-external-wording-samplepassed and wrote the wording-variance challenge bundle underartifacts/benchmarks/make refresh-external-wording-sample-proofpassed and refreshed the checked-in wording-variance benchmark proof underdocs/examples/make validate-benchmark-submission SUBMISSION=docs/examples/wording-variance-benchmark-sample-current-submission.jsonpassedmake validate-benchmark-submission SUBMISSION=/tmp/pancreatic-signal-external-eval/template-external-submission.jsonpassed against the generated draftapps/api/.venv/bin/python -m pytest apps/api/tests/test_imports.py apps/api/tests/test_import_runs.py -qpassed with46 passed, including inlineReference.identifierextraction coverage, attachment-backedpresentedFormdecoding and audit coverage, plus customMSH-2delimiter, escape-sequence, subcomponent, and audit-visibility coverage- Targeted import / de-identification / export coverage also passes for the new import metadata surface
- Import-run audit coverage now passes for success, update counts, validation failures, unsupported payloads, site-scope rejection, and audit-route access control
- Config-override coverage now passes for one FHIR field-preference override and one HL7 field-preference override while preserving defaults
- The new
/importsworkspace is included in the validated web build and uses persisted run IDs to deep-link failed submissions into their audit detail - The published demo proof currently includes a checked-in JSON plus Markdown benchmark summary for outside collaborators, and the web app reads that snapshot directly into dataset coverage cards, queue previews, and reviewer-facing casebook entries; the current checked-in corpus is 10 labeled reports across 5 recurring benchmark buckets, with hybrid-only recovery of the follow-up-only cases
C-005andC-008 - The public external benchmark starter pack now uses the same reviewer-facing fields and publishes a casebook-shaped JSON plus Markdown bundle, so collaborators can benchmark less-synthetic datasets without losing the coverage, queue, and reviewer-cue structure established by the demo proof
- The repo now also includes a 12-case deidentified retrospective-style multi-cohort sample pack plus a 9-case wording-variance challenge pack, both with report excerpts, reviewer cues, cohort coverage, and visible misses or confounders that demonstrate the same casebook structure on less-synthetic external datasets;
/proofnow renders both packs alongside the demo comparison and compares them directly instead of leaving them docs-only - Both pilot demo overlays resolve successfully through
docker compose config - The smoke helper now supports both trusted-identity proxy mode and field-level header-auth mode, plus live verification against
/api/v1/imports/reports,/api/v1/imports/fhir/diagnostic-reports, and/api/v1/imports/hl7/oru - The smoke helper now also supports live
site_scope_rejectionverification through/api/v1/imports/reports, including persisted failed run IDs and zero run-specific visible-case assertions - The smoke helper now also supports live structured adapter
site_scope_rejectionverification through/api/v1/imports/fhir/diagnostic-reportsand/api/v1/imports/hl7/oru, including persisted failed run IDs and zero run-specific visible-case assertions - The smoke helper now also supports live structured adapter audit-visibility verification by reusing persisted FHIR and HL7
site_scope_rejectionruns, proving the owner can inspect both while a second scoped actor receives404on both details and does not see either in the recent-run list - The smoke helper now also supports live
validation_errorandparse_errorverification through/api/v1/imports/reports, including persisted failed run IDs and zero run-specific visible-case assertions - The smoke helper now also supports live structured adapter failure verification through
/api/v1/imports/fhir/diagnostic-reportsand/api/v1/imports/hl7/oru, including persistedunsupported_payloadandparse_errorrun IDs plus zero run-specific visible-case assertions - The smoke helper now also supports live audit-visibility verification by reusing a persisted
site_scope_rejectionrun, proving the owner can inspect it while a second scoped actor receives404on detail and does not see it in the recent-run list - The smoke helper now also supports live shared-visibility verification by reusing a successful
/api/v1/imports/reportsrun, proving a second scoped actor can inspect the same completed run detail and recent-run list entry - The smoke helper now also supports live structured shared-visibility verification by reusing successful FHIR and HL7 imports, proving a second scoped actor can inspect both completed structured runs and their recent-run list entries
- The smoke helper now also supports live failed-run shared-visibility verification by reusing a persisted
/api/v1/imports/reportsvalidation_errorrun, proving a second same-site actor can inspect the failed detail and recent-run entry even whenimported_sitesis empty - The smoke helper now also supports live structured failed-run shared-visibility verification by reusing persisted FHIR
unsupported_payloadand HL7parse_errorruns, proving a second same-site actor can inspect both failed details and recent-run entries even whenimported_sitesis empty - The proxy demo overlay now defaults to an import-capable navigator identity so the built
/importsworkspace and the live proxy smoke path exercise the same capability class - The latest strict validation pass was rerun after widening the hosted pilot smoke workflow to include the attachment-backed FHIR success path and remains green
- GitHub Actions now runs
make validate-stricton pull requests, onmain, and through manual workflow dispatch using a checked-in workflow under.github/workflows/validate.yml - The smoke helper's FHIR demo path now uses an attachment-backed
DiagnosticReport.presentedFormXHTML narrative encoded as base64 UTF-16, so the existingpilot-*-fhir-smoketargets exercise the same supported text-like attachment decode path covered by API tests - A checked-in workflow under
.github/workflows/pilot-smoke.ymlnow reusesmake pilot-proxy-demo-smoke,make pilot-proxy-demo-fhir-smoke,make pilot-proxy-demo-site-rejection-smoke,make pilot-proxy-demo-adapter-site-rejection-smoke,make pilot-header-demo-smoke,make pilot-header-demo-fhir-smoke,make pilot-header-demo-site-rejection-smoke, andmake pilot-header-demo-adapter-site-rejection-smokeon manual dispatch plus a weekly Monday schedule; it remains intentionally narrower than the full manual visibility and failure-path overlay matrix - That hosted pilot workflow now also supports
workflow_dispatchwithsmoke_scope=fhir-success-onlyplus a manual-dispatchsmoke_scope=hl7-success-only, and every hosted matrix job uploads both itspilot-smoke.logoutput and generatedpilot-smoke-summary.jsonpluspilot-smoke-summary.mdartifacts, including duration, absolute smoke timestamps, and exit code, to preserve exact run context once hosted execution starts - A bundled evidence builder now exists as
make pilot-smoke-evidence SUMMARY_DIR=/path/to/downloaded/pilot-smoke-artifacts HL7_DECISION=pending|keep-manual|promote-default, producingpilot-smoke-evidence.jsonpluspilot-smoke-evidence.mdfrom downloaded hosted summary artifacts so release notes and handoff updates do not depend on manual copy-paste - GitHub-hosted
Pilot Smoke (fhir-success-only)run#23563902873passed on 2026-03-25 across proxy and header auth, with summary artifactspilot-smoke-summary-proxy-demo-fhir-smokeandpilot-smoke-summary-header-demo-fhir-smoke - GitHub-hosted
Pilot Smoke (hl7-success-only)run#23564057337passed on 2026-03-25 across proxy and header auth, with summary artifactspilot-smoke-summary-proxy-demo-hl7-smokeandpilot-smoke-summary-header-demo-hl7-smoke - Bundled hosted evidence now records
phase_complete: trueplushl7_hosting_mode: manual_only, with the explicit rationale that HL7 proved operable in GitHub Actions but still stays out of the default weekly matrix to limit recurring runtime and maintenance cost - Release-facing documentation now includes
CHANGELOG.mdanddocs/RELEASE_READINESS.md
Last known good live deployment check:
- The header-demo Docker stack was booted with the pilot overlay
make pilot-header-demo-smokepassed on 2026-03-21- The smoke path verified API readiness, web readiness,
/imports,/api/v1/auth/me, report import via/api/v1/imports/reports, persisted import-run audit detail for run41, visible cases, and a persisted reviewer-action round-trip make pilot-header-demo-fhir-smokepassed on 2026-03-20- The smoke path verified API readiness, web readiness,
/imports,/api/v1/auth/me, FHIR import via/api/v1/imports/fhir/diagnostic-reports, persisted import-run audit detail, visible cases, and a persisted reviewer-action round-trip make pilot-header-demo-hl7-smokepassed on 2026-03-20- The smoke path verified API readiness, web readiness,
/imports,/api/v1/auth/me, HL7 import via/api/v1/imports/hl7/oru, persisted import-run audit detail, visible cases, and a persisted reviewer-action round-trip - The proxy-demo Docker stack was booted with the pilot overlay
make pilot-proxy-demo-smokepassed on 2026-03-21- The smoke path verified API readiness, web readiness,
/imports,/api/v1/auth/me, report import via/api/v1/imports/reports, persisted import-run audit detail for run40, visible cases, and a persisted reviewer-action round-trip make pilot-proxy-demo-fhir-smokepassed on 2026-03-20- The smoke path verified API readiness, web readiness,
/imports,/api/v1/auth/me, FHIR import via/api/v1/imports/fhir/diagnostic-reports, persisted import-run audit detail, visible cases, and a persisted reviewer-action round-trip make pilot-proxy-demo-hl7-smokepassed on 2026-03-20- The smoke path verified API readiness, web readiness,
/imports,/api/v1/auth/me, HL7 import via/api/v1/imports/hl7/oru, persisted import-run audit detail, visible cases, and a persisted reviewer-action round-trip make pilot-proxy-demo-site-rejection-smokepassed on 2026-03-22- The smoke path verified API readiness, web readiness,
/imports,/api/v1/auth/me, a persistedsite_scope_rejectionrun via/api/v1/imports/reports, stable audit detail for run42, and zero run-specific visible cases make pilot-proxy-demo-adapter-site-rejection-smokepassed on 2026-03-22 local time- The smoke path verified API readiness, web readiness,
/imports,/api/v1/auth/me, persisted FHIR and HL7site_scope_rejectionruns via the structured import endpoints, stable audit detail for runs44and45, and zero run-specific visible cases make pilot-proxy-demo-adapter-audit-visibility-smokepassed on 2026-03-21- The smoke path verified API readiness, web readiness,
/imports,/api/v1/auth/me, persisted FHIR and HL7site_scope_rejectionruns via the structured import endpoints, owner visibility on both audit endpoints, alternate-actor404detail denial, alternate-actor omission from the recent-run list, and zero run-specific visible cases make pilot-proxy-demo-parse-validation-smokepassed on 2026-03-20- The smoke path verified API readiness, web readiness,
/imports,/api/v1/auth/me, persistedvalidation_errorandparse_errorruns via/api/v1/imports/reports, stable audit detail, and zero run-specific visible cases make pilot-proxy-demo-adapter-failure-smokepassed on 2026-03-20- The smoke path verified API readiness, web readiness,
/imports,/api/v1/auth/me, persisted FHIRunsupported_payloadand HL7parse_errorruns via the structured import endpoints, stable audit detail, and zero run-specific visible cases make pilot-proxy-demo-shared-visibility-smokepassed on 2026-03-21- The smoke path verified API readiness, web readiness,
/imports,/api/v1/auth/me, a successful/api/v1/imports/reportsrun, owner visibility on both audit endpoints, alternate-actor visibility on the same detail and recent-run list entry, visible run-specific cases, and a persisted reviewer-action round-trip make pilot-proxy-demo-adapter-shared-visibility-smokepassed on 2026-03-21- The smoke path verified API readiness, web readiness,
/imports,/api/v1/auth/me, successful FHIR plus HL7 structured imports, owner visibility on both audit endpoints, alternate-actor visibility on both run details and recent-run list entries, visible run-specific cases, and a persisted reviewer-action round-trip make pilot-proxy-demo-audit-visibility-smokepassed on 2026-03-20- The smoke path verified API readiness, web readiness,
/imports,/api/v1/auth/me, a persistedsite_scope_rejectionrun via/api/v1/imports/reports, owner visibility on both audit endpoints, alternate-actor404detail denial, alternate-actor omission from the recent-run list, and zero run-specific visible cases make pilot-proxy-demo-failed-shared-visibility-smokepassed on 2026-03-20- The smoke path verified API readiness, web readiness,
/imports,/api/v1/auth/me, a persistedvalidation_errorrun via/api/v1/imports/reports, owner visibility on both audit endpoints, alternate-actor allow behavior on the same failed detail and recent-run list entry, and zero run-specific visible cases make pilot-proxy-demo-adapter-failed-shared-visibility-smokepassed on 2026-03-20- The smoke path verified API readiness, web readiness,
/imports,/api/v1/auth/me, persisted FHIRunsupported_payloadplus HL7parse_errorruns via the structured import endpoints, owner visibility on both audit endpoints, alternate-actor allow behavior on both failed details and recent-run list entries, and zero run-specific visible cases make pilot-header-demo-site-rejection-smokepassed on 2026-03-22- The smoke path verified API readiness, web readiness,
/imports,/api/v1/auth/me, a persistedsite_scope_rejectionrun via/api/v1/imports/reports, stable audit detail for run43, and zero run-specific visible cases make pilot-header-demo-adapter-site-rejection-smokepassed on 2026-03-22 local time- The smoke path verified API readiness, web readiness,
/imports,/api/v1/auth/me, persisted FHIR and HL7site_scope_rejectionruns via the structured import endpoints, stable audit detail for runs46and47, and zero run-specific visible cases make pilot-header-demo-adapter-audit-visibility-smokepassed on 2026-03-21- The smoke path verified API readiness, web readiness,
/imports,/api/v1/auth/me, persisted FHIR and HL7site_scope_rejectionruns via the structured import endpoints, owner visibility on both audit endpoints, alternate-actor404detail denial, alternate-actor omission from the recent-run list, and zero run-specific visible cases make pilot-header-demo-parse-validation-smokepassed on 2026-03-20- The smoke path verified API readiness, web readiness,
/imports,/api/v1/auth/me, persistedvalidation_errorandparse_errorruns via/api/v1/imports/reports, stable audit detail, and zero run-specific visible cases make pilot-header-demo-adapter-failure-smokepassed on 2026-03-20- The smoke path verified API readiness, web readiness,
/imports,/api/v1/auth/me, persisted FHIRunsupported_payloadand HL7parse_errorruns via the structured import endpoints, stable audit detail, and zero run-specific visible cases make pilot-header-demo-shared-visibility-smokepassed on 2026-03-21- The smoke path verified API readiness, web readiness,
/imports,/api/v1/auth/me, a successful/api/v1/imports/reportsrun, owner visibility on both audit endpoints, alternate-actor visibility on the same detail and recent-run list entry, visible run-specific cases, and a persisted reviewer-action round-trip make pilot-header-demo-adapter-shared-visibility-smokepassed on 2026-03-21- The smoke path verified API readiness, web readiness,
/imports,/api/v1/auth/me, successful FHIR plus HL7 structured imports, owner visibility on both audit endpoints, alternate-actor visibility on both run details and recent-run list entries, visible run-specific cases, and a persisted reviewer-action round-trip make pilot-header-demo-audit-visibility-smokepassed on 2026-03-20- The smoke path verified API readiness, web readiness,
/imports,/api/v1/auth/me, a persistedsite_scope_rejectionrun via/api/v1/imports/reports, owner visibility on both audit endpoints, alternate-actor404detail denial, alternate-actor omission from the recent-run list, and zero run-specific visible cases make pilot-header-demo-failed-shared-visibility-smokepassed on 2026-03-20- The smoke path verified API readiness, web readiness,
/imports,/api/v1/auth/me, a persistedvalidation_errorrun via/api/v1/imports/reports, owner visibility on both audit endpoints, alternate-actor allow behavior on the same failed detail and recent-run list entry, and zero run-specific visible cases make pilot-header-demo-adapter-failed-shared-visibility-smokepassed on 2026-03-20- The smoke path verified API readiness, web readiness,
/imports,/api/v1/auth/me, persisted FHIRunsupported_payloadplus HL7parse_errorruns via the structured import endpoints, owner visibility on both audit endpoints, alternate-actor allow behavior on both failed details and recent-run list entries, and zero run-specific visible cases
Additional note from this slice:
make pilot-header-demo-uprequired host-level Docker access and completed successfully- The sandboxed
make pilot-header-demo-fhir-smokeandmake pilot-header-demo-hl7-smokewould have had the same localhost restriction as the existing smoke targets - Unsandboxed runs of both
make pilot-header-demo-fhir-smokeandmake pilot-header-demo-hl7-smokepassed end to end make pilot-header-demo-downcompleted successfully after the live verification runmake validate-strictpassed after adding the GitHub Actions validation workflow and the web lint check to the strict gate- The checked-in workflow at
.github/workflows/validate.ymlnow runsmake validate-stricton pull requests, onmain, and through manual workflow dispatch docker compose -f docker-compose.yml -f docker-compose.pilot.yml -f docker-compose.pilot.proxy-demo.yml configpasseddocker compose -f docker-compose.yml -f docker-compose.pilot.yml -f docker-compose.pilot.header-demo.yml configpassedmake pilot-proxy-demo-uprequired host-level Docker access and completed successfully- Unsandboxed runs of
make pilot-proxy-demo-smoke,make pilot-proxy-demo-fhir-smoke, andmake pilot-proxy-demo-hl7-smokeall passed end to end on 2026-03-20 make pilot-proxy-demo-downcompleted successfully after the live verification run- A new failure-path smoke mode now verifies persisted
site_scope_rejectionaudit detail plus zero run-specific visible cases without requiring a reviewer round-trip - Unsandboxed runs of
make pilot-proxy-demo-site-rejection-smokeandmake pilot-header-demo-site-rejection-smokeboth passed end to end on 2026-03-20 - A new malformed-report smoke mode now verifies persisted
validation_errorandparse_erroraudit detail plus zero run-specific visible cases without requiring a reviewer round-trip - Unsandboxed runs of
make pilot-proxy-demo-parse-validation-smokeandmake pilot-header-demo-parse-validation-smokeboth passed end to end on 2026-03-20 - A new adapter-failure smoke mode now verifies persisted FHIR
unsupported_payloadand HL7parse_erroraudit detail plus zero run-specific visible cases without requiring a reviewer round-trip - Unsandboxed runs of
make pilot-proxy-demo-adapter-failure-smokeandmake pilot-header-demo-adapter-failure-smokeboth passed end to end on 2026-03-20 - A new audit-visibility smoke mode now verifies owner access plus alternate-actor denial for a persisted
site_scope_rejectionrun across both import-run audit endpoints - Unsandboxed runs of
make pilot-proxy-demo-audit-visibility-smokeandmake pilot-header-demo-audit-visibility-smokeboth passed end to end on 2026-03-20 - A new shared-visibility smoke mode now verifies owner access plus alternate-actor allow behavior for a successful persisted
/api/v1/imports/reportsrun across both import-run audit endpoints - Unsandboxed runs of
make pilot-proxy-demo-shared-visibility-smokeandmake pilot-header-demo-shared-visibility-smokeboth passed end to end on 2026-03-21 - A new structured shared-visibility smoke mode now verifies owner access plus alternate-actor allow behavior for successful FHIR and HL7 imports across both import-run audit endpoints
- Unsandboxed runs of
make pilot-proxy-demo-adapter-shared-visibility-smokeandmake pilot-header-demo-adapter-shared-visibility-smokeboth passed end to end on 2026-03-21 - A new failed-run shared-visibility smoke mode now verifies owner access plus alternate-actor allow behavior for a persisted
/api/v1/imports/reportsvalidation_errorrun across both import-run audit endpoints - Unsandboxed runs of
make pilot-proxy-demo-failed-shared-visibility-smokeandmake pilot-header-demo-failed-shared-visibility-smokeboth passed end to end on 2026-03-20 local time, producing persisted run timestamps2026-03-21T04:23:26Zand2026-03-21T04:24:10Z - A new structured failed-run shared-visibility smoke mode now verifies owner access plus alternate-actor allow behavior for persisted FHIR
unsupported_payloadand HL7parse_errorruns across both import-run audit endpoints - Unsandboxed runs of
make pilot-proxy-demo-adapter-failed-shared-visibility-smokeandmake pilot-header-demo-adapter-failed-shared-visibility-smokeboth passed end to end on 2026-03-20 local time, producing persisted run timestamps2026-03-21T04:34:36Z,2026-03-21T04:34:36Z,2026-03-21T04:35:13Z, and2026-03-21T04:35:13Z - A new structured site-rejection smoke mode now verifies persisted FHIR and HL7
site_scope_rejectionaudit detail plus zero run-specific visible cases without requiring a reviewer round-trip - Unsandboxed runs of
make pilot-proxy-demo-adapter-site-rejection-smokeandmake pilot-header-demo-adapter-site-rejection-smokeboth passed end to end on 2026-03-20 local time, producing persisted run IDs31,32,33, and34with UTC timestamps2026-03-21T04:49:33Z,2026-03-21T04:49:33Z,2026-03-21T04:50:48Z, and2026-03-21T04:50:48Z - A new structured audit-visibility smoke mode now verifies owner access plus alternate-actor denial for persisted FHIR and HL7
site_scope_rejectionruns across both import-run audit endpoints - Unsandboxed runs of
make pilot-proxy-demo-adapter-audit-visibility-smokeandmake pilot-header-demo-adapter-audit-visibility-smokeboth passed end to end on 2026-03-20 local time, producing persisted run IDs36,37,38, and39with UTC timestamps2026-03-21T05:19:26Z,2026-03-21T05:19:26Z,2026-03-21T05:21:57Z, and2026-03-21T05:21:57Z - A new hosted pilot smoke workflow now reuses the existing base proxy and header Make targets on manual dispatch and a weekly schedule
- Unsandboxed reruns of
make pilot-proxy-demo-smokeandmake pilot-header-demo-smokeboth passed end to end on 2026-03-20 local time, producing persisted run IDs40and41with UTC timestamps2026-03-21T05:46:27Zand2026-03-21T05:47:33Z - The checked-in hosted workflow under
.github/workflows/pilot-smoke.ymlnow also reuses the report-path site-rejection targets, installs API dependencies, and runs those same overlay targets in GitHub Actions with recorded green hosted FHIR and HL7 success-path evidence on 2026-03-25
- Hosted FHIR confirmation:
#23563902873succeeded on 2026-03-25 with dispatch scopefhir-success-only, total job duration53seconds, max job duration27seconds, and summary artifactspilot-smoke-summary-header-demo-fhir-smokepluspilot-smoke-summary-proxy-demo-fhir-smoke - Hosted HL7 trial:
#23564057337succeeded on 2026-03-25 with dispatch scopehl7-success-only, total job duration52seconds, max job duration26seconds, and summary artifactspilot-smoke-summary-header-demo-hl7-smokepluspilot-smoke-summary-proxy-demo-hl7-smoke - Hosted evidence bundle output is now produced by
make pilot-smoke-evidence SUMMARY_DIR=/path/to/downloaded/pilot-smoke-artifacts HL7_DECISION=keep-manual, and the current copyable outcome is: hosted FHIR confirmed, hosted HL7 trial recorded, HL7 hosting modemanual_only, phase completeTrue - The compose-side root cause behind the earlier hosted failures was the bind-mounted web service losing image-installed
node_modules; the fix is the anonymous/app/node_modulesvolume now checked intodocker-compose.yml - Unsandboxed reruns of
make pilot-proxy-demo-site-rejection-smokeandmake pilot-header-demo-site-rejection-smokeboth passed end to end on 2026-03-22 local time, producing persisted run IDs42and43with UTC timestamps2026-03-23T05:05:47Zand2026-03-23T05:06:23Z
- API and web boot cleanly
- Demo data import works
- Deterministic triage persists
Case,Report,Finding, andReviewAction - Reviewer queue and case detail flow are live
- Evaluation, exports, benchmark scripts, and reproducible validation are in place
- Explainable PDAC trial matching is implemented
- Case detail shows structured abstractions and eligibility traces
- Hybrid explainable scoring is implemented
- Worklist supports hybrid review priority, active-learning priority, disagreement queueing, threshold evaluation, and reviewer feedback capture
- Structured request logging and readiness probes are implemented
- Trusted proxy auth is implemented with provider presets (
generic,authentik,keycloak,oauth2-proxy) - Role and site-scope enforcement are implemented in the API and reflected in the UI
- Pilot Docker overlays and smoke scripts are in place
- Research-safe de-identification views and redacted exports are implemented
- FHIR
DiagnosticReportand HL7 v2 ORU import adapters are implemented - Imported reports now preserve explicit provenance fields: patient identifier, encounter identifier, accession number, ordering provider, source system, source format, and import source identifier
- The metadata is persisted on reports, exposed in case detail and exports, and pseudonymized in research-safe views
- The generic CSV / JSON import path also accepts the same metadata surface
- Import endpoints now persist import-run summaries with actor, timestamps, processed / created / updated / failed counts, and stable failure buckets
- Analyst-facing audit routes are implemented at
/api/v1/imports/runsand/api/v1/imports/runs/{run_id} - Audit visibility respects auth plus site scope, while still allowing actors to review their own failed runs
- Selected FHIR and HL7 metadata precedence rules are now configurable through deployment settings rather than code edits
- The Next.js app now exposes
/importsfor CSV / JSON / JSONL uploads, FHIRDiagnosticReportsubmission, HL7 ORU submission, recent run summaries, and per-run failure details - Failed import responses now include
X-Import-Run-IDwhen an audit run was recorded, allowing the web workspace to load stable failure buckets immediately - Checked-in env bundles now exist for local mock development, the proxy demo overlay, and a new header-auth demo overlay
- A new
docker-compose.pilot.header-demo.ymloverlay now demonstrates site-scoped field-level header auth with a fixed navigator identity - The smoke helper and Make targets now support both proxy and header auth demos, and both pilot smoke paths verify that
/importsrenders - The header-demo smoke path now imports demo data through
/api/v1/imports/reportsand verifies the persisted run through/api/v1/imports/runs/{run_id} - The header-demo pilot path now also has live FHIR and HL7 smoke targets that verify structured adapter imports plus persisted audit detail end to end
- The header-demo pilot path now also has a live site-scope rejection smoke target that verifies a persisted failure bucket and zero run-specific visible cases
- The header-demo pilot path now also has a live malformed-report smoke target that verifies persisted
validation_errorplusparse_errorbuckets and zero run-specific visible cases - The header-demo pilot path now also has a live adapter-failure smoke target that verifies persisted FHIR
unsupported_payloadplus HL7parse_errorbuckets and zero run-specific visible cases - The header-demo pilot path now also has a live shared-visibility smoke target that verifies owner visibility plus alternate-actor allow behavior for a successful
/api/v1/imports/reportsrun - The header-demo pilot path now also has a live failed-run shared-visibility smoke target that verifies owner visibility plus alternate-actor allow behavior for a persisted
/api/v1/imports/reportsvalidation_errorrun - The header-demo pilot path now also has a live structured failed-run shared-visibility smoke target that verifies owner visibility plus alternate-actor allow behavior for persisted FHIR
unsupported_payloadand HL7parse_errorruns - The header-demo pilot path now also has a live structured shared-visibility smoke target that verifies owner visibility plus alternate-actor allow behavior for successful FHIR and HL7 imports
- The header-demo pilot path now also has a live audit-visibility smoke target that verifies owner visibility plus alternate-actor denial for a persisted
site_scope_rejectionrun - The proxy-demo overlay now uses a fixed site-scoped navigator identity too, and its smoke path imports demo data through
/api/v1/imports/reportswith persisted run verification - The proxy-demo pilot path now also has live FHIR and HL7 smoke targets that verify structured adapter imports plus persisted audit detail end to end
- The proxy-demo pilot path now also has a live site-scope rejection smoke target that verifies a persisted failure bucket and zero run-specific visible cases
- The proxy-demo pilot path now also has a live malformed-report smoke target that verifies persisted
validation_errorplusparse_errorbuckets and zero run-specific visible cases - The proxy-demo pilot path now also has a live adapter-failure smoke target that verifies persisted FHIR
unsupported_payloadplus HL7parse_errorbuckets and zero run-specific visible cases - The proxy-demo pilot path now also has a live shared-visibility smoke target that verifies owner visibility plus alternate-actor allow behavior for a successful
/api/v1/imports/reportsrun - The proxy-demo pilot path now also has a live failed-run shared-visibility smoke target that verifies owner visibility plus alternate-actor allow behavior for a persisted
/api/v1/imports/reportsvalidation_errorrun - The proxy-demo pilot path now also has a live structured failed-run shared-visibility smoke target that verifies owner visibility plus alternate-actor allow behavior for persisted FHIR
unsupported_payloadand HL7parse_errorruns - The proxy-demo pilot path now also has a live structured shared-visibility smoke target that verifies owner visibility plus alternate-actor allow behavior for successful FHIR and HL7 imports
- The proxy-demo pilot path now also has a live audit-visibility smoke target that verifies owner visibility plus alternate-actor denial for a persisted
site_scope_rejectionrun - GitHub Actions now also has a hosted pilot smoke workflow for the base proxy and header success-path overlays plus report-path and structured adapter site rejection in both auth modes
- Shared persisted records now exist for research sources, runs, run items, documents, evidence, topics, digests, and opportunities
- Seed catalogs now live under
data/research/for sources, topics, a lightweight pancreatic oncology graph, and seeded documents - Manual ingest, due-only ingest, schedule, and digest scripts now exist through
make research-intel-ingest,make research-intel-ingest-due,make research-intel-schedule,make research-intel-digest, andmake research-intel-refresh - The Next.js app now exposes
/research-intel,/research-intel/schedule,/research-intel/documents,/research-intel/digests, and/research-intel/opportunities - Case detail now links to a generated research brief that maps rationale or trial context to current research-intel topics and cited documents
- Promotion stays human-gated, triage scores remain untouched, and payment or crypto execution is still intentionally out of scope
- Preserve deterministic explainability. Do not replace the rule engine with opaque behavior.
- Keep all new ingestion paths mapped into the existing
ReportInputplustriage_report()flow unless there is a strong reason to introduce a new persistence boundary. - Keep auth, site scoping, and de-identification behavior intact for any new API surfaces.
- Do not regress reviewer usability in favor of backend purity. The worklist and case detail pages are central product surfaces.
- Do not claim clinical validation. This remains a research-first pilot stack.
apps/api/app/services/triage_engine.pyapps/api/app/services/hybrid_analysis.pyapps/api/app/services/evaluation.pyapps/api/app/store/memory_store.pyapps/api/app/models/entities.pyscripts/run_demo_eval.pyscripts/validate_repo.py
apps/web/app/page.tsxapps/web/app/about/page.tsxapps/web/app/proof/page.tsxapps/web/app/marketing.module.cssapps/web/lib/demo-proof.tsapps/api/app/schemas/benchmark_submission.pyapps/api/app/schemas/evaluation.pyapps/api/tests/test_external_evaluation.pyscripts/run_external_eval.pyscripts/write_demo_benchmark.pyscripts/validate_benchmark_submission.pydocs/QUICKSTART.mddocs/EVALUATION.mddocs/BENCHMARK_SUBMISSIONS.mddocs/LABELING_GUIDE.mddocs/examples/demo-benchmark-current.jsondocs/examples/demo-benchmark-current.mddocs/examples/benchmark-submission-template.jsondocs/examples/benchmark-label-template.jsonldocs/examples/benchmark-prediction-template.jsonl
apps/api/app/api/routes/cases.pyapps/api/app/api/routes/exports.pyapps/api/app/schemas/case.pyapps/api/app/schemas/feedback.pyapps/web/app/cases/page.tsxapps/web/app/cases/[caseId]/page.tsxapps/web/app/imports/page.tsxapps/web/app/imports/actions.tsapps/web/lib/api.ts
README.mdCHANGELOG.mdCONTRIBUTING.mdSECURITY.mdCODE_OF_CONDUCT.mddocs/RELEASE_READINESS.mdapps/api/app/main.pyapps/api/app/api/routes/health.pyapps/api/app/auth/dependencies.pyapps/api/app/core/config.pyapps/api/app/observability/logging.pydocker-compose.ymldocker-compose.pilot.ymldocker-compose.pilot.header-demo.ymldocker-compose.pilot.proxy-demo.ymldeploy/examples/README.mdscripts/smoke_proxy_auth.pyMakefile
apps/api/app/services/deidentification.pyapps/api/app/schemas/research.pyapps/api/app/api/routes/cases.pyapps/web/app/cases/[caseId]/research/page.tsx
apps/api/app/api/routes/imports.pyapps/api/app/services/imports.pyapps/api/app/services/fhir_imports.pyapps/api/app/services/hl7_imports.pyapps/api/app/db/session.pyapps/api/app/models/entities.pyapps/api/app/schemas/imports.pyapps/api/app/schemas/triage.pyapps/api/tests/test_imports.pyapps/api/tests/test_import_runs.py
Broaden from the checked-in retrospective-style multi-cohort sample to a larger de-identified or collaborator-supplied benchmark drop using the same reviewer-facing proof shape.
- Phase 6 exit criteria are now met: the hosted/manual smoke boundary is verified, the hosted evidence is recorded, and the release-facing docs reflect the real product state.
- The proof surface now carries benchmark buckets, queue previews, reviewer cues, expected rationale codes, and a 10-report casebook across 5 recurring categories, while the public external benchmark helper now emits the same shape for collaborator datasets and the repo now includes both a 12-case retrospective-style multi-cohort sample pack and a 9-case wording-variance challenge pack.
- More parser or smoke-baseline work would now deliver less value than bringing a broader real-world benchmark drop into that established structure.
Add one post-Phase-6 proof slice that:
- introduces a broader de-identified or collaborator-supplied benchmark input set using the same published casebook fields
- keeps the existing reviewer-facing proof shape while improving data realism, provenance, or collaborator handoff quality
- refreshes
/proof, release notes inputs, and benchmark-facing docs from that stronger evidence base - leaves the now-recorded hosted smoke baseline intact rather than reopening it unnecessarily
-
Start from the existing benchmark proof surfaces under
docs/examples/,/proof, and the external evaluation tooling rather than adding new infrastructure. -
Use the current expanded casebook fields, optional report excerpts, optional cohort labels, and checked-in retrospective sample as the compatibility target for any broader de-identified retrospective sample or externally supplied benchmark set.
-
Refresh the published proof artifacts and make sure the updated benchmark narrative still matches the current release-facing docs and the now-explicit hosted/manual smoke boundary.
- The next slice improves public proof quality rather than just re-stating existing smoke evidence.
- Any refreshed benchmark or reviewer-facing artifact is reproducible from checked-in commands.
make validate-strictstays green.- The hosted Phase 6 evidence recorded above remains accurate and unchanged unless a newer deliberate rerun supersedes it.
make validate-strict
sed -n '1,260p' docs/PHASES.md
sed -n '1,260p' docs/API_SPEC.md
sed -n '1,260p' docs/DEPLOYMENT.md
sed -n '1,260p' docs/OPEN_SOURCE_STRATEGY.md
sed -n '1,260p' README.md
sed -n '1,260p' docs/RELEASE_READINESS.md
sed -n '1,260p' apps/api/app/services/hl7_imports.py
sed -n '1,260p' apps/api/tests/test_imports.py
sed -n '1,260p' apps/api/tests/test_import_runs.py
sed -n '1,260p' scripts/smoke_proxy_auth.py
sed -n '1,220p' .github/workflows/pilot-smoke.yml
make benchmark-external LABELS=docs/examples/benchmark-label-template.jsonl PREDICTIONS=docs/examples/benchmark-prediction-template.jsonl
make benchmark-external-sample
make refresh-external-sample-proof
make validate-benchmark-submission SUBMISSION=docs/examples/benchmark-submission-template.jsonThis is a clean post-Phase-6 checkpoint. The repo is runnable, validated, and already beyond MVP scaffolding. Import metadata preservation, import-run audit trails, config overrides, the web import workspace, concrete proxy plus header-auth pilot packaging, automatic PR and main validation, checked-in hosted base plus attachment-backed FHIR success-path plus report-path and structured adapter site-rejection smoke automation, a sharper public landing experience, a checked-in benchmark proof surface, a machine-validated public benchmark submission pack, a reproducible external evaluation bundle writer, FHIR inline Reference.identifier fallback coverage, FHIR presentedForm attachment-backed narrative decoding, and HL7 ED, repeated-OBX-5, custom-MSH-2 delimiter, escape-sequence, plus subcomponent-aware metadata support are complete. The benchmark proof surface now also publishes dataset coverage, top-k queue previews, benchmark buckets, expected rationale cues, reviewer-facing casebook notes from an expanded 10-report demo snapshot that covers 5 recurring benchmark buckets, a side-by-side comparison of the published external packs, and anchored links into each checked-in external casebook; the repo also includes both a checked-in 12-case retrospective-style multi-cohort sample pack and a 9-case wording-variance challenge pack, and make validate-strict verifies the published external benchmark registry that feeds /proof. The next agent should focus on bringing in a broader de-identified or collaborator-supplied benchmark drop, or deeper bucket-level cross-pack analysis, rather than reopening hosted smoke capture or parser work that is already covered in tests.