All notable project-level changes should be documented in this file.
This repository is still pre-release, but the goal is to keep the path to a research-first 1.0 understandable for maintainers, collaborators, and downstream evaluators.
- Pilot Smoke CI: the web Docker image build failed with an npm
ERESOLVEconflict because@playwright/test 1.48.2was added topackage.jsonwithout regeneratingpackage-lock.json, so every scheduled hosted smoke run since 2026-04-25 failed at the overlay boot step.@playwright/testis now1.60.0, the lockfile is regenerated in sync, and the web Dockerfile usesnpm ciso lockfile drift fails loudly at build time instead of silently re-resolving. - GitHub Actions in all workflows bumped to Node 24-ready majors (
checkout@v6,setup-python@v6,setup-node@v6,upload-artifact@v7,cache@v5) ahead of the 2026-06-16 forced Node 24 migration.
- Open-source readiness scaffolding:
CITATION.cff, issue templates (bug, feature, use-case interest), PR template, dependabot config, product screenshots in the README, audience one-pagers underdocs/outreach/, and a go-public checklist indocs/GO_PUBLIC_CHECKLIST.md.SECURITY.mdandCODE_OF_CONDUCT.mdnow point at GitHub private vulnerability reporting instead of an unnamed private channel. - Pancreatic Signal v2 audit fixes:
- Sentence-bounded negation in
apps/api/app/services/triage_engine.py(replaces ad-hoc 60-char regex windows so negation cannot leak across sentences) - Persisted structured
FindingRecordrows from the triage engine in addition to the flattened evidence summary - Externalized additive scoring into a
scoringblock indata/ontologies/pancreatic_signal_rules.jsonbacked by a newOntologyConfigschema (extra="forbid") - Store implementation moved to
apps/api/app/store/case_store.pywith a thin re-export shim retained atmemory_store.py docs/API_SPEC.mdupdated for CSV/JSON/JSONL imports and/settings/ruleslabelled future- Playwright e2e smoke under
apps/web/tests/e2e/for/proof,/cases, case detail review, and/imports, wired intomake web-e2eand CI
- Sentence-bounded negation in
- Pancreatic Signal v2 autoresearch lab subsystem inspired by karpathy/autoresearch:
autoresearch/program.md— research org instructions, edit surface, guardrails, and proposal contract- Frozen
autoresearch/baseline/pancreatic_signal_rules.jsonused for diffing and rollback scripts/run_autoresearch_experiment.py— single-experiment driver: schema validation, demo eval (rules + hybrid), external sample snapshot lookup, determinism check, recall/threshold/regex-budget guardrails, structuredeval.jsonanddecision.jsonartifactsscripts/run_autoresearch_loop.py— orchestrator with a pluggable agent CLI, append-only run log underautoresearch/runs/, and human-gatedpromoteandrollbackactionsmake autoresearch-once,make autoresearch-loop,make autoresearch-promote, andmake autoresearch-rollbacktargets- Read-only API at
/api/v1/autoresearch/runs,/api/v1/autoresearch/runs/{id}, and/api/v1/autoresearch/leaderboard, plus an admin-gatedPOST /api/v1/autoresearch/promote/{id}(seeapps/api/app/api/routes/autoresearch.py) /autoresearchweb surface with run history, leaderboard, diff/eval viewer, and capability-gated promote actiondocs/AUTORESEARCH.mddesign notes and operating guide
- Repo-level contributor, security, and community docs in CONTRIBUTING.md, SECURITY.md, and CODE_OF_CONDUCT.md
- A release-facing runbook in docs/RELEASE_RUNBOOK.md plus a release notes scaffold in docs/RELEASE_NOTES_TEMPLATE.md so maintainers can move from validation to hosted smoke evidence capture without private context
- A clearer public project narrative in README.md and docs/OPEN_SOURCE_STRATEGY.md
- A release-readiness checklist in docs/RELEASE_READINESS.md
- An adoption-facing quickstart in docs/QUICKSTART.md plus a checked-in published benchmark snapshot in
docs/examples/demo-benchmark-current.md - A public benchmark pack with docs/LABELING_GUIDE.md, docs/BENCHMARK_SUBMISSIONS.md, validated submission templates, and a benchmark submission validator
- A comparable external evaluation bundle writer in
scripts/run_external_eval.py, amake benchmark-externalentrypoint, and a checked-in prediction template for outside collaborators - GitHub Actions pull request and
mainvalidation via.github/workflows/validate.ymlusingmake validate-strict - A hosted pilot smoke workflow in
.github/workflows/pilot-smoke.ymlthat reuses the base and FHIR proxy and header overlay smoke targets plus the report-path and structured adapter site-rejection variants on manual dispatch and a weekly schedule - A narrower manual-dispatch path in
.github/workflows/pilot-smoke.ymlfor hosted attachment-backed FHIR confirmation, plus uploaded per-job smoke log artifacts for later run auditing - A structured hosted smoke summary generator in
scripts/summarize_pilot_smoke.pyplus workflow-uploaded JSON and Markdown summary artifacts for later handoff capture - A bundled hosted smoke evidence builder in
scripts/build_pilot_smoke_evidence.pyplusmake pilot-smoke-evidenceso downloaded hosted summary artifacts can be consolidated into release-ready JSON and Markdown evidence - A manual-only
hl7-success-onlydispatch path in.github/workflows/pilot-smoke.ymlso hosted HL7 success-path trials can be recorded without widening the default weekly matrix - Live failed-run shared-visibility smoke coverage for a persisted non-site
validation_errorimport run in the proxy and header pilot overlays - Live structured adapter failed-run shared-visibility smoke coverage for persisted non-site FHIR
unsupported_payloadand HL7parse_errorruns in the proxy and header pilot overlays - Live structured adapter site-scope rejection smoke coverage for persisted FHIR and HL7
site_scope_rejectionruns in the proxy and header pilot overlays - Live structured adapter audit-visibility smoke coverage for persisted FHIR and HL7
site_scope_rejectionruns in the proxy and header pilot overlays
- The Next.js home and about surfaces now present Pancreatic Signal as a benchmarkable product entrypoint instead of a bare scaffold shell, and the web app now exposes a dedicated
/proofpage backed by the checked-in benchmark snapshots - Benchmark-oriented contribution paths now have a documented label schema, stable error-bucket rubric, and a machine-validated submission format for outside collaborators
- Outside collaborators can now turn label and prediction JSONL files into JSON, Markdown, and submission-draft artifacts without hand-assembling benchmark metrics
- FHIR
DiagnosticReportimports now preserve patient, encounter, and accession metadata from inlineReference.identifiervalues when upstream bundles omit fully resolved resources - FHIR
DiagnosticReportimports now decode supported text-likepresentedFormattachments, including attachment-backed XHTML narratives, and keep unsectioned attachment findings merged withconclusioninstead of dropping the conclusion text - The proxy and header FHIR pilot smoke fixtures now submit attachment-backed
DiagnosticReport.presentedFormXHTML narratives, so deployable smoke coverage exercises the supported text-like attachment decode path instead of only anObservation-backed structured result - HL7 ORU imports now decode base64
EDreport text, normalize repeatedOBX-5values, respect customMSH-2component and repetition separators, normalize common HL7 escape sequences, and clean composite metadata fields with subcomponent-aware extraction before triage - Top-level documentation now describes the implemented research platform instead of the earlier scaffold-era state
- Deployment and API docs now call out cross-actor visibility for non-site failed import runs with empty
imported_sites make validate-strictnow includes web lint alongside Python checks, API tests, evaluation checks, and the web build so local and hosted validation stay aligned- The validation and deployment docs now distinguish between hosted base plus report-path and structured adapter site-rejection automation and the broader manual overlay smoke matrix
- Hosted pilot smoke jobs now generate machine-readable and Markdown summary artifacts from
pilot-smoke.log, so the first green attachment-backed FHIR run can be recorded without manual log scraping - Hosted pilot smoke summaries now also preserve absolute smoke start and finish timestamps, and downloaded artifact summaries can be consolidated into a single hosted evidence bundle for release and handoff capture
- Hosted pilot smoke summaries now include smoke duration and exit code, and the recorded 2026-03-25 hosted HL7 trial keeps HL7 success-path coverage manual-only in the default matrix by explicit decision
- The bind-mounted web service in
docker-compose.ymlnow preserves/app/node_modules, so local and GitHub-hosted pilot stacks retain the Next.js CLI installed during image build - The published demo benchmark snapshot now includes dataset coverage, top-k queue previews, benchmark buckets, expected rationale cues, and reviewer-facing casebook notes instead of only aggregate metric tables
- The checked-in demo benchmark corpus now covers 10 labeled reports across 5 recurring benchmark buckets, so
/proofand the published artifacts show wording variance, pancreatitis confounders, secondary-sign clusters, and follow-up-only cyst surveillance cases instead of only a minimal five-case set - The public external benchmark starter pack now carries the same reviewer-facing fields as the demo casebook, and
scripts/run_external_eval.pynow emits dataset coverage, top-k queue previews, and a per-case reviewer casebook for collaborator-supplied labels plus predictions - The repo now includes a checked-in deidentified retrospective-style external benchmark sample with reproducible JSON, Markdown, and submission artifacts, optional
report_excerptpluscohortsupport, and an expanded 12-report multi-cohort casebook for less-synthetic external review - The public
/proofsurface now renders that checked-in retrospective-style sample alongside the demo comparison, including cohort coverage and cohort-aware queue labels, so outside collaborators can inspect both the synthetic casebook and the less-synthetic external sample in one reproducible web view - FHIR
DiagnosticReportimports now merge multiple supportedpresentedFormattachments in order and suppress shorter overlapping fragments when a richer narrative attachment already contains them - FHIR
DiagnosticReportimports now also decode supportedpresentedForm.urlattachments when they resolve to bundled or contained FHIRBinaryresources - FHIR
DiagnosticReportimports now expand referencedObservation.componentfindings into report text so structured component-level pancreatic findings are preserved for triage - FHIR
DiagnosticReportimports now also expand grouped referencedObservation.hasMemberfindings into report text and fall back toconclusionCodetext when a report omits free-textconclusion - FHIR
DiagnosticReportimports now also preserve referencedObservation.interpretationandreferenceRangecontext for reviewer-visible pancreatic measurements, while avoiding duplicate or cyclic grouped-member expansion
make validate-strictpassed on 2026-03-25 with9 pass, 0 warn, 0 fail- API validation reported
139 passedon 2026-03-25 apps/api/.venv/bin/python -m pytest apps/api/tests/test_imports.py -qpassed on 2026-03-25 with34 passed, including split, overlapping, bundled or containedBinary-backed FHIRpresentedForm, grouped and cycle-safeObservation.hasMember,conclusionCode, measurementinterpretationplusreferenceRange, andObservation.componentcoverageapps/api/.venv/bin/python -m pytest apps/api/tests/test_summarize_pilot_smoke.py apps/api/tests/test_smoke_proxy_auth.py -qpassed on 2026-03-25 with23 passed, covering the hosted smoke summary parser plus the attachment-backed FHIR smoke fixture shapeapps/api/.venv/bin/python -m pytest apps/api/tests/test_summarize_pilot_smoke.py apps/api/tests/test_build_pilot_smoke_evidence.py -qpassed on 2026-03-25, covering timestamped hosted smoke summaries plus bundled evidence generation from downloaded hosted artifactsapps/api/.venv/bin/python -m pytest apps/api/tests/test_smoke_proxy_auth.py -qpassed on 2026-03-25 with21 passed, covering the attachment-backed FHIR smoke fixture shapeapps/api/.venv/bin/python -m pytest apps/api/tests/test_demo_benchmark_snapshot.py apps/api/tests/test_exports_metrics.py -qpassed on 2026-03-25 with11 passed, covering the casebook-backed demo benchmark snapshot plus richer evaluation-case payload fieldsapps/api/.venv/bin/python -m pytest apps/api/tests/test_external_evaluation.py -qpassed on 2026-03-25 with3 passed, covering the richer external benchmark template plus casebook-shaped external bundle outputsapps/api/.venv/bin/python -m pytest apps/api/tests/test_external_evaluation.py apps/api/tests/test_benchmark_submission.py -qpassed on 2026-03-25 with9 passed, covering the multi-cohort retrospective-style sample pack, optional report excerpts plus cohorts, and the richer external benchmark template plus casebook-shaped bundle outputsmake benchmark-external-sampleandmake refresh-external-sample-proofboth wrote reproducible multi-cohort retrospective-style benchmark artifacts on 2026-03-25- Unsandboxed local reruns of
make pilot-proxy-demo-fhir-smokeandmake pilot-header-demo-fhir-smokepassed end to end on 2026-03-25, recording persisted import runs50and51plus visible-case and reviewer round-trip verification - GitHub-hosted
Pilot Smoke (fhir-success-only)run#23563902873passed on 2026-03-25 with summary artifactspilot-smoke-summary-header-demo-fhir-smokeandpilot-smoke-summary-proxy-demo-fhir-smoke, both showing one visible case and a reviewer round-trip - GitHub-hosted
Pilot Smoke (hl7-success-only)run#23564057337passed on 2026-03-25 with summary artifactspilot-smoke-summary-header-demo-hl7-smokeandpilot-smoke-summary-proxy-demo-hl7-smoke, both showing one visible case and a reviewer round-trip make pilot-smoke-evidence SUMMARY_DIR=... HL7_DECISION=keep-manualproduced a complete hosted evidence bundle on 2026-03-25 and recorded the explicit decision to keep HL7 manual-only in the default hosted matrix to limit recurring runtime and maintenance costmake refresh-demo-proofrefreshed the checked-in casebook-backed benchmark proof snapshot indocs/examples/on 2026-03-25 local time, expanding the published corpus to 10 labeled reports across 5 benchmark buckets with hybrid-only rescues for follow-up-only casesC-005andC-008make benchmark-external LABELS=docs/examples/benchmark-label-template.jsonl PREDICTIONS=docs/examples/benchmark-prediction-template.jsonl OUT_DIR=/tmp/pancreatic-signal-external-eval BASENAME=template-external TOP_K=2wrote JSON, Markdown, and submission-draft artifacts on 2026-03-24make validate-benchmark-submission SUBMISSION=/tmp/pancreatic-signal-external-eval/template-external-submission.jsonpassed on 2026-03-24apps/api/.venv/bin/python -m pytest apps/api/tests/test_imports.py apps/api/tests/test_import_runs.py -qpassed on 2026-03-24 with46 passed, including inlineReference.identifierFHIR coverage, attachment-backedpresentedFormdecoding and audit coverage, plus customMSH-2HL7 delimiter, escape-sequence, subcomponent, and audit-visibility coverage- Local reruns of
make pilot-proxy-demo-smokeandmake pilot-header-demo-smokepassed on 2026-03-20 local time with persisted run IDs40and41, matching the checked-in hosted workflow targets - Local reruns of
make pilot-proxy-demo-site-rejection-smokeandmake pilot-header-demo-site-rejection-smokepassed on 2026-03-22 local time with persisted run IDs42and43, matching the newly hosted failure-path targets - Local reruns of
make pilot-proxy-demo-adapter-site-rejection-smokeandmake pilot-header-demo-adapter-site-rejection-smokepassed on 2026-03-22 local time with persisted run IDs44,45,46, and47, matching the newly hosted structured failure-path targets - Live structured adapter site-scope rejection and audit-visibility smoke runs passed for both trusted-proxy and header-auth overlays and are recorded in docs/CODEX_HANDOFF.md
This preview marker captures the repo state before a formal 1.0 release process begins.
- Deterministic pancreatic triage with evidence spans, rationale codes, persistence, exports, and reproducible evaluation
- Reviewer workflow with worklist filters, case detail, review actions, hybrid prioritization, feedback capture, and research-safe views
- FHIR
DiagnosticReport, HL7 ORU, and generic report import paths with structured provenance metadata - Import-run audit summaries and detail routes with stable failure buckets and visibility controls
- Env-driven field-preference overrides for selected FHIR and HL7 metadata extraction
- Dedicated
/importsweb workspace for file uploads, structured submissions, and audit inspection - Pilot-ready Docker overlays for trusted-proxy and header-auth demos
- Live smoke coverage for report, FHIR, and HL7 import success paths
- Live smoke coverage for site-scope rejection, parse or validation failure, adapter failure, audit denial, shared visibility, and structured shared visibility
- The proxy pilot demo now defaults to an import-capable navigator identity so
/importsand pilot smoke paths exercise the same capability class - Top-level handoff documentation now reflects late Phase 6 hardening rather than early MVP bootstrap work
make validate-strictpassed on 2026-03-20 with8 pass, 0 warn, 0 fail- API validation reported
90 passed - Live pilot smoke runs passed for both trusted-proxy and header-auth overlays across the documented success and failure paths recorded in docs/CODEX_HANDOFF.md