Changelog

All notable project-level changes should be documented in this file.

This repository is still pre-release, but the goal is to keep the path to a research-first 1.0 understandable for maintainers, collaborators, and downstream evaluators.

Unreleased

0.10.0 - 2026-06-11

Fixed

Pilot Smoke CI: the web Docker image build failed with an npm ERESOLVE conflict because @playwright/test 1.48.2 was added to package.json without regenerating package-lock.json, so every scheduled hosted smoke run since 2026-04-25 failed at the overlay boot step. @playwright/test is now 1.60.0, the lockfile is regenerated in sync, and the web Dockerfile uses npm ci so lockfile drift fails loudly at build time instead of silently re-resolving.
GitHub Actions in all workflows bumped to Node 24-ready majors (checkout@v6, setup-python@v6, setup-node@v6, upload-artifact@v7, cache@v5) ahead of the 2026-06-16 forced Node 24 migration.

Added

Open-source readiness scaffolding: CITATION.cff, issue templates (bug, feature, use-case interest), PR template, dependabot config, product screenshots in the README, audience one-pagers under docs/outreach/, and a go-public checklist in docs/GO_PUBLIC_CHECKLIST.md. SECURITY.md and CODE_OF_CONDUCT.md now point at GitHub private vulnerability reporting instead of an unnamed private channel.
Pancreatic Signal v2 audit fixes:
- Sentence-bounded negation in apps/api/app/services/triage_engine.py (replaces ad-hoc 60-char regex windows so negation cannot leak across sentences)
- Persisted structured FindingRecord rows from the triage engine in addition to the flattened evidence summary
- Externalized additive scoring into a scoring block in data/ontologies/pancreatic_signal_rules.json backed by a new OntologyConfig schema (extra="forbid")
- Store implementation moved to apps/api/app/store/case_store.py with a thin re-export shim retained at memory_store.py
- docs/API_SPEC.md updated for CSV/JSON/JSONL imports and /settings/rules labelled future
- Playwright e2e smoke under apps/web/tests/e2e/ for /proof, /cases, case detail review, and /imports, wired into make web-e2e and CI
Pancreatic Signal v2 autoresearch lab subsystem inspired by karpathy/autoresearch:
- autoresearch/program.md — research org instructions, edit surface, guardrails, and proposal contract
- Frozen autoresearch/baseline/pancreatic_signal_rules.json used for diffing and rollback
- scripts/run_autoresearch_experiment.py — single-experiment driver: schema validation, demo eval (rules + hybrid), external sample snapshot lookup, determinism check, recall/threshold/regex-budget guardrails, structured eval.json and decision.json artifacts
- scripts/run_autoresearch_loop.py — orchestrator with a pluggable agent CLI, append-only run log under autoresearch/runs/, and human-gated promote and rollback actions
- make autoresearch-once, make autoresearch-loop, make autoresearch-promote, and make autoresearch-rollback targets
- Read-only API at /api/v1/autoresearch/runs, /api/v1/autoresearch/runs/{id}, and /api/v1/autoresearch/leaderboard, plus an admin-gated POST /api/v1/autoresearch/promote/{id} (see apps/api/app/api/routes/autoresearch.py)
- /autoresearch web surface with run history, leaderboard, diff/eval viewer, and capability-gated promote action
- docs/AUTORESEARCH.md design notes and operating guide
Repo-level contributor, security, and community docs in CONTRIBUTING.md, SECURITY.md, and CODE_OF_CONDUCT.md
A release-facing runbook in docs/RELEASE_RUNBOOK.md plus a release notes scaffold in docs/RELEASE_NOTES_TEMPLATE.md so maintainers can move from validation to hosted smoke evidence capture without private context
A clearer public project narrative in README.md and docs/OPEN_SOURCE_STRATEGY.md
A release-readiness checklist in docs/RELEASE_READINESS.md
An adoption-facing quickstart in docs/QUICKSTART.md plus a checked-in published benchmark snapshot in docs/examples/demo-benchmark-current.md
A public benchmark pack with docs/LABELING_GUIDE.md, docs/BENCHMARK_SUBMISSIONS.md, validated submission templates, and a benchmark submission validator
A comparable external evaluation bundle writer in scripts/run_external_eval.py, a make benchmark-external entrypoint, and a checked-in prediction template for outside collaborators
GitHub Actions pull request and main validation via .github/workflows/validate.yml using make validate-strict
A hosted pilot smoke workflow in .github/workflows/pilot-smoke.yml that reuses the base and FHIR proxy and header overlay smoke targets plus the report-path and structured adapter site-rejection variants on manual dispatch and a weekly schedule
A narrower manual-dispatch path in .github/workflows/pilot-smoke.yml for hosted attachment-backed FHIR confirmation, plus uploaded per-job smoke log artifacts for later run auditing
A structured hosted smoke summary generator in scripts/summarize_pilot_smoke.py plus workflow-uploaded JSON and Markdown summary artifacts for later handoff capture
A bundled hosted smoke evidence builder in scripts/build_pilot_smoke_evidence.py plus make pilot-smoke-evidence so downloaded hosted summary artifacts can be consolidated into release-ready JSON and Markdown evidence
A manual-only hl7-success-only dispatch path in .github/workflows/pilot-smoke.yml so hosted HL7 success-path trials can be recorded without widening the default weekly matrix
Live failed-run shared-visibility smoke coverage for a persisted non-site validation_error import run in the proxy and header pilot overlays
Live structured adapter failed-run shared-visibility smoke coverage for persisted non-site FHIR unsupported_payload and HL7 parse_error runs in the proxy and header pilot overlays
Live structured adapter site-scope rejection smoke coverage for persisted FHIR and HL7 site_scope_rejection runs in the proxy and header pilot overlays
Live structured adapter audit-visibility smoke coverage for persisted FHIR and HL7 site_scope_rejection runs in the proxy and header pilot overlays

Changed

The Next.js home and about surfaces now present Pancreatic Signal as a benchmarkable product entrypoint instead of a bare scaffold shell, and the web app now exposes a dedicated /proof page backed by the checked-in benchmark snapshots
Benchmark-oriented contribution paths now have a documented label schema, stable error-bucket rubric, and a machine-validated submission format for outside collaborators
Outside collaborators can now turn label and prediction JSONL files into JSON, Markdown, and submission-draft artifacts without hand-assembling benchmark metrics
FHIR DiagnosticReport imports now preserve patient, encounter, and accession metadata from inline Reference.identifier values when upstream bundles omit fully resolved resources
FHIR DiagnosticReport imports now decode supported text-like presentedForm attachments, including attachment-backed XHTML narratives, and keep unsectioned attachment findings merged with conclusion instead of dropping the conclusion text
The proxy and header FHIR pilot smoke fixtures now submit attachment-backed DiagnosticReport.presentedForm XHTML narratives, so deployable smoke coverage exercises the supported text-like attachment decode path instead of only an Observation-backed structured result
HL7 ORU imports now decode base64 ED report text, normalize repeated OBX-5 values, respect custom MSH-2 component and repetition separators, normalize common HL7 escape sequences, and clean composite metadata fields with subcomponent-aware extraction before triage
Top-level documentation now describes the implemented research platform instead of the earlier scaffold-era state
Deployment and API docs now call out cross-actor visibility for non-site failed import runs with empty imported_sites
make validate-strict now includes web lint alongside Python checks, API tests, evaluation checks, and the web build so local and hosted validation stay aligned
The validation and deployment docs now distinguish between hosted base plus report-path and structured adapter site-rejection automation and the broader manual overlay smoke matrix
Hosted pilot smoke jobs now generate machine-readable and Markdown summary artifacts from pilot-smoke.log, so the first green attachment-backed FHIR run can be recorded without manual log scraping
Hosted pilot smoke summaries now also preserve absolute smoke start and finish timestamps, and downloaded artifact summaries can be consolidated into a single hosted evidence bundle for release and handoff capture
Hosted pilot smoke summaries now include smoke duration and exit code, and the recorded 2026-03-25 hosted HL7 trial keeps HL7 success-path coverage manual-only in the default matrix by explicit decision
The bind-mounted web service in docker-compose.yml now preserves /app/node_modules, so local and GitHub-hosted pilot stacks retain the Next.js CLI installed during image build
The published demo benchmark snapshot now includes dataset coverage, top-k queue previews, benchmark buckets, expected rationale cues, and reviewer-facing casebook notes instead of only aggregate metric tables
The checked-in demo benchmark corpus now covers 10 labeled reports across 5 recurring benchmark buckets, so /proof and the published artifacts show wording variance, pancreatitis confounders, secondary-sign clusters, and follow-up-only cyst surveillance cases instead of only a minimal five-case set
The public external benchmark starter pack now carries the same reviewer-facing fields as the demo casebook, and scripts/run_external_eval.py now emits dataset coverage, top-k queue previews, and a per-case reviewer casebook for collaborator-supplied labels plus predictions
The repo now includes a checked-in deidentified retrospective-style external benchmark sample with reproducible JSON, Markdown, and submission artifacts, optional report_excerpt plus cohort support, and an expanded 12-report multi-cohort casebook for less-synthetic external review
The public /proof surface now renders that checked-in retrospective-style sample alongside the demo comparison, including cohort coverage and cohort-aware queue labels, so outside collaborators can inspect both the synthetic casebook and the less-synthetic external sample in one reproducible web view
FHIR DiagnosticReport imports now merge multiple supported presentedForm attachments in order and suppress shorter overlapping fragments when a richer narrative attachment already contains them
FHIR DiagnosticReport imports now also decode supported presentedForm.url attachments when they resolve to bundled or contained FHIR Binary resources
FHIR DiagnosticReport imports now expand referenced Observation.component findings into report text so structured component-level pancreatic findings are preserved for triage
FHIR DiagnosticReport imports now also expand grouped referenced Observation.hasMember findings into report text and fall back to conclusionCode text when a report omits free-text conclusion
FHIR DiagnosticReport imports now also preserve referenced Observation.interpretation and referenceRange context for reviewer-visible pancreatic measurements, while avoiding duplicate or cyclic grouped-member expansion

Validated

make validate-strict passed on 2026-03-25 with 9 pass, 0 warn, 0 fail
API validation reported 139 passed on 2026-03-25
apps/api/.venv/bin/python -m pytest apps/api/tests/test_imports.py -q passed on 2026-03-25 with 34 passed, including split, overlapping, bundled or contained Binary-backed FHIR presentedForm, grouped and cycle-safe Observation.hasMember, conclusionCode, measurement interpretation plus referenceRange, and Observation.component coverage
apps/api/.venv/bin/python -m pytest apps/api/tests/test_summarize_pilot_smoke.py apps/api/tests/test_smoke_proxy_auth.py -q passed on 2026-03-25 with 23 passed, covering the hosted smoke summary parser plus the attachment-backed FHIR smoke fixture shape
apps/api/.venv/bin/python -m pytest apps/api/tests/test_summarize_pilot_smoke.py apps/api/tests/test_build_pilot_smoke_evidence.py -q passed on 2026-03-25, covering timestamped hosted smoke summaries plus bundled evidence generation from downloaded hosted artifacts
apps/api/.venv/bin/python -m pytest apps/api/tests/test_smoke_proxy_auth.py -q passed on 2026-03-25 with 21 passed, covering the attachment-backed FHIR smoke fixture shape
apps/api/.venv/bin/python -m pytest apps/api/tests/test_demo_benchmark_snapshot.py apps/api/tests/test_exports_metrics.py -q passed on 2026-03-25 with 11 passed, covering the casebook-backed demo benchmark snapshot plus richer evaluation-case payload fields
apps/api/.venv/bin/python -m pytest apps/api/tests/test_external_evaluation.py -q passed on 2026-03-25 with 3 passed, covering the richer external benchmark template plus casebook-shaped external bundle outputs
apps/api/.venv/bin/python -m pytest apps/api/tests/test_external_evaluation.py apps/api/tests/test_benchmark_submission.py -q passed on 2026-03-25 with 9 passed, covering the multi-cohort retrospective-style sample pack, optional report excerpts plus cohorts, and the richer external benchmark template plus casebook-shaped bundle outputs
make benchmark-external-sample and make refresh-external-sample-proof both wrote reproducible multi-cohort retrospective-style benchmark artifacts on 2026-03-25
Unsandboxed local reruns of make pilot-proxy-demo-fhir-smoke and make pilot-header-demo-fhir-smoke passed end to end on 2026-03-25, recording persisted import runs 50 and 51 plus visible-case and reviewer round-trip verification
GitHub-hosted Pilot Smoke (fhir-success-only) run #23563902873 passed on 2026-03-25 with summary artifacts pilot-smoke-summary-header-demo-fhir-smoke and pilot-smoke-summary-proxy-demo-fhir-smoke, both showing one visible case and a reviewer round-trip
GitHub-hosted Pilot Smoke (hl7-success-only) run #23564057337 passed on 2026-03-25 with summary artifacts pilot-smoke-summary-header-demo-hl7-smoke and pilot-smoke-summary-proxy-demo-hl7-smoke, both showing one visible case and a reviewer round-trip
make pilot-smoke-evidence SUMMARY_DIR=... HL7_DECISION=keep-manual produced a complete hosted evidence bundle on 2026-03-25 and recorded the explicit decision to keep HL7 manual-only in the default hosted matrix to limit recurring runtime and maintenance cost
make refresh-demo-proof refreshed the checked-in casebook-backed benchmark proof snapshot in docs/examples/ on 2026-03-25 local time, expanding the published corpus to 10 labeled reports across 5 benchmark buckets with hybrid-only rescues for follow-up-only cases C-005 and C-008
make benchmark-external LABELS=docs/examples/benchmark-label-template.jsonl PREDICTIONS=docs/examples/benchmark-prediction-template.jsonl OUT_DIR=/tmp/pancreatic-signal-external-eval BASENAME=template-external TOP_K=2 wrote JSON, Markdown, and submission-draft artifacts on 2026-03-24
make validate-benchmark-submission SUBMISSION=/tmp/pancreatic-signal-external-eval/template-external-submission.json passed on 2026-03-24
apps/api/.venv/bin/python -m pytest apps/api/tests/test_imports.py apps/api/tests/test_import_runs.py -q passed on 2026-03-24 with 46 passed, including inline Reference.identifier FHIR coverage, attachment-backed presentedForm decoding and audit coverage, plus custom MSH-2 HL7 delimiter, escape-sequence, subcomponent, and audit-visibility coverage
Local reruns of make pilot-proxy-demo-smoke and make pilot-header-demo-smoke passed on 2026-03-20 local time with persisted run IDs 40 and 41, matching the checked-in hosted workflow targets
Local reruns of make pilot-proxy-demo-site-rejection-smoke and make pilot-header-demo-site-rejection-smoke passed on 2026-03-22 local time with persisted run IDs 42 and 43, matching the newly hosted failure-path targets
Local reruns of make pilot-proxy-demo-adapter-site-rejection-smoke and make pilot-header-demo-adapter-site-rejection-smoke passed on 2026-03-22 local time with persisted run IDs 44, 45, 46, and 47, matching the newly hosted structured failure-path targets
Live structured adapter site-scope rejection and audit-visibility smoke runs passed for both trusted-proxy and header-auth overlays and are recorded in docs/CODEX_HANDOFF.md

0.9.0-preview - 2026-03-20

This preview marker captures the repo state before a formal 1.0 release process begins.

Added

Deterministic pancreatic triage with evidence spans, rationale codes, persistence, exports, and reproducible evaluation
Reviewer workflow with worklist filters, case detail, review actions, hybrid prioritization, feedback capture, and research-safe views
FHIR DiagnosticReport, HL7 ORU, and generic report import paths with structured provenance metadata
Import-run audit summaries and detail routes with stable failure buckets and visibility controls
Env-driven field-preference overrides for selected FHIR and HL7 metadata extraction
Dedicated /imports web workspace for file uploads, structured submissions, and audit inspection
Pilot-ready Docker overlays for trusted-proxy and header-auth demos
Live smoke coverage for report, FHIR, and HL7 import success paths
Live smoke coverage for site-scope rejection, parse or validation failure, adapter failure, audit denial, shared visibility, and structured shared visibility

Changed

The proxy pilot demo now defaults to an import-capable navigator identity so /imports and pilot smoke paths exercise the same capability class
Top-level handoff documentation now reflects late Phase 6 hardening rather than early MVP bootstrap work

Validated

make validate-strict passed on 2026-03-20 with 8 pass, 0 warn, 0 fail
API validation reported 90 passed
Live pilot smoke runs passed for both trusted-proxy and header-auth overlays across the documented success and failure paths recorded in docs/CODEX_HANDOFF.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changelog

Unreleased

0.10.0 - 2026-06-11

Fixed

Added

Changed

Validated

0.9.0-preview - 2026-03-20

Added

Changed

Validated

FilesExpand file tree

CHANGELOG.md

Latest commit

History

CHANGELOG.md

File metadata and controls

Changelog

Unreleased

0.10.0 - 2026-06-11

Fixed

Added

Changed

Validated

0.9.0-preview - 2026-03-20

Added

Changed

Validated