Skip to content

Latest commit

 

History

History
131 lines (111 loc) · 19.2 KB

File metadata and controls

131 lines (111 loc) · 19.2 KB

Changelog

All notable project-level changes should be documented in this file.

This repository is still pre-release, but the goal is to keep the path to a research-first 1.0 understandable for maintainers, collaborators, and downstream evaluators.

Unreleased

0.10.0 - 2026-06-11

Fixed

  • Pilot Smoke CI: the web Docker image build failed with an npm ERESOLVE conflict because @playwright/test 1.48.2 was added to package.json without regenerating package-lock.json, so every scheduled hosted smoke run since 2026-04-25 failed at the overlay boot step. @playwright/test is now 1.60.0, the lockfile is regenerated in sync, and the web Dockerfile uses npm ci so lockfile drift fails loudly at build time instead of silently re-resolving.
  • GitHub Actions in all workflows bumped to Node 24-ready majors (checkout@v6, setup-python@v6, setup-node@v6, upload-artifact@v7, cache@v5) ahead of the 2026-06-16 forced Node 24 migration.

Added

  • Open-source readiness scaffolding: CITATION.cff, issue templates (bug, feature, use-case interest), PR template, dependabot config, product screenshots in the README, audience one-pagers under docs/outreach/, and a go-public checklist in docs/GO_PUBLIC_CHECKLIST.md. SECURITY.md and CODE_OF_CONDUCT.md now point at GitHub private vulnerability reporting instead of an unnamed private channel.
  • Pancreatic Signal v2 audit fixes:
    • Sentence-bounded negation in apps/api/app/services/triage_engine.py (replaces ad-hoc 60-char regex windows so negation cannot leak across sentences)
    • Persisted structured FindingRecord rows from the triage engine in addition to the flattened evidence summary
    • Externalized additive scoring into a scoring block in data/ontologies/pancreatic_signal_rules.json backed by a new OntologyConfig schema (extra="forbid")
    • Store implementation moved to apps/api/app/store/case_store.py with a thin re-export shim retained at memory_store.py
    • docs/API_SPEC.md updated for CSV/JSON/JSONL imports and /settings/rules labelled future
    • Playwright e2e smoke under apps/web/tests/e2e/ for /proof, /cases, case detail review, and /imports, wired into make web-e2e and CI
  • Pancreatic Signal v2 autoresearch lab subsystem inspired by karpathy/autoresearch:
  • Repo-level contributor, security, and community docs in CONTRIBUTING.md, SECURITY.md, and CODE_OF_CONDUCT.md
  • A release-facing runbook in docs/RELEASE_RUNBOOK.md plus a release notes scaffold in docs/RELEASE_NOTES_TEMPLATE.md so maintainers can move from validation to hosted smoke evidence capture without private context
  • A clearer public project narrative in README.md and docs/OPEN_SOURCE_STRATEGY.md
  • A release-readiness checklist in docs/RELEASE_READINESS.md
  • An adoption-facing quickstart in docs/QUICKSTART.md plus a checked-in published benchmark snapshot in docs/examples/demo-benchmark-current.md
  • A public benchmark pack with docs/LABELING_GUIDE.md, docs/BENCHMARK_SUBMISSIONS.md, validated submission templates, and a benchmark submission validator
  • A comparable external evaluation bundle writer in scripts/run_external_eval.py, a make benchmark-external entrypoint, and a checked-in prediction template for outside collaborators
  • GitHub Actions pull request and main validation via .github/workflows/validate.yml using make validate-strict
  • A hosted pilot smoke workflow in .github/workflows/pilot-smoke.yml that reuses the base and FHIR proxy and header overlay smoke targets plus the report-path and structured adapter site-rejection variants on manual dispatch and a weekly schedule
  • A narrower manual-dispatch path in .github/workflows/pilot-smoke.yml for hosted attachment-backed FHIR confirmation, plus uploaded per-job smoke log artifacts for later run auditing
  • A structured hosted smoke summary generator in scripts/summarize_pilot_smoke.py plus workflow-uploaded JSON and Markdown summary artifacts for later handoff capture
  • A bundled hosted smoke evidence builder in scripts/build_pilot_smoke_evidence.py plus make pilot-smoke-evidence so downloaded hosted summary artifacts can be consolidated into release-ready JSON and Markdown evidence
  • A manual-only hl7-success-only dispatch path in .github/workflows/pilot-smoke.yml so hosted HL7 success-path trials can be recorded without widening the default weekly matrix
  • Live failed-run shared-visibility smoke coverage for a persisted non-site validation_error import run in the proxy and header pilot overlays
  • Live structured adapter failed-run shared-visibility smoke coverage for persisted non-site FHIR unsupported_payload and HL7 parse_error runs in the proxy and header pilot overlays
  • Live structured adapter site-scope rejection smoke coverage for persisted FHIR and HL7 site_scope_rejection runs in the proxy and header pilot overlays
  • Live structured adapter audit-visibility smoke coverage for persisted FHIR and HL7 site_scope_rejection runs in the proxy and header pilot overlays

Changed

  • The Next.js home and about surfaces now present Pancreatic Signal as a benchmarkable product entrypoint instead of a bare scaffold shell, and the web app now exposes a dedicated /proof page backed by the checked-in benchmark snapshots
  • Benchmark-oriented contribution paths now have a documented label schema, stable error-bucket rubric, and a machine-validated submission format for outside collaborators
  • Outside collaborators can now turn label and prediction JSONL files into JSON, Markdown, and submission-draft artifacts without hand-assembling benchmark metrics
  • FHIR DiagnosticReport imports now preserve patient, encounter, and accession metadata from inline Reference.identifier values when upstream bundles omit fully resolved resources
  • FHIR DiagnosticReport imports now decode supported text-like presentedForm attachments, including attachment-backed XHTML narratives, and keep unsectioned attachment findings merged with conclusion instead of dropping the conclusion text
  • The proxy and header FHIR pilot smoke fixtures now submit attachment-backed DiagnosticReport.presentedForm XHTML narratives, so deployable smoke coverage exercises the supported text-like attachment decode path instead of only an Observation-backed structured result
  • HL7 ORU imports now decode base64 ED report text, normalize repeated OBX-5 values, respect custom MSH-2 component and repetition separators, normalize common HL7 escape sequences, and clean composite metadata fields with subcomponent-aware extraction before triage
  • Top-level documentation now describes the implemented research platform instead of the earlier scaffold-era state
  • Deployment and API docs now call out cross-actor visibility for non-site failed import runs with empty imported_sites
  • make validate-strict now includes web lint alongside Python checks, API tests, evaluation checks, and the web build so local and hosted validation stay aligned
  • The validation and deployment docs now distinguish between hosted base plus report-path and structured adapter site-rejection automation and the broader manual overlay smoke matrix
  • Hosted pilot smoke jobs now generate machine-readable and Markdown summary artifacts from pilot-smoke.log, so the first green attachment-backed FHIR run can be recorded without manual log scraping
  • Hosted pilot smoke summaries now also preserve absolute smoke start and finish timestamps, and downloaded artifact summaries can be consolidated into a single hosted evidence bundle for release and handoff capture
  • Hosted pilot smoke summaries now include smoke duration and exit code, and the recorded 2026-03-25 hosted HL7 trial keeps HL7 success-path coverage manual-only in the default matrix by explicit decision
  • The bind-mounted web service in docker-compose.yml now preserves /app/node_modules, so local and GitHub-hosted pilot stacks retain the Next.js CLI installed during image build
  • The published demo benchmark snapshot now includes dataset coverage, top-k queue previews, benchmark buckets, expected rationale cues, and reviewer-facing casebook notes instead of only aggregate metric tables
  • The checked-in demo benchmark corpus now covers 10 labeled reports across 5 recurring benchmark buckets, so /proof and the published artifacts show wording variance, pancreatitis confounders, secondary-sign clusters, and follow-up-only cyst surveillance cases instead of only a minimal five-case set
  • The public external benchmark starter pack now carries the same reviewer-facing fields as the demo casebook, and scripts/run_external_eval.py now emits dataset coverage, top-k queue previews, and a per-case reviewer casebook for collaborator-supplied labels plus predictions
  • The repo now includes a checked-in deidentified retrospective-style external benchmark sample with reproducible JSON, Markdown, and submission artifacts, optional report_excerpt plus cohort support, and an expanded 12-report multi-cohort casebook for less-synthetic external review
  • The public /proof surface now renders that checked-in retrospective-style sample alongside the demo comparison, including cohort coverage and cohort-aware queue labels, so outside collaborators can inspect both the synthetic casebook and the less-synthetic external sample in one reproducible web view
  • FHIR DiagnosticReport imports now merge multiple supported presentedForm attachments in order and suppress shorter overlapping fragments when a richer narrative attachment already contains them
  • FHIR DiagnosticReport imports now also decode supported presentedForm.url attachments when they resolve to bundled or contained FHIR Binary resources
  • FHIR DiagnosticReport imports now expand referenced Observation.component findings into report text so structured component-level pancreatic findings are preserved for triage
  • FHIR DiagnosticReport imports now also expand grouped referenced Observation.hasMember findings into report text and fall back to conclusionCode text when a report omits free-text conclusion
  • FHIR DiagnosticReport imports now also preserve referenced Observation.interpretation and referenceRange context for reviewer-visible pancreatic measurements, while avoiding duplicate or cyclic grouped-member expansion

Validated

  • make validate-strict passed on 2026-03-25 with 9 pass, 0 warn, 0 fail
  • API validation reported 139 passed on 2026-03-25
  • apps/api/.venv/bin/python -m pytest apps/api/tests/test_imports.py -q passed on 2026-03-25 with 34 passed, including split, overlapping, bundled or contained Binary-backed FHIR presentedForm, grouped and cycle-safe Observation.hasMember, conclusionCode, measurement interpretation plus referenceRange, and Observation.component coverage
  • apps/api/.venv/bin/python -m pytest apps/api/tests/test_summarize_pilot_smoke.py apps/api/tests/test_smoke_proxy_auth.py -q passed on 2026-03-25 with 23 passed, covering the hosted smoke summary parser plus the attachment-backed FHIR smoke fixture shape
  • apps/api/.venv/bin/python -m pytest apps/api/tests/test_summarize_pilot_smoke.py apps/api/tests/test_build_pilot_smoke_evidence.py -q passed on 2026-03-25, covering timestamped hosted smoke summaries plus bundled evidence generation from downloaded hosted artifacts
  • apps/api/.venv/bin/python -m pytest apps/api/tests/test_smoke_proxy_auth.py -q passed on 2026-03-25 with 21 passed, covering the attachment-backed FHIR smoke fixture shape
  • apps/api/.venv/bin/python -m pytest apps/api/tests/test_demo_benchmark_snapshot.py apps/api/tests/test_exports_metrics.py -q passed on 2026-03-25 with 11 passed, covering the casebook-backed demo benchmark snapshot plus richer evaluation-case payload fields
  • apps/api/.venv/bin/python -m pytest apps/api/tests/test_external_evaluation.py -q passed on 2026-03-25 with 3 passed, covering the richer external benchmark template plus casebook-shaped external bundle outputs
  • apps/api/.venv/bin/python -m pytest apps/api/tests/test_external_evaluation.py apps/api/tests/test_benchmark_submission.py -q passed on 2026-03-25 with 9 passed, covering the multi-cohort retrospective-style sample pack, optional report excerpts plus cohorts, and the richer external benchmark template plus casebook-shaped bundle outputs
  • make benchmark-external-sample and make refresh-external-sample-proof both wrote reproducible multi-cohort retrospective-style benchmark artifacts on 2026-03-25
  • Unsandboxed local reruns of make pilot-proxy-demo-fhir-smoke and make pilot-header-demo-fhir-smoke passed end to end on 2026-03-25, recording persisted import runs 50 and 51 plus visible-case and reviewer round-trip verification
  • GitHub-hosted Pilot Smoke (fhir-success-only) run #23563902873 passed on 2026-03-25 with summary artifacts pilot-smoke-summary-header-demo-fhir-smoke and pilot-smoke-summary-proxy-demo-fhir-smoke, both showing one visible case and a reviewer round-trip
  • GitHub-hosted Pilot Smoke (hl7-success-only) run #23564057337 passed on 2026-03-25 with summary artifacts pilot-smoke-summary-header-demo-hl7-smoke and pilot-smoke-summary-proxy-demo-hl7-smoke, both showing one visible case and a reviewer round-trip
  • make pilot-smoke-evidence SUMMARY_DIR=... HL7_DECISION=keep-manual produced a complete hosted evidence bundle on 2026-03-25 and recorded the explicit decision to keep HL7 manual-only in the default hosted matrix to limit recurring runtime and maintenance cost
  • make refresh-demo-proof refreshed the checked-in casebook-backed benchmark proof snapshot in docs/examples/ on 2026-03-25 local time, expanding the published corpus to 10 labeled reports across 5 benchmark buckets with hybrid-only rescues for follow-up-only cases C-005 and C-008
  • make benchmark-external LABELS=docs/examples/benchmark-label-template.jsonl PREDICTIONS=docs/examples/benchmark-prediction-template.jsonl OUT_DIR=/tmp/pancreatic-signal-external-eval BASENAME=template-external TOP_K=2 wrote JSON, Markdown, and submission-draft artifacts on 2026-03-24
  • make validate-benchmark-submission SUBMISSION=/tmp/pancreatic-signal-external-eval/template-external-submission.json passed on 2026-03-24
  • apps/api/.venv/bin/python -m pytest apps/api/tests/test_imports.py apps/api/tests/test_import_runs.py -q passed on 2026-03-24 with 46 passed, including inline Reference.identifier FHIR coverage, attachment-backed presentedForm decoding and audit coverage, plus custom MSH-2 HL7 delimiter, escape-sequence, subcomponent, and audit-visibility coverage
  • Local reruns of make pilot-proxy-demo-smoke and make pilot-header-demo-smoke passed on 2026-03-20 local time with persisted run IDs 40 and 41, matching the checked-in hosted workflow targets
  • Local reruns of make pilot-proxy-demo-site-rejection-smoke and make pilot-header-demo-site-rejection-smoke passed on 2026-03-22 local time with persisted run IDs 42 and 43, matching the newly hosted failure-path targets
  • Local reruns of make pilot-proxy-demo-adapter-site-rejection-smoke and make pilot-header-demo-adapter-site-rejection-smoke passed on 2026-03-22 local time with persisted run IDs 44, 45, 46, and 47, matching the newly hosted structured failure-path targets
  • Live structured adapter site-scope rejection and audit-visibility smoke runs passed for both trusted-proxy and header-auth overlays and are recorded in docs/CODEX_HANDOFF.md

0.9.0-preview - 2026-03-20

This preview marker captures the repo state before a formal 1.0 release process begins.

Added

  • Deterministic pancreatic triage with evidence spans, rationale codes, persistence, exports, and reproducible evaluation
  • Reviewer workflow with worklist filters, case detail, review actions, hybrid prioritization, feedback capture, and research-safe views
  • FHIR DiagnosticReport, HL7 ORU, and generic report import paths with structured provenance metadata
  • Import-run audit summaries and detail routes with stable failure buckets and visibility controls
  • Env-driven field-preference overrides for selected FHIR and HL7 metadata extraction
  • Dedicated /imports web workspace for file uploads, structured submissions, and audit inspection
  • Pilot-ready Docker overlays for trusted-proxy and header-auth demos
  • Live smoke coverage for report, FHIR, and HL7 import success paths
  • Live smoke coverage for site-scope rejection, parse or validation failure, adapter failure, audit denial, shared visibility, and structured shared visibility

Changed

  • The proxy pilot demo now defaults to an import-capable navigator identity so /imports and pilot smoke paths exercise the same capability class
  • Top-level handoff documentation now reflects late Phase 6 hardening rather than early MVP bootstrap work

Validated

  • make validate-strict passed on 2026-03-20 with 8 pass, 0 warn, 0 fail
  • API validation reported 90 passed
  • Live pilot smoke runs passed for both trusted-proxy and header-auth overlays across the documented success and failure paths recorded in docs/CODEX_HANDOFF.md