Skip to content

Commit 6df3787

Browse files
authored
Merge pull request #2 from antonkulaga/nightscout
Nightscout data from server import + csv converter support
2 parents 36506e4 + 9c99cf1 commit 6df3787

45 files changed

Lines changed: 58756 additions & 245 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/workflows/test.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ on:
99
jobs:
1010
test:
1111
runs-on: ubuntu-latest
12+
environment: test
1213
strategy:
1314
matrix:
1415
python-version: ["3.10", "3.11", "3.12", "3.13", "3.14"]
@@ -29,5 +30,8 @@ jobs:
2930
uv sync --extra dev
3031
3132
- name: Run tests
33+
env:
34+
NIGHTSCOUT_URL: ${{ vars.NIGHTSCOUT_URL }}
35+
NIGHTSCOUT_TOKEN: ${{ secrets.NIGHTSCOUT_TOKEN }}
3236
run: |
3337
uv run pytest tests/ -v

.gitignore

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ __pycache__/
66
# C extensions
77
*.so
88

9-
data/*
9+
# data/ rules live in data/.gitignore (ignore-all with explicit allowlist).
1010

1111
# Distribution / packaging
1212
.Python

AGENTS.md

Lines changed: 57 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,19 @@
11
## Project overview
22

3-
`cgm-format` is a Python library for converting vendor-specific Continuous Glucose Monitoring (CGM) data (Dexcom, Libre) into a standardized unified format for ML training and inference.
3+
`cgm-format` is a Python library for converting vendor-specific Continuous Glucose Monitoring (CGM) data (Dexcom, Libre, Medtronic, Nightscout) into a standardized unified format for ML training and inference.
44

55
The package lives under `src/cgm_format/` (PEP 517 src layout). The two main public classes are:
66
- `FormatParser` (`format_parser.py`) — Stages 1–3: decode raw bytes, detect vendor format, parse to unified Polars DataFrame.
77
- `FormatProcessor` (`format_processor.py`) — Stages 4–6: sequence detection, gap interpolation, timestamp synchronization, inference preparation, data-only export.
88

99
Supporting modules:
1010
- `formats/unified.py``UnifiedEventType`, `Quality` flags, `CGM_SCHEMA` (the canonical `CGMSchemaDefinition`).
11-
- `formats/dexcom.py`, `formats/libre.py` — vendor-specific column enums, detection patterns, schemas.
11+
- `formats/dexcom.py`, `formats/libre.py`, `formats/medtronic.py`, `formats/nightscout.py` — vendor-specific column enums, detection patterns, schemas.
1212
- `formats/supported.py``FORMAT_DETECTION_PATTERNS`, `SCHEMA_MAP`, `KNOWN_ISSUES_TO_SUPPRESS`.
1313
- `interface/cgm_interface.py` — abstract base classes `CGMParser` / `CGMProcessor`, all exception types, `ProcessingWarning`, constants.
1414
- `interface/schema.py``CGMSchemaDefinition`, `ColumnSchema`, `EnumLiteral`, Frictionless export helpers.
1515
- `cgm_cli.py` — Typer CLI entry-point (`cgm-cli`).
16+
- `nightscout_downloader.py``download_nightscout()` helper using `httpx` (JSON-only download, supports `token` and `api_secret` auth).
1617

1718
GitHub repo: GlucoseDAO/cgm_format
1819

@@ -21,7 +22,7 @@ GitHub repo: GlucoseDAO/cgm_format
2122
`uv` is used as the package manager. **Always run commands via `uv run`** — never use bare `pytest`, `python`, or `cgm-cli` directly. The project uses a src layout with hatchling; the package and its dependencies (including `polars`) are only available inside the uv-managed virtual environment. Running bare `pytest` picks up the system Python, which does not have the project installed, and fails with `ModuleNotFoundError`.
2223

2324
```bash
24-
uv sync --extra dev # FIRST: install/sync all dependencies (dev includes pytest, typer, rich, pandas, pyarrow, frictionless)
25+
uv sync --extra dev # FIRST: install/sync all dependencies (dev includes cli + pytest)
2526
uv run pytest # run the full test suite
2627
uv run pytest tests/test_format_parser.py # run a specific test file
2728
uv run cgm-cli --help # explore the CLI
@@ -32,7 +33,7 @@ uv run cgm-cli pipeline <file> # run full 6-stage pipeline
3233

3334
If tests fail with `ModuleNotFoundError: No module named 'polars'` or `No module named 'cgm_format'`, run `uv sync --extra dev` first.
3435

35-
Tests are **integration tests that use real data** in `data/` — do not mock unless absolutely required.
36+
Tests are **integration tests that use real data** in `data/input/` — do not mock unless absolutely required.
3637

3738
## Code style guidelines
3839

@@ -78,6 +79,30 @@ The canonical output is a Polars DataFrame conforming to `CGM_SCHEMA` (`formats/
7879
5. Export new public symbols from `src/cgm_format/__init__.py` and add to `__all__`.
7980
6. Add real-data integration tests in `tests/`.
8081

82+
## Gap thresholds and grid-aligned gap measurement
83+
84+
### SMALL_GAP_MAX_MINUTES = 15 (3 intervals)
85+
86+
The gap threshold that separates "small" (fillable) from "large" (sequence-splitting) gaps is `SMALL_GAP_MAX_MINUTES = EXPECTED_INTERVAL_MINUTES * 3 = 15` minutes. This value is aligned with the sister library [`glucose_data_processing`](https://github.com/GlucoseDAO/glucose_data_processing) which uses the same `small_gap_max_minutes=15` default.
87+
88+
**Why a grid multiple matters:** `interpolate_gaps` uses grid-aligned gap measurement when `snap_to_grid=True` (the default). Raw timestamps are projected onto the 5-minute grid before measuring gaps, so effective gap sizes are always multiples of 5 (0, 5, 10, 15, 20, ...). A threshold that is itself a grid multiple (15) produces clean, deterministic fill/skip decisions. The previous threshold of 19 was not a grid multiple, which caused borderline instability: a raw gap of 18.7 min would round to 20 min on the grid (exceeding 19), while the same gap measured on raw timestamps would be below 19. This made `interpolate_gaps` and `synchronize_timestamps` disagree on whether to fill such gaps.
89+
90+
### Grid-aligned gap measurement for commutativity
91+
92+
`_interpolate_sequence` computes effective gap sizes by projecting both endpoints of each gap onto the 5-minute grid via `calculate_grid_point()`, then measuring the distance between grid positions. This ensures that `interpolate_gaps → synchronize_timestamps` and `synchronize_timestamps → interpolate_gaps` see identical gap sizes and produce identical results (commutativity). The approach is:
93+
94+
1. For each pair of consecutive glucose readings, compute the nearest grid point for both timestamps.
95+
2. Measure the gap as the difference between grid positions (always a multiple of `expected_interval_minutes`).
96+
3. Apply the `> expected_interval_minutes` and `<= SMALL_GAP_MAX_MINUTES` thresholds to the grid-aligned gap.
97+
98+
This is only active when `snap_to_grid=True`. When `snap_to_grid=False`, raw timestamp differences are used (no grid alignment), so commutativity with `synchronize_timestamps` is not guaranteed.
99+
100+
### Comparison operators
101+
102+
Both `cgm_format` and `glucose_data_processing` use the same operator convention:
103+
- **Sequence splits:** `> threshold` (strictly greater → gap AT the threshold stays in the same sequence)
104+
- **Interpolation fill:** `<= threshold` (less-or-equal → gap AT the threshold IS filled)
105+
81106
## Known pitfalls
82107

83108
### Encoding artifacts in vendor CSVs
@@ -104,16 +129,42 @@ df.filter((pl.col("quality") & Quality.IMPUTATION.value) != 0)
104129

105130
The CLI `report` and `validate` commands use `frictionless` if available, but degrade gracefully without it. Import it inside functions (not at module level) or guard with `HAS_FRICTIONLESS`. The core `FormatParser` / `FormatProcessor` do not depend on it.
106131

132+
### Nightscout dual-path architecture
133+
134+
Nightscout data is supported through two parsing paths:
135+
136+
1. **JSON API path** (primary): `FormatParser.parse_nightscout(entries_json, treatments_json)` or
137+
`FormatParser.from_nightscout_exports(entries_path, treatments_path)` or
138+
`FormatParser.from_nightscout_url(base_url, ...)`. Downloads entries and treatments as JSON,
139+
combines glucose readings with insulin/carbs/temp basals. Supports `token` and `api_secret` auth.
140+
141+
2. **nightscout-exporter CSV path**: Combined CSV file with `# CGM ENTRIES` and `# TREATMENTS`
142+
section headers. Auto-detected by `detect_format()` and parsed via `parse_file()` /
143+
`parse_from_string()`. The `_process_nightscout` dispatcher handles both JSON and CSV.
144+
145+
The built-in Nightscout API CSV endpoints are **not supported** — entries.csv is headerless with
146+
only 5 columns, and treatments.csv doesn't actually serve CSV (returns JSON regardless). The
147+
`nightscout_entries.csv` file in `data/input/` is kept as a negative control.
148+
149+
### `httpx` is an optional dependency
150+
151+
The `nightscout_downloader` module requires `httpx` for HTTP requests. It is included in the `cli` and `dev` optional dependency groups. Import it inside functions and raise a clear `ImportError` with install instructions if missing.
152+
107153
## Learned workspace facts
108154

109155
- Source layout: `src/cgm_format/` (hatchling build, `tool.hatch.build.targets.wheel.packages`).
110-
- Test data lives in `data/` (excluded from sdist). Tests use real files — no mocking.
156+
- Test data lives in `data/input/` (excluded from sdist). Tests use real files — no mocking.
111157
- `scripts/` contains one-off utilities (`regenerate_all_schemas.py`, scrub scripts); they are not part of the installed package.
112158
- `examples/` shows library usage patterns; keep them runnable as documentation.
113159
- The `cgm-cli` entry point is defined in `[project.scripts]` in `pyproject.toml`; the implementation is `cgm_format.cgm_cli:main`.
114-
- Optional dependency groups: `extra` (pandas, pyarrow, frictionless), `cli` (typer, rich + extra), `dev` (pytest + cli).
160+
- Optional dependency groups: `cli` (typer, rich, httpx, pandas, pyarrow, frictionless), `dev` (pytest + cli + python-dotenv).
115161
- `uv lock --upgrade` only updates `uv.lock`; `pyproject.toml` minimum version bounds must be bumped manually if you want to raise them.
162+
- `tests/conftest.py` loads `.env` via `python-dotenv` and provides a session-scoped `nightscout_data_dir` fixture that downloads Nightscout JSON data from `NIGHTSCOUT_URL` (with optional `NIGHTSCOUT_TOKEN` / `NIGHTSCOUT_API_SECRET`) into `data/input/`. Files are cached; pass `--nightscout-redownload` to force refresh.
163+
- `data/.gitignore` uses an ignore-all + allowlist pattern (`*` then `!input/`, `!input/**`). To commit a new top-level subdirectory under `data/`, add explicit `!<dir>/` and `!<dir>/**` entries.
164+
- `detect_format()` recognizes nightscout-exporter CSV (with `# CGM ENTRIES` section headers). Nightscout JSON files do **not** go through `detect_format` — use `parse_nightscout()` or `from_nightscout_exports()` instead.
165+
- `download_nightscout()` always fetches JSON (entries, treatments, profile). Supports `token` (query param) and `api_secret` (SHA1-hashed header) authentication.
116166

117167
## Learned User Preferences
118168

119169
- When upgrading dependencies (`uv lock --upgrade`), also raise the lower-bound version constraints in `pyproject.toml` to match the newly resolved versions.
170+
- Tests should be resilient to changing data — use `pytest.skip()` for optional data features (e.g. specific treatment types) rather than hard assertions that assume specific data content.

0 commit comments

Comments
 (0)