fix(opentargets): adapt to upstream API drift + skip deprecated expressions field by lauraluebbert · Pull Request #256 · scverse/gget

lauraluebbert · 2026-06-26T17:21:01Z

Fixes the 11 opentargets tests failing across all Python versions on main. Three distinct upstream changes — addressed independently:

1. GraphQL schema change (real code bug) — 2 tests

RuntimeError: OpenTargets GraphQL returned HTTP 400.
Body: Field 'synonyms' of type '[DrugLabelAndSource!]!' must have a sub selection.

The Drug type's `synonyms` and `tradeNames` are now `[DrugLabelAndSource!]!` instead of scalar lists. Querying without a sub-selection 400s. Updated `QUERY_STRING_DRUGS` to query `{ label }` — the existing `_collapse_singletons()` post-processor flattens each `{label: "X"}` back to `"X"` automatically, so the public DataFrame shape is preserved (`drug.synonyms` remains `list[str]`).

Confirmed via GraphQL introspection: `DrugLabelAndSource.{label, source}` are both `String!`.

2. Data drift — 8 fixtures refreshed

Per project convention (refresh fixtures, don't normalize):

Test	What changed upstream
`test_opentargets`	Disease ontology: `EFO_0000274` → `MONDO_0004980`
`test_opentargets_diseases`	Same disease ID change
`test_opentargets_drugs`	Re-captured with the synonyms fix applied
`test_opentargets_interactions`	Protein IDs (e.g. `ENSP00000361004` → `ENSP00000360730`)
`test_opentargets_pharmacogenetics`	Variant allele frequencies (`T_C,C` → `T_C,T`)
`test_opentargets_depmap`	Hash refreshed (1239 rows)
`test_opentargets_depmap_filter`	UBERON_0002367 (prostate) has no depmap data for IL13 anymore; fixture now expects `[]`
`test_opentargets_interactions_no_limit`	Hash refreshed (25 rows)

Refreshed via a one-shot script that calls each function with the test args and rewrites `expected_result` to match.

3. `expressions` field deprecated — 2 tests skipped

Open Targets removed data from `Target.expressions` (returns `[]` for all queries). The replacement is `Target.baselineExpression` with a completely different schema:

Old: `expressions[].tissue.{id, label, anatomicalSystems, organs}` + `expressions[].rna.{zscore, value, unit, level}`
New: `baselineExpression.rows[].{targetFromSourceId, max, q1, q3, median, min, tissueBiosample, celltypeBiosample, datasourceId, …}`

Migrating `gget_opentargets` to the new field is a user-facing API change (different DataFrame columns) that warrants its own PR. For now:

`test_opentargets_expression` and `test_opentargets_expression_no_limit` are marked `type: "skip"` with a `reason` field
`tests/from_json.py` gains a new `skip` test type so JSON-defined tests can be skipped with a reason message (no need for a separate Python test file)

…ssions field 11 opentargets tests were failing across all Python versions due to three distinct upstream changes: 1. GraphQL schema change (real bug) The Drug type's `synonyms` and `tradeNames` fields are now `[DrugLabelAndSource!]!` (was scalar list[str]). Querying without a sub-selection returns HTTP 400. Updated QUERY_STRING_DRUGS to query `synonyms { label }` and `tradeNames { label }`. The existing _collapse_singletons() post-processor flattens each {label: "X"} back to "X" automatically, so the public DataFrame shape is preserved. 2. Data drift (8 tests, fixtures refreshed) Open Targets re-indexed disease ontology IDs (EFO → MONDO), gene protein IDs, allele frequencies, depmap entries, and interaction data. Per the project convention (refresh fixtures, don't normalize) the expected_result blocks for test_opentargets, *_diseases, *_drugs, *_interactions, *_pharmacogenetics, *_depmap, *_depmap_filter, and *_interactions_no_limit were re-captured from the current upstream output via a one-shot helper script. 3. expressions field deprecated (2 tests, skipped) The Target.expressions field now returns [] for all queries; Open Targets replaced it with Target.baselineExpression which has a completely different schema (tissueBiosample, q1/q3/median/min/max instead of tissue/rna sub-objects). Migrating gget_opentargets to the new field is a user-facing API change and out of scope for this PR; the two affected tests are now marked as skipped with a reason field. Also adds a `skip` test type to tests/from_json.py so JSON-defined tests can be marked as known-skipped (with a reason message) without needing a separate Python test file or fixture deletion. Verified locally: 15 passed, 2 skipped, 0 failed.

codecov-commenter · 2026-06-26T17:34:57Z

Codecov Report

❌ Patch coverage is 0% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 7.98%. Comparing base (682d7c6) to head (daeee55).
⚠️ Report is 5 commits behind head on dev.

Files with missing lines	Patch %	Lines
gget/gget_opentargets.py	0.00%	1 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff            @@
##             dev    #256      +/-   ##
========================================
- Coverage   7.98%   7.98%   -0.01%     
========================================
  Files         29      29              
  Lines       9244    9245       +1     
========================================
  Hits         738     738              
- Misses      8506    8507       +1

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

- Drugs resource: HTTP 400 fix (synonyms/tradeNames sub-selection). Surfaces both the upstream cause and reassures users that the DataFrame column shape is unchanged. - Expression resource: known limitation pointing at the baselineExpression migration as the next step. Flags that the tests for this path are skipped in the meantime.

Strip back the opentargets-related changes so this PR is focused on the archs4 + ELM CI-stability fixes only. The opentargets work (synonyms HTTP 400 fix, fixture refresh, expression skip) is being handled in a separate PR (scverse#256), per maintainer preference for one-module-per-PR review. Reverted to origin/dev: - gget/gget_opentargets.py - tests/test_opentargets.py - tests/fixtures/test_opentargets.json Trimmed updates.md: - Removed the opentargets bullet (lives in scverse#256) - Added an archs4 bullet explaining the color-column + deterministic- sort fix (user-visible behavior change, was missing here) Remaining scope: - gget_archs4.py: graceful handling of missing color column, deterministic median-then-id sort - tests/test_archs4.py: TestArchs4MissingColor regression test - tests/fixtures/test_archs4.json: refreshed for the deterministic sort - tests/test_elm.py: retry ELM setup on transient download failure Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Without this, gget opentargets resource="expression" silently returns an empty DataFrame, which looks identical to "your gene has no expression data" and gives the user no signal that the upstream field is actually retired. Now emits a logger.warning naming the deprecated field, the planned baselineExpression replacement, and the tracking issue (#247) so users can subscribe / contribute. Updated the 0.30.8 entry in updates.md to mention the warning.

* fix(opentargets): give drug synonyms a GraphQL sub-selection (HTTP 400 fix) OpenTargets changed the Drug 'synonyms' and 'tradeNames' fields from [String!]! to the object type [DrugLabelAndSource!]!, which now requires a sub-selection. The bare-scalar selection caused every drug query to fail with HTTP 400. Request '{ label }' for both fields and flatten the response objects back to a list of label strings so downstream output stays backward-compatible (a list of strings). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(archs4): tolerate missing 'color' column in tissue expression (#dev-drift) ARCHS4's tissue-expression CSV intermittently omits the 'color' column, which made `gget archs4 --which tissue` crash with `KeyError: "['color'] not found in axis"`. The 'color' column is only used for plotting upstream and is dropped (never used) by gget, so a missing column should not be fatal. Use `drop(columns=["color"], errors="ignore")` so the request degrades gracefully when the column is absent. Adds network-free regression tests covering both the present-color and missing-color responses. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(opentargets): use baselineExpression for the expression resource OpenTargets retired the `target.expressions` field (it now returns an empty list for every gene), so `gget opentargets -r expression` returned nothing. Baseline expression data moved to the paginated `target.baselineExpression` field with a new per-biosample data model. - Repoint the expression query to `baselineExpression(page:{index:0,size:250}) { rows {...} }` and update rows_path to ["baselineExpression","rows"]. - Output columns change accordingly (per-biosample summary stats: median/min/ q1/q3/max/unit + tissueBiosample/celltypeBiosample ids + datasource/datatype), because the upstream data model changed and the old shape no longer exists. - Remove the two now-invalid live exact-match fixtures and replace them with network-free mocked tests; update docs (example, resource table, updates.md). Verified live: http_json with the new query returns 1409 rows in ~0.6s and the parsing pipeline yields the documented columns. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(opentargets): loosen live-data assertions to structural/invariant (data drifts across releases) OpenTargets is a live database re-released regularly; several opentargets tests pinned exact current values (disease ids/scores, result hashes, interaction partner ids, genotypes) that legitimately change every release, so they failed on unrelated PRs even though gget returns correct current data. Replace the exact-value/hash assertions for test_opentargets, _diseases, _depmap, _depmap_filter, _interactions, _interactions_no_limit and _pharmacogenetics with structural/invariant assertions (expected columns present, numeric dtypes, value-format patterns — ontology-curie disease/tissue ids, ENSG interaction partners, ACH DepMap ids, score in [0,1], nucleotide genotypes — and the depmap filter invariant). The fixture entries are marked `code_defined`; the structural methods live in tests/test_opentargets.py. These stay meaningful (they break on wrong columns, malformed ids, non-numeric scores, broken filtering, or empty-where-guaranteed) without pinning drifting data. Verified live against current OpenTargets data. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test: assert live-data contracts for CI repair * test: retry ELM live setup downloads * test: keep OpenTargets expression semantics out of CI repair * test(opentargets): add semantic anchors + score tolerance to live-data tests (#249) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * test(opentargets): rewrite live-data tests as explicit IL13 assertions (#249) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * test(opentargets): read gene from fixture + guard to IL13; drop duplicate test_opentargets (#249) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * test(archs4): rewrite live tissue tests as concrete fixture-driven checks (#249) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(archs4): deterministic tissue sort via id tiebreaker; restore exact-snapshot tests (#249) Sort tissue rows by [median desc, id asc] so output is reproducible when medians tie (ARCHS4 returns tied rows in varying order). Revert the live tissue tests to exact assert_equal snapshots (re-sorted to the deterministic order); keep the network-free color regression tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Drop opentargets fixes from this PR (now covered by #256) Strip back the opentargets-related changes so this PR is focused on the archs4 + ELM CI-stability fixes only. The opentargets work (synonyms HTTP 400 fix, fixture refresh, expression skip) is being handled in a separate PR (#256), per maintainer preference for one-module-per-PR review. Reverted to origin/dev: - gget/gget_opentargets.py - tests/test_opentargets.py - tests/fixtures/test_opentargets.json Trimmed updates.md: - Removed the opentargets bullet (lives in #256) - Added an archs4 bullet explaining the color-column + deterministic- sort fix (user-visible behavior change, was missing here) Remaining scope: - gget_archs4.py: graceful handling of missing color column, deterministic median-then-id sort - tests/test_archs4.py: TestArchs4MissingColor regression test - tests/fixtures/test_archs4.json: refreshed for the deterministic sort - tests/test_elm.py: retry ELM setup on transient download failure Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(archs4): drop the redundant with-color companion test test_tissue_with_color_still_dropped tested the "happy path" that both the old and the new code already handle the same way (column present → column dropped from output). It can't catch any plausible regression of the actual fix (which is the errors="ignore" kwarg, exercised by the sibling test_tissue_missing_color_does_not_crash). Removing it tightens the test suite without weakening the regression guard around the actual bug. _CSV_WITH_COLOR class attribute removed along with it (no other references). --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Laura Luebbert <laura.lbt60@gmail.com>

lauraluebbert mentioned this pull request Jun 26, 2026

fix(ci): repair archs4 + ELM live-data test failures #252

Merged

3 tasks

Merge branch 'dev' into fix/opentargets-api-drift

daeee55

lauraluebbert merged commit 82a47bc into dev Jun 26, 2026
1 of 2 checks passed

lauraluebbert deleted the fix/opentargets-api-drift branch June 26, 2026 18:59

lauraluebbert mentioned this pull request Jun 26, 2026

OpenTargets tests pin exact live values that drift across data releases (diseases/depmap/interactions/pharmacogenetics) #249

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(opentargets): adapt to upstream API drift + skip deprecated expressions field#256

fix(opentargets): adapt to upstream API drift + skip deprecated expressions field#256
lauraluebbert merged 4 commits into
devfrom
fix/opentargets-api-drift

lauraluebbert commented Jun 26, 2026 •

edited

Loading

Uh oh!

codecov-commenter commented Jun 26, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

lauraluebbert commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1. GraphQL schema change (real code bug) — 2 tests

2. Data drift — 8 fixtures refreshed

3. `expressions` field deprecated — 2 tests skipped

Uh oh!

codecov-commenter commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lauraluebbert commented Jun 26, 2026 •

edited

Loading

codecov-commenter commented Jun 26, 2026 •

edited

Loading