Skip to content

fix(opentargets): adapt to upstream API drift + skip deprecated expressions field#256

Merged
lauraluebbert merged 4 commits into
devfrom
fix/opentargets-api-drift
Jun 26, 2026
Merged

fix(opentargets): adapt to upstream API drift + skip deprecated expressions field#256
lauraluebbert merged 4 commits into
devfrom
fix/opentargets-api-drift

Conversation

@lauraluebbert

@lauraluebbert lauraluebbert commented Jun 26, 2026

Copy link
Copy Markdown
Collaborator

Fixes the 11 opentargets tests failing across all Python versions on main. Three distinct upstream changes — addressed independently:

1. GraphQL schema change (real code bug) — 2 tests

RuntimeError: OpenTargets GraphQL returned HTTP 400.
Body: Field 'synonyms' of type '[DrugLabelAndSource!]!' must have a sub selection.

The Drug type's `synonyms` and `tradeNames` are now `[DrugLabelAndSource!]!` instead of scalar lists. Querying without a sub-selection 400s. Updated `QUERY_STRING_DRUGS` to query `{ label }` — the existing `_collapse_singletons()` post-processor flattens each `{label: "X"}` back to `"X"` automatically, so the public DataFrame shape is preserved (`drug.synonyms` remains `list[str]`).

Confirmed via GraphQL introspection: `DrugLabelAndSource.{label, source}` are both `String!`.

2. Data drift — 8 fixtures refreshed

Per project convention (refresh fixtures, don't normalize):

Test What changed upstream
`test_opentargets` Disease ontology: `EFO_0000274` → `MONDO_0004980`
`test_opentargets_diseases` Same disease ID change
`test_opentargets_drugs` Re-captured with the synonyms fix applied
`test_opentargets_interactions` Protein IDs (e.g. `ENSP00000361004` → `ENSP00000360730`)
`test_opentargets_pharmacogenetics` Variant allele frequencies (`T_C,C` → `T_C,T`)
`test_opentargets_depmap` Hash refreshed (1239 rows)
`test_opentargets_depmap_filter` UBERON_0002367 (prostate) has no depmap data for IL13 anymore; fixture now expects `[]`
`test_opentargets_interactions_no_limit` Hash refreshed (25 rows)

Refreshed via a one-shot script that calls each function with the test args and rewrites `expected_result` to match.

3. `expressions` field deprecated — 2 tests skipped

Open Targets removed data from `Target.expressions` (returns `[]` for all queries). The replacement is `Target.baselineExpression` with a completely different schema:

  • Old: `expressions[].tissue.{id, label, anatomicalSystems, organs}` + `expressions[].rna.{zscore, value, unit, level}`
  • New: `baselineExpression.rows[].{targetFromSourceId, max, q1, q3, median, min, tissueBiosample, celltypeBiosample, datasourceId, …}`

Migrating `gget_opentargets` to the new field is a user-facing API change (different DataFrame columns) that warrants its own PR. For now:

  • `test_opentargets_expression` and `test_opentargets_expression_no_limit` are marked `type: "skip"` with a `reason` field
  • `tests/from_json.py` gains a new `skip` test type so JSON-defined tests can be skipped with a reason message (no need for a separate Python test file)

…ssions field

11 opentargets tests were failing across all Python versions due to
three distinct upstream changes:

1. GraphQL schema change (real bug)
   The Drug type's `synonyms` and `tradeNames` fields are now
   `[DrugLabelAndSource!]!` (was scalar list[str]). Querying without a
   sub-selection returns HTTP 400. Updated QUERY_STRING_DRUGS to query
   `synonyms { label }` and `tradeNames { label }`. The existing
   _collapse_singletons() post-processor flattens each {label: "X"}
   back to "X" automatically, so the public DataFrame shape is
   preserved.

2. Data drift (8 tests, fixtures refreshed)
   Open Targets re-indexed disease ontology IDs (EFO → MONDO), gene
   protein IDs, allele frequencies, depmap entries, and interaction
   data. Per the project convention (refresh fixtures, don't normalize)
   the expected_result blocks for test_opentargets, *_diseases, *_drugs,
   *_interactions, *_pharmacogenetics, *_depmap, *_depmap_filter, and
   *_interactions_no_limit were re-captured from the current upstream
   output via a one-shot helper script.

3. expressions field deprecated (2 tests, skipped)
   The Target.expressions field now returns [] for all queries; Open
   Targets replaced it with Target.baselineExpression which has a
   completely different schema (tissueBiosample, q1/q3/median/min/max
   instead of tissue/rna sub-objects). Migrating gget_opentargets to
   the new field is a user-facing API change and out of scope for this
   PR; the two affected tests are now marked as skipped with a
   reason field.

Also adds a `skip` test type to tests/from_json.py so JSON-defined
tests can be marked as known-skipped (with a reason message) without
needing a separate Python test file or fixture deletion.

Verified locally: 15 passed, 2 skipped, 0 failed.
@codecov-commenter

codecov-commenter commented Jun 26, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 0% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 7.98%. Comparing base (682d7c6) to head (daeee55).
⚠️ Report is 5 commits behind head on dev.

Files with missing lines Patch % Lines
gget/gget_opentargets.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##             dev    #256      +/-   ##
========================================
- Coverage   7.98%   7.98%   -0.01%     
========================================
  Files         29      29              
  Lines       9244    9245       +1     
========================================
  Hits         738     738              
- Misses      8506    8507       +1     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

- Drugs resource: HTTP 400 fix (synonyms/tradeNames sub-selection).
  Surfaces both the upstream cause and reassures users that the
  DataFrame column shape is unchanged.
- Expression resource: known limitation pointing at the
  baselineExpression migration as the next step. Flags that the
  tests for this path are skipped in the meantime.
lauraluebbert added a commit to Elarwei001/gget that referenced this pull request Jun 26, 2026
Strip back the opentargets-related changes so this PR is focused on
the archs4 + ELM CI-stability fixes only. The opentargets work
(synonyms HTTP 400 fix, fixture refresh, expression skip) is being
handled in a separate PR (scverse#256), per maintainer preference for
one-module-per-PR review.

Reverted to origin/dev:
- gget/gget_opentargets.py
- tests/test_opentargets.py
- tests/fixtures/test_opentargets.json

Trimmed updates.md:
- Removed the opentargets bullet (lives in scverse#256)
- Added an archs4 bullet explaining the color-column + deterministic-
  sort fix (user-visible behavior change, was missing here)

Remaining scope:
- gget_archs4.py: graceful handling of missing color column,
  deterministic median-then-id sort
- tests/test_archs4.py: TestArchs4MissingColor regression test
- tests/fixtures/test_archs4.json: refreshed for the deterministic sort
- tests/test_elm.py: retry ELM setup on transient download failure

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Without this, gget opentargets resource="expression" silently returns
an empty DataFrame, which looks identical to "your gene has no
expression data" and gives the user no signal that the upstream field
is actually retired. Now emits a logger.warning naming the deprecated
field, the planned baselineExpression replacement, and the tracking
issue (#247) so users can subscribe / contribute.

Updated the 0.30.8 entry in updates.md to mention the warning.
lauraluebbert added a commit that referenced this pull request Jun 26, 2026
* fix(opentargets): give drug synonyms a GraphQL sub-selection (HTTP 400 fix)

OpenTargets changed the Drug 'synonyms' and 'tradeNames' fields from
[String!]! to the object type [DrugLabelAndSource!]!, which now requires
a sub-selection. The bare-scalar selection caused every drug query to
fail with HTTP 400.

Request '{ label }' for both fields and flatten the response objects
back to a list of label strings so downstream output stays
backward-compatible (a list of strings).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(archs4): tolerate missing 'color' column in tissue expression (#dev-drift)

ARCHS4's tissue-expression CSV intermittently omits the 'color' column,
which made `gget archs4 --which tissue` crash with
`KeyError: "['color'] not found in axis"`. The 'color' column is only used
for plotting upstream and is dropped (never used) by gget, so a missing
column should not be fatal.

Use `drop(columns=["color"], errors="ignore")` so the request degrades
gracefully when the column is absent. Adds network-free regression tests
covering both the present-color and missing-color responses.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(opentargets): use baselineExpression for the expression resource

OpenTargets retired the `target.expressions` field (it now returns an empty
list for every gene), so `gget opentargets -r expression` returned nothing.
Baseline expression data moved to the paginated `target.baselineExpression`
field with a new per-biosample data model.

- Repoint the expression query to `baselineExpression(page:{index:0,size:250})
  { rows {...} }` and update rows_path to ["baselineExpression","rows"].
- Output columns change accordingly (per-biosample summary stats: median/min/
  q1/q3/max/unit + tissueBiosample/celltypeBiosample ids + datasource/datatype),
  because the upstream data model changed and the old shape no longer exists.
- Remove the two now-invalid live exact-match fixtures and replace them with
  network-free mocked tests; update docs (example, resource table, updates.md).

Verified live: http_json with the new query returns 1409 rows in ~0.6s and the
parsing pipeline yields the documented columns.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test(opentargets): loosen live-data assertions to structural/invariant (data drifts across releases)

OpenTargets is a live database re-released regularly; several opentargets tests
pinned exact current values (disease ids/scores, result hashes, interaction
partner ids, genotypes) that legitimately change every release, so they failed
on unrelated PRs even though gget returns correct current data.

Replace the exact-value/hash assertions for test_opentargets, _diseases,
_depmap, _depmap_filter, _interactions, _interactions_no_limit and
_pharmacogenetics with structural/invariant assertions (expected columns
present, numeric dtypes, value-format patterns — ontology-curie disease/tissue
ids, ENSG interaction partners, ACH DepMap ids, score in [0,1], nucleotide
genotypes — and the depmap filter invariant). The fixture entries are marked
`code_defined`; the structural methods live in tests/test_opentargets.py.

These stay meaningful (they break on wrong columns, malformed ids, non-numeric
scores, broken filtering, or empty-where-guaranteed) without pinning drifting
data. Verified live against current OpenTargets data.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test: assert live-data contracts for CI repair

* test: retry ELM live setup downloads

* test: keep OpenTargets expression semantics out of CI repair

* test(opentargets): add semantic anchors + score tolerance to live-data tests (#249)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* test(opentargets): rewrite live-data tests as explicit IL13 assertions (#249)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* test(opentargets): read gene from fixture + guard to IL13; drop duplicate test_opentargets (#249)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* test(archs4): rewrite live tissue tests as concrete fixture-driven checks (#249)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(archs4): deterministic tissue sort via id tiebreaker; restore exact-snapshot tests (#249)

Sort tissue rows by [median desc, id asc] so output is reproducible when medians tie
(ARCHS4 returns tied rows in varying order). Revert the live tissue tests to exact
assert_equal snapshots (re-sorted to the deterministic order); keep the network-free
color regression tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Drop opentargets fixes from this PR (now covered by #256)

Strip back the opentargets-related changes so this PR is focused on
the archs4 + ELM CI-stability fixes only. The opentargets work
(synonyms HTTP 400 fix, fixture refresh, expression skip) is being
handled in a separate PR (#256), per maintainer preference for
one-module-per-PR review.

Reverted to origin/dev:
- gget/gget_opentargets.py
- tests/test_opentargets.py
- tests/fixtures/test_opentargets.json

Trimmed updates.md:
- Removed the opentargets bullet (lives in #256)
- Added an archs4 bullet explaining the color-column + deterministic-
  sort fix (user-visible behavior change, was missing here)

Remaining scope:
- gget_archs4.py: graceful handling of missing color column,
  deterministic median-then-id sort
- tests/test_archs4.py: TestArchs4MissingColor regression test
- tests/fixtures/test_archs4.json: refreshed for the deterministic sort
- tests/test_elm.py: retry ELM setup on transient download failure

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(archs4): drop the redundant with-color companion test

test_tissue_with_color_still_dropped tested the "happy path" that both
the old and the new code already handle the same way (column present →
column dropped from output). It can't catch any plausible regression
of the actual fix (which is the errors="ignore" kwarg, exercised by
the sibling test_tissue_missing_color_does_not_crash).

Removing it tightens the test suite without weakening the regression
guard around the actual bug. _CSV_WITH_COLOR class attribute removed
along with it (no other references).

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Laura Luebbert <laura.lbt60@gmail.com>
@lauraluebbert lauraluebbert merged commit 82a47bc into dev Jun 26, 2026
1 of 2 checks passed
@lauraluebbert lauraluebbert deleted the fix/opentargets-api-drift branch June 26, 2026 18:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants