scikit-opls

Orthogonal Projections to Latent Structures (OPLS / OPLS-DA) with a scikit-learn interface.

OPLS (Trygg & Wold, 2002) splits the variation in X into a predictive part correlated with the response and orthogonal parts that are not. scikit-opls removes the orthogonal variation with an OSC-style orthogonal filter and then fits sklearn.cross_decomposition.PLSRegression on the cleaned X as the predictive engine. With n_orthogonal=0 the model reduces exactly to PLSRegression.

Install

uv sync

Usage

OPLS regression

import numpy as np
from scikit_opls import OPLS

rng = np.random.default_rng(0)
X = rng.normal(size=(100, 50))
y = X[:, 0] * 2.0 + rng.normal(scale=0.1, size=100)

model = OPLS(n_components=1, n_orthogonal=2, scale="standard").fit(X, y)

model.predict(X)              # predictions
model.transform(X)            # predictive scores
model.transform_orthogonal(X) # orthogonal scores
model.filter_transform(X)     # preprocessed, orthogonal-filtered X fed to the engine
model.r2x_, model.r2y_        # fit summaries
model.vip_                    # variable importance (predictive), lazy property

The whole fitted pipeline (scaling → orthogonal filter → predictive PLS) is linear, so it collapses to coefficients on the raw input space:

y_hat = (X @ model.coef_raw_.T + model.intercept_raw_).ravel()  # == model.predict(X)

Let cross-validated Q2 choose the number of orthogonal components with scikit-learn's GridSearchCV — no bespoke estimator needed (scoring=None gives out-of-fold R2, which equals Q2 for OPLS):

from sklearn.model_selection import GridSearchCV
from scikit_opls import OPLS

search = GridSearchCV(
    OPLS(n_components=1), {"n_orthogonal": list(range(10))}, cv=7
).fit(X, y)
search.best_params_["n_orthogonal"]       # chosen count
search.best_estimator_                    # final OPLS refit on all data
search.cv_results_["mean_test_score"]     # out-of-fold R2/Q2 path

For OPLS-DA, wrap OPLSDA() the same way; an int cv becomes stratified automatically and scoring="roc_auc" is usually preferable.

To bias toward fewer orthogonal components — prefer the smallest count whose mean score is within a tolerance of the best — pass a refit callable:

import numpy as np

def parsimonious_refit(cv_results, tol=0.01):
    scores = np.asarray(cv_results["mean_test_score"], dtype=float)
    counts = np.asarray(cv_results["param_n_orthogonal"], dtype=int)
    within = np.flatnonzero(scores >= np.nanmax(scores) - tol)
    return int(within[np.argmin(counts[within])])

GridSearchCV(
    OPLS(n_components=1), {"n_orthogonal": list(range(10))},
    cv=7, refit=parsimonious_refit,
).fit(X, y)

OPLS-DA (binary classification)

from scikit_opls import OPLSDA

y = np.where(X[:, 0] > 0, "case", "ctrl")
clf = OPLSDA(n_components=1, n_orthogonal=2).fit(X, y)

clf.predict(X)            # class labels
clf.decision_function(X)  # raw signed OPLS regression output
clf.opls_.transform(X)    # predictive scores of the underlying OPLS model

# Probabilities: wrap in a cross-fitted calibrator when each class has enough
# samples for the chosen calibration CV split.
from sklearn.calibration import CalibratedClassifierCV
CalibratedClassifierCV(clf, cv=5).fit(X, y).predict_proba(X)

Diagnostics

Plotting needs the optional plot extra (pip install scikit-opls[plot]); it follows scikit-learn's Display convention.

from scikit_opls.plotting import OPLSScoresDisplay, SPlotDisplay
from scikit_opls.validation import permutation_test

# Draw score plot (t_pred vs t_ortho). Supports component selection for multi-component PLS
OPLSScoresDisplay.from_estimator(
    model, X, y, predictive_component=0, orthogonal_component=0
)

# Draw S-plot (covariance vs correlation) for a specific predictive component
SPlotDisplay.from_estimator(model, X, component=0)

# Permutation significance testing
permutation_test(OPLS(n_orthogonal=2), X, y)

Note

Pipeline support in plotting: Diagnostic plotting displays support OPLS, OPLSDA, pipelines ending in one, and fitted search meta-estimators exposing best_estimator_ around either shape. When passing a pipeline, pass raw X as expected by the pipeline. When passing the final OPLS step directly, pass the already transformed matrix. For pipeline S-plots, points are in the transformed feature space received by the final OPLS step.

Example datasets

Two small scripts under examples/ show usage with CSV data hosted as GitHub release assets. The examples read these URLs directly with pandas.read_csv, so the datasets do not need to be stored in the local checkout:

https://github.com/HauserGroup/scikit-opls/releases/download/data/colorectal_cancer_nmr.csv
https://github.com/HauserGroup/scikit-opls/releases/download/data/palmerpenguins.csv

uv run python examples/colorectal_cancer_nmr_oplsda.py
uv run python examples/palmerpenguins_opls_regression.py

Parameters

Parameter	Meaning
`n_components`	Predictive components (classic OPLS uses 1).
`n_orthogonal`	Orthogonal components to remove (`int`; tune via `GridSearchCV`).
`scale`	`"none"`, `"center"`, `"pareto"`, `"standard"`.

Wrap OPLS in GridSearchCV over n_orthogonal for cross-validated selection (see the snippet above); cv, scoring and n_jobs come from GridSearchCV.

Development

uv sync --dev              # install the project and dev tools
uv run pre-commit install  # enable the git hooks (run once)

uv run pytest --cov        # tests + coverage (incl. sklearn check_estimator)
uv run ruff check          # lint
uv run ruff format --check # format check
uv run pyright src         # type-check
uv run pre-commit run --all-files  # run every hook

See CONTRIBUTING.md for the full contributor workflow.

References

This project is inspired by the R ropls package and uses the orthogonal-scores PLS algorithm of pls::oscorespls.fit as its engine.

Trygg, J. & Wold, S. (2002). Orthogonal projections to latent structures (O-PLS). Journal of Chemometrics, 16(3), 119–128.
Galindo-Prieto, B., Eriksson, L. & Trygg, J. (2014). Variable influence on projection (VIP) for OPLS models. Journal of Chemometrics, 28(8), 623–632.

Name		Name	Last commit message	Last commit date
Latest commit History 119 Commits
.devcontainer		.devcontainer
.github		.github
docs		docs
examples		examples
src/scikit_opls		src/scikit_opls
tests		tests
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
CHANGES.md		CHANGES.md
CITATION.cff		CITATION.cff
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
RELEASING.md		RELEASING.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock
zensical.toml		zensical.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

scikit-opls

Install

Usage

OPLS regression

OPLS-DA (binary classification)

Diagnostics

Example datasets

Parameters

Development

References

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

scikit-opls

Install

Usage

OPLS regression

OPLS-DA (binary classification)

Diagnostics

Example datasets

Parameters

Development

References

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages