Code release for the ICML 2026 Spotlight Position Paper
Position: Temporal Measurement Interval Determines Computational and Model Complexity in Single-Cell Perturbation Analysis
OpenReview · ICML Poster · Repository
Single-cell perturbation analysis aims to predict how cellular states change after interventions such as drug treatments or genetic edits. A central difficulty is that pre- and post-perturbation measurements are typically observed as unpaired populations, so accurate prediction requires inferring a latent coupling and learning a transition map.
In our position paper, we argue that the measurement time gap is the key experimental knob controlling both the computational tractability of coupling and the effective model complexity.
This repository contains the implementation for the paper's synthetic experiments, biological benchmarks, baseline comparisons, and plotting scripts.
After a perturbation, cellular states evolve over time.
Let
We assume there exists an unknown map
The goal is to learn a map
By definition,
To formulate this compactly, we introduce the matrix
Cell perturbation analysis can thus be viewed as supervised learning with unknown correspondences between predictor and response variables.
Our results rely on biologically motivated insights and assumptions. In particular, we assume (i) temporal smoothness of the perturbation dynamics, and (ii) a restricted isometry property (RIP) for the pre-perturbation cell states.
In our experiments, we explicitly demonstrate regimes where these assumptions hold and settings where they break down.
Drug-induced perturbations act over time: a compound must engage its targets, trigger signaling cascades, and alter downstream transcriptional or translational programs before inducing measurable cellular responses.
We assume that the transition map
for all
The second assumption governing our theory and experiments concerns the geometric structure of the pre-perturbation data. Specifically, we rely on a restricted isometry property, originally introduced and extensively studied in compressed sensing.
Let
holds for all
Our analysis assumes that the data matrix
We identify a critical time gap
Define the measurement time gap
Then the problem of recovering the permutation
| Regime | Condition | Computational consequence |
|---|---|---|
| Trackable regime | The permutation |
|
| Untrackable regime | Recovering |
The theorem reveals that the trackable time gap is jointly governed by the geometric properties of the problem, where smoother transition dynamics (smaller Lipschitz constants) and robust initial RIP conditions directly extend the trackable time gap.
The fundamental difficulty in solving the linear-transition problem is that the data are unpaired.
We study the problem of estimating a linear transition map of the form
where
Concretely, there exists an unknown permutation
Let
Estimating the transition therefore amounts to recovering
The objective is not jointly convex in
Input: unpaired control samples X^(0), treated samples X^(t), iterations K
Initialize W^(t,0) = I_d
for k = 1, ..., K:
1. Optimizing Π:
solve the linear assignment / optimal transport problem
2. Optimizing W^(t):
solve the least-squares linear map update
Output: W^(t,K) and predictor x_new^(t) = W^(t,K) x_new^(0)
We compare LAOT with representative nonlinear baselines discussed in the paper.
| Method | Role in this repository | Implementation / reference code |
|---|---|---|
| LAOT | Minimal linear alternating optimal transport solver proposed in the paper. | Implemented in this repository from Algorithm 1 in the paper. |
| CellOT | Nonlinear neural optimal transport baseline. | Adapted from the official CellOT repository: https://github.com/bunnech/cellot |
| Compact CellOT | Reduced-capacity CellOT variant used to isolate the role of model capacity and computational complexity. | Implemented in this repository by modifying the CellOT architecture. |
| scGen | VAE-based generative baseline. | Official scGen repository: https://github.com/theislab/scgen |
CellOT parameterizes the transport map with an input-convex neural network and jointly learns the OT coupling and the mapping. Its default configuration has four hidden layers with 64 units each, yielding a parameter count that is orders of magnitude larger than LAOT.
scGen is a VAE-based generative model that learns a latent representation and predicts perturbation responses via latent-space arithmetic rather than explicit correspondence recovery, again relying on a comparatively large number of learnable parameters.
We also include Compact CellOT, a reduced-capacity variant of CellOT with three hidden layers and 32 units per layer, to isolate the role of model capacity and computational complexity.
Figure shows a sharp two-regime behavior: for fine time gap
This degradation is not limited to the linear solver: state-of-the-art nonlinear models exhibit the same collapse, indicating that model expressivity alone does not remove the intrinsic trackability barrier.
CellOT exhibits highly non-monotone training dynamics, whereas capacity reduction stabilizes training but can introduce long plateaus. LAOT converges rapidly with near-coincident train/test curves.
The repository is organized around the main experimental blocks of the paper. Each folder contains notebooks for one benchmark or one analysis regime.
| Paper component | Repository folder | Notebooks / role |
|---|---|---|
| Synthetic phase transition and untrackable-regime tests | Synthetic_data_experiments/ |
Synthetic_data_permutation.ipynb, LAOT_Synthetic_data.ipynb, Compact_CellOT_Synthetic_data.ipynb, scGen_Synthetic_data.ipynb
|
| AP-1 within-context protein perturbation | AP-1_within_context_protein_perturbation/ |
LAOT, CellOT, Compact CellOT, and scGen notebooks for within-cell-line DMSO |
| AP-1 replicate generalization | AP-1_within_context_protein_perturbation_replicate/ |
LAOT, CellOT, Compact CellOT, and scGen notebooks for cross-replicate robustness |
| AP-1 cross-context / OOD generalization | AP-1_cross_context_protein_perturbation_OOD/ |
LAOT, CellOT, Compact CellOT, and scGen notebooks for held-out-cell-line transfer |
| 4i within-context protein-imaging perturbation | 4i_within_context_protein_perturbation/ |
LAOT, CellOT, Compact CellOT, and scGen notebooks for 8h drug-response prediction |
| SciPlex3 within-context scRNA-seq perturbation | SciPlex3_within_context_scRNA-seq_perturbation/ |
LAOT, CellOT, Compact CellOT, and scGen notebooks for 24h transcriptomic drug-response prediction |
| 2i biological time-course sweep | 2i_time_course/ |
LAOT_2i_time_course.ipynb for 12h--168h horizon analysis |
| Paper plots | Plots/ |
PDF plots used for training dynamics, 2i horizon sweep, and permutation recovery |
| README figures | assets/ |
PNG assets used in this README |
A minimal environment can be created with Conda:
conda create -n trackability python=3.10 -y
conda activate trackability
pip install numpy scipy pandas scikit-learn matplotlib seaborn tqdm jupyter ipykernel
pip install torch scanpy anndata potSome notebooks import CellOT-specific modules. Install CellOT following the official CellOT instructions, or add your local CellOT clone to PYTHONPATH:
# Example only; adjust to your local setup.
export PYTHONPATH=/path/to/cellot:$PYTHONPATHFor the 2i/WOT time-course experiments, install the WOT dependency if needed:
pip install wotThis repository does not redistribute the biological datasets. Please download each dataset from the original study or benchmark website, follow the corresponding license/terms of use, and cite the original data source in any derivative work. The synthetic experiments are generated directly by the notebooks and do not require an external dataset.
Before running a notebook, update the dataset path variables in the first configuration cells. The notebooks were written to reproduce the paper experiments, so they may contain local paths that should be changed to match your machine.
| Dataset used in the paper | Original source to use | Notes for this repository |
|---|---|---|
| AP-1 protein perturbations | Comandante-Lou, Baumann, and Fallahi-Sichani, Cell Reports 2022: study page / DOI, PMC version, and the authors' AP1-networkPlasticityMelanoma repository. In the paper, this benchmark is obtained from the original study page and Supplementary Data S4. | Used for DMSO data/ap1/ after downloading. |
| 4i multiplexed protein-imaging perturbations | Gut et al., Science 2018: paper / DOI. For the benchmark format used by CellOT, use Bunne et al., Nature Methods 2023: CellOT repository and ETH Research Collection processed datasets. | Used for drug-response prediction after 8h exposure. Place locally under data/4i/. |
| SciPlex3 scRNA-seq perturbations | Srivatsan et al., Science 2020: paper / DOI, NCBI GEO GSE139944, and the authors' sci-plex code repository. The CellOT-preprocessed version can also be obtained from the CellOT processed datasets above. | Used for 24h transcriptomic drug-response prediction. Place locally under data/sciplex3/. |
| 2i reprogramming time-course | Schiebinger et al., Cell 2019: paper / DOI and the Waddington-OT tutorial/data page, which links the tutorial input data and transport maps. | Used for the 12h--168h time-gap sweep. Place locally under data/reprogramming_2i/. |
Open the repository with Jupyter and run the notebook corresponding to the desired experiment:
jupyter labYou can also execute a notebook from the command line. For example:
jupyter nbconvert --to notebook --execute \
Synthetic_data_experiments/Synthetic_data_permutation.ipynb \
--output Synthetic_data_permutation.executed.ipynb| Benchmark | Modality | Setting | Main purpose |
|---|---|---|---|
| Synthetic near-identity data | simulated vectors | varying |
phase transition in latent permutation recovery |
| AP-1 | targeted protein panel | DMSO |
within-cell-line and replicate perturbation prediction |
| 4i | multiplexed protein imaging | drug exposure, 8h | within-context drug perturbation prediction |
| SciPlex3 | scRNA-seq | 24h drug response | transcriptome-scale perturbation prediction |
| 2i time course | scRNA-seq trajectory | 12h to 168h horizons | biological time-gap sweep |
| Cross-cell-line AP-1 | targeted protein panel | held-out cell line | out-of-context generalization stress test |
Lower MMD$^2$ is better.
| Dataset | Condition | CellOT | scGen | Compact CellOT | LAOT |
|---|---|---|---|---|---|
| AP-1 | COLO858 | 0.0995 | 0.0172 | 0.0019 | 0.0006 |
| AP-1 | WM902B | 0.0443 | 0.1423 | 0.0015 | 0.0007 |
| AP-1 | SKMEL19 | 0.1122 | 0.0323 | 0.0016 | 0.0011 |
| 4i | Imatinib | 0.0700 | 0.0330 | 0.0079 | 0.0063 |
| 4i | Trametinib | 0.0463 | 0.0098 | 0.0076 | 0.0080 |
| 4i | Dexamethasone | 0.0685 | 0.0160 | 0.0075 | 0.0071 |
| SciPlex3 drug | CellOT | scGen | Compact CellOT | LAOT |
|---|---|---|---|---|
| Trametinib | 0.0078 | 0.0059 | 0.0048 | 0.0040 |
| Givinostat | 0.0117 | 0.0083 | 0.0079 | 0.0033 |
| Abexinostat | 0.0129 | 0.0091 | 0.0074 | 0.0038 |
We evaluate set-level prediction quality using the squared Maximum Mean Discrepancy (MMD$^2$), a kernel two-sample distance that is zero if and only if the underlying distributions match (for characteristic kernels).
Given samples $\mathcal{X}={x_i}{i=1}^{n}\subset\mathbb{R}^d$ and $\mathcal{Y}={y_j}{j=1}^{m}\subset\mathbb{R}^d$ and a positive definite kernel
In all experiments, we use the Gaussian RBF kernel
Unless stated otherwise, we choose
- Update all local data paths before running the biological notebooks.
- Use the same train/test split protocol as the corresponding notebook.
- For MMD$^2$, select the RBF bandwidth from the training split only to avoid test leakage.
- LAOT is effectively deterministic after the split is fixed, because it uses a linear assignment step and a least-squares map update.
- Neural baselines can have nontrivial variance across random seeds. Report mean
$\pm$ standard deviation over repeated runs. - The untrackable regime should be interpreted as a computational/statistical barrier, not merely as a failure of one solver.
Generative AI tools were used to assist with parts of the implementation, including evaluation utilities, plotting scripts, documentation, and reproducibility instructions. All scientific claims, experimental results, final manuscript text, and released code were reviewed, edited, tested, and validated by the authors.
single-cell · perturbation-prediction · optimal-transport · linear-assignment · cellot · scgen · mmd · icml-2026 · trackability · computational-complexity
If this repository is useful for your work, please cite:
@inproceedings{jafari2026temporal,
title = {Position: Temporal Measurement Interval Determines Computational and Model Complexity in Single-Cell Perturbation Analysis},
author = {Jafari, Alireza and Shakeri, Heman and Daneshmand, Hadi},
booktitle = {Proceedings of the 43rd International Conference on Machine Learning},
series = {Proceedings of Machine Learning Research},
volume = {306},
year = {2026},
url = {https://openreview.net/forum?id=lECKpTE1lW}
}This repository is released under the MIT License. See LICENSE for details.



