Version 0.1.0 — a complete, self-contained PyTorch implementation of FAEclust. This is the PyTorch port of the original FAEclust project; the framework and public API are the same.
FAEclust is the first autoencoder architecture designed specifically for clustering multi-dimensional functional data. In FAEclust, we employ univariate functions as weights instead of integral kernels:
- Unlike traditional regression models, where integral kernels are used for functional predictors to capture full dependence, our encoder is designed to learn latent representations. Consequently, the univariate functions serve as a coordinate system in the Hilbert space.
- Using univariate functions not only reduces computational cost but also improves interpretability. In particular, the shape of the learned functional weights reveals which regions of the input functional data contribute most to the construction of the embedded representations.
Both the encoder and decoder are universal approximators. We introduce a shape-informed clustering objective that is robust to phase variation in functional data, and we develop a path-following homotopy algorithm with complexity O(n log(n)) to obtain the optimal clustering of the latent representations. If you used this package in your research, please cite our paper:
@inproceedings{SVS2025NeurIPS,
author = {Samuel Singh and Shirley Coyle and Mimi Zhang},
booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
editor = { },
pages = { },
publisher = {Curran Associates, Inc.},
title = {Shape-Informed Clustering of Multi-Dimensional Functional Data via Deep Functional Autoencoders},
volume = { },
year = {2025}
}- Network Update: In the backward phase, we update the network parameters by minimizing a unified objective function (Loss) that incorporates both the network training objective (Penalized Reconstruction Loss) and the clustering regularization (Clustering Loss).
- Cluster Update: During the forward phase, we update the learned latent representations, which necessitates a concurrent update of the clustering results.
The modular pipeline for FAEclust has the following structure:
- Similarity: Compute pairwise (elastic) distances with
TimeSeriesDistance(), identify the optimal number of nearest neighbors (min the paper) viaNearestNeighborsOpt(), and finally compute pairwise similarity measures. - Smoothing: Convert raw curves into basis functions and expansion coefficients via
smoothing_features(). - FAE network: Configure and train the functional network via
FunctionalAutoencoder(). - Convex Clustering: Cluster analysis of the latent representations, where the clustering objective function is a convex function (
ConvexClustering()).
A one-call convenience,
run_experiment(), runs this whole pipeline on a built-in synthetic dataset, and an optionaltuningutility performs Optuna hyperparameter search. See Convenience Utilities below.
Smoothing is a utility class that implements the smoothing step. It supports B-spline, Fourier series, and Wavelet-based smoothing. This class is used internally by smoothing_features().
Smoothing(
dis_p, # number of grid points for evaluating functions
fit, # basis function type: 'bspline', 'fourier', or a Wavelet name
n, # for Fourier: number of harmonics (2n+1 basis functions)
smoothing_str, # initial smoothing parameter for B-splines if _terms_ is not given
terms, # number of basis terms/knots to use (if None, auto-optimize)
wavelet_level, # Wavelet decomposition level (for Wavelet fits) if _terms_ is not given
data = None # input data of shape (n_samples, n_timesteps)
) -
dis_p(int): Number of grid points for evaluating functions. Default:300 -
fit(str): The type of basis expansion to use for smoothing. Options are the same as insmoothing_features:'bspline','fourier', or a Wavelet name (e.g.,'db4'). Default:'bspline' -
n(int): Applicable iffit='fourier'. It specifies the number of Fourier harmonics to include. The total number of Fourier basis functions will be$2n + 1$ (including the constant term,$n$ cosine terms, and$n$ sine terms). Ifn=None, the smoothing is adaptive and Generalized Cross-Validation (GCV) is used to find the optimal number of harmonics. Default:3 -
smoothing_str(float): Parameter for B-spline fitting. Whenterms(number of knots/basis functions) is not specified for B-splines, it is used as the initial scale of the penalty$\lambda$ in the GCV search controlling the trade-off between smoothness and fidelity: higher values yield smoother curves (more regularization). Default:0.3 -
terms(int or None): Applicable iffit='bspline'. The number of basis functions or knots to use. Ifterms=None, the smoothing is adaptive and the class will attempt GCV to find the optimum fit and the corresponding terms. Default:None -
wavelet_level(int): The level of decomposition for Wavelet smoothing. Higher levels capture coarser structures. Ifterms=None, andfitis a Wavelet, the code attempts to find an optimal level via GCV. Default:4 -
data(np.ndarray, shape (n_samples, n_timesteps)): The raw sample paths to smooth (one feature dimension at a time).
-
coeffs(np.ndarray): Array of basis expansion coefficients. -
fn_s(list of callables): List of the smoothed functions evaluated/defined on the time grid. -
smoothing_basis(list of callables): The list of basis functions used for smoothing the raw sample paths.
Tip: in practice you call
smoothing_features(data, m=..., dis_p=..., fit=..., standardize=True), which appliesSmoothingacross every feature dimension and returns(coeffs, curves, basis_smoothing)ready for the autoencoder. The optionalstandardize=Trueapplies per-component functional standardization (manuscript Appendix A).
TimeSeriesDistance computes the pairwise distance matrix for the raw sample paths.
TimeSeriesDistance(
X, # raw sample paths of shape (n_samples, n_features, n_timesteps)
metric, # distance metric ('elastic', 'elastic-fast', 'fastdtw', 'ultrafast')
n_jobs, # number of parallel jobs for computation
band_radius, # Sakoe-Chiba band radius (None = auto)
elastic_coarse_factor # coarse downsampling ratio for 'elastic-fast'
) -
X(np.ndarray, shape=(n_samples, n_features, n_timesteps)): An array containing the raw multi-dimensional functional data. The class internally standardizes each feature dimension before distance computation to ensure comparability. -
metric(str): Metric to use for distance computation. Options include:-
'elastic'(default): the true Fisher–Rao elastic distance (SRVF + dynamic programming), JIT-compiled and optionally banded. Phase-invariant. -
'elastic-fast': multi-resolution elastic — coarse DP + banded refinement; quasi-O(N) with near-DP accuracy. -
'fastdtw': band-constrained dynamic time warping, O(N). -
'ultrafast': multi-resolution recursive FastDTW, ~O(N).
-
-
n_jobs(int): Number of parallel jobs for computation. Default:-1 -
band_radius(int or None): Sakoe–Chiba band radius. WhenNone, it auto-scales tomax(12, n_timesteps // 4). Pass-1to disable banding for the'elastic'DP. Default:None -
elastic_coarse_factor(int): Downsampling ratio for the coarse stage of'elastic-fast'. Default:2
-
compute_distances(self): Compute the pairwise distance matrix.- Returns
dist_matrix(np.ndarray, shape (n_samples, n_samples)) : The pairwise distance matrix.
- Returns
-
plot_extremes(self): Visualise the pair of most similar and the most distinct samples.- Returns
None.
- Returns
NearestNeighborsOpt is a utility class for determining the optimal number of nearest neighbors (m) used in the pairwise similarity measure of the clustering objective function. Given a pairwise distance matrix, it examines how the structure of the k-nearest neighbor graph evolves as k varies, using two complementary criteria: graph connectivity and the distance "knee".
NearestNeighborsOpt(
dist_matrix # pairwise distance matrix of shape=(n_samples, n_samples)
) dist_matrix(np.ndarray, shape (n_samples, n_samples)) : Distance matrix returned byTimeSeriesDistance().
-
estimate_optimal_m(method='connectivity', max_m=None): Select the optimal number of nearest neighbors.-
method(str) : Method to use for neighbourhood optimization. Options include:-
'connectivity'(default) : Find the smallest$m$ at which the k-NN graph is fully connected (a single component). This is the robust default used throughout the examples. -
'avg_distance': Find the$m$ at which the average m-th neighbor distance exhibits the largest relative jump (searched in the informative early window).
-
-
max_m(int) : Maximum number of neighbors to consider. Defaultmax_m=n_samples-1 -
Returns
-
m(int) : The optimum number of nearest neighbors.
-
-
-
get_nearest_neighbors(opt_m=m): Construct the adjacency list (neighbor index list) for each data point given the neighborhood valuem.-
opt_m(int): Estimated usingestimate_optimal_m(). -
Returns
-
neighbors_dict(dict) : A dictionary mapping each data point to itsmnearest neighbors.
-
-
-
compute_similarity(neighbors_dict, method='neighbors'): Compute the pairwise similarities from the distance matrix and the k-NN graph, following the manuscript (Section 4.1):$s(y_i, y_j) = \mathbb{1}[, y_j \in N_m(y_i) \text{ or } y_i \in N_m(y_j) ,]\cdot \exp(-d(y_i, y_j))$ .-
neighbors_dict(dict) : Mappingi -> indices of i's m nearest neighbours. -
Returns
-
sim_matrix(np.ndarray, shape (n_samples, n_samples)): The pairwise similarity matrix (n×n).
-
-
FAEclust is a deep learning framework for clustering multivariate functional data. It integrates three key components: (1) functional data smoothing via basis function expansion (e.g. B-splines, Fourier series, Wavelet family, ...) to provide a smooth representation of each sample path, (2) a functional autoencoder consisting of an encoder for learning complex relationships among the features and a decoder for flexibly reconstructing intricate functional patterns, and (3) a shape-informed convex clustering algorithm that automatically determines the optimal number of clusters. The FunctionalAutoencoder class is designed to handle these steps end-to-end, while providing various hyperparameters to tailor the model to different datasets.
FunctionalAutoencoder(
p, # number of component random functions (dimensions)
layers, # list specifying encoder/decoder layer widths
l, # number of basis functions for encoder functional weights
m, # number of basis functions for smoothing the sample paths
basis_smoothing, # list of basis functions used for smoothing (e.g. Fourier basis)
basis_input, # list of basis functions for encoder functional weights (e.g. B-spline basis)
lambda_e, # penalty parameter for the orthogonality regularization on encoder functional weights
lambda_d, # penalty parameter for the roughness regularization on encoder functional weights and biases
lambda_c, # penalty parameter for the clustering loss in the integrated objective function
t, # time grid (array of length T) over which the smoothed functions are evaluated
sim_matrix, # pairwise similarity matrix (n×n) in the clustering objective function
tau = 1.0, # dropout keep-probability for the MLP layers (1.0 = off)
use_bn = True, # batch-normalisation in the MLP layers
manifold = None, # decoder readout: None | 'euclidean' | 'sphere' | 'poincare'
seed = 0 # RNG seed for reproducible initialisation & shuffling
)-
p(int) – Number of component random functions (dimensions). For example,p=2for two-dimensional functional data. -
layers(list of int) – Architecture specification for the autoencoder’s layers. This list should include the width of each encoder layer, the latent dimension, and the width of each decoder layer. The format is[q1, q2, ..., s, ..., Q2, Q1, Z1, Z2], where:-
q1is the number of nodes in the encoder's first hidden layer. -
q2, ..., sare the sizes of the successive hidden layers in the encoder, withsbeing the final latent dimension (the size of the bottleneck vector). -
..., Q2are the sizes of the dense layers in the decoder (mirroring the encoder’s dense layers). -
Q1, Z1, Z2are the sizes of the last three layers of the decoder that output functions.
-
-
l(int) – Functional weights and biases are represented as linear combinations of basis functions.lis the number of basis functions for the functional weights in the encoder. -
m(int) – Number of basis functions used for converting the raw sample paths into smooth functions. -
basis_smoothing(list of callables) – A list ofmbasis functions (evaluated on the time gridt) for smoothing the raw sample paths. The provided utilitysmoothing_features()can generate this list along with the expansion coefficients. -
basis_input(list of callables) – A list oflbasis functions (evaluated on the time gridt) for representing the functional weights in the encoder. The utilitybspline_basis(l)returns such a list. -
lambda_e(float) – Penalty parameter for the orthogonality regularization on encoder functional weights, encouraging within-component functional weights to be orthogonal. -
lambda_d(float) – Penalty parameter for the roughness regularization on encoder functional weights and biases, controlling their smoothness. -
lambda_c(float) – Penalty parameter for the clustering loss in the integrated objective function. A higherlambda_cplaces more emphasis on forming well-separated clusters in the latent space (at the potential cost of reconstruction accuracy). -
t(array-like of shape (T,)) – Time grid (array of length T) over which the input functions, functional weights, functional biases and output functions are evaluated/defined. -
sim_matrix(numpy.ndarray of shape (n_samples, n_samples)) – The pairwise similarity matrix among the$N$ sample paths, a term in the clustering objective function. The functionNearestNeighborsOpt()will construct it from the optimalm-nearest-neighbor graph. -
tau(float in (0, 1]) – Dropout keep-probability for the MLP layers.tau=1.0disables dropout. Default:1.0 -
use_bn(bool) – Whether to apply batch normalisation in the MLP blocks. Default:True -
manifold(str or None) – Optional differentiable decoder readout$\rho$ mapping outputs onto a manifold:None/'euclidean'(no readout),'sphere', or'poincare'(Poincaré disk). Default:None -
seed(int) – RNG seed controlling weight initialisation and the deterministic minibatch shuffle, so a run is fully reproducible fromseedalone. Default:0
-
model_summary(self): Print a summary of the model and its trainable parameter count.-
Returns
None
-
Returns
-
train_model(self, X_train, epochs, learning_rate, batch_size, neighbors_dict, sim_matrix, beta=0.9, pretrain_epochs=0, criterion='silhouette', verbose=False, device=None, cluster_every=5): Train with mini-batch gradient descent with momentum. The firstpretrain_epochsminimise the penalized reconstruction loss only; the remaining epochs add the clustering term.-
X_train(np.ndarray) – The smoothing coefficients returned bysmoothing_features(). -
epochs(int) – Number of training epochs (full passes over the dataset). Default:100 -
learning_rate(float) – Learning rate of the training algorithm. Default:1e-3 -
batch_size(int) – The mini-batch size. Default:32 -
neighbors_dict(dict) – A dictionary mapping each data point to itsmnearest neighbors, fromget_nearest_neighbors(). -
sim_matrix(numpy.ndarray of shape (n_samples, n_samples)) – The pairwise similarity matrix among the$N$ sample paths. -
beta(float) – Momentum coefficient. Default:0.9 -
pretrain_epochs(int) – Number of reconstruction-only warm-up epochs before the clustering term is switched on. Default:0 -
criterion(str) – Internal validation index used to pick the number of clusters:'silhouette'or'davies_bouldin'. Default:'silhouette' -
cluster_every(int) – Re-run the convex-clustering homotopy and refresh cached cluster centroids every this many epochs. Default:5 -
device(str or None) – Compute device;Noneauto-selects CUDA → MPS → CPU. Default:None
-
-
predict(self, coeffs, batch_size=None, criterion='silhouette'): Return the embedded data and the cluster labels of the functional data.-
Returns
-
Z(np.ndarray, shape=(n_samples, s)) : Array of the embedded data. -
labels(np.ndarray, shape=(n_samples,)) : Functional data cluster labels.
-
-
Returns
ConvexClustering is the path-following homotopy algorithm that produces a hierarchy of clusters and determines the optimal number of clusters via an internal validation metric. The autoencoder calls it internally during training and prediction, but it can also be used directly on any embedding.
ConvexClustering(
X, # embedded data of shape (n_samples, s)
neighbors_dict,
sim_matrix, # the pairwise similarity matrix
n_jobs = -1,
verbose = False, # whether to print the merging process and the validation scores
criterion = 'silhouette' # internal validation index for selecting the number of clusters
)-
X(np.ndarray, shape=(n_samples, s)) – The embedded data in the latent space. -
neighbors_dict(dict) – Mappingi -> i's m nearest neighbours. -
sim_matrix(np.ndarray, shape (n_samples, n_samples)) – The pairwise similarity weights$s(y_i, y_j)$ . -
n_jobs(int) – Number of parallel jobs. Default:-1 -
verbose(bool, optional) – IfTrue, print the hierarchy of clusters and the corresponding validation scores. Default:False -
criterion(str) – Internal validation index:'silhouette'or'davies_bouldin'. Default:'silhouette'
fit(self): Perform clustering on the embedded data.- Returns
cluster_labels(np.ndarray, shape=(n_samples,)) : The (optimal) cluster labels.
- Returns
run_experiment(dataset='pendulum', ...)– One call that runs the entire pipeline (generate → distance → m-NN → smooth → train → cluster → score) on a built-in synthetic manifold dataset, returning a dict withami,ari,labels,Z, the trainedmodel, and timing. The available datasets are listed inMANIFOLD_DATASETS('hypersphere','hyperbolic','swiss_roll','lorenz','pendulum'). Passfast=Truefor a quick smoke test.from FAEclust import run_experiment res = run_experiment('pendulum', epochs=120, metric='fastdtw') print(res['ami'], res['ari'])
tuning.optimize_hyperparameters(...)– Optional Optuna Bayesian hyperparameter search (requirespip install optuna).
# Option A: conda
conda env create -f environment.yml
conda activate FAE
# Option B: pip (any Python >= 3.10 environment)
pip install -r requirements.txtRun the examples and tests from inside this folder (the FAEclust package is importable here):
# non-interactive scripts (fixed dataset)
python simulation.py # synthetic pendulum dataset
python test.py # real UCR "Plane" dataset (requires: pip install aeon)
# interactive, step-by-step notebooks (pick the dataset at the prompt)
jupyter notebook simulation.ipynb # choose one of the 5 synthetic manifold datasets
jupyter notebook test.ipynb # choose a UCR/UEA dataset (requires: pip install aeon)
pytest -q # unit + light end-to-end tests (requires: pip install pytest cvxpy)Below is a minimal end-to-end run of FAEclust on a synthetic manifold dataset (the pendulum dataset from the paper); it mirrors simulation.py. We compute the similarity matrix using the elastic distance (metric='elastic'), smooth the data with B-spline basis functions, configure a functional autoencoder with an encoder-decoder architecture (16 → 8 → 2 (latent) → 8 → 16 → 16 → 16), pre-train for 50 epochs and then train with the clustering term for the remaining epochs (150 total). The output labels from predict are the cluster assignments found by the path-following homotopy algorithm; the Adjusted Rand Index (ARI) and Adjusted Mutual Information (AMI) measure agreement with the true labels.
import numpy as np
from FAEclust import (smoothing_features, bspline_basis, rescale,
DatasetGenerator, TimeSeriesDistance,
NearestNeighborsOpt, FunctionalAutoencoder)
# 1. data
X, y = DatasetGenerator(200, 2, 100, 4).generate_pendulum()
X, y = rescale(X, y, "pendulum")
# 2. similarity
D = TimeSeriesDistance(X, metric="elastic").compute_distances()
opt = NearestNeighborsOpt(D)
m = opt.estimate_optimal_m(method="connectivity", max_m=X.shape[0] - 1)
neighbors_dict = opt.get_nearest_neighbors(opt_m=m)
sim_matrix = opt.compute_similarity(neighbors_dict)
# 3. smoothing
coeffs, _, basis_smoothing = smoothing_features(X, m=50, dis_p=300,
fit="bspline", standardize=True)
# 4. train
fae = FunctionalAutoencoder(2, [16, 8, 2, 8, 16, 16, 16], l=50, m=50,
basis_smoothing=basis_smoothing, basis_input=bspline_basis(50),
lambda_e=0.5, lambda_d=0.05, lambda_c=0.5,
t=np.linspace(0, 1, 300), sim_matrix=sim_matrix, manifold=None)
fae.train_model(coeffs, epochs=150, learning_rate=1e-3, batch_size=32,
neighbors_dict=neighbors_dict, sim_matrix=sim_matrix,
pretrain_epochs=50)
# 5. predict & evaluate
Z, labels = fae.predict(coeffs, batch_size=32)🔗 view the full notebook — simulation.ipynb is an interactive, step-by-step version: you pick one of the five synthetic manifold datasets in MANIFOLD_DATASETS at the prompt (default 'hypersphere'), and each stage is its own cell with a visualization — the pairwise-distance heatmap and the most-similar/most-dissimilar pair (plot_extremes), the m-NN similarity graph and the average m-th-distance curve, the B-spline smoothing fit, the pre-train→fine-tune training-loss curve, the latent-space t-SNE, and the convex-clustering homotopy result with AMI/ARI. Where the manifold is a sphere or Poincaré disk, the notebook also enables the differentiable manifold readout (manifold='sphere'/'poincare').
test.ipynb is the real-data counterpart (🔗 view the notebook): it loads a dataset from the UCR / UEA Time-Series Classification Archive via aeon (default 'Chinatown'; you can type any archive name at the prompt) and runs the same step-by-step pipeline with per-stage visualizations — pairwise distance (default metric='elastic'; also 'elastic-fast', 'fastdtw', 'ultrafast'), the connectivity-based m-NN similarity graph, B-spline smoothing with functional standardization, the pre-train→fine-tune schedule, and a t-SNE view of the predicted latent clusters. The non-interactive script test.py runs the same workflow on the "Plane" dataset using FastDTW.
The above examples and parameters serve as a guide, but users are encouraged to experiment with the basis size (l, m), network depth (layers), and loss weights (lambda_e, lambda_d, lambda_c) to best fit their specific datasets.

