Code, data-processing scripts, model configurations, and reproducibility materials for the paper:
Understanding Key Features of Time Series Foundation Models from Epidemic Forecasting
This repository provides a reusable benchmark for evaluating modern time series forecasting models on regional influenza forecasting tasks. The study focuses on U.S. Health and Human Services (HHS) region-level influenza-like illness (ILI) and influenza-associated hospitalization time series, with CDC-aligned 1–4-week-ahead forecasting horizons.
The repository includes implementations and experiment pipelines for classical statistical baselines, deep learning forecasters, numerical time series foundation models, LLM-style time series models, and the proposed mixture-of-experts framework, MultiFoundationCore.
Accurate short-term epidemic forecasting is important for vaccination planning, hospital staffing, and public-health preparedness. This project evaluates whether modern time series foundation models can improve regional influenza forecasting under realistic public-health constraints.
The main forecasting tasks are:
-
Influenza-like illness (ILI) forecasting
- Weekly ILI time series.
- 10 U.S. HHS regions.
- Approximately 20 years of regional surveillance history.
- Forecasting horizons: 1, 2, 3, and 4 weeks ahead.
-
Influenza-associated hospitalization forecasting
- Weekly region-level hospitalization time series.
- Approximately three influenza seasons.
- Used both as a forecasting target and as an auxiliary related-domain signal.
The paper compares temporal within-region forecasting and spatial across-region generalization. It also studies the role of pretraining, auxiliary signals, retraining frequency, and model family differences.
This repository supports experiments addressing the following questions:
- Do time series foundation models transfer effectively to operational influenza forecasting?
- How do numerical time series models compare with LLM-style time series models under CDC-aligned 1–4-week horizons?
- Does pretraining on related domains, such as influenza hospitalizations, improve multi-horizon ILI forecasting?
- How much does spatial distribution shift affect forecasting performance across unseen HHS regions?
- Can a mixture-of-experts design improve robustness by combining multiple pretrained forecasters?
- How does retraining frequency affect temporal adaptation and long-horizon accuracy?
The repository contains scripts and configurations for the following model families.
- ARIMA
- LSTM
- TinyLSTM
- TCN
- TFT
- TiDE
- Vanilla Transformer
- TimesNet
- PatchTST
- iTransformer
- Chronos-T5 variants
- Chronos-Bolt variants
- TimeLLM-GPT2
- MultiFoundationCore
MultiFoundationCore is a mixture-of-experts framework that combines predictions or representations from multiple pretrained and task-specific forecasters. It is designed to support auxiliary signals such as influenza-associated hospitalization data and to improve robustness across multi-horizon forecasting tasks.
The experiments use the following datasets.
| Dataset | Spatial unit | Frequency | Role |
|---|---|---|---|
| Influenza-like illness (ILI) | 10 U.S. HHS regions | Weekly | Main forecasting target |
| Influenza-associated hospitalizations | 10 U.S. HHS regions | Weekly | Forecasting target and auxiliary signal |
| Dataset | Role |
|---|---|
| Influenza hospitalizations | Related-domain pretraining and auxiliary input for ILI prediction |
| ILI | Related-domain auxiliary input for hospitalization prediction |
| Epidemic weekly data | Epidemic-domain pretraining source |
| Traffic / TrafficL | Non-epidemiological structured time series pretraining source |
| M4 | Large generic time series pretraining corpus |
In the temporal setting, each HHS region is split chronologically. Models are trained on earlier observations and evaluated on future observations from the same region. This tests forward-in-time generalization.
In the spatial setting, models are trained on a subset of HHS regions and evaluated on held-out regions. This tests geographic transfer and robustness to regional distribution shift.
All main forecasting experiments use CDC-aligned short-term horizons:
1 week ahead
2 weeks ahead
3 weeks ahead
4 weeks ahead
The repository supports both:
- Direct multi-output forecasting: the model predicts the full 1–4-week trajectory in one forward pass.
- Iterative forecasting: the model predicts one step ahead and recursively rolls forward.
The main evaluation metrics are:
- Mean Squared Error (MSE)
- Normalized Nash–Sutcliffe Efficiency (NNSE)
NNSE is used because it provides a normalized performance score. A value near 1 indicates strong predictive agreement, while values near 0.5 correspond to a mean-baseline-like model.
Additional metrics may be computed in some scripts:
- Mean Absolute Error (MAE)
- Symmetric Mean Absolute Percentage Error (SMAPE)
- Root Mean Squared Error (RMSE)
- Coefficient of determination (R²)
Clone the repository:
git clone https://github.com/<your-username>/Epidemic-Times-Series-Foundation-Models-Benchmark.git
cd Epidemic-Times-Series-Foundation-Models-BenchmarkCreate the environment:
conda env create -f environment.yml
conda activate epidemic-tsfmAlternatively, install from requirements.txt:
python -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
pip install -r requirements.txtDownload or place the required raw files under:
data/raw/
Expected processed format:
unique_id, ds, y
Region 1, 2003-10-05, ...
Region 1, 2003-10-12, ...
Region 2, 2003-10-05, ...
Where:
unique_idis the region identifier.dsis the weekly date.yis the target value.
The paper reports several major findings:
- MultiFoundationCore achieves the strongest overall performance in temporal ILI forecasting by fusing multiple expert forecasters.
- PatchTST and iTransformer are strong numerical time series models for CDC-aligned short-horizon influenza prediction.
- Pretraining improves multi-horizon robustness, especially at longer horizons and especially when the pretraining source is mechanistically related to influenza dynamics.
- Hospitalization data is useful in both directions: it improves ILI forecasting when used for pretraining or as an auxiliary signal, and ILI improves hospitalization forecasting as an auxiliary covariate.
- LLM-style time series models underperform strong numerical models in this operational 1–4-week forecasting setting.
- Spatial generalization is harder than temporal forecasting, because held-out regions introduce distribution shift across geography.
- Moderate retraining frequency improves temporal adaptation, especially at 3–4-week horizons.
If you use this repository, please cite the paper:
@article{jafari2026epidemic_tsfm,
title = {Understanding Key Features of Time Series Foundation Models from Epidemic Forecasting},
author = {Jafari, Alireza and Fox, Judy and Fox, Geoffrey C. and Marathe, Madhav and Chou, Jingyuan and Adiga, Aniruddha},
journal = {Under review},
year = {2026}
}This project is released under the MIT License unless otherwise specified.
Data sources may have their own licenses or usage terms. Please check the original data providers before redistributing raw datasets.
For questions, please contact:
Alireza Jafari
University of Virginia
Email: jrp5td@virginia.edu
This repository accompanies research on time series foundation models for public-health forecasting. The experiments use regional influenza surveillance and hospitalization time series and compare a broad set of statistical, neural, foundation, and mixture-of-experts forecasting methods.