Skip to content

ajitagupta/db-vs-sbb-punctuality

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DB vs SBB — A Punctuality Comparison

Swiss trains are famously punctual; German trains famously aren't. This project quantifies the gap using each country's own open data — and finds it's even larger than the reputation suggests. At the busiest 20 stations of each network, Swiss trains are punctual ~96% of the time vs. ~78% for German trains, and every single major Swiss station outperforms even the best major German one.

Scope: analysis covers 30 days of SBB Ist-Daten (September 2024) compared against the German piebro/deutsche-bahn-data window. Punctuality includes cancellations as "not punctual"; the 6-minute threshold matches DB's long-distance definition, the 3-minute threshold matches SBB's internal target.

Punctuality comparison

Headline findings

At Germany's and Switzerland's 20 busiest train stations over the analysis window:

  • Swiss trains are punctual roughly 96% of the time at the standard 6-minute threshold; German trains, 78% — a gap of ~18 percentage points.
  • The gap nearly doubles at the stricter 3-minute threshold (SBB 91%, DB 63%) — DB clears the easier bar much more often than the harder one, indicating a fat tail of small-but-noticeable delays.
  • Every one of SBB's 20 busiest stations outperforms the best of DB's 20 busiest. DB's most punctual large station (87.6%) still falls short of SBB's least punctual one (89.7%).
  • Mean arrival delay is ~3.7× higher in Germany (3.81 min vs 1.04 min).

What this project is

A focused two-stat comparison between Deutsche Bahn and Schweizerische Bundesbahnen, using:

  • German data: piebro/deutsche-bahn-data — community-maintained Hugging Face dataset (CC BY 4.0, underlying data © Deutsche Bahn), covering ~100 busiest German stations from 2024-07 onwards.
  • Swiss data: SBB Ist-Daten on opentransportdata.swiss — daily CSV files with actual vs. planned arrival/departure times across the SBB network.

The comparison is restricted to two questions:

  1. What does the arrival-delay distribution look like in each country?
  2. How does punctuality vary across the 20 busiest stations in each network?

Nothing else. No machine learning, no time-of-day analysis, no seasonal effects. The scope is deliberately small.

Methodology, in one paragraph

Punctuality is computed at two thresholds: <3 minutes (SBB Group's internal target) and <6 minutes (DB's long-distance definition). Both are reported for both countries so the comparison is honest about which definitional regime is being used. Cancellations are counted as "not punctual" in both datasets, which differs from each operator's official reporting. The analysis window is 2024-07 onwards, matching the German dataset's coverage. The German data is pre-flattened with a computed delay_in_min column; for Swiss data the delay is computed as (actual_arrival - scheduled_arrival) in minutes, clipped to non-negative. Station selection is "top 20 by stop-event volume in the analysis window" within each country independently — comparing each network's busy hubs to the other's.

Repository structure

db-vs-sbb-punctuality/
├── README.md
├── requirements.txt
├── LICENSE
├── .gitignore
├── notebooks/
│   └── analysis.ipynb            ← the full comparison
├── scripts/
│   ├── download_de_data.py       ← German data from Hugging Face
│   └── download_ch_data.py       ← Swiss data from opentransportdata.swiss
├── figures/                      ← PNG charts (git-tracked)
└── data/
    ├── de/                       ← German Parquet files (git-ignored)
    └── ch/                       ← Swiss CSV files (git-ignored)

Running it yourself

# Setup
git clone https://github.com/ajitagupta/db-vs-sbb-punctuality.git
cd db-vs-sbb-punctuality
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

# Download both datasets
python scripts/download_de_data.py
python scripts/download_ch_data.py    # this one takes longer — see script for date range

# Run the notebook
jupyter notebook notebooks/analysis.ipynb

The Swiss download is the slower step (one CSV per day, ~30 MB each). The script downloads a configurable range — see python scripts/download_ch_data.py --help.

Caveats and honest limitations

  • The Swiss dataset is much larger than the German one in raw row count (entire SBB network vs. ~100 stations). For a fair top-20 comparison this isn't a problem, but absolute distribution comparisons are sensitive to the coverage difference.
  • DB and SBB define "punctual" differently in their official reporting (6 min vs 3 min thresholds; differing cancellation treatment). This project shows both thresholds and treats cancellations uniformly. Numbers therefore differ from each operator's headline figures.
  • The Swiss Ist-Daten represents the SBB network, not the full Swiss rail system (BLS, SOB, foreign operators on Swiss tracks are partially included; private regional operators are not).
  • No causal claims. This is descriptive. Reasons why one country outperforms the other — track sharing with freight, infrastructure investment, network topology, average journey length — are not analyzed here.

Acknowledgements

  • German data: piebro/deutsche-bahn-data, CC BY 4.0, underlying data © Deutsche Bahn.
  • Swiss data: SBB Ist-Daten, opentransportdata.swiss, © SBB AG.
  • Prior public analyses of DB data by David Kriesel (CCC 2019) and the Bahn-Vorhersage project informed the methodology choices.

License

MIT — see LICENSE. Underlying data is governed by each operator's open-data license terms.


Built as a data-analysis portfolio project by Ajita Gupta — full-stack engineer based in Zurich.

About

A data analysis project on the comparison of Deutsche Bahn vs. SBB punctuality

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors