This repository contains the full codebase, processed data, and analytical outputs for the study:
The project investigates the long-run structure of male and female mortality using interpretable latent variable modelling. A variational autoencoder (VAE) is applied to age-specific, cause-of-death mortality profiles across a century to uncover low-dimensional demographic mechanisms driving sex differences in mortality.
Rather than modelling causes independently, the approach captures the joint structure of mortality by age, cause, sex, and historical period, allowing male–female divergence to be decomposed into interpretable latent dimensions aligned with known demographic transitions.
All analyses are fully reproducible and rely exclusively on publicly available data.
Mortality data are derived from the UK Office for National Statistics (ONS):
Office for National Statistics. Causes of death over 100 years. https://www.ons.gov.uk/peoplepopulationandcommunity/birthsdeathsandmarriages/deaths/articles/causesofdeathover100years/2017-09-18
Raw cause-of-death counts were harmonised across decades, age groups, and cause categories to ensure comparability across historical changes in registration and classification systems.
A variational autoencoder (VAE) is trained separately for males and females
Each observation corresponds to an age group × decade mortality profile
Inputs consist of cause-specific mortality counts
Sex is not included as a model input
Latent space dimensionality: 3
Learning sex-specific latent structures independently ensures that observed differences arise from empirical patterns rather than imposed modelling assumptions.
Latent dimensions are interpreted by correlating latent coordinates with observed cause-specific mortality counts across ages and periods.
Three stable and interpretable dimensions emerge:
Z1 — Epidemiological transition axis Captures the transition from infectious and acute causes of death to chronic disease mortality.
Z2 — Cancer / modernization axis Reflects long-run structural shifts in cancer mortality and accident exposure associated with social and economic change.
Z3 — External and behavioural risk axis Captures accidents, suicide, violence, and war-related mortality, exhibiting strong age specificity and pronounced sex differences.
Male–female divergence is quantified as the mean absolute distance in aligned latent space at each age–decade cell.
To identify mechanisms driving divergence:
Absolute differences are decomposed by latent axis
The dominant latent dimension is identified for each age and decade
Divergence is interpreted mechanistically rather than descriptively
This allows separation of divergence driven by baseline epidemiological conditions from divergence driven by behavioural and structural exposures.
The repository includes multiple validation procedures designed to support interpretability and methodological rigor:
- Dimensionality justification
VAEs trained with 2–5 latent dimensions
Reconstruction and total loss evaluated
Three dimensions shown to capture dominant structure without instability
- Temporal stability
Separate models trained on 1915–1965 and 1965–2015
Consistent latent axes recovered across periods
- Quantitative decoding of latent axes
Synthetic mortality profiles generated at fixed latent positions
Decoded into cause-of-death space to clarify substantive meaning
- Temporal trajectories
Mean latent values tracked across decades
Reveals epidemiological transition and modernization trends
- Age-specific trajectories
Latent paths traced for selected age groups
Demonstrates age-specific and sex-specific divergence mechanisms
CSV tables enabling replication and secondary analysis
Publication-quality figures illustrating latent structure and divergence
Fully documented notebooks covering the entire analytical pipeline
All analyses can be reproduced by:
Cloning the repository
Installing dependencies: pip install -r environment/requirements.txt Running notebooks sequentially
The analysis was conducted using:
Python ≥ 3.10
PyTorch
NumPy, pandas, scikit-learn
matplotlib and seaborn
Random seeds are fixed throughout to ensure reproducibility.
This repository is intended for:
Demographic and population health research
Methodological work on mortality structure
Interpretable machine learning in social science
Peer review, replication, and visa assessment
The code prioritises clarity and interpretability over production optimisation.
If you use this work, please cite:
Decomposing Sex Differences in Mortality Across Age and Cause in England and Wales, 1915–2015. Preprint forthcoming.
This project is released under the MIT License. ONS data remain subject to their original usage terms.
For questions, replication issues, or collaboration inquiries, please open an issue or contact the author via GitHub.