English | 中文版
Team Number: 2622622 Competition members:ysk、Aurora、Travor Competition Date: February 2026
This project presents our complete solution to the 2026 Mathematical Contest in Modeling (MCM) Problem C. We analyzed 34 seasons of Dancing with the Stars (DWTS) data to investigate the combination mechanisms of judge scores and fan votes, and designed an optimized voting system.
- Bayesian Fan Vote Estimation Model - Using MCMC sampling to infer fan vote shares from elimination outcomes
- Controversial Case Quantitative Analysis - Identified and simulated survival probabilities for 15 controversial contestants under different rules
- Multi-Model Feature Analysis - LMM + XGBoost/SHAP + Cox survival analysis triangulation
- Dynamic Weight Optimization System - Sigmoid function + Bottom-2 hybrid mechanism
In DWTS, celebrities partner with professional dancers and perform weekly. Judge scores (1-10) combined with fan votes determine eliminations. Three scoring methods have been used historically:
| Era | Seasons | Method | Characteristics |
|---|---|---|---|
| Era 1 | S1-S2 | Rank-based | Judge and fan rankings summed |
| Era 2 | S3-S27 | Percentage-based | Judge score % + fan vote % |
| Era 3 | S28+ | Bottom-2 | Lowest two by combined score, judges vote to eliminate |
- Bobby Bones (S27): Consistently lowest judge scores, yet won the championship
- Jerry Rice (S2): 5 weeks of lowest judge scores, finished runner-up
- Bristol Palin (S11): 12 times lowest judge scores, finished 3rd place
- Billy Ray Cyrus (S4): 6 weeks of lowest judge scores, finished 5th
2026mcmC/
├── 2026_MCM_Problem_C_Data.csv # Raw data (34 seasons)
│
├── DataProcessed/ # Data preprocessing module
│ ├── data_preprocessing.py # Wide → long format conversion
│ ├── DWTS_Processed_Long.csv # Processed long format data
│ └── DWTS_Features.csv # Feature engineering data
│
├── Q1/ # Question 1: Bayesian fan vote estimation
│ ├── bayesian_fan_vote_model.py # Main model controller (S3-S27)
│ ├── bayesian_s1s2_rank_model.py # S1-S2 rank-based specialized model
│ ├── bayesian_bottom2_model.py # S28+ Bottom-2 specialized model
│ ├── fan_vote_estimates.csv # S3-S27 fan share estimates
│ ├── fan_vote_s1s2_enhanced.csv # S1-S2 fan share estimates
│ ├── fan_vote_bottom2.csv # S28+ fan share estimates
│ └── elimination_validation.csv # Consistency validation results
│
├── Q2/ # Question 2: Scoring rule comparison & controversy analysis
│ ├── find_controversial_cases.py # Controversial contestant identification
│ ├── analyze_judge_vs_fan_mechanism.py # Judge vs fan mechanism comparison
│ ├── compare_scoring_methods.py # Rank vs percentage comparison
│ ├── critical_fan_share_analysis.py # Critical fan share analysis
│ ├── critical_fan_share_bottom2.py # Critical analysis with Bottom-2
│ ├── visualize_controversial_heatmap_v3.py # Counterfactual heatmap
│ └── judge_vs_fan_comparison.csv # Mechanism comparison results
│
├── Q3/ # Question 3: Celebrity features & partner effects
│ ├── q3_lmm_xgboost_analysis.py # LMM + XGBoost + SHAP
│ ├── q3_cox_survival_analysis.py # Cox proportional hazards model
│ ├── q3_effect_difference_analysis.py # Bootstrap effect difference test
│ ├── q3_sensitivity_check.py # Temporal robustness check
│ ├── pro_partner_effects.csv # Partner random effects
│ └── pro_partner_survival.csv # Partner survival statistics
│
├── Q4/ # Question 4: Optimal voting system design
│ ├── q4_dynamic_weight_model.py # Sigmoid dynamic weight model
│ ├── q4_pareto_optimization.py # Multi-objective Pareto optimization
│ ├── q4_historical_validation.py # Historical case validation
│ ├── q4_sensitivity_analysis.py # Parameter sensitivity analysis
│ └── sigmoid_grid_search.csv # Grid search results
│
├── SensitiveAnalyse/ # Sensitivity analysis summary
│
├── Paper/ # Paper LaTeX files
│ ├── math_model_part2.tex # Mathematical model derivations
│ ├── dwts_algorithm_pseudocode.tex # 27 algorithm pseudocodes
│ ├── memo.tex # 1-page producer memo
│ └── ai_use_report.tex # AI usage report
│
└── README.md # Chinese version
Let season
-
$J_i$ : Contestant$i$ 's total judge score (known) -
$\theta_i$ : Contestant$i$ 's fan vote share (to estimate) -
$E$ : Eliminated contestant
Fan shares satisfy the simplex constraint:
-
Era 1-2 (uninformative prior):
$\boldsymbol{\theta} \sim \text{Dirichlet}(\mathbf{1}_N)$ -
Era 3 (informative prior):
$\alpha_i = 0.5 + 5.0 \cdot \frac{r^J(i) - 1}{N - 1}$
Era 1-2: Eliminated contestant has lowest combined score
Era 3 (Bottom-2): Eliminated contestant is in the bottom two
| Parameter | Standard (S3-S27) | Enhanced (S1-S2) | Bottom-2 (S28+) |
|---|---|---|---|
| Draws | 2000 | 5000 | 3000 |
| Tune | 1000 | 2000 | 1500 |
| Chains | 2 | 4 | 4 |
| Sampler | NUTS | Slice | Slice |
- Consistency Rate: Proportion where model prediction matches actual elimination (>95%)
-
Convergence Diagnostics:
$\hat{R} < 1.05$ , ESS > 400
Original Rule: Fan votes ranked → 1st place gets N points, 2nd gets N-1... → Add to judge rank points → Lowest total eliminated
Modeling Challenge: The rank() operation is discrete and non-differentiable; MCMC samplers require continuous gradients
Simplification Strategy: Use continuous fan share
Why This Works (Monotonicity Guarantee):
Rank-based scoring is essentially an order-preserving transformation:
Therefore, when determining "who has the lowest total score":
- Original rule:
$\text{Total}_i = \text{rank_points}(f_i) + j_i$ - Simplified rule:
$\text{Total}_i = f_i + j_i$ (normalized)
Both maintain identical relative orderings, yielding equivalent fan share posterior distributions.
Example:
f = [0.30, 0.31, 0.39] → ranks [3, 2, 1] → points [1, 2, 3]
f = [0.30, 0.32, 0.38] → ranks [3, 2, 1] → points [1, 2, 3] (small change, same ranks)
f = [0.30, 0.35, 0.35] → ranks [3, 1.5, 1.5] → points jump! (sampler fails)
This is a common continuous approximation in Bayesian inference—preserving ordering while enabling MCMC sampling.
Where
Monte Carlo simulation (
- Sample fan shares from posterior:
$\tilde{f}_i \sim \mathcal{N}(\hat{f}_i, \sigma_i^2)$ - Compute combined score:
$T_i = 0.5 \cdot J_i + 0.5 \cdot \tilde{f}_i$ - Count target contestant survival frequency
| Comparison Dimension | Rank-based | Percentage-based | Difference |
|---|---|---|---|
| Eliminated avg fan share | 8.2% | 11.5% | +3.3% |
| Fan influence | Lower | Higher | Percentage favors fans |
| Judge-fan mechanism correlation | - | - | r = -0.320 |
- Fixed effects: Age, industry, week, season progress
- Random effects: Partner
$u_g \sim \mathcal{N}(0, \sigma_u^2)$
Intraclass Correlation Coefficient (ICC):
| Target Variable | ICC | Interpretation |
|---|---|---|
| Judge Score | 18.5% | Partner explains 18.5% of judge score variance |
| Fan Share | 7.0% | Partner has smaller effect on fan votes |
| Rank | Judge Score Drivers | Fan Vote Drivers |
|---|---|---|
| 1 | Week | Industry |
| 2 | Season Progress | Week |
| 3 | Age | Age |
| 4 | Industry | Season Progress |
| Factor | Hazard Ratio (HR) | 95% CI | Interpretation |
|---|---|---|---|
| Age (+10 years) | 1.15 | [1.08, 1.23] | Older age increases elimination risk |
| Athlete vs Actor | 0.82 | [0.71, 0.95] | Athletes have lower elimination risk |
| Model vs Actor | 1.34 | [1.12, 1.60] | Models have higher elimination risk |
Where
Optimal Parameters (Grid Search):
-
$w_{\min} = 0.20$ (early judge weight) -
$w_{\max} = 0.60$ (late judge weight) -
$k = 1.90$ (transition rate)
Three objective functions:
- Fairness: Lower judge rank of eliminated contestant is better
- Engagement: Marginal impact of fan votes on outcomes
- No-Robbery: Avoid eliminating high-scoring contestants
Knee Point Analysis: Kneedle algorithm determines optimal weight
Early (Week 1-4): w = 20-30% → Protect "dark horses"
Mid (Week 5-6): w = 35-45% → Gradual transition
Late (Week 7+): w = 50-60% + Bottom-2 → Ensure deserving champion
Based on the above analysis, we prepared a recommendation memo for DWTS producers:
# Core dependencies
pip install numpy pandas scipy matplotlib seaborn
# Bayesian modeling
pip install pymc arviz pytensor
# Statistical modeling
pip install statsmodels
# Machine learning
pip install xgboost shap scikit-learn
# Survival analysis
pip install lifelinescd d:\2026mcmC
pip install -r requirements.txt# 1. Data preprocessing
python DataProcessed/data_preprocessing.py
# 2. Q1: Bayesian fan vote estimation (time-consuming)
python Q1/bayesian_s1s2_rank_model.py # S1-S2
python Q1/bayesian_fan_vote_model.py # S3-S27
python Q1/bayesian_bottom2_model.py # S28+
# 3. Q2: Controversy analysis
python Q2/find_controversial_cases.py
python Q2/analyze_judge_vs_fan_mechanism.py
# 4. Q3: Feature analysis
python Q3/q3_lmm_xgboost_analysis.py
python Q3/q3_cox_survival_analysis.py
# 5. Q4: System optimization
python Q4/q4_pareto_optimization.py
python Q4/q4_dynamic_weight_model.py| Rule | Pros | Cons | Recommended Scenario |
|---|---|---|---|
| Rank-based | Simple, intuitive | Amplifies small differences | Early seasons |
| Percentage-based | High fan influence | May create controversy | Mid seasons |
| Bottom-2 | Best balance | Adds complexity | Late seasons |
| Metric | Current System | Recommended System | Improvement |
|---|---|---|---|
| Robbery Rate | 42% | 8% | -34% |
| Fan Engagement | Baseline | +22% | Significant increase |
| Fairness | Baseline | 96% | Major improvement |
- Keep percentage-based scoring - Maintains voting suspense and fan engagement
- Adopt dynamic judge weights - Low early (30%) protects dark horses, high late (60%) ensures deserving champions
- Enable Bottom-2 from Week 7 - Creates drama while preventing extreme outcomes
- Consider vote point system - Multiple votes for same contestant require more points
This project contains 27 core algorithms, detailed in Paper/dwts_algorithm_pseudocode.tex:
| Number | Algorithm Name | Question |
|---|---|---|
| 1 | Data Preprocessing | Foundation |
| 2-5 | Bayesian MCMC Models | Q1 |
| 6-7 | Consistency/Uncertainty Analysis | Q1 |
| 8-12 | Scoring Rule Comparison & Heatmap | Q2 |
| 13-18 | LMM/XGBoost/Cox Analysis | Q3 |
| 19-24 | Pareto Optimization & Dynamic Weights | Q4 |
| 25-27 | Sensitivity Analysis | Validation |
| File | Description | Records |
|---|---|---|
2026_MCM_Problem_C_Data.csv |
Raw wide format data | 424 contestants |
| File | Description |
|---|---|
DWTS_Processed_Long.csv |
Long format (each row = contestant × week) |
DWTS_Features.csv |
Feature-engineered data |
fan_vote_*.csv |
Fan share estimates by era |
| File | Description |
|---|---|
*_comparison.csv |
Rule comparison results |
pro_partner_effects.csv |
Partner effects |
sigmoid_grid_search.csv |
Parameter optimization results |
*.png |
Visualization images |
- Issue: Paper sections scattered across multiple LaTeX files, difficult to sync versions
- Improvement:
- Use Endnote for reference management
- Use Tencent Docs for initial draft collaboration
- Self-hosted Overleaf Community Edition (VM deployment) for real-time LaTeX collaboration
- Use GitHub private repository for version control, supporting branch management and code review
- Issue: Early code used relative paths, paths broke after file reorganization
- Improvement:
- All Python files now use absolute paths configuration
- Recommend creating
config.pyat project start to centralize path management
- Issue: Pure Python scripts have long debug cycles; data exploration and visualization require re-running entire programs
- Improvement:
- Modular development: Split data loading, preprocessing, modeling, and visualization into separate Notebook cells for step-by-step debugging
- Interactive exploration: Use Jupyter's immediate output to quickly inspect intermediate variables and data distributions
- Visualization tuning: Adjust chart parameters (colors, fonts, layout) in real-time without re-running entire scripts
- Markdown documentation: Add explanatory text and formula derivations alongside code, creating self-documenting analysis reports
- Version control: Use
nbstripouttool to clean outputs, avoiding messy git diffs - Environment isolation: Use virtual environments +
ipykernelregistered kernels to ensure dependency consistency
Recommended Workflow:
1. New feature development → Jupyter Notebook for rapid prototyping
2. After feature stabilizes → Extract core logic to .py modules
3. Final integration → Main script calls modules, Notebook preserves exploration records
- Issue: MCMC convergence diagnostics not sufficiently checked early, some results needed re-running
-
Improvement:
- Establish standardized convergence checking workflow (
$\hat{R} < 1.05$ , ESS > 400) - Use ArviZ to auto-generate diagnostic reports
- Establish standardized convergence checking workflow (
- Issue: Data cleaning took too long early on, optimization module rushed at the end
- Improvement:
- Familiarize with data format in advance, prepare reusable preprocessing templates
- Reserve adequate time for sensitivity analysis for each question
- Issue: Careless reading led to doing Q3's feature analysis (celebrity characteristics impact) together with Q1
- Consequences:
- Circular reasoning risk: Q1 infers fan shares from elimination results, Q3 analyzes feature impact using fan shares—mixing them risks "inferring results from results"
- Paper structure chaos: Blurred boundaries between methodology and results, hard to articulate clearly
- Improvement:
- Carefully read all sub-questions before starting, clarify data flow and causal chains
- Define clear inputs and outputs for each question, avoid cross-question dependencies
- Progress sequentially—output of previous question becomes input of next
| Direction | Description | Priority |
|---|---|---|
| Deep Learning | Use LSTM/Transformer to predict fan voting trends | Medium |
| Real-time System | Build voting result real-time monitoring dashboard | Low |
| A/B Testing Framework | Design controlled experiments for different scoring rules | High |
| Social Media Analysis | Integrate Twitter/Instagram sentiment data | Medium |
- Unified absolute paths
- Added detailed bilingual comments
- Created
.gitignorefile - Add unit tests (
pytest) - Use
requirements.txtto lock dependency versions - Add type annotations (
typing) - Replace
printstatements withlogging - Abstract repeated code into shared modules
This project used AI assistance tools, detailed in Paper/ai_use_report.tex:
Effective Use Cases:
- Code debugging and error troubleshooting
- LaTeX formula formatting
- Document translation and polishing
Use with Caution:
- Core algorithm design (requires manual verification of correctness)
- Statistical conclusion interpretation (AI may be overconfident)
- Innovation point extraction (requires domain knowledge)
"I came, I divided, I conquered!"
Four days and nights. The moon outside waxed and waned; the coffee by the screen went cold and hot again.
We dove into oceans of data searching for truth, wandered through labyrinths of equations seeking exits. There were 3 AM breakdowns, and there were cheers when convergence finally hit. Blaming each other when the code crashed, high-fiving when the results came through—perhaps this is the romance of mathematical modeling.
"We are all in the gutter, but some of us are looking at the stars."
Maybe the results won't be what we hoped for. But so what? We gazed upon the same stars, chased the same dream. The journey itself is the meaning.
"The journey is the reward."
Let's set a bit and flow!
This project is for 2026 MCM academic competition purposes only.
- Team Number: 2622622
- Competition: 2026 Mathematical Contest in Modeling (MCM)
- Problem: Problem C - Data With The Stars
This README was last updated in February 2026

