Skip to content

mranuraggarg/supernovae_classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Supernovae Type Ia Classification — Phase 2 Tier 2

This branch, phase2-tier2-compact-feature-ablation, is the canonical Phase 2 Tier 2 branch for the project. It builds on the frozen Phase 2 Tier 1 compact 16-feature XGBoost baseline and explains why that compact model works through feature ablation, feature-block ablation, staged subset growth, reduced-core experiments, and supporting reproducibility artifacts.

The focus of this branch is not a new preprocessing pipeline. It reuses the Phase 2 Tier 1 processed SPCC/SNPhotCC feature table, compact feature set, fixed split protocol, and XGBoost evaluation setup so every Tier 2 result is directly comparable to the compact baseline.

Current Status

Phase 2 Tier 2 is complete as a reproducible analysis branch. The branch now includes:

  • the Phase 2 Tier 1 compact 16-feature baseline reference,
  • Tier 2 ablation and subset-growth experiment scripts,
  • generated CSV, JSON, Markdown, PNG, and PDF artifacts under results/phase2_tier2/ and plots/phase2_tier2/,
  • corrected compact-baseline PR and ROC curves for the 16-feature model,
  • dataset, feature-dictionary, training-protocol, uncertainty, learning-curve, and artifact-index support files,
  • and Zenodo archives for both the initial v1 package and the current v2 support package.

The current archived version of this branch is phase2-tier2-v2.0, which includes the normalized Phase 2 Tier 2 support package and updated branch documentation.

Baseline Reference

The frozen compact baseline comes from Phase 2 Tier 1 and uses 16 physically interpretable features. It is the reference point for all Tier 2 deltas.

Metric Compact 16-feature baseline
F1 0.844230
ROC-AUC 0.976588
PR-AUC 0.927761

The corrected compact-baseline curve artifacts are:

results/phase2_tier1/phase2_tier1_compact_baseline_plots/phase2_tier1_compact_baseline_precision_recall_curve.png
results/phase2_tier1/phase2_tier1_compact_baseline_plots/phase2_tier1_compact_baseline_roc_curve.png

These curves belong to the Phase 2 Tier 1 compact baseline model, not the older original-paper model and not a separate Tier 2 model.

Compact Feature Set

The compact model uses four physical feature families.

Group Features
Brightness r_mean_flux, g_mean_flux, z_peak_flux, i_peak_flux, r_peak_flux
Color peak_color_g_minus_r, peak_color_r_minus_i, peak_color_i_minus_z
Variability i_std_flux, z_std_flux, r_std_flux, i_amplitude
Temporal r_time_of_peak, i_time_of_peak, z_time_of_peak, time_span

Tier 2 Results

The main Tier 2 conclusion is that temporal information is the most decisive family for the compact classifier, while brightness, color, and variability provide complementary structure. The compact feature set can be reduced to a smaller interpretable core with only modest degradation, but the full compact set remains the strongest and most stable reference model.

Key generated results:

Analysis Main output Interpretation
Single-feature ablation results/phase2_tier2/feature_ablation_metrics.csv time_span is the strongest individual feature by F1 loss when removed.
Block ablation results/phase2_tier2/block_ablation_metrics.csv Removing temporal features causes the largest block-level degradation.
Subset growth results/phase2_tier2/subset_growth_metrics.csv Performance grows as brightness, color, variability, and temporal information are added.
Minimal core results/phase2_tier2/minimal_core_metrics.csv A 10-feature core retains most of the compact baseline performance.
Master report results/phase2_tier2/phase2_tier2_report.md Human-readable summary of the Tier 2 ablation results.

Highlights from results/phase2_tier2/phase2_tier2_report.md:

  • Removing time_span gives the largest single-feature F1 drop: delta F1 = -0.033952.
  • Removing the temporal block gives the largest block-level drop: delta F1 = -0.046596, delta PR-AUC = -0.054893.
  • The best reduced core is the top-10 subset: F1 = 0.838509, PR-AUC = 0.916900.
  • The full compact model remains the preferred reference: F1 = 0.844230, PR-AUC = 0.927761.

Figures

The primary Tier 2 figures are:

plots/phase2_tier2/feature_ablation_delta_f1.png
plots/phase2_tier2/block_ablation_delta_f1.png
plots/phase2_tier2/subset_growth_f1.png
plots/phase2_tier2/subset_growth_paper_ready.png
plots/phase2_tier2/minimal_core_tradeoff.png

The paper-ready subset-growth figure is:

plots/phase2_tier2/subset_growth_paper_ready.png

It uses the four cumulative stages:

  1. brightness only,
  2. brightness + color,
  3. brightness + color + variability,
  4. full compact model.

It plots both F1 and PR-AUC and is intended for the ablation/growth-of-information discussion.

Supporting Artifacts

Additional Phase 2 Tier 2 support material is stored under results/phase2_tier2/.

Important files include:

results/phase2_tier2/artifact_index.md
results/phase2_tier2/dataset_summary.md
results/phase2_tier2/training_protocol.md
results/phase2_tier2/compact_feature_dictionary.md
results/phase2_tier2/selected_compact_model.md
results/phase2_tier2/uncertainty_summary.md
results/phase2_tier2/learning_curve.md
results/phase2_tier2/validation_logloss_curve.md
results/phase2_tier2/tier2_experiment_audit.md

The artifact index is the best entry point for the full support package:

results/phase2_tier2/artifact_index.md
results/phase2_tier2/artifact_index.csv

Running The Workflow

Use the project environment defined by environment.yml or astro-ml.yml.

Example:

conda env create -f environment.yml
conda activate astro-ml

The Tier 2 scripts expect the processed Phase 2 Tier 1 compact dataset to exist:

data/processed/phase2_tier1_compact_baseline.csv

The core Tier 2 experiment order is:

python3 phase2_tier2_feature_ablation.py
python3 phase2_tier2_block_ablation.py
python3 phase2_tier2_subset_growth.py
python3 phase2_tier2_minimal_core.py
python3 phase2_tier2_summary.py

The support-artifact scripts can be run after the compact dataset and Tier 2 outputs are available:

python3 phase2_tier2_metadata.py
python3 phase2_tier2_training_protocol.py
python3 phase2_tier2_feature_dictionary.py
python3 phase2_tier2_selected_model.py
python3 phase2_tier2_uncertainty.py
python3 phase2_tier2_learning_curves.py
python3 phase2_tier2_example_light_curve.py
python3 phase2_tier2_audit.py
python3 phase2_tier2_artifact_index.py

To regenerate the corrected compact-baseline PR and ROC curves:

python3 phase2_tier1_compact_curves.py

Repository Structure

The current Phase 2 Tier 2 branch centers on these files and folders:

supernovae_classification/
├── README.md
├── data/
│   └── processed/
│       ├── phase2_tier1_compact_baseline.csv
│       └── spcc_features_tier1.csv
├── feature_pipeline/
├── plots/
│   └── phase2_tier2/
├── results/
│   ├── phase2_tier1/
│   └── phase2_tier2/
├── phase2_tier1_compact_curves.py
├── phase2_tier2_common.py
├── phase2_tier2_feature_ablation.py
├── phase2_tier2_block_ablation.py
├── phase2_tier2_subset_growth.py
├── phase2_tier2_minimal_core.py
├── phase2_tier2_summary.py
├── phase2_tier2_metadata.py
├── phase2_tier2_training_protocol.py
├── phase2_tier2_feature_dictionary.py
├── phase2_tier2_selected_model.py
├── phase2_tier2_uncertainty.py
├── phase2_tier2_learning_curves.py
├── phase2_tier2_example_light_curve.py
├── phase2_tier2_audit.py
└── phase2_tier2_artifact_index.py

Older Phase 1 and Phase 2 Tier 1 utilities remain in the repository for provenance, but this README documents the current Phase 2 Tier 2 branch state.

Release And DOI

The current Phase 2 Tier 2 Zenodo release is:

Citation:

Anurag Garg. (2026). mranuraggarg/supernovae_classification:
Phase 2 Tier 2 v2: Compact Feature Ablation Support Package
(phase2-tier2-v2.0). Zenodo. https://doi.org/10.5281/zenodo.19666153

The previous Phase 2 Tier 2 Zenodo release is:

The DOI for all Zenodo versions of this project is:

Data Source And Inspiration

This repository originally drew inspiration from Adam Moss's supernova classification work and data organization:

Phase 2 extends that direction by reconstructing a native SPCC/SNPhotCC preprocessing and feature-engineering workflow, then using compact physically interpretable features for controlled model analysis.

License

This project is licensed under the MIT License.

About

Classifying type Ia supernovae through their light curve using ML.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors