Skip to content

Latest commit

 

History

History
20 lines (16 loc) · 1.07 KB

File metadata and controls

20 lines (16 loc) · 1.07 KB

Changelog

All notable changes to this project are documented in this file.

The format is based on Keep a Changelog and this project follows Semantic Versioning.

[0.1.0] - 2026-04-11

Added

  • Initial release of PyTorch Training Inspector.
  • Core training wrapper with step-scoped instrumentation.
  • Monitors for loss, gradients, learning rate, GPU memory, throughput, and activations.
  • Detectors for NaN loss, gradient explosion, training stalls, LR mismatch, and OOM risk.
  • Plotly dashboard generation and standalone report output.
  • Alert routing primitives (stdout, Slack, email).
  • Checkpoint analysis and profiler integration utilities.
  • Example scripts for baseline usage, advanced monitoring, multi-GPU, and checkpoint resume.
  • Unit and integration tests for key monitoring and detection flows.
  • Documentation set under docs/ (user guide, API reference, anomaly patterns, performance tuning).
  • Repository governance and release readiness artifacts (LICENSE, CONTRIBUTING.md, CODE_OF_CONDUCT.md, SECURITY.md, CITATION.cff, pyproject.toml, .gitattributes).