Skip to content

Latest commit

 

History

History
58 lines (46 loc) · 1.62 KB

CHANGELOG.md

File metadata and controls

58 lines (46 loc) · 1.62 KB

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[next rel] - TBD

Added

  • AdaScale: Added gradient accumulation feature (#202)
  • AdaScale: Added support of torch.lr_scheduler (#229)

Fixed

  • AdaScale: smoothing factor value fixed when using gradient accumulation (#235)
  • Pipe: documentation on balancing functions (#243)

[0.1.1] - 2020-12-01

Fixed

  • make sure pip package includes header files (#221)

[0.1.0] - 2020-12-01

Added

  • ShardedDataParallel with autoreduce (#157)
  • cpu support for Pipe (#188)
  • ShardedOptim: Distributed Grad Scaler (for torch AMP) (#182)
  • OSS-aware clip grads, bridge sharded states (#167)
  • oss: add rank_local_state_dict staticmethod (#174)
  • support for PyTorch 1.7.0 (#171)
  • Add implementation of AdaScale (#139)

Fixed

  • pip package install (#196, #200)

[0.0.3] - 2020-10-14

Added

  • multi-process pipe

Fixed

  • multiple OSS fixes
  • MegaTron+OSS DDP fix

[0.0.2] - 2020-08-28

Added

  • add ddp that works with oss with reduce() not all_reduce() (#19)
  • support for PyTorch v1.6
  • add mixed precision Adam (#40)
  • Adam optimizer state scaling (#44)

Fixed

  • properly restore a sharded optim state (#39)
  • OSS restore state to proper device (#46)
  • optim/oss: support optimizers with additional step kwargs (#53)
  • optim/oss: fix state cast (#56)
  • fix eval for oss_ddp (#55)
  • optim/oss: work correctly with LRScheduler (#58)

[0.0.1] - 2020-07-31

  • Initial release.