Skip to content

Conversation

@ShadenSmith
Copy link
Contributor

No description provided.

@ShadenSmith ShadenSmith deleted the fix-permalink branch March 18, 2020 07:41
samyam pushed a commit that referenced this pull request Mar 8, 2021
samyam added a commit that referenced this pull request Mar 8, 2021
* Squash stage3 v1 (#146)

Co-authored-by: Samyam <[email protected]>
Co-authored-by: Jeff Rasley <[email protected]>
Co-authored-by: Samyam Rajbhandari <[email protected]>
Co-authored-by: Olatunji Ruwase <[email protected]>
Co-authored-by: Shaden Smith <[email protected]>
Co-authored-by: Shaden Smith <[email protected]>
Co-authored-by: eltonzheng <[email protected]>

* Fix correctness bug (#147)

* formatting fix (#150)

* stage3 bugfix (API) update and simplified FP16 Z3 tests (#151)

* fp16 Z3 API update and bugfix

* revert debug change

* ZeRO-3 detach and race condition bugfixes (#149)

* trying out ZeRO-3 race condition fix

* CUDA sync instead of stream

* reduction stream sync

* remove commented code

* Fix optimizer state_dict KeyError (#148)

Co-authored-by: Jeff Rasley <[email protected]>

* fix for smaller SGS sizes, ensures each grad is backed by unique tensors (#152)

* Simplifying the logic for getting averaged gradients (#153)

* skip for now

* Z3 Docs redux (#154)

* removing some TODOs and commented code (#155)

* New Z3 defaults (#156)

Co-authored-by: Jeff Rasley <[email protected]>

* formatting

* megatron external params

Co-authored-by: Jeff Rasley <[email protected]>
Co-authored-by: Olatunji Ruwase <[email protected]>
Co-authored-by: Shaden Smith <[email protected]>
Co-authored-by: Shaden Smith <[email protected]>
Co-authored-by: eltonzheng <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant