Skip to content

Conversation

@trajepl
Copy link
Owner

@trajepl trajepl commented Jun 23, 2022

No description provided.

mrwyattii and others added 20 commits June 6, 2022 16:19
Add '-S' argument to pdsh command to return the largest error code from the ssh sessions
Co-authored-by: Quentin Anthony <[email protected]>
Co-authored-by: Ammar Ahmad Awan <[email protected]>
Co-authored-by: Jeff Rasley <[email protected]>
…gpt-j) (#1992)

Co-authored-by: Reza Yazdani <[email protected]>
Co-authored-by: Michael Wyatt <[email protected]>
Co-authored-by: Reza Yazdani <[email protected]>
Co-authored-by: Michael Wyatt <[email protected]>
* fix to catch assert error for inference test imports

* fix wrong syntax

* changed to sequential inf tests

* fix for lm_eval import

* added environment check fixture

* added expected torch and cuda version

* check various version depth for cuda/torch

Co-authored-by: Olatunji Ruwase <[email protected]>
Co-authored-by: Jeff Rasley <[email protected]>
* Retain prefetched params until last use

* Unit tests fixes
* Split parameter offload from z3

* Format fixes

* Bug fixes

* Cleanup

* Remove dead code
Co-authored-by: Olatunji Ruwase <[email protected]>
@trajepl trajepl merged commit 25e04b3 into trajepl:master Jun 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.