Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add checkpoint unit test #35

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open

Add checkpoint unit test #35

wants to merge 9 commits into from

Conversation

sfc-gh-mwyatt
Copy link
Collaborator

@sfc-gh-mwyatt sfc-gh-mwyatt commented Jan 24, 2025

  • Add unit tests for HF and DS checkpoint engines
  • make unit tests run in same process as pytest. This helps to avoid error messages getting hidden from launching a subprocess (and is also faster because no startup overhead for each test)
  • Fix bug where we were not loading DS checkpoints when resuming training
  • Move process destruction to CLI function
  • Add pytest hooks and helper functions to run both CPU + GPU tests in a single command (but this is still not quite working and we still need to run them separately for now)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants