Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use xm.save to save model on TPU #3044

Closed
wants to merge 44 commits into from

Conversation

lezwon
Copy link
Contributor

@lezwon lezwon commented Aug 19, 2020

What does this PR do?

Fixes #2700

Before submitting

  • Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together? Otherwise, we ask you to create a separate PR for every change.
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?
  • Did you verify new and existing tests pass locally with your changes?
  • If you made a notable change (that affects users), did you update the CHANGELOG?

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

@pep8speaks
Copy link

pep8speaks commented Aug 22, 2020

Hello @lezwon! Thanks for updating this PR.

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-09-19 09:57:09 UTC

@Borda Borda added feature Is an improvement or enhancement accelerator: tpu Tensor Processing Unit labels Aug 24, 2020
@edenlightning
Copy link
Contributor

Hey @lezwon, any ETA on this fix?

@lezwon
Copy link
Contributor Author

lezwon commented Sep 16, 2020

@edenafek it has a lot of breaking changes. Also there was an issue due to which it was hanging at the 4th epoch.
Had it on pause due to the refactors happening. Will try to get this done on the weekend. :)

@lezwon lezwon force-pushed the bugfix/2700_xm_save branch from 8ac6eeb to c883572 Compare September 19, 2020 09:57
@Borda
Copy link
Member

Borda commented Sep 25, 2020

@lezwon how is this going, can we get it done today?
pls mind rebase on master... maybe it would be easier for you to squash it fist and then rebase...

@edenlightning edenlightning modified the milestones: 0.9.x, 1.0, 1.1 Oct 4, 2020
@edenlightning
Copy link
Contributor

@lezwon can we revive this fix?

@lezwon
Copy link
Contributor Author

lezwon commented Oct 20, 2020

@edenlightning I'm using a new branch to fix this issue. Will close this one as it is stale.

@lezwon lezwon closed this Oct 20, 2020
@Borda
Copy link
Member

Borda commented Oct 20, 2020

@lezwon mind link here the new PR?

@lezwon lezwon mentioned this pull request Oct 22, 2020
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accelerator: tpu Tensor Processing Unit feature Is an improvement or enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Checkpointing is broken on TPUs
4 participants