Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend CheckpointFunction to track all tensor input/output #1148

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

000Justin000
Copy link

@000Justin000 000Justin000 commented Nov 7, 2023

What does this PR do?

The current activation checkpointing implementation would require the input/output argument to be a Tensor to be properly tracked by the auto grad, however for pyspeech nn layers we often use aux_input as a dict and output state which is a list.

This diff enables serialization of a python container: given an input that could be any python "container" (tuple, list, dict), perform a (depth first search) DFS to extract the pytorch tensors from the container and serialize the output to a tuple of tensors. At the original location replace with a index to the serialized list of tensors. As such, the original input can be easily reconstructed.

Before checkpointed_forward, the serialization happens and the tuple of tensors is use as input to forward (thus tracked); during checkpointed_forward, the original input is reconstructed by deserialization and pass in the original forward; the output of the original forward is serialized in the same manner and returned (so that the output is also tracked). After checkpointed_forward, the serialized output is deserialized to the desired format.

Before submitting

  • Did you have fun?
    • Make sure you had fun coding 🙃
  • Did you read the contributor guideline?
  • Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
    • N/A
  • Did you make sure to update the docs?
    • N/A
  • Did you write any new necessary tests?
    • N/A
  • Did you update the changelog? (if needed)
    • N/A

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants