-
Notifications
You must be signed in to change notification settings - Fork 23.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] functional autograd + compiled autograd #139098
base: gh/zou3519/1081/base
Are you sure you want to change the base?
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/139098
Note: Links to docs will display an error until the docs builds have been completed. ❌ 36 New FailuresAs of commit 368259d with merge base 5c88a9f (): NEW FAILURES - The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This commit refactors autograd so that nodes can be called in a functional way. Furthermore, it refactors compiled autograd to use the new functional autograd, without any behavior changes. This is on the way to getting compiled autograd to stop tracing into autograd nodes when it constructs an FX graph out of the autograd graph. We also implement some very basic support for that, which can be toggled via `old_inline_behavior=False` in compiled_autograd.py. Functional autograd works like the following: - All torch::autograd::Node must define a `retrieve_saved(SwapSavedVariables) -> ivalue_list` API. This function takes compiled autograd's SwapSavedVariables and packs the state that is relevant to the current Node into an ivalue_list. - All torch::autograd::Node must define a `get_functional() -> std::function`. This returns a new stateless function that accepts the gradients and saved values as an ivalue_list and returns new gradients. - We developed a mechanism to bind arbitrary C++ functions that take ivalue_list to Python. This is really similar to how we bind custom ops to Python and was done in consideration of the Windows symbol limit (otherwise, we'd be binding one symbol per Node into Python). Here's an example of the new autograd generated code - https://gist.github.com/zou3519/09bb98bb0f11445bc3da063201adb818 Here's an example of the FX graph compiled autograd produces (with old_inline_behavior=False): - https://gist.github.com/zou3519/43e8106176d15d623e1377850f585c97 ghstack-source-id: c2f93cd46dd245ccc26cf1bd07f861cb18267eaf Pull Request resolved: #139098
This commit refactors autograd so that nodes can be called in a functional way. Furthermore, it refactors compiled autograd to use the new functional autograd, without any behavior changes. This is on the way to getting compiled autograd to stop tracing into autograd nodes when it constructs an FX graph out of the autograd graph. We also implement some very basic support for that, which can be toggled via `old_inline_behavior=False` in compiled_autograd.py. Functional autograd works like the following: - All torch::autograd::Node must define a `retrieve_saved(SwapSavedVariables) -> ivalue_list` API. This function takes compiled autograd's SwapSavedVariables and packs the state that is relevant to the current Node into an ivalue_list. - All torch::autograd::Node must define a `get_functional() -> std::function`. This returns a new stateless function that accepts the gradients and saved values as an ivalue_list and returns new gradients. - We developed a mechanism to bind arbitrary C++ functions that take ivalue_list to Python. This is really similar to how we bind custom ops to Python and was done in consideration of the Windows symbol limit (otherwise, we'd be binding one symbol per Node into Python). Here's an example of the new autograd generated code - https://gist.github.com/zou3519/09bb98bb0f11445bc3da063201adb818 Here's an example of the FX graph compiled autograd produces (with old_inline_behavior=False): - https://gist.github.com/zou3519/43e8106176d15d623e1377850f585c97 ghstack-source-id: c2f93cd46dd245ccc26cf1bd07f861cb18267eaf Pull Request resolved: pytorch/pytorch#139098
Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as |
Stack from ghstack (oldest at bottom):
This commit refactors autograd so that nodes can be called in a
functional way. Furthermore, it refactors compiled autograd to use
the new functional autograd, without any behavior changes.
This is on the way to getting compiled autograd to stop tracing into
autograd nodes when it constructs an FX graph out of the autograd graph.
We also implement some very basic support for that, which can be toggled
via
old_inline_behavior=False
in compiled_autograd.py.Functional autograd works like the following:
retrieve_saved(SwapSavedVariables) -> ivalue_list
API. This functiontakes compiled autograd's SwapSavedVariables and packs the state that
is relevant to the current Node into an ivalue_list.
get_functional() -> std::function
.This returns a new stateless function that accepts the
gradients and saved values as an ivalue_list and returns new
gradients.
ivalue_list to Python.
This is really similar to how we bind custom ops to Python and was
done in consideration of the Windows symbol limit (otherwise, we'd be
binding one symbol per Node into Python).
Here's an example of the new autograd generated code
Here's an example of the FX graph compiled autograd produces (with
old_inline_behavior=False):
cc @EikanWang @jgong5 @wenzhe-nrv @sanchitintel @voznesenskym @penguinwu @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov @rec @xmfan