Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] functional autograd + compiled autograd #139098

Draft
wants to merge 1 commit into
base: gh/zou3519/1081/base
Choose a base branch
from

Conversation

zou3519
Copy link
Contributor

@zou3519 zou3519 commented Oct 28, 2024

Stack from ghstack (oldest at bottom):

This commit refactors autograd so that nodes can be called in a
functional way. Furthermore, it refactors compiled autograd to use
the new functional autograd, without any behavior changes.

This is on the way to getting compiled autograd to stop tracing into
autograd nodes when it constructs an FX graph out of the autograd graph.
We also implement some very basic support for that, which can be toggled
via old_inline_behavior=False in compiled_autograd.py.

Functional autograd works like the following:

  • All torch::autograd::Node must define a
    retrieve_saved(SwapSavedVariables) -> ivalue_list API. This function
    takes compiled autograd's SwapSavedVariables and packs the state that
    is relevant to the current Node into an ivalue_list.
  • All torch::autograd::Node must define a
    get_functional() -> std::function.
    This returns a new stateless function that accepts the
    gradients and saved values as an ivalue_list and returns new
    gradients.
  • We developed a mechanism to bind arbitrary C++ functions that take
    ivalue_list to Python.
    This is really similar to how we bind custom ops to Python and was
    done in consideration of the Windows symbol limit (otherwise, we'd be
    binding one symbol per Node into Python).

Here's an example of the new autograd generated code

Here's an example of the FX graph compiled autograd produces (with
old_inline_behavior=False):

cc @EikanWang @jgong5 @wenzhe-nrv @sanchitintel @voznesenskym @penguinwu @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov @rec @xmfan

[ghstack-poisoned]
Copy link

pytorch-bot bot commented Oct 28, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/139098

Note: Links to docs will display an error until the docs builds have been completed.

❌ 36 New Failures

As of commit 368259d with merge base 5c88a9f (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

zou3519 added a commit that referenced this pull request Oct 28, 2024
This commit refactors autograd so that nodes can be called in a
functional way. Furthermore, it refactors compiled autograd to use
the new functional autograd, without any behavior changes.

This is on the way to getting compiled autograd to stop tracing into
autograd nodes when it constructs an FX graph out of the autograd graph.
We also implement some very basic support for that, which can be toggled
via `old_inline_behavior=False` in compiled_autograd.py.

Functional autograd works like the following:
- All torch::autograd::Node must define a
  `retrieve_saved(SwapSavedVariables) -> ivalue_list` API. This function
  takes compiled autograd's SwapSavedVariables and packs the state that
  is relevant to the current Node into an ivalue_list.
- All torch::autograd::Node must define a
  `get_functional() -> std::function`.
  This returns a new stateless function that accepts the
  gradients and saved values as an ivalue_list and returns new
  gradients.
- We developed a mechanism to bind arbitrary C++ functions that take
  ivalue_list to Python.
  This is really similar to how we bind custom ops to Python and was
  done in consideration of the Windows symbol limit (otherwise, we'd be
  binding one symbol per Node into Python).

Here's an example of the new autograd generated code
- https://gist.github.com/zou3519/09bb98bb0f11445bc3da063201adb818

Here's an example of the FX graph compiled autograd produces (with
old_inline_behavior=False):
- https://gist.github.com/zou3519/43e8106176d15d623e1377850f585c97

ghstack-source-id: c2f93cd46dd245ccc26cf1bd07f861cb18267eaf
Pull Request resolved: #139098
@facebook-github-bot facebook-github-bot added the oncall: jit Add this issue/PR to JIT oncall triage queue label Oct 28, 2024
fightingand pushed a commit to fightingand/pytorch that referenced this pull request Dec 20, 2024
This commit refactors autograd so that nodes can be called in a
functional way. Furthermore, it refactors compiled autograd to use
the new functional autograd, without any behavior changes.

This is on the way to getting compiled autograd to stop tracing into
autograd nodes when it constructs an FX graph out of the autograd graph.
We also implement some very basic support for that, which can be toggled
via `old_inline_behavior=False` in compiled_autograd.py.

Functional autograd works like the following:
- All torch::autograd::Node must define a
  `retrieve_saved(SwapSavedVariables) -> ivalue_list` API. This function
  takes compiled autograd's SwapSavedVariables and packs the state that
  is relevant to the current Node into an ivalue_list.
- All torch::autograd::Node must define a
  `get_functional() -> std::function`.
  This returns a new stateless function that accepts the
  gradients and saved values as an ivalue_list and returns new
  gradients.
- We developed a mechanism to bind arbitrary C++ functions that take
  ivalue_list to Python.
  This is really similar to how we bind custom ops to Python and was
  done in consideration of the Windows symbol limit (otherwise, we'd be
  binding one symbol per Node into Python).

Here's an example of the new autograd generated code
- https://gist.github.com/zou3519/09bb98bb0f11445bc3da063201adb818

Here's an example of the FX graph compiled autograd produces (with
old_inline_behavior=False):
- https://gist.github.com/zou3519/43e8106176d15d623e1377850f585c97

ghstack-source-id: c2f93cd46dd245ccc26cf1bd07f861cb18267eaf
Pull Request resolved: pytorch/pytorch#139098
Copy link
Contributor

github-actions bot commented Jan 7, 2025

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

@github-actions github-actions bot added the Stale label Jan 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/inductor module: compiled autograd compiled_autograd module: dynamo module: inductor oncall: jit Add this issue/PR to JIT oncall triage queue release notes: jit release notes category Stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants