Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add PyTorch 2.4 tests in CI #654

Merged
merged 9 commits into from
Aug 15, 2024
Merged

Add PyTorch 2.4 tests in CI #654

merged 9 commits into from
Aug 15, 2024

Conversation

msaroufim
Copy link
Member

@msaroufim msaroufim commented Aug 10, 2024

EDIT: will need to fix this first before merging this #679 (comment)

Before this PR we were testing nightlies which recently moved from PyTorch 2.4 and 2.5 meaning we don't have coverage for 2.4 now

I would have also liked to remove support for PyTorch 2.2 with the policy being "we support latest stable pytorch, the version before that and nightlies"

A few tests needed to be changed

  • Autoquant with our nightlies now fail on pytorch 2.4 stable - cc @jerryzh168 @HDCharles
  • fp8 tests were moved to nightly only per @vkuzo's request
  • Mixed_mm test @msaroufim
  • fsdp2 low bit optim tests on 2.4 which means they are completely skipped now @gau-nernst

But will not make this change suddenly, cc some partners in case this is not OK @ebsmothers @joecummings @kimishpatel @Jack-Khuu @awgu @malfet

Copy link

pytorch-bot bot commented Aug 10, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/654

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 7d0a9e7 with merge base 1acd710 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 10, 2024
@msaroufim msaroufim changed the title Support PyTorch 2.4 and drop PyTorch 2.2 Add PyTorch 2.4 tests in CI Aug 10, 2024
@msaroufim
Copy link
Member Author

msaroufim commented Aug 11, 2024

@jerryzh168 there's 12 failing tests for AffineQuantized Tensor with torch 2.4 https://github.com/pytorch/ao/actions/runs/10335226306/job/28609531090?pr=654

This is concerning cause it means our latest nightly release does not work with latest pytorch stable release. I'm worried we have a similar with our official 0.4 release not working with PyTorch 2.4.

If so We might be forced to do a patch release. Can you please sync with @jcaip and figure this out

@vkuzo
Copy link
Contributor

vkuzo commented Aug 12, 2024

is there a way to move the float8 tests to PT 2.5 in this PR? It would be nice to just support the latest version for now to move fast. It would be just switching

if not TORCH_VERSION_AFTER_2_4:
and comparable lines in test/float8/* to target 2.5 instead.

@msaroufim
Copy link
Member Author

msaroufim commented Aug 12, 2024

is there a way to move the float8 tests to PT 2.5 in this PR? It would be nice to just support the latest version for now to move fast. It would be just switching

Yeah will land this first since our version guard logic is incorrect #485

@jerryzh168
Copy link
Contributor

@jerryzh168 there's 12 failing tests for AffineQuantized Tensor with torch 2.4 pytorch/ao/actions/runs/10335226306/job/28609531090?pr=654

This is concerning cause it means our latest nightly release does not work with latest pytorch stable release. I'm worried we have a similar with our official 0.4 release not working with PyTorch 2.4.

If so We might be forced to do a patch release. Can you please sync with @jcaip and figure this out

I can repro, opened a PR for export team to take a look here: pytorch/pytorch#133262

@jcaip
Copy link
Contributor

jcaip commented Aug 12, 2024

cc @msaroufim @jerryzh168

Testing locally with torch==2.4.0 and torchao==0.4.0 I don't see any failures for test_integration.py (The test that's failing in CI). Is there an easy way to test in CI to make sure?

Screenshot 2024-08-12 at 4 48 36 PM Screenshot 2024-08-12 at 4 55 33 PM

I'm ~70% sure it's this line that's causing the bug, since it's the only use of get_plain after branch cut.

@msaroufim
Copy link
Member Author

msaroufim commented Aug 12, 2024

wait huh this is not the same result Jerry is getting, lemme first double check the version guard code, I should have time Wednesday to finally finish this

@jerryzh168
Copy link
Contributor

jerryzh168 commented Aug 13, 2024

Testing locally with torch==2.4.0 and torchao==0.4.0 I don't see any failures for test_integration.py (The test that's failing in CI). Is there an easy way to test in CI to make sure?

oh I'm testing with torchao nightly, let me test with 0.4.0 as well

Update: yeah looks like 0.4.0 works, this error only happens in torchao nightly

I think it's related to #609 because previously autoquant does not have multiple levels of tensor subclass

@msaroufim msaroufim requested review from vkuzo and jerryzh168 August 14, 2024 23:55
@jerryzh168
Copy link
Contributor

Testing locally with torch==2.4.0 and torchao==0.4.0 I don't see any failures for test_integration.py (The test that's failing in CI). Is there an easy way to test in CI to make sure?

oh I'm testing with torchao nightly, let me test with 0.4.0 as well

Update: yeah looks like 0.4.0 works, this error only happens in torchao nightly

I think it's related to #609 because previously autoquant does not have multiple levels of tensor subclass

we are waiting for pytorch/pytorch#133262 to be resolved, in the meanwhile I feel it's fine to only run these failed autoquant tests on 2.5+ if it works there, but I remember it failed as well (torch nightly + torchao nightly)

@msaroufim msaroufim merged commit ffa88a4 into main Aug 15, 2024
16 checks passed
yanbing-j pushed a commit to yanbing-j/ao that referenced this pull request Dec 9, 2024
* Pin to the latest stable ExecuTorch commit.

* Pin to the latest stable ExecuTorch commit.

* Rename et-pin to et-pin.txt
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants