-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding torch.compile
compatiblity
#9
base: main
Are you sure you want to change the base?
Conversation
Hi @mitkotak, sorry about the delayed response - I didn't have notifications for this repo turned on(?). For your specific PR, I'll give it a try as soon as I get a chance, so thank you very much for pointing out the recent optimizations! RE: upstreaming back to |
No worries ! Appreciate the follow up and thanks again for the cool work. Yup agreed on the numerical part. Have been seeing similar funky behavior but hopefully we can contain it with some rigorous numerical asserts. Also do you guys have a TF32 equivalent for XPUs cause I have seen the numerics get funky with TF32 so was wondering what it might look like on XPUs If you are interested, I am happy to put all of this into our experimental folder and then we can test out the numerics with downstream users who might be interested in testing this out. |
XPUs do have TF32 support, but AFAIK it's not enabled by default for the same reason as why it's no longer default for PyTorch in general since it's to do with the RE: experimental folder we would like to see these kernels get used somehow, and I think that could be a good way to go. Just so you're aware we also have the monkey patching option here as an under-the-hood replacement, so in principle users could also install this repo and just import the patch to not have to make further code changes. That way if these kernels get updated, you won't have to manually port things over back to About a month or two ago I'd also worked on a branch in my fork that I haven't had time to merge in yet, but we also wrote a paper about the irreducible representation latent space. If you're interested in reading it/chatting about it, I can send you a copy to your email. |
Yup training is fine since we even use mixed-precision there. Yup not stable enough for MD.
Ahh gotcha cool yup that makes sense then. Yeah we can just have a flag that folks can turn on if they need the kernels. We will just have to make sure they don't cause any graph breaks. |
Hey
e3nn
developer here. Thanks for the great work ! Interested in upstreaming the kernels into the main repo if you guys are interested.We recently landed full PT2 compatibility. I re-ran the benchmarks and results were pretty dramatic. This was on an RTX A5500. I was wondering if it's possible to rerun the benchmarks on an XPU.
Interested in figuring out when to pivot to custom kernels and when to let Inductor take charge.
Also please feel free to edit my PR to suit your needs. Just wanted to point out the flag changes.
Thanks again !
Update: Did not check numerical accuracy so maybe that might be leading to some gains ? The flipping of the order is also interesting since I would have expected the approach highlighted in the paper to be better for higher Ls.
Speedup = e3nn time / custom triton kernel time