-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using FP8 for inference without CPU offloading can introduce noise. #10302
Comments
Hi @todochenxi. Could you share an example of the noisy outputs? |
|
Same result @todochenxi |
Describe the bug
If I use
pipe.enable_model_cpu_offload(device=device)
, the model can perform inference correctly after warming up. However, if I comment out this line, the inference results are noisy.Reproduction
Logs
No response
System Info
WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:
PyTorch 2.5.1+cu121 with CUDA 1201 (you have 2.4.1+cu121)
Python 3.10.15 (you have 3.10.13)
Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
Memory-efficient attention, SwiGLU, sparse and more won't be available.
Set XFORMERS_MORE_DETAILS=1 for more details
Copy-and-paste the text below in your GitHub issue and FILL OUT the two last points.
NVIDIA GeForce RTX 3090, 24576 MiB
Who can help?
@yiyixuxu @DN6
The text was updated successfully, but these errors were encountered: