Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance and Multi-GPU Support for FP8 Inference #160

Open
Jinxiaolong1129 opened this issue Jan 2, 2025 · 1 comment
Open

Performance and Multi-GPU Support for FP8 Inference #160

Jinxiaolong1129 opened this issue Jan 2, 2025 · 1 comment

Comments

@Jinxiaolong1129
Copy link

Description

First, thank you for open-sourcing HunyuanVideo and for the awesome work! The availability of FP8 quantized weights significantly reduces GPU memory usage, which is impressive. I’ve been experimenting with FP8 inference on a single H100 GPU, but I encountered some concerns and would like to seek clarification:

  1. Single-GPU FP8 Inference Speed:
    I noticed that when performing inference with FP8 weights on a single H100 GPU, the speed is slower compared to loading and using a standard FP16 model. Could you explain why this is the case? Are there specific optimizations required or limitations with FP8 that might impact performance?

  2. Multi-GPU FP8 Inference:
    Does the current implementation support inference with FP8 weights on multiple H100 GPUs? If not, are there plans to enable multi-GPU support for FP8 models in the near future? Any guidance on how to set this up would be greatly appreciated.

Context

Here is the command I used for FP8 inference:

cd HunyuanVideo

DIT_CKPT_PATH={PATH_TO_FP8_WEIGHTS}/{WEIGHT_NAME}_fp8.pt

python3 sample_video.py \
    --dit-weight ${DIT_CKPT_PATH} \
    --video-size 1280 720 \
    --video-length 129 \
    --infer-steps 50 \
    --prompt "A cat walks on the grass, realistic style." \
    --seed 42 \
    --embedded-cfg-scale 6.0 \
    --flow-shift 7.0 \
    --flow-reverse \
    --use-cpu-offload \
    --use-fp8 \
    --save-path ./results

Any insights or updates on these questions would be immensely helpful for optimizing our workflow.

Thank you!

@ckczzj
Copy link
Collaborator

ckczzj commented Jan 16, 2025

Thanks for your attention. Our Parallel Inference on Multiple GPUs by xDiT also support our FP8 quantized weights.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants