Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HunyuanVideoPipeline produces NaN values #10314

Open
smedegaard opened this issue Dec 20, 2024 · 10 comments
Open

HunyuanVideoPipeline produces NaN values #10314

smedegaard opened this issue Dec 20, 2024 · 10 comments
Labels
bug Something isn't working

Comments

@smedegaard
Copy link

Describe the bug

Running diffusers.utils.export_to_video() on the output of HunyuanVideoPipeline results in

/app/diffusers/src/diffusers/image_processor.py:147: RuntimeWarning: invalid value encountered in cast
  images = (images * 255).round().astype("uint8")

After adding some checks to numpy_to_pil() in image_processor.py I have confirmed that the output contains NaN values

  File "/app/pipeline.py", line 37, in <module>
    output = pipe(
             ^^^^^
  File "/usr/local/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/app/diffusers/src/diffusers/pipelines/hunyuan_video/pipeline_hunyuan_video.py", line 677, in __call__
    video = self.video_processor.postprocess_video(video, output_type=output_type)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/diffusers/src/diffusers/video_processor.py", line 103, in postprocess_video
    batch_output = self.postprocess(batch_vid, output_type)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/diffusers/src/diffusers/image_processor.py", line 823, in postprocess
    return self.numpy_to_pil(image)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/diffusers/src/diffusers/image_processor.py", line 158, in numpy_to_pil
    raise ValueError("Image array contains NaN values")
ValueError: Image array contains NaN values

Reproduction

import os
import time

import torch
from diffusers import HunyuanVideoPipeline, HunyuanVideoTransformer3DModel
from diffusers.utils import export_to_video
from huggingface_hub import snapshot_download
from torch.profiler import ProfilerActivity, profile, record_function

os.environ["TOKENIZERS_PARALLELISM"] = "false"


MODEL_ID = "tencent/HunyuanVideo"
PROMPT = "a whale shark floating through outer space"
profile_dir = os.environ.get("PROFILE_OUT_PATH", "./")
profile_file_name = os.environ.get("PROFILE_OUT_FILE_NAME", "hunyuan_profile.json")
profile_path = os.path.join(profile_dir, profile_file_name)

transformer = HunyuanVideoTransformer3DModel.from_pretrained(
    MODEL_ID, subfolder="transformer", torch_dtype=torch.float16, revision="refs/pr/18"
)
pipe = HunyuanVideoPipeline.from_pretrained(
    MODEL_ID, transformer=transformer, torch_dtype=torch.float16, revision="refs/pr/18"
)
pipe.vae.enable_tiling()
pipe.to("cuda")

print(f"\nStarting profiling of {MODEL_ID}\n")

with profile(
    activities=[ProfilerActivity.CPU, ProfilerActivity.CUDA], record_shapes=True
) as prof:
    with record_function("model_inference"):
        output = pipe(
            prompt=PROMPT,
            height=320,
            width=512,
            num_frames=61,
            num_inference_steps=30,
        )

# Export and print profiling results
print(prof.key_averages().table(sort_by="cuda_time_total", row_limit=10))
prof.export_chrome_trace(profile_path)
print(f"{profile_file_name} ready")

# export video
video = output.frames[0]

print(" ====== raw video matrix =====")
print(video)
print()

print(" ====== Exporting video =====")
export_to_video(video, "hunyuan_example.mp4", fps=15)
print()

Logs

No response

System Info

GPU: AMD MI300X

ARG BASE_IMAGE=python:3.11-slim
FROM ${BASE_IMAGE}

ENV PYTHONBUFFERED=true
ENV CUDA_VISIBLE_DEVICES=0

WORKDIR /app

# Install tools
RUN apt-get update && \
    apt-get install -y --no-install-recommends \
    git \
    libgl1-mesa-glx \
    libglib2.0-0 \
    libsm6 \
    libxext6 \
    libxrender-dev \
    libfontconfig1 \
    ffmpeg \
    build-essential && \
    rm -rf /var/lib/apt/lists/*

# install ROCm pytorch and python dependencies
RUN python -m pip install --no-cache-dir \
    torch torchvision --index-url https://download.pytorch.org/whl/rocm6.2 && \
    python -m pip install --no-cache-dir \
    accelerate transformers sentencepiece protobuf opencv-python imageio imageio-ffmpeg

# install diffusers from source to include newest pipeline classes
COPY diffusers diffusers
RUN cd diffusers && \
    python -m pip install -e .

# Copy the profiling script
ARG PIPELINE_FILE
COPY ${PIPELINE_FILE} pipeline.py

# run the script
CMD ["python", "pipeline.py"]

Who can help?

@DN6 @a-r-r-o-w

@smedegaard smedegaard added the bug Something isn't working label Dec 20, 2024
@a-r-r-o-w
Copy link
Member

Transformer needs to be in bfloat16. Could you try with that?

@smedegaard
Copy link
Author

Transformer needs to be in bfloat16. Could you try with that?

Same result @a-r-r-o-w

@hlky
Copy link
Collaborator

hlky commented Dec 20, 2024

On CUDA we've seen the same issue when not using the latest PyTorch, from torch torchvision --index-url https://download.pytorch.org/whl/rocm6.2 it looks like you should have either 2.5.0 or 2.5.1, if it's 2.5.0 can you try 2.5.1 and if it's 2.5.1 can you try nightly?

@smedegaard
Copy link
Author

Thanks for the suggestion @hlky , I'll try some more combinations.

ROCm pytorch result
6.4 2.6.0a0+gitb7a45db ⛔ NaN values in output

@tanshuai0219
Copy link

Same, I also get nan value.

@a-r-r-o-w
Copy link
Member

@tanshuai0219 Is this on a CUDA GPU or MPS/ROCm? I'm unable to replicate when using the transformer in bfloat16 on torch >= 2.5. I can try some previous versions of pytorch to try and make it work for CUDA devices, but for other devices, I'm afraid we will need help from the community in making it work

@tanshuai0219
Copy link

@tanshuai0219 Is this on a CUDA GPU or MPS/ROCm? I'm unable to replicate when using the transformer in bfloat16 on torch >= 2.5. I can try some previous versions of pytorch to try and make it work for CUDA devices, but for other devices, I'm afraid we will need help from the community in making it work

Yes, it's on a CUDA GPU, CUDA version: 12.4
I pull the latest version of diffusers, and use
pip install -e . to install diffusers.

Then I run:

`import torch
from diffusers import HunyuanVideoPipeline, HunyuanVideoTransformer3DModel
from diffusers.utils import export_to_video

model_id = "hunyuanvideo-community/HunyuanVideo"
transformer = HunyuanVideoTransformer3DModel.from_pretrained(
model_id, subfolder="transformer", torch_dtype=torch.bfloat16
)
pipe = HunyuanVideoPipeline.from_pretrained(model_id, transformer=transformer, torch_dtype=torch.float16)
pipe.vae.enable_tiling()
pipe.to("cuda")

output = pipe(
prompt="A cat walks on the grass, realistic",
height=320,
width=512,
num_frames=61,
num_inference_steps=30,
).frames[0]

import numpy as np

print(np.array(output[0]))

export_to_video(output, "output.mp4", fps=15)`


np.array(output[0]) is all zero.
and the saved output.mp4 is all black:

output.mp4

@a-r-r-o-w
Copy link
Member

Can you share the output of diffusers-cli env? I verified once more that it works for me. I'll take a look at other torch versions soon. Here's my output:

- 🤗 Diffusers version: 0.33.0.dev0
- Platform: Linux-5.4.0-166-generic-x86_64-with-glibc2.31
- Running on Google Colab?: No
- Python version: 3.10.14
- PyTorch version (GPU?): 2.5.1+cu124 (True)
- Flax version (CPU?/GPU?/TPU?): 0.8.5 (cpu)
- Jax version: 0.4.31
- JaxLib version: 0.4.31
- Huggingface_hub version: 0.26.2
- Transformers version: 4.48.0.dev0
- Accelerate version: 1.1.0.dev0
- PEFT version: 0.13.3.dev0
- Bitsandbytes version: 0.43.3
- Safetensors version: 0.4.5
- xFormers version: not installed
- Accelerator: NVIDIA A100-SXM4-80GB, 81920 MiB
NVIDIA A100-SXM4-80GB, 81920 MiB
NVIDIA A100-SXM4-80GB, 81920 MiB
NVIDIA DGX Display, 4096 MiB
NVIDIA A100-SXM4-80GB, 81920 MiB
- Using GPU in script?: <fill in>
- Using distributed or parallel set-up in script?: <fill in>
output.mp4

@tanshuai0219
Copy link

Can you share the output of diffusers-cli env? I verified once more that it works for me. I'll take a look at other torch versions soon. Here's my output:

- 🤗 Diffusers version: 0.33.0.dev0
- Platform: Linux-5.4.0-166-generic-x86_64-with-glibc2.31
- Running on Google Colab?: No
- Python version: 3.10.14
- PyTorch version (GPU?): 2.5.1+cu124 (True)
- Flax version (CPU?/GPU?/TPU?): 0.8.5 (cpu)
- Jax version: 0.4.31
- JaxLib version: 0.4.31
- Huggingface_hub version: 0.26.2
- Transformers version: 4.48.0.dev0
- Accelerate version: 1.1.0.dev0
- PEFT version: 0.13.3.dev0
- Bitsandbytes version: 0.43.3
- Safetensors version: 0.4.5
- xFormers version: not installed
- Accelerator: NVIDIA A100-SXM4-80GB, 81920 MiB
NVIDIA A100-SXM4-80GB, 81920 MiB
NVIDIA A100-SXM4-80GB, 81920 MiB
NVIDIA DGX Display, 4096 MiB
NVIDIA A100-SXM4-80GB, 81920 MiB
- Using GPU in script?: <fill in>
- Using distributed or parallel set-up in script?: <fill in>

output.mp4

here is mine:

  • 🤗 Diffusers version: 0.33.0.dev0
  • Platform: Linux
  • Running on Google Colab?: No
  • Python version: 3.10.14
  • PyTorch version (GPU?): 2.4.0+cu124 (True)
  • Huggingface_hub version: 0.24.2
  • Transformers version: 4.46.3
  • Accelerate version: 0.33.0
  • PEFT version: 0.12.0
  • Bitsandbytes version: 0.43.2
  • Safetensors version: 0.4.3
  • xFormers version: 0.0.27
  • Accelerator: NVIDIA A100-80GB, 81920 MiB
  • Using GPU in script?:
  • Using distributed or parallel set-up in script?:

@tanshuai0219
Copy link

Can you share the output of diffusers-cli env? I verified once more that it works for me. I'll take a look at other torch versions soon. Here's my output:

- 🤗 Diffusers version: 0.33.0.dev0
- Platform: Linux-5.4.0-166-generic-x86_64-with-glibc2.31
- Running on Google Colab?: No
- Python version: 3.10.14
- PyTorch version (GPU?): 2.5.1+cu124 (True)
- Flax version (CPU?/GPU?/TPU?): 0.8.5 (cpu)
- Jax version: 0.4.31
- JaxLib version: 0.4.31
- Huggingface_hub version: 0.26.2
- Transformers version: 4.48.0.dev0
- Accelerate version: 1.1.0.dev0
- PEFT version: 0.13.3.dev0
- Bitsandbytes version: 0.43.3
- Safetensors version: 0.4.5
- xFormers version: not installed
- Accelerator: NVIDIA A100-SXM4-80GB, 81920 MiB
NVIDIA A100-SXM4-80GB, 81920 MiB
NVIDIA A100-SXM4-80GB, 81920 MiB
NVIDIA DGX Display, 4096 MiB
NVIDIA A100-SXM4-80GB, 81920 MiB
- Using GPU in script?: <fill in>
- Using distributed or parallel set-up in script?: <fill in>

output.mp4

here is mine:

  • 🤗 Diffusers version: 0.33.0.dev0
  • Platform: Linux
  • Running on Google Colab?: No
  • Python version: 3.10.14
  • PyTorch version (GPU?): 2.4.0+cu124 (True)
  • Huggingface_hub version: 0.24.2
  • Transformers version: 4.46.3
  • Accelerate version: 0.33.0
  • PEFT version: 0.12.0
  • Bitsandbytes version: 0.43.2
  • Safetensors version: 0.4.3
  • xFormers version: 0.0.27
  • Accelerator: NVIDIA A100-80GB, 81920 MiB
  • Using GPU in script?:
  • Using distributed or parallel set-up in script?:

If I upgrade the transformers from 4.46.3 to 4.48.0.dev0, I get the error like:
RuntimeError: Failed to import diffusers.pipelines.hunyuan_video.pipeline_hunyuan_video because of the following error (look up to see its traceback): Failed to import diffusers.loaders.lora_pipeline because of the following error (look up to see its traceback): cannot import name 'shard_checkpoint' from 'transformers.modeling_utils'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants