Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FluxPipeline silently rounds the generated image shape #9904

Open
albertochimentiinbibo opened this issue Nov 11, 2024 · 4 comments
Open

FluxPipeline silently rounds the generated image shape #9904

albertochimentiinbibo opened this issue Nov 11, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@albertochimentiinbibo
Copy link

albertochimentiinbibo commented Nov 11, 2024

Describe the bug

When prompting the FluxPipeline class to generate an image with shape (1920, 1080), the output image shape is rounded to (1920, 1072) which to me seems like the nearest multiple of 16 instead of 8.
As the FluxPipeline class accepts input sizes divisible by 8 I would expect them to remain consistent throught the generation process.

By giving a quick look at the code it seems that in the FluxPipeline._unpack_latents method, the height and width are floor divided (//) by the vae_scale_factor which is 16.

I would love to understand why the scale factor is set like the following:
https://github.com/huggingface/diffusers/blob/89e4d6219805975bd7d253a267e1951badc9f1c0/src/diffusers/pipelines/flux/pipeline_flux.py#L197C9-L199C10

Reproduction

Here is the minimal code to reproduce the bug, feel free to change the number of inference steps as it should not influence the scope of the test.

from diffusers.pipelines import FluxPipeline
import torch

bf_repo = "black-forest-labs/FLUX.1-dev"

prompt = "Astronaut drinking coffe on the moon."
shape = (1920, 1080)

pipe = FluxPipeline.from_pretrained(bf_repo, torch_dtype=torch.bfloat16)

pipe.enable_sequential_cpu_offload()

image = pipe(
    prompt,
    height=shape[1],
    width=shape[0],
    num_inference_steps=28,
    generator=torch.Generator('cpu').manual_seed(123)
    ).images[0]

print(f"Prompted shape: {shape}")
print(f"Generated shape: {image.size}")
image.show()

Logs

Prompted shape: (1920, 1080)
Generated shape: (1920, 1072)

System Info

  • 🤗 Diffusers version: 0.31.0
  • Platform: Windows-10-10.0.22631-SP0
  • Running on Google Colab?: No
  • Python version: 3.10.6
  • PyTorch version (GPU?): 2.4.1+cu124 (True)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Huggingface_hub version: 0.26.2
  • Transformers version: 4.44.2
  • Accelerate version: 0.34.2
  • PEFT version: not installed
  • Bitsandbytes version: not installed
  • Safetensors version: 0.4.5
  • xFormers version: not installed
  • Accelerator: NVIDIA GeForce RTX 4070 Laptop GPU, 8188 MiB
  • Using GPU in script?: yes
  • Using distributed or parallel set-up in script?: no

Who can help?

@sayakpaul @DN6

@albertochimentiinbibo albertochimentiinbibo added the bug Something isn't working label Nov 11, 2024
@albertochimentiinbibo
Copy link
Author

For the record, on the main branch ( aligned with commit SHA dac623b ) the same script provided above errors with the following traceback:

Traceback (most recent call last):
  File "C:\dev\github-issues\diffusers\test.py", line 13, in <module>
    image = pipe(
  File "...\torch\2.4.1+cu124\python\3.10.6\torch\utils\_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "...\diffusers\pipelines\flux\pipeline_flux.py", line 684, in __call__
    latents, latent_image_ids = self.prepare_latents(
  File "...\diffusers\pipelines\flux\pipeline_flux.py", line 520, in prepare_latents
    latents = self._pack_latents(latents, batch_size, num_channels_latents, height, width)
  File "...\diffusers\pipelines\flux\pipeline_flux.py", line 444, in _pack_latents
    latents = latents.view(batch_size, num_channels_latents, height // 2, 2, width // 2, 2)
RuntimeError: shape '[1, 16, 67, 2, 120, 2]' is invalid for input of size 518400

This happens for all sizes which are not multiple of 16.

Checking out the recent PRs I found this one which seems to change what I was pointing out in the comment above.

@sayakpaul
Copy link
Member

Cc: @yiyixuxu @DN6

@DN6
Copy link
Collaborator

DN6 commented Nov 13, 2024

Hi @albertochimentiinbibo thanks for catching this. You're right the FluxPipeline is meant to work with images that are multiples of 16.

The vae_scale_factor is based on how much the VAE spatially compresses an image to a latent. In Flux's case it's 8 (1024x1024 image gets turned into a 128x128 latent). PR: #9711 made the usage of the scale factor clearer,
but I think we missed the fact that in the previous version the division by 16 was intended to factor in the fact that the latent height and width need to be divisible by 2. This is because the packing step breaks up the latent into 2x2 patches.

I'll open a PR to fix. We'll also raise a warning that the image will be resized to a compatible height, width.

@albertochimentiinbibo
Copy link
Author

Thank you for the feedback @DN6 will be waiting on the fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants