Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 28 additions & 1 deletion docs/source/package_reference/stable_diffusion_pipeline.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -40,11 +40,38 @@ To get the most out of it, it should be associated with a scheduler that is opti
- all


# GaudiStableDiffusionXLPipeline

The `GaudiStableDiffusionXLPipeline` class enables to perform text-to-image generation on HPUs using SDXL models.
It inherits from the `GaudiDiffusionPipeline` class that is the parent to any kind of diffuser pipeline.

To get the most out of it, it should be associated with a scheduler that is optimized for HPUs like `GaudiDDIMScheduler`.
Recommended schedulers are `GaudiEulerDiscreteScheduler` for SDXL base and `GaudiEulerAncestralDiscreteScheduler` for SDXL turbo.


## GaudiStableDiffusionXLPipeline

[[autodoc]] diffusers.pipelines.stable_diffusion_xl.pipeline_stable_diffusion_xl.GaudiStableDiffusionXLPipeline
- __call__


## GaudiEulerDiscreteScheduler

[[autodoc]] diffusers.schedulers.scheduling_euler_discrete.GaudiEulerDiscreteScheduler
- all


## GaudiEulerAncestralDiscreteScheduler

[[autodoc]] diffusers.schedulers.scheduling_euler_ancestral_discrete.GaudiEulerAncestralDiscreteScheduler
- all


# GaudiStableDiffusionUpscalePipeline

The `GaudiStableDiffusionUpscalePipeline` is used to enhance the resolution of input images by a factor of 4 on HPUs.
It inherits from the `GaudiDiffusionPipeline` class that is the parent to any kind of diffuser pipeline.


[[autodoc]] diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion_upscale.GaudiStableDiffusionUpscalePipeline
- __call__
- __call__
84 changes: 84 additions & 0 deletions examples/stable-diffusion/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,90 @@ python text_to_image_generation.py \
> - use [the latest checkpoint](https://huggingface.co/Intel/ldm3d-4c) for generating improved results
> - use [the pano checkpoint](https://huggingface.co/Intel/ldm3d-pano) to generate panoramic view

### Stable Diffusion XL (SDXL)

Stable Diffusion XL was proposed in [SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis](https://arxiv.org/pdf/2307.01952.pdf) by the Stability AI team.

Here is how to generate SDXL images with a single prompt:
```python
python text_to_image_generation.py \
--model_name_or_path stabilityai/stable-diffusion-xl-base-1.0 \
--prompts "Sailing ship painting by Van Gogh" \
--num_images_per_prompt 20 \
--batch_size 4 \
--image_save_dir /tmp/stable_diffusion_xl_images \
--scheduler euler_discrete \
--use_habana \
--use_hpu_graphs \
--gaudi_config Habana/stable-diffusion \
--bf16
```

> HPU graphs are recommended when generating images by batches to get the fastest possible generations.
> The first batch of images entails a performance penalty. All subsequent batches will be generated much faster.
> You can enable this mode with `--use_hpu_graphs`.

Here is how to generate SDXL images with several prompts:
```python
python text_to_image_generation.py \
--model_name_or_path stabilityai/stable-diffusion-xl-base-1.0 \
--prompts "Sailing ship painting by Van Gogh" "A shiny flying horse taking off" \
--num_images_per_prompt 20 \
--batch_size 8 \
--image_save_dir /tmp/stable_diffusion_xl_images \
--scheduler euler_discrete \
--use_habana \
--use_hpu_graphs \
--gaudi_config Habana/stable-diffusion \
--bf16
```

SDXL combines a second text encoder (OpenCLIP ViT-bigG/14) with the original text encoder to significantly
increase the number of parameters. Here is how to generate images with several prompts for both `prompt`
and `prompt_2` (2nd text encoder), as well as their negative prompts:
```python
python text_to_image_generation.py \
--model_name_or_path stabilityai/stable-diffusion-xl-base-1.0 \
--prompts "Sailing ship painting by Van Gogh" "A shiny flying horse taking off" \
--prompts_2 "Red tone" "Blue tone" \
--negative_prompts "Low quality" "Sketch" \
--negative_prompts_2 "Clouds" "Clouds" \
--num_images_per_prompt 20 \
--batch_size 8 \
--image_save_dir /tmp/stable_diffusion_xl_images \
--scheduler euler_discrete \
--use_habana \
--use_hpu_graphs \
--gaudi_config Habana/stable-diffusion \
--bf16
```

> HPU graphs are recommended when generating images by batches to get the fastest possible generations.
> The first batch of images entails a performance penalty. All subsequent batches will be generated much faster.
> You can enable this mode with `--use_hpu_graphs`.

### SDXL-Turbo
SDXL-Turbo is a distilled version of SDXL 1.0, trained for real-time synthesis.

Here is how to generate images with multiple prompts:
```bash
python text_to_image_generation.py \
--model_name_or_path stabilityai/sdxl-turbo \
--prompts "Sailing ship painting by Van Gogh" "A shiny flying horse taking off" \
--num_images_per_prompt 20 \
--batch_size 8 \
--image_save_dir /tmp/stable_diffusion_xl_turbo_images \
--scheduler euler_ancestral_discrete \
--use_habana \
--use_hpu_graphs \
--gaudi_config Habana/stable-diffusion \
--bf16
```

> HPU graphs are recommended when generating images by batches to get the fastest possible generations.
> The first batch of images entails a performance penalty. All subsequent batches will be generated much faster.
> You can enable this mode with `--use_hpu_graphs`.


## Textual Inversion

Expand Down
128 changes: 103 additions & 25 deletions examples/stable-diffusion/text_to_image_generation.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,11 @@

import torch

from optimum.habana.diffusers import GaudiDDIMScheduler
from optimum.habana.diffusers import (
GaudiDDIMScheduler,
GaudiEulerAncestralDiscreteScheduler,
GaudiEulerDiscreteScheduler,
)
from optimum.habana.utils import set_seed


Expand Down Expand Up @@ -49,6 +53,14 @@ def main():
help="Path to pre-trained model",
)

parser.add_argument(
"--scheduler",
default="ddim",
choices=["euler_discrete", "euler_ancestral_discrete", "ddim"],
type=str,
help="Name of scheduler",
)

# Pipeline arguments
parser.add_argument(
"--prompts",
Expand All @@ -57,12 +69,29 @@ def main():
default="An image of a squirrel in Picasso style",
help="The prompt or prompts to guide the image generation.",
)
parser.add_argument(
"--prompts_2",
type=str,
nargs="*",
default=None,
help="The second prompt or prompts to guide the image generation (applicable to SDXL).",
)
parser.add_argument(
"--num_images_per_prompt", type=int, default=1, help="The number of images to generate per prompt."
)
parser.add_argument("--batch_size", type=int, default=1, help="The number of images in a batch.")
parser.add_argument("--height", type=int, default=512, help="The height in pixels of the generated images.")
parser.add_argument("--width", type=int, default=512, help="The width in pixels of the generated images.")
parser.add_argument(
"--height",
type=int,
default=0,
help="The height in pixels of the generated images (0=default from model config).",
)
parser.add_argument(
"--width",
type=int,
default=0,
help="The width in pixels of the generated images (0=default from model config).",
)
parser.add_argument(
"--num_inference_steps",
type=int,
Expand All @@ -89,6 +118,13 @@ def main():
default=None,
help="The prompt or prompts not to guide the image generation.",
)
parser.add_argument(
"--negative_prompts_2",
type=str,
nargs="*",
default=None,
help="The second prompt or prompts not to guide the image generation (applicable to SDXL).",
)
parser.add_argument(
"--eta",
type=float,
Expand Down Expand Up @@ -139,13 +175,28 @@ def main():

args = parser.parse_args()

if args.ldm3d:
from optimum.habana.diffusers import GaudiStableDiffusionLDM3DPipeline as GaudiStableDiffusionPipeline
# Set image resolution
res = {}
if args.width > 0 and args.height > 0:
res["width"] = args.width
res["height"] = args.height

# Import selected pipeline
sdxl_models = ["stable-diffusion-xl-base-1.0", "sdxl-turbo"]

if args.model_name_or_path == "runwayml/stable-diffusion-v1-5":
args.model_name_or_path = "Intel/ldm3d-4c"
if any(model in args.model_name_or_path for model in sdxl_models):
from optimum.habana.diffusers import GaudiStableDiffusionXLPipeline

sdxl = True
else:
from optimum.habana.diffusers import GaudiStableDiffusionPipeline
if args.ldm3d:
from optimum.habana.diffusers import GaudiStableDiffusionLDM3DPipeline as GaudiStableDiffusionPipeline

if args.model_name_or_path == "runwayml/stable-diffusion-v1-5":
args.model_name_or_path = "Intel/ldm3d-4c"
else:
from optimum.habana.diffusers import GaudiStableDiffusionPipeline
sdxl = False

# Setup logging
logging.basicConfig(
Expand All @@ -156,36 +207,63 @@ def main():
logger.setLevel(logging.INFO)

# Initialize the scheduler and the generation pipeline
scheduler = GaudiDDIMScheduler.from_pretrained(args.model_name_or_path, subfolder="scheduler")
if args.scheduler == "euler_discrete":
scheduler = GaudiEulerDiscreteScheduler.from_pretrained(args.model_name_or_path, subfolder="scheduler")
elif args.scheduler == "euler_ancestral_discrete":
scheduler = GaudiEulerAncestralDiscreteScheduler.from_pretrained(
args.model_name_or_path, subfolder="scheduler"
)
else:
scheduler = GaudiDDIMScheduler.from_pretrained(args.model_name_or_path, subfolder="scheduler")

kwargs = {
"scheduler": scheduler,
"use_habana": args.use_habana,
"use_hpu_graphs": args.use_hpu_graphs,
"gaudi_config": args.gaudi_config_name,
}

if args.bf16:
kwargs["torch_dtype"] = torch.bfloat16
pipeline = GaudiStableDiffusionPipeline.from_pretrained(
args.model_name_or_path,
**kwargs,
)

# Set seed before running the model
set_seed(args.seed)

# Generate images
outputs = pipeline(
prompt=args.prompts,
num_images_per_prompt=args.num_images_per_prompt,
batch_size=args.batch_size,
height=args.height,
width=args.width,
num_inference_steps=args.num_inference_steps,
guidance_scale=args.guidance_scale,
negative_prompt=args.negative_prompts,
eta=args.eta,
output_type=args.output_type,
)
if sdxl:
pipeline = GaudiStableDiffusionXLPipeline.from_pretrained(
args.model_name_or_path,
**kwargs,
)
outputs = pipeline(
prompt=args.prompts,
prompt_2=args.prompts_2,
num_images_per_prompt=args.num_images_per_prompt,
batch_size=args.batch_size,
num_inference_steps=args.num_inference_steps,
guidance_scale=args.guidance_scale,
negative_prompt=args.negative_prompts,
negative_prompt_2=args.negative_prompts_2,
eta=args.eta,
output_type=args.output_type,
**res,
)
else:
pipeline = GaudiStableDiffusionPipeline.from_pretrained(
args.model_name_or_path,
**kwargs,
)
outputs = pipeline(
prompt=args.prompts,
num_images_per_prompt=args.num_images_per_prompt,
batch_size=args.batch_size,
num_inference_steps=args.num_inference_steps,
guidance_scale=args.guidance_scale,
negative_prompt=args.negative_prompts,
eta=args.eta,
output_type=args.output_type,
**res,
)

# Save the pipeline in the specified directory if not None
if args.pipeline_save_dir is not None:
Expand Down
3 changes: 2 additions & 1 deletion optimum/habana/diffusers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,5 @@
from .pipelines.stable_diffusion.pipeline_stable_diffusion import GaudiStableDiffusionPipeline
from .pipelines.stable_diffusion.pipeline_stable_diffusion_ldm3d import GaudiStableDiffusionLDM3DPipeline
from .pipelines.stable_diffusion.pipeline_stable_diffusion_upscale import GaudiStableDiffusionUpscalePipeline
from .schedulers import GaudiDDIMScheduler
from .pipelines.stable_diffusion_xl.pipeline_stable_diffusion_xl import GaudiStableDiffusionXLPipeline
from .schedulers import GaudiDDIMScheduler, GaudiEulerAncestralDiscreteScheduler, GaudiEulerDiscreteScheduler
4 changes: 3 additions & 1 deletion optimum/habana/diffusers/pipelines/pipeline_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,8 @@
},
"optimum.habana.diffusers.schedulers": {
"GaudiDDIMScheduler": ["save_pretrained", "from_pretrained"],
"GaudiEulerDiscreteScheduler": ["save_pretrained", "from_pretrained"],
"GaudiEulerAncestralDiscreteScheduler": ["save_pretrained", "from_pretrained"],
},
}

Expand Down Expand Up @@ -112,7 +114,7 @@ def __init__(
if bf16_full_eval:
logger.warning(
"`use_torch_autocast` is True in the given Gaudi configuration but "
"`torch_dtype=torch.blfloat16` was given. Disabling mixed precision and continuing in bf16 only."
"`torch_dtype=torch.bfloat16` was given. Disabling mixed precision and continuing in bf16 only."
)
self.gaudi_config.use_torch_autocast = False
else:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -368,6 +368,7 @@ def __call__(
# 4. Prepare timesteps
self.scheduler.set_timesteps(num_inference_steps, device="cpu")
timesteps = self.scheduler.timesteps.to(device)
self.scheduler.reset_timestep_dependent_params()

# 5. Prepare latent variables
num_channels_latents = self.unet.config.in_channels
Expand Down Expand Up @@ -459,7 +460,7 @@ def __call__(

# compute the previous noisy sample x_t -> x_t-1
latents_batch = self.scheduler.step(
noise_pred, latents_batch, **extra_step_kwargs, return_dict=False
noise_pred, timestep, latents_batch, **extra_step_kwargs, return_dict=False
)[0]

if not self.use_hpu_graphs:
Expand Down Expand Up @@ -489,8 +490,6 @@ def __call__(
image = latents_batch
outputs["images"].append(image)

self.scheduler.reset_timestep_dependent_params()

if not self.use_hpu_graphs:
self.htcore.mark_step()

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -285,6 +285,7 @@ def __call__(
# 4. Prepare timesteps
self.scheduler.set_timesteps(num_inference_steps, device="cpu")
timesteps = self.scheduler.timesteps.to(device)
self.scheduler.reset_timestep_dependent_params()

# 5. Prepare latent variables
num_channels_latents = self.unet.config.in_channels
Expand Down Expand Up @@ -362,7 +363,7 @@ def __call__(

# compute the previous noisy sample x_t -> x_t-1
latents_batch = self.scheduler.step(
noise_pred, latents_batch, **extra_step_kwargs, return_dict=False
noise_pred, timestep, latents_batch, **extra_step_kwargs, return_dict=False
)[0]

if not self.use_hpu_graphs:
Expand All @@ -380,8 +381,6 @@ def __call__(
image = latents_batch
outputs["images"].append(image)

self.scheduler.reset_timestep_dependent_params()

if not self.use_hpu_graphs:
self.htcore.mark_step()

Expand Down
Loading