huggingface · regisss · Jan 9, 2024 · Dec 18, 2023 · Dec 22, 2023 · Dec 23, 2023
@@ -40,11 +40,38 @@ To get the most out of it, it should be associated with a scheduler that is opti
     - all
 
 
+# GaudiStableDiffusionXLPipeline
+
+The `GaudiStableDiffusionXLPipeline` class enables to perform text-to-image generation on HPUs using SDXL models.
+It inherits from the `GaudiDiffusionPipeline` class that is the parent to any kind of diffuser pipeline.
+
+To get the most out of it, it should be associated with a scheduler that is optimized for HPUs like `GaudiDDIMScheduler`.
+Recommended schedulers are `GaudiEulerDiscreteScheduler` for SDXL base and `GaudiEulerAncestralDiscreteScheduler` for SDXL turbo.
+
+
+## GaudiStableDiffusionXLPipeline
+
+[[autodoc]] diffusers.pipelines.stable_diffusion_xl.pipeline_stable_diffusion_xl.GaudiStableDiffusionXLPipeline
+    - __call__
+
+
+## GaudiEulerDiscreteScheduler
+
+[[autodoc]] diffusers.schedulers.scheduling_euler_discrete.GaudiEulerDiscreteScheduler
+    - all
+
+
+## GaudiEulerAncestralDiscreteScheduler
+
+[[autodoc]] diffusers.schedulers.scheduling_euler_ancestral_discrete.GaudiEulerAncestralDiscreteScheduler
+    - all
+
+
 # GaudiStableDiffusionUpscalePipeline
 
 The `GaudiStableDiffusionUpscalePipeline` is used to enhance the resolution of input images by a factor of 4 on HPUs.
 It inherits from the `GaudiDiffusionPipeline` class that is the parent to any kind of diffuser pipeline.
 
 
 [[autodoc]] diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion_upscale.GaudiStableDiffusionUpscalePipeline
-    - __call__
+    - __call__
@@ -115,6 +115,90 @@ python text_to_image_generation.py \
 > - use [the latest checkpoint](https://huggingface.co/Intel/ldm3d-4c) for generating improved results
 > - use [the pano checkpoint](https://huggingface.co/Intel/ldm3d-pano) to generate panoramic view
 
+### Stable Diffusion XL (SDXL)
+
+Stable Diffusion XL was proposed in [SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis](https://arxiv.org/pdf/2307.01952.pdf) by the Stability AI team.
+
+Here is how to generate SDXL images with a single prompt:
+```python
+python text_to_image_generation.py \
+    --model_name_or_path stabilityai/stable-diffusion-xl-base-1.0 \
+    --prompts "Sailing ship painting by Van Gogh" \
+    --num_images_per_prompt 20 \
+    --batch_size 4 \
+    --image_save_dir /tmp/stable_diffusion_xl_images \
+    --scheduler euler_discrete \
+    --use_habana \
+    --use_hpu_graphs \
+    --gaudi_config Habana/stable-diffusion \
+    --bf16    
+```
+
+> HPU graphs are recommended when generating images by batches to get the fastest possible generations.
+> The first batch of images entails a performance penalty. All subsequent batches will be generated much faster.
+> You can enable this mode with `--use_hpu_graphs`.
+
+Here is how to generate SDXL images with several prompts:
+```python
+python text_to_image_generation.py \
+    --model_name_or_path stabilityai/stable-diffusion-xl-base-1.0 \
+    --prompts "Sailing ship painting by Van Gogh" "A shiny flying horse taking off" \
+    --num_images_per_prompt 20 \
+    --batch_size 8 \
+    --image_save_dir /tmp/stable_diffusion_xl_images \
+    --scheduler euler_discrete \
+    --use_habana \
+    --use_hpu_graphs \
+    --gaudi_config Habana/stable-diffusion \
+    --bf16
+```
+
+SDXL combines a second text encoder (OpenCLIP ViT-bigG/14) with the original text encoder to significantly
+increase the number of parameters. Here is how to generate images with several prompts for both `prompt`
+and `prompt_2` (2nd text encoder), as well as their negative prompts:
+```python
+python text_to_image_generation.py \
+    --model_name_or_path stabilityai/stable-diffusion-xl-base-1.0 \
+    --prompts "Sailing ship painting by Van Gogh" "A shiny flying horse taking off" \
+    --prompts_2 "Red tone" "Blue tone" \
+    --negative_prompts "Low quality" "Sketch" \
+    --negative_prompts_2 "Clouds" "Clouds" \
+    --num_images_per_prompt 20 \
+    --batch_size 8 \
+    --image_save_dir /tmp/stable_diffusion_xl_images \
+    --scheduler euler_discrete \
+    --use_habana \
+    --use_hpu_graphs \
+    --gaudi_config Habana/stable-diffusion \
+    --bf16
+```
+
+> HPU graphs are recommended when generating images by batches to get the fastest possible generations.
+> The first batch of images entails a performance penalty. All subsequent batches will be generated much faster.
+> You can enable this mode with `--use_hpu_graphs`.
+
+### SDXL-Turbo
+SDXL-Turbo is a distilled version of SDXL 1.0, trained for real-time synthesis.
+
+Here is how to generate images with multiple prompts:
+```bash
+python text_to_image_generation.py \
+    --model_name_or_path stabilityai/sdxl-turbo \
+    --prompts "Sailing ship painting by Van Gogh" "A shiny flying horse taking off" \
+    --num_images_per_prompt 20 \
+    --batch_size 8 \
+    --image_save_dir /tmp/stable_diffusion_xl_turbo_images \
+    --scheduler euler_ancestral_discrete \
+    --use_habana \
+    --use_hpu_graphs \
+    --gaudi_config Habana/stable-diffusion \
+    --bf16
+```
+
+> HPU graphs are recommended when generating images by batches to get the fastest possible generations.
+> The first batch of images entails a performance penalty. All subsequent batches will be generated much faster.
+> You can enable this mode with `--use_hpu_graphs`.
+
 
 ## Textual Inversion
 

@@ -20,7 +20,11 @@
 
 import torch
 
-from optimum.habana.diffusers import GaudiDDIMScheduler
+from optimum.habana.diffusers import (
+    GaudiDDIMScheduler,
+    GaudiEulerAncestralDiscreteScheduler,
+    GaudiEulerDiscreteScheduler,
+)
 from optimum.habana.utils import set_seed
 
 
@@ -49,6 +53,14 @@ def main():
         help="Path to pre-trained model",
     )
 
+    parser.add_argument(
+        "--scheduler",
+        default="ddim",
+        choices=["euler_discrete", "euler_ancestral_discrete", "ddim"],
+        type=str,
+        help="Name of scheduler",
+    )
+
     # Pipeline arguments
     parser.add_argument(
         "--prompts",
@@ -57,12 +69,29 @@ def main():
         default="An image of a squirrel in Picasso style",
         help="The prompt or prompts to guide the image generation.",
     )
+    parser.add_argument(
+        "--prompts_2",
+        type=str,
+        nargs="*",
+        default=None,
+        help="The second prompt or prompts to guide the image generation (applicable to SDXL).",
+    )
     parser.add_argument(
         "--num_images_per_prompt", type=int, default=1, help="The number of images to generate per prompt."
     )
     parser.add_argument("--batch_size", type=int, default=1, help="The number of images in a batch.")
-    parser.add_argument("--height", type=int, default=512, help="The height in pixels of the generated images.")
-    parser.add_argument("--width", type=int, default=512, help="The width in pixels of the generated images.")
+    parser.add_argument(
+        "--height",
+        type=int,
+        default=0,
+        help="The height in pixels of the generated images (0=default from model config).",
+    )
+    parser.add_argument(
+        "--width",
+        type=int,
+        default=0,
+        help="The width in pixels of the generated images (0=default from model config).",
+    )
     parser.add_argument(
         "--num_inference_steps",
         type=int,
@@ -89,6 +118,13 @@ def main():
         default=None,
         help="The prompt or prompts not to guide the image generation.",
     )
+    parser.add_argument(
+        "--negative_prompts_2",
+        type=str,
+        nargs="*",
+        default=None,
+        help="The second prompt or prompts not to guide the image generation (applicable to SDXL).",
+    )
     parser.add_argument(
         "--eta",
         type=float,
@@ -139,13 +175,28 @@ def main():
 
     args = parser.parse_args()
 
-    if args.ldm3d:
-        from optimum.habana.diffusers import GaudiStableDiffusionLDM3DPipeline as GaudiStableDiffusionPipeline
+    # Set image resolution
+    res = {}
+    if args.width > 0 and args.height > 0:
+        res["width"] = args.width
+        res["height"] = args.height
+
+    # Import selected pipeline
+    sdxl_models = ["stable-diffusion-xl-base-1.0", "sdxl-turbo"]
 
-        if args.model_name_or_path == "runwayml/stable-diffusion-v1-5":
-            args.model_name_or_path = "Intel/ldm3d-4c"
+    if any(model in args.model_name_or_path for model in sdxl_models):
+        from optimum.habana.diffusers import GaudiStableDiffusionXLPipeline
+
+        sdxl = True
     else:
-        from optimum.habana.diffusers import GaudiStableDiffusionPipeline
+        if args.ldm3d:
+            from optimum.habana.diffusers import GaudiStableDiffusionLDM3DPipeline as GaudiStableDiffusionPipeline
+
+            if args.model_name_or_path == "runwayml/stable-diffusion-v1-5":
+                args.model_name_or_path = "Intel/ldm3d-4c"
+        else:
+            from optimum.habana.diffusers import GaudiStableDiffusionPipeline
+        sdxl = False
 
     # Setup logging
     logging.basicConfig(
@@ -156,36 +207,63 @@ def main():
     logger.setLevel(logging.INFO)
 
     # Initialize the scheduler and the generation pipeline
-    scheduler = GaudiDDIMScheduler.from_pretrained(args.model_name_or_path, subfolder="scheduler")
+    if args.scheduler == "euler_discrete":
+        scheduler = GaudiEulerDiscreteScheduler.from_pretrained(args.model_name_or_path, subfolder="scheduler")
+    elif args.scheduler == "euler_ancestral_discrete":
+        scheduler = GaudiEulerAncestralDiscreteScheduler.from_pretrained(
+            args.model_name_or_path, subfolder="scheduler"
+        )
+    else:
+        scheduler = GaudiDDIMScheduler.from_pretrained(args.model_name_or_path, subfolder="scheduler")
+
     kwargs = {
         "scheduler": scheduler,
         "use_habana": args.use_habana,
         "use_hpu_graphs": args.use_hpu_graphs,
         "gaudi_config": args.gaudi_config_name,
     }
+
     if args.bf16:
         kwargs["torch_dtype"] = torch.bfloat16
-    pipeline = GaudiStableDiffusionPipeline.from_pretrained(
-        args.model_name_or_path,
-        **kwargs,
-    )
 
     # Set seed before running the model
     set_seed(args.seed)
 
     # Generate images
-    outputs = pipeline(
-        prompt=args.prompts,
-        num_images_per_prompt=args.num_images_per_prompt,
-        batch_size=args.batch_size,
-        height=args.height,
-        width=args.width,
-        num_inference_steps=args.num_inference_steps,
-        guidance_scale=args.guidance_scale,
-        negative_prompt=args.negative_prompts,
-        eta=args.eta,
-        output_type=args.output_type,
-    )
+    if sdxl:
+        pipeline = GaudiStableDiffusionXLPipeline.from_pretrained(
+            args.model_name_or_path,
+            **kwargs,
+        )
+        outputs = pipeline(
+            prompt=args.prompts,
+            prompt_2=args.prompts_2,
+            num_images_per_prompt=args.num_images_per_prompt,
+            batch_size=args.batch_size,
+            num_inference_steps=args.num_inference_steps,
+            guidance_scale=args.guidance_scale,
+            negative_prompt=args.negative_prompts,
+            negative_prompt_2=args.negative_prompts_2,
+            eta=args.eta,
+            output_type=args.output_type,
+            **res,
+        )
+    else:
+        pipeline = GaudiStableDiffusionPipeline.from_pretrained(
+            args.model_name_or_path,
+            **kwargs,
+        )
+        outputs = pipeline(
+            prompt=args.prompts,
+            num_images_per_prompt=args.num_images_per_prompt,
+            batch_size=args.batch_size,
+            num_inference_steps=args.num_inference_steps,
+            guidance_scale=args.guidance_scale,
+            negative_prompt=args.negative_prompts,
+            eta=args.eta,
+            output_type=args.output_type,
+            **res,
+        )
 
     # Save the pipeline in the specified directory if not None
     if args.pipeline_save_dir is not None:

@@ -2,4 +2,5 @@
 from .pipelines.stable_diffusion.pipeline_stable_diffusion import GaudiStableDiffusionPipeline
 from .pipelines.stable_diffusion.pipeline_stable_diffusion_ldm3d import GaudiStableDiffusionLDM3DPipeline
 from .pipelines.stable_diffusion.pipeline_stable_diffusion_upscale import GaudiStableDiffusionUpscalePipeline
-from .schedulers import GaudiDDIMScheduler
+from .pipelines.stable_diffusion_xl.pipeline_stable_diffusion_xl import GaudiStableDiffusionXLPipeline
+from .schedulers import GaudiDDIMScheduler, GaudiEulerAncestralDiscreteScheduler, GaudiEulerDiscreteScheduler
@@ -51,6 +51,8 @@
     },
     "optimum.habana.diffusers.schedulers": {
         "GaudiDDIMScheduler": ["save_pretrained", "from_pretrained"],
+        "GaudiEulerDiscreteScheduler": ["save_pretrained", "from_pretrained"],
+        "GaudiEulerAncestralDiscreteScheduler": ["save_pretrained", "from_pretrained"],
     },
 }
 
@@ -112,7 +114,7 @@ def __init__(
                 if bf16_full_eval:
                     logger.warning(
                         "`use_torch_autocast` is True in the given Gaudi configuration but "
-                        "`torch_dtype=torch.blfloat16` was given. Disabling mixed precision and continuing in bf16 only."
+                        "`torch_dtype=torch.bfloat16` was given. Disabling mixed precision and continuing in bf16 only."
                     )
                     self.gaudi_config.use_torch_autocast = False
                 else:

@@ -368,6 +368,7 @@ def __call__(
             # 4. Prepare timesteps
             self.scheduler.set_timesteps(num_inference_steps, device="cpu")
             timesteps = self.scheduler.timesteps.to(device)
+            self.scheduler.reset_timestep_dependent_params()
 
             # 5. Prepare latent variables
             num_channels_latents = self.unet.config.in_channels
@@ -459,7 +460,7 @@ def __call__(
 
                     # compute the previous noisy sample x_t -> x_t-1
                     latents_batch = self.scheduler.step(
-                        noise_pred, latents_batch, **extra_step_kwargs, return_dict=False
+                        noise_pred, timestep, latents_batch, **extra_step_kwargs, return_dict=False
                     )[0]
 
                     if not self.use_hpu_graphs:
@@ -489,8 +490,6 @@ def __call__(
                     image = latents_batch
                 outputs["images"].append(image)
 
-                self.scheduler.reset_timestep_dependent_params()
-
                 if not self.use_hpu_graphs:
                     self.htcore.mark_step()
 

@@ -285,6 +285,7 @@ def __call__(
             # 4. Prepare timesteps
             self.scheduler.set_timesteps(num_inference_steps, device="cpu")
             timesteps = self.scheduler.timesteps.to(device)
+            self.scheduler.reset_timestep_dependent_params()
 
             # 5. Prepare latent variables
             num_channels_latents = self.unet.config.in_channels
@@ -362,7 +363,7 @@ def __call__(
 
                     # compute the previous noisy sample x_t -> x_t-1
                     latents_batch = self.scheduler.step(
-                        noise_pred, latents_batch, **extra_step_kwargs, return_dict=False
+                        noise_pred, timestep, latents_batch, **extra_step_kwargs, return_dict=False
                     )[0]
 
                     if not self.use_hpu_graphs:
@@ -380,8 +381,6 @@ def __call__(
                     image = latents_batch
                 outputs["images"].append(image)
 
-                self.scheduler.reset_timestep_dependent_params()
-
                 if not self.use_hpu_graphs:
                     self.htcore.mark_step()