Stable Diffusion XL is a cutting-edge AI model designed for generating high-resolution images from textual descriptions. This model, developed by Stability AI, leverages the power of deep learning to transform text prompts into vivid and detailed images, offering new horizons in the field of digital art and design. Refer to the Stable Diffusion XL for more details.
To get started, ensure you have:
- Python version 3.10.6
- A suitable development environment
Python is the backbone of this project, and Pip is the package manager used to install necessary libraries. First, ensure you have Python installed. Then, install or upgrade Pip using:
python3 -m pip install --user --upgrade pip
This command ensures you have the latest version of Pip installed.
A virtual environment in Python is a self-contained directory that contains a Python installation for a particular project. This approach avoids conflicts between project dependencies. Create and activate a virtual environment with:
python3 -m venv .venv
source .venv/bin/activate
To access pre-trained models, including Stable Diffusion XL, sign up for a Hugging Face account. This platform hosts various machine learning models and offers community and professional support.
Install Diffusers and Gradio using Pip. Diffusers is a Hugging Face library offering pre-trained diffusion models for various tasks. Gradio is an open-source library for creating user-friendly web interfaces for machine learning models.
pip install diffusers gradio torch transformers accelerate
Write the following code in a file named main.py
. This script sets up a basic Gradio interface to interact with the Stable Diffusion XL model. In this we are using Stable Diffusion XL Base model stabilityai/stable-diffusion-xl-base-1.0
which is provided by Stability AI. Source code is available at https://github.com/Stability-AI/generative-models . And to using the model we are using StableDiffusionXLPipeline from diffusers for text-to-image generation.
import gradio as gr
from diffusers import StableDiffusionXLPipeline
import torch
# Load the model with specified parameters
base = StableDiffusionXLPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16, variant="fp16", use_safetensors=True,
)
base.to("cuda")
def query(payload):
# Generate image based on the text prompt
image = base(prompt=payload).images[0]
return image
# Setup Gradio interface
demo = gr.Interface(fn=query, inputs="text", outputs="image")
if __name__ == "__main__":
demo.launch(show_api=False)
To run this script, use:
python3 main.py
This will provide you with a URL. When you open this URL, it will take you to the Gradio interface where you can input your prompts.
For generating multiple images, modify the script as follows.
import gradio as gr
from diffusers import StableDiffusionXLPipeline
import torch
# Load the model
base = StableDiffusionXLPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16, variant="fp16", use_safetensors=True,
)
base.to("cuda")
def query(payload):
# Generate multiple images based on the text prompt
image = base(prompt=payload, num_images_per_prompt=2).images
return image
# Setup Gradio interface for multiple images
demo = gr.Interface(fn=query, inputs="text", outputs="gallery")
if __name__ == "__main__":
demo.launch(show_api=False)
For refined images, modify your script to include the base stabilityai/stable-diffusion-xl-base-1.0
and refiner stabilityai/stable-diffusion-xl-refiner-1.0
models. As we now using 2 models at a time so we need to use DiffusionPipeline. which support the bundling of multiple independently-trained models, schedulers, and processors into a single end-to-end class.
import gradio as gr
from diffusers import DiffusionPipeline
import torch
# Load base and refiner models with specified parameters
base = DiffusionPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
)
base.to("cuda")
refiner = DiffusionPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-refiner-1.0",
text_encoder_2=base.text_encoder_2,
vae=base.vae,
torch_dtype=torch.float16,
use_safetensors=True,
variant="fp16",
)
refiner.to("cuda")
# Define processing steps
n_steps = 40
high_noise_frac = 0.8
def query(payload):
# Generate and refine image based on the text prompt
image = base(prompt=payload, num_inference_steps=n_steps, denoising_end=high_noise_frac, output_type="latent").images
image = refiner(prompt=payload, num_inference_steps=n_steps, denoising_start=high_noise_frac, image=image).images[0]
return image
# Setup Gradio interface for refined images
demo = gr.Interface(fn=query, inputs="text", outputs="image")
if __name__ == "__main__":
demo.launch(show_api=False)
By following these steps, you can set up and use Stable Diffusion XL on your local machine. Whether you're a hobbyist or a professional, this tool offers an exciting way to explore AI-powered image generation.
Refer to the Hugging Face documentation for more advanced usage.