Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve readme of depth_guided_stable_diffusion example #5593

Merged
merged 17 commits into from
Apr 4, 2024
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/content/getting-started/troubleshooting.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ since we don't support all of these settings equally well.

When using Wgpu's Vulkan backend (the default on Windows & Linux) on a computer that has both integrated and dedicated GPUs, a lot of issues can arise from Vulkan either picking the "wrong" GPU at runtime, or even simply from the fact that this choice conflicts with other driver picking technologies (e.g. NVIDIA Optimus).

In both cases, forcing Vulkan to pick either the integrated or discrete GPU (try both!) using the [`VK_ICD_FILENAMES`](https://vulkan.lunarg.com/doc/view/1.3.204.1/mac/LoaderDriverInterface.html#user-content-driver-discovery) environment variable might help with crashes, artifacts and bad performance. E.g.:
In both cases, forcing Vulkan to pick either the integrated or discrete GPU (try both!) using the [`VK_ICD_FILENAMES`](https://vulkan.lunarg.com/doc/view/1.3.280.0/mac/LoaderDriverInterface.html#user-content-driver-discovery) environment variable might help with crashes, artifacts and bad performance. E.g.:
- Force the Intel integrated GPU:
- Linux: `export VK_ICD_FILENAMES=/usr/share/vulkan/icd.d/intel.json`.
- Force the discrete Nvidia GPU:
Expand Down
3 changes: 3 additions & 0 deletions docs/cspell.json
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,8 @@
"DCMAKE",
"deallocate",
"deallocation",
"denoising",
"Denoising",
"debuginfo",
"dedup",
"depgraph",
Expand Down Expand Up @@ -365,6 +367,7 @@
"UI's",
"uncollapsed",
"unmultiplied",
"uncond",
"Unorm",
"unsetting",
"upcasting",
Expand Down
99 changes: 91 additions & 8 deletions examples/python/depth_guided_stable_diffusion/README.md
Original file line number Diff line number Diff line change
@@ -1,25 +1,108 @@
<!--[metadata]
title = "Depth Guided Stable Diffusion"
tags = ["2D", "depth", "huggingface", "stable-diffusion", "tensor", "text"]
tags = ["depth-guided", "stable-diffusion", "huggingface", "3D", "tensor", "text"]
description = "Leverage Depth Guided Stable Diffusion to generate images with enhanced depth perception. This method integrates depth maps to guide the Stable Diffusion model, creating more visually compelling and contextually accurate images."
thumbnail = "https://static.rerun.io/depth_guided_stable_diffusion/a85516aba09f72649517891d767e15383ce7f4ea/480w.png"
thumbnail_dimensions = [480, 253]
abey79 marked this conversation as resolved.
Show resolved Hide resolved
channel = "nightly"
-->

<picture>
<img src="https://static.rerun.io/depth-guided-stable-diffusion/bea9bfaf33ebed4296f576d931c8c8e6fdd08a21/full.png" alt="Depth-guided stable diffusion screenshot">
<source media="(max-width: 480px)" srcset="https://static.rerun.io/depth-guided-stable-diffusion/bea9bfaf33ebed4296f576d931c8c8e6fdd08a21/480w.png">
<source media="(max-width: 768px)" srcset="https://static.rerun.io/depth-guided-stable-diffusion/bea9bfaf33ebed4296f576d931c8c8e6fdd08a21/768w.png">
<source media="(max-width: 1024px)" srcset="https://static.rerun.io/depth-guided-stable-diffusion/bea9bfaf33ebed4296f576d931c8c8e6fdd08a21/1024w.png">
<source media="(max-width: 1200px)" srcset="https://static.rerun.io/depth-guided-stable-diffusion/bea9bfaf33ebed4296f576d931c8c8e6fdd08a21/1200w.png">
<source media="(max-width: 480px)" srcset="https://static.rerun.io/depth_guided_stable_diffusion/a85516aba09f72649517891d767e15383ce7f4ea/480w.png">
<source media="(max-width: 768px)" srcset="https://static.rerun.io/depth_guided_stable_diffusion/a85516aba09f72649517891d767e15383ce7f4ea/768w.png">
<source media="(max-width: 1024px)" srcset="https://static.rerun.io/depth_guided_stable_diffusion/a85516aba09f72649517891d767e15383ce7f4ea/1024w.png">
<source media="(max-width: 1200px)" srcset="https://static.rerun.io/depth_guided_stable_diffusion/a85516aba09f72649517891d767e15383ce7f4ea/1200w.png">
<img src="https://static.rerun.io/depth_guided_stable_diffusion/a85516aba09f72649517891d767e15383ce7f4ea/full.png" alt="Depth-guided stable diffusion example">
</picture>
Wumpf marked this conversation as resolved.
Show resolved Hide resolved

A more elaborate example running Depth Guided Stable Diffusion 2.0.
Leverage [Depth Guided Stable Diffusion](https://github.com/Stability-AI/stablediffusion?tab=readme-ov-file#depth-conditional-stable-diffusion) to generate images with enhanced depth perception. This method integrates depth maps to guide the Stable Diffusion model, creating more visually compelling and contextually accurate images.

For more info see [here](https://github.com/Stability-AI/stablediffusion).
## Used Rerun Types
[`Image`](https://www.rerun.io/docs/reference/types/archetypes/image), [`Tensor`](https://www.rerun.io/docs/reference/types/archetypes/tensor), [`DepthImage`](https://www.rerun.io/docs/reference/types/archetypes/depth_image), [`TextDocument`](https://www.rerun.io/docs/reference/types/archetypes/text_document)

## Background
Depth Guided Stable Diffusion enriches the image generation process by incorporating depth information, providing a unique way to control the spatial composition of generated images. This approach allows for more nuanced and layered creations, making it especially useful for scenes requiring a sense of three-dimensionality.

# Logging and Visualizing with Rerun
The visualizations in this example were created with the Rerun SDK, demonstrating the integration of depth information in the Stable Diffusion image generation process. Here is the code for generating the visualization in Rerun.

## Prompt
Visualizing the prompt and negative prompt
```python
rr.log("prompt/text", rr.TextDocument(prompt))
rr.log("prompt/text_negative", rr.TextLog(negative_prompt))
```

## Text
Visualizing the text input ids, the text attention mask and the unconditional input ids
```python
rr.log("prompt/text_input/ids", rr.Tensor(text_input_ids))
rr.log("prompt/text_input/attention_mask", rr.Tensor(text_inputs.attention_mask))
rr.log("prompt/uncond_input/ids", rr.Tensor(uncond_input.input_ids))
Wumpf marked this conversation as resolved.
Show resolved Hide resolved
```

## Text embeddings
Visualizing the text embeddings. The text embeddings are generated in response to the specific prompts used while the unconditional text embeddings represent a neutral or baseline state without specific input conditions.
```python
rr.log("prompt/text_embeddings", rr.Tensor(text_embeddings))
rr.log("prompt/uncond_embeddings", rr.Tensor(uncond_embeddings))
```

## Depth map
Visualizing the pixel values of the depth estimation, estimated depth image, interpolated depth image and normalized depth image
```python
rr.log("depth/input_preprocessed", rr.Tensor(pixel_values))
rr.log("depth/estimated", rr.DepthImage(depth_map))
rr.log("depth/interpolated", rr.DepthImage(depth_map))
rr.log("depth/normalized", rr.DepthImage(depth_map))
```

## Latents
Log the latents, the representation of the images in the format used by the diffusion model.
```python
rr.log("diffusion/latents", rr.Tensor(latents, dim_names=["b", "c", "h", "w"]))
```

## Denoising loop
For each step in the denoising loop we set a time sequence with step and timestep and log the latent model input, noise predictions, latents and image. This make is possible for us to see all denoising steps in the Rerun viewer.
```python
rr.set_time_sequence("step", i)
rr.set_time_sequence("timestep", t)
rr.log("diffusion/latent_model_input", rr.Tensor(latent_model_input))
rr.log("diffusion/noise_pred", rr.Tensor(noise_pred, dim_names=["b", "c", "h", "w"]))
rr.log("diffusion/latents", rr.Tensor(latents, dim_names=["b", "c", "h", "w"]))
rr.log("image/diffused", rr.Image(image))
```

## Diffused image
Finally we log the diffused image generated by the model.

```python
rr.log("image/diffused", rr.Image(image_8))
```

# Run the Code

To run this example, make sure you have the Rerun repository checked out and the latest SDK installed:
```bash
# Setup
pip install --upgrade rerun-sdk # install the latest Rerun SDK
git clone [email protected]:rerun-io/rerun.git # Clone the repository
cd rerun
git checkout latest # Check out the commit matching the latest SDK release
```

Install the necessary libraries specified in the requirements file:
```bash
pip install -r examples/python/depth_guided_stable_diffusion/requirements.txt
```

To run this example use
```bash
python examples/python/depth_guided_stable_diffusion/main.py
```

You can specify your own image and prompts using
```bash
python examples/python/depth_guided_stable_diffusion/main.py [--img-path IMG_PATH] [--depth-map-path DEPTH_MAP_PATH] [--prompt PROMPT]
`````
Original file line number Diff line number Diff line change
Expand Up @@ -208,8 +208,8 @@ def _encode_prompt(self, prompt, device, num_images_per_prompt, do_classifier_fr
if `guidance_scale` is less than `1`).
"""
batch_size = len(prompt) if isinstance(prompt, list) else 1
rr.log("prompt/text", rr.TextLog(prompt))
rr.log("prompt/text_negative", rr.TextLog(negative_prompt))
rr.log("prompt/text", rr.TextDocument(prompt))
rr.log("prompt/text_negative", rr.TextDocument(negative_prompt))
Wumpf marked this conversation as resolved.
Show resolved Hide resolved
text_inputs = self.tokenizer(
prompt,
padding="max_length",
Expand Down
4 changes: 2 additions & 2 deletions scripts/lint.py
Original file line number Diff line number Diff line change
Expand Up @@ -627,8 +627,8 @@ def lint_example_description(filepath: str, fm: Frontmatter) -> list[str]:
return []

desc = fm.get("description", "")
if len(desc) > 130:
return [f"Frontmatter: description is too long ({len(desc)} > 130)"]
if len(desc) > 512:
return [f"Frontmatter: description is too long ({len(desc)} > 512)"]
else:
return []

Expand Down
Loading