-
Notifications
You must be signed in to change notification settings - Fork 373
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Improve readme of
depth_guided_stable_diffusion
example (#5593)
### What * Replaces #5502 ### Checklist * [x] I have read and agree to [Contributor Guide](https://github.com/rerun-io/rerun/blob/main/CONTRIBUTING.md) and the [Code of Conduct](https://github.com/rerun-io/rerun/blob/main/CODE_OF_CONDUCT.md) * [x] I've included a screenshot or gif (if applicable) * [x] I have tested the web demo (if applicable): * Using newly built examples: [app.rerun.io](https://app.rerun.io/pr/5593/index.html) * Using examples from latest `main` build: [app.rerun.io](https://app.rerun.io/pr/5593/index.html?manifest_url=https://app.rerun.io/version/main/examples_manifest.json) * Using full set of examples from `nightly` build: [app.rerun.io](https://app.rerun.io/pr/5593/index.html?manifest_url=https://app.rerun.io/version/nightly/examples_manifest.json) * [x] The PR title and labels are set such as to maximize their usefulness for the next release's CHANGELOG * [x] If applicable, add a new check to the [release checklist](https://github.com/rerun-io/rerun/blob/main/tests/python/release_checklist)! - [PR Build Summary](https://build.rerun.io/pr/5593) - [Docs preview](https://rerun.io/preview/49584ada6236c9802c2490a106ef20ebedaa6cf3/docs) <!--DOCS-PREVIEW--> - [Examples preview](https://rerun.io/preview/49584ada6236c9802c2490a106ef20ebedaa6cf3/examples) <!--EXAMPLES-PREVIEW--> - [Recent benchmark results](https://build.rerun.io/graphs/crates.html) - [Wasm size tracking](https://build.rerun.io/graphs/sizes.html) --------- Co-authored-by: Andreas Reich <[email protected]>
- Loading branch information
1 parent
a1640aa
commit eb1892e
Showing
3 changed files
with
93 additions
and
12 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,25 +1,107 @@ | ||
<!--[metadata] | ||
title = "Depth Guided Stable Diffusion" | ||
tags = ["2D", "depth", "huggingface", "stable-diffusion", "tensor", "text"] | ||
thumbnail = "https://static.rerun.io/stable-diffusiuon/55fd41b25d4f7c55e7493640905a21e41021969f/480w.png" | ||
thumbnail_dimensions = [480, 480] | ||
channel = "nightly" | ||
tags = ["depth-guided", "stable-diffusion", "huggingface", "3D", "tensor", "text"] | ||
description = "Leverage Depth Guided Stable Diffusion to generate images with enhanced depth perception. This method integrates depth maps to guide the Stable Diffusion model, creating more visually compelling and contextually accurate images." | ||
thumbnail = "https://static.rerun.io/depth-guided-stable-diffusion/bea9bfaf33ebed4296f576d931c8c8e6fdd08a21/480w.png" | ||
thumbnail_dimensions = [480, 266] | ||
--> | ||
|
||
<picture> | ||
<img src="https://static.rerun.io/depth-guided-stable-diffusion/bea9bfaf33ebed4296f576d931c8c8e6fdd08a21/full.png" alt="Depth-guided stable diffusion screenshot"> | ||
<source media="(max-width: 480px)" srcset="https://static.rerun.io/depth-guided-stable-diffusion/bea9bfaf33ebed4296f576d931c8c8e6fdd08a21/480w.png"> | ||
<source media="(max-width: 768px)" srcset="https://static.rerun.io/depth-guided-stable-diffusion/bea9bfaf33ebed4296f576d931c8c8e6fdd08a21/768w.png"> | ||
<source media="(max-width: 1024px)" srcset="https://static.rerun.io/depth-guided-stable-diffusion/bea9bfaf33ebed4296f576d931c8c8e6fdd08a21/1024w.png"> | ||
<source media="(max-width: 1200px)" srcset="https://static.rerun.io/depth-guided-stable-diffusion/bea9bfaf33ebed4296f576d931c8c8e6fdd08a21/1200w.png"> | ||
<img src="https://static.rerun.io/depth-guided-stable-diffusion/bea9bfaf33ebed4296f576d931c8c8e6fdd08a21/full.png" alt="Depth-guided stable diffusion example"> | ||
</picture> | ||
|
||
A more elaborate example running Depth Guided Stable Diffusion 2.0. | ||
Leverage [Depth Guided Stable Diffusion](https://github.com/Stability-AI/stablediffusion?tab=readme-ov-file#depth-conditional-stable-diffusion) to generate images with enhanced depth perception. This method integrates depth maps to guide the Stable Diffusion model, creating more visually compelling and contextually accurate images. | ||
|
||
For more info see [here](https://github.com/Stability-AI/stablediffusion). | ||
## Used Rerun Types | ||
[`Image`](https://www.rerun.io/docs/reference/types/archetypes/image), [`Tensor`](https://www.rerun.io/docs/reference/types/archetypes/tensor), [`DepthImage`](https://www.rerun.io/docs/reference/types/archetypes/depth_image), [`TextDocument`](https://www.rerun.io/docs/reference/types/archetypes/text_document),[`TextLog`](https://www.rerun.io/docs/reference/types/archetypes/text_log)[`BarChart`](https://www.rerun.io/docs/reference/types/archetypes/bar_chart) | ||
|
||
## Background | ||
Depth Guided Stable Diffusion enriches the image generation process by incorporating depth information, providing a unique way to control the spatial composition of generated images. This approach allows for more nuanced and layered creations, making it especially useful for scenes requiring a sense of three-dimensionality. | ||
|
||
# Logging and Visualizing with Rerun | ||
The visualizations in this example were created with the Rerun SDK, demonstrating the integration of depth information in the Stable Diffusion image generation process. Here is the code for generating the visualization in Rerun. | ||
|
||
## Prompt | ||
Visualizing the prompt and negative prompt | ||
```python | ||
rr.log("prompt/text", rr.TextLog(prompt)) | ||
rr.log("prompt/text_negative", rr.TextLog(negative_prompt)) | ||
``` | ||
|
||
## Text | ||
Visualizing the text input ids, the text attention mask and the unconditional input ids | ||
```python | ||
rr.log("prompt/text_input/ids", rr.BarChart(text_input_ids)) | ||
rr.log("prompt/text_input/attention_mask", rr.BarChart(text_inputs.attention_mask)) | ||
rr.log("prompt/uncond_input/ids", rr.Tensor(uncond_input.input_ids)) | ||
``` | ||
|
||
## Text embeddings | ||
Visualizing the text embeddings. The text embeddings are generated in response to the specific prompts used while the unconditional text embeddings represent a neutral or baseline state without specific input conditions. | ||
```python | ||
rr.log("prompt/text_embeddings", rr.Tensor(text_embeddings)) | ||
rr.log("prompt/uncond_embeddings", rr.Tensor(uncond_embeddings)) | ||
``` | ||
|
||
## Depth map | ||
Visualizing the pixel values of the depth estimation, estimated depth image, interpolated depth image and normalized depth image | ||
```python | ||
rr.log("depth/input_preprocessed", rr.Tensor(pixel_values)) | ||
rr.log("depth/estimated", rr.DepthImage(depth_map)) | ||
rr.log("depth/interpolated", rr.DepthImage(depth_map)) | ||
rr.log("depth/normalized", rr.DepthImage(depth_map)) | ||
``` | ||
|
||
## Latents | ||
Log the latents, the representation of the images in the format used by the diffusion model. | ||
```python | ||
rr.log("diffusion/latents", rr.Tensor(latents, dim_names=["b", "c", "h", "w"])) | ||
``` | ||
|
||
## Denoising loop | ||
For each step in the denoising loop we set a time sequence with step and timestep and log the latent model input, noise predictions, latents and image. This make is possible for us to see all denoising steps in the Rerun viewer. | ||
```python | ||
rr.set_time_sequence("step", i) | ||
rr.set_time_sequence("timestep", t) | ||
rr.log("diffusion/latent_model_input", rr.Tensor(latent_model_input)) | ||
rr.log("diffusion/noise_pred", rr.Tensor(noise_pred, dim_names=["b", "c", "h", "w"])) | ||
rr.log("diffusion/latents", rr.Tensor(latents, dim_names=["b", "c", "h", "w"])) | ||
rr.log("image/diffused", rr.Image(image)) | ||
``` | ||
|
||
## Diffused image | ||
Finally we log the diffused image generated by the model. | ||
|
||
```python | ||
rr.log("image/diffused", rr.Image(image_8)) | ||
``` | ||
|
||
# Run the Code | ||
|
||
To run this example, make sure you have the Rerun repository checked out and the latest SDK installed: | ||
```bash | ||
# Setup | ||
pip install --upgrade rerun-sdk # install the latest Rerun SDK | ||
git clone [email protected]:rerun-io/rerun.git # Clone the repository | ||
cd rerun | ||
git checkout latest # Check out the commit matching the latest SDK release | ||
``` | ||
|
||
Install the necessary libraries specified in the requirements file: | ||
```bash | ||
pip install -r examples/python/depth_guided_stable_diffusion/requirements.txt | ||
``` | ||
|
||
To run this example use | ||
```bash | ||
python examples/python/depth_guided_stable_diffusion/main.py | ||
``` | ||
|
||
You can specify your own image and prompts using | ||
```bash | ||
python examples/python/depth_guided_stable_diffusion/main.py [--img-path IMG_PATH] [--depth-map-path DEPTH_MAP_PATH] [--prompt PROMPT] | ||
````` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters