Skip to content

Commit

Permalink
Gradual latent hires fix (#5)
Browse files Browse the repository at this point in the history
* Add custom seperator

* Fix typo

* Fix typo again

* Fix min-snr-gamma for v-prediction and ZSNR.

This fixes min-snr for vpred+zsnr by dividing directly by SNR+1.
The old implementation did it in two steps: (min-snr/snr) * (snr/(snr+1)), which causes division by zero when combined with --zero_terminal_snr

* use **kwargs and change svd() calling convention to make svd() reusable

 * add required attributes to model_org, model_tuned, save_to
 * set "*_alpha" using str(float(foo))

* add min_diff, clamp_quantile args

based on bmaltais/kohya_ss#1332 bmaltais/kohya_ss@a9ec90c

* add caption_separator option

* add Deep Shrink

* add gradual latent

* Update README.md

* format by black, add ja comment

* make separate U-Net for inference

* make slicing vae compatible with latest diffusers

* fix gradual latent cannot be disabled

* apply unsharp mask

* add unsharp mask

* fix strength error

---------

Co-authored-by: Kohaku-Blueleaf <[email protected]>
Co-authored-by: feffy380 <[email protected]>
Co-authored-by: Won-Kyu Park <[email protected]>
Co-authored-by: Kohya S <[email protected]>
Co-authored-by: Kohya S <[email protected]>
  • Loading branch information
6 people authored Dec 3, 2023
1 parent 1c179d5 commit 23b6dfc
Show file tree
Hide file tree
Showing 20 changed files with 1,333 additions and 80 deletions.
32 changes: 32 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,35 @@
# Gradual Latent について

latentのサイズを徐々に大きくしていくHires fixです。`sdxl_gen_img.py` に以下のオプションが追加されています。

- `--gradual_latent_timesteps` : latentのサイズを大きくし始めるタイムステップを指定します。デフォルトは None で、Gradual Latentを使用しません。750 くらいから始めてみてください。
- `--gradual_latent_ratio` : latentの初期サイズを指定します。デフォルトは 0.5 で、デフォルトの latent サイズの半分のサイズから始めます。
- `--gradual_latent_ratio_step`: latentのサイズを大きくする割合を指定します。デフォルトは 0.125 で、latentのサイズを 0.625, 0.75, 0.875, 1.0 と徐々に大きくします。
- `--gradual_latent_ratio_every_n_steps`: latentのサイズを大きくする間隔を指定します。デフォルトは 3 で、3ステップごとに latent のサイズを大きくします。

それぞれのオプションは、プロンプトオプション、`--glt``--glr``--gls``--gle` でも指定できます。

サンプラーに手を加えているため、__サンプラーに `euler_a` を指定してください。__ 他のサンプラーでは動作しません。

`gen_img_diffusers.py` にも同様のオプションが追加されていますが、試した範囲ではどうやっても乱れた画像しか生成できませんでした。

# About Gradual Latent

Gradual Latent is a Hires fix that gradually increases the size of the latent. `sdxl_gen_img.py` has the following options added.

- `--gradual_latent_timesteps`: Specifies the timestep to start increasing the size of the latent. The default is None, which means Gradual Latent is not used. Please try around 750 at first.
- `--gradual_latent_ratio`: Specifies the initial size of the latent. The default is 0.5, which means it starts with half the default latent size.
- `--gradual_latent_ratio_step`: Specifies the ratio to increase the size of the latent. The default is 0.125, which means the latent size is gradually increased to 0.625, 0.75, 0.875, 1.0.
- `--gradual_latent_ratio_every_n_steps`: Specifies the interval to increase the size of the latent. The default is 3, which means the latent size is increased every 3 steps.

Each option can also be specified with prompt options, `--glt`, `--glr`, `--gls`, `--gle`.

__Please specify `euler_a` for the sampler.__ Because the source code of the sampler is modified. It will not work with other samplers.

`gen_img_diffusers.py` also has the same options, but in the range I tried, it only generated distorted images no matter what I did.

---

__SDXL is now supported. The sdxl branch has been merged into the main branch. If you update the repository, please follow the upgrade instructions. Also, the version of accelerate has been updated, so please run accelerate config again.__ The documentation for SDXL training is [here](./README.md#sdxl-training).

This repository contains training, generation and utility scripts for Stable Diffusion.
Expand Down
2 changes: 1 addition & 1 deletion fine_tune.py
Original file line number Diff line number Diff line change
Expand Up @@ -370,7 +370,7 @@ def fn_recursive_set_mem_eff(module: torch.nn.Module):
loss = loss.mean([1, 2, 3])

if args.min_snr_gamma:
loss = apply_snr_weight(loss, timesteps, noise_scheduler, args.min_snr_gamma)
loss = apply_snr_weight(loss, timesteps, noise_scheduler, args.min_snr_gamma, args.v_parameterization)
if args.scale_v_pred_loss_like_noise_pred:
loss = scale_v_prediction_loss_like_noise_prediction(loss, timesteps, noise_scheduler)
if args.debiased_estimation_loss:
Expand Down
24 changes: 16 additions & 8 deletions finetune/tag_images_by_wd14_tagger.py
Original file line number Diff line number Diff line change
Expand Up @@ -160,7 +160,9 @@ def main(args):

tag_freq = {}

undesired_tags = set(args.undesired_tags.split(","))
caption_separator = args.caption_separator
stripped_caption_separator = caption_separator.strip()
undesired_tags = set(args.undesired_tags.split(stripped_caption_separator))

def run_batch(path_imgs):
imgs = np.array([im for _, im in path_imgs])
Expand Down Expand Up @@ -194,7 +196,7 @@ def run_batch(path_imgs):

if tag_name not in undesired_tags:
tag_freq[tag_name] = tag_freq.get(tag_name, 0) + 1
general_tag_text += ", " + tag_name
general_tag_text += caption_separator + tag_name
combined_tags.append(tag_name)
elif i >= len(general_tags) and p >= args.character_threshold:
tag_name = character_tags[i - len(general_tags)]
Expand All @@ -203,18 +205,18 @@ def run_batch(path_imgs):

if tag_name not in undesired_tags:
tag_freq[tag_name] = tag_freq.get(tag_name, 0) + 1
character_tag_text += ", " + tag_name
character_tag_text += caption_separator + tag_name
combined_tags.append(tag_name)

# 先頭のカンマを取る
if len(general_tag_text) > 0:
general_tag_text = general_tag_text[2:]
general_tag_text = general_tag_text[len(caption_separator) :]
if len(character_tag_text) > 0:
character_tag_text = character_tag_text[2:]
character_tag_text = character_tag_text[len(caption_separator) :]

caption_file = os.path.splitext(image_path)[0] + args.caption_extension

tag_text = ", ".join(combined_tags)
tag_text = caption_separator.join(combined_tags)

if args.append_tags:
# Check if file exists
Expand All @@ -224,13 +226,13 @@ def run_batch(path_imgs):
existing_content = f.read().strip("\n") # Remove newlines

# Split the content into tags and store them in a list
existing_tags = [tag.strip() for tag in existing_content.split(",") if tag.strip()]
existing_tags = [tag.strip() for tag in existing_content.split(stripped_caption_separator) if tag.strip()]

# Check and remove repeating tags in tag_text
new_tags = [tag for tag in combined_tags if tag not in existing_tags]

# Create new tag_text
tag_text = ", ".join(existing_tags + new_tags)
tag_text = caption_separator.join(existing_tags + new_tags)

with open(caption_file, "wt", encoding="utf-8") as f:
f.write(tag_text + "\n")
Expand Down Expand Up @@ -350,6 +352,12 @@ def setup_parser() -> argparse.ArgumentParser:
parser.add_argument("--frequency_tags", action="store_true", help="Show frequency of tags for images / 画像ごとのタグの出現頻度を表示する")
parser.add_argument("--onnx", action="store_true", help="use onnx model for inference / onnxモデルを推論に使用する")
parser.add_argument("--append_tags", action="store_true", help="Append captions instead of overwriting / 上書きではなくキャプションを追記する")
parser.add_argument(
"--caption_separator",
type=str,
default=", ",
help="Separator for captions, include space if needed / キャプションの区切り文字、必要ならスペースを含めてください",
)

return parser

Expand Down
Loading

0 comments on commit 23b6dfc

Please sign in to comment.