forked from kohya-ss/sd-scripts
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request kohya-ss#1339 from kohya-ss/alpha-masked-loss
Alpha masked loss
- Loading branch information
Showing
15 changed files
with
313 additions
and
40 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,57 @@ | ||
## マスクロスについて | ||
|
||
マスクロスは、入力画像のマスクで指定された部分だけ損失計算することで、画像の一部分だけを学習することができる機能です。 | ||
たとえばキャラクタを学習したい場合、キャラクタ部分だけをマスクして学習することで、背景を無視して学習することができます。 | ||
|
||
マスクロスのマスクには、二種類の指定方法があります。 | ||
|
||
- マスク画像を用いる方法 | ||
- 透明度(アルファチャネル)を使用する方法 | ||
|
||
なお、サンプルは [ずんずんPJイラスト/3Dデータ](https://zunko.jp/con_illust.html) の「AI画像モデル用学習データ」を使用しています。 | ||
|
||
### マスク画像を用いる方法 | ||
|
||
学習画像それぞれに対応するマスク画像を用意する方法です。学習画像と同じファイル名のマスク画像を用意し、それを学習画像と別のディレクトリに保存します。 | ||
|
||
- 学習画像 | ||
![image](https://github.com/kohya-ss/sd-scripts/assets/52813779/607c5116-5f62-47de-8b66-9c4a597f0441) | ||
- マスク画像 | ||
![image](https://github.com/kohya-ss/sd-scripts/assets/52813779/53e9b0f8-a4bf-49ed-882d-4026f84e8450) | ||
|
||
```.toml | ||
[[datasets.subsets]] | ||
image_dir = "/path/to/a_zundamon" | ||
caption_extension = ".txt" | ||
conditioning_data_dir = "/path/to/a_zundamon_mask" | ||
num_repeats = 8 | ||
``` | ||
|
||
マスク画像は、学習画像と同じサイズで、学習する部分を白、無視する部分を黒で描画します。グレースケールにも対応しています(127 ならロス重みが 0.5 になります)。なお、正確にはマスク画像の R チャネルが用いられます。 | ||
|
||
DreamBooth 方式の dataset で、`conditioning_data_dir` で指定したディレクトリにマスク画像を保存してください。ControlNet のデータセットと同じですので、詳細は [ControlNet-LLLite](train_lllite_README-ja.md#データセットの準備) を参照してください。 | ||
|
||
### 透明度(アルファチャネル)を使用する方法 | ||
|
||
学習画像の透明度(アルファチャネル)がマスクとして使用されます。透明度が 0 の部分は無視され、255 の部分は学習されます。半透明の場合は、その透明度に応じてロス重みが変化します(127 ならおおむね 0.5)。 | ||
|
||
![image](https://github.com/kohya-ss/sd-scripts/assets/52813779/0baa129b-446a-4aac-b98c-7208efb0e75e) | ||
|
||
※それぞれの画像は透過PNG | ||
|
||
学習時のスクリプトのオプションに `--alpha_mask` を指定するか、dataset の設定ファイルの subset で、`alpha_mask` を指定してください。たとえば、以下のようになります。 | ||
|
||
```toml | ||
[[datasets.subsets]] | ||
image_dir = "/path/to/image/dir" | ||
caption_extension = ".txt" | ||
num_repeats = 8 | ||
alpha_mask = true | ||
``` | ||
|
||
## 学習時の注意事項 | ||
|
||
- 現時点では DreamBooth 方式の dataset のみ対応しています。 | ||
- マスクは latents のサイズ、つまり 1/8 に縮小されてから適用されます。そのため、細かい部分(たとえばアホ毛やイヤリングなど)はうまく学習できない可能性があります。マスクをわずかに拡張するなどの工夫が必要かもしれません。 | ||
- マスクロスを用いる場合、学習対象外の部分をキャプションに含める必要はないかもしれません。(要検証) | ||
- `alpha_mask` の場合、マスクの有無を切り替えると latents キャッシュが自動的に再生成されます。 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,56 @@ | ||
## Masked Loss | ||
|
||
Masked loss is a feature that allows you to train only part of an image by calculating the loss only for the part specified by the mask of the input image. For example, if you want to train a character, you can train only the character part by masking it, ignoring the background. | ||
|
||
There are two ways to specify the mask for masked loss. | ||
|
||
- Using a mask image | ||
- Using transparency (alpha channel) of the image | ||
|
||
The sample uses the "AI image model training data" from [ZunZunPJ Illustration/3D Data](https://zunko.jp/con_illust.html). | ||
|
||
### Using a mask image | ||
|
||
This is a method of preparing a mask image corresponding to each training image. Prepare a mask image with the same file name as the training image and save it in a different directory from the training image. | ||
|
||
- Training image | ||
![image](https://github.com/kohya-ss/sd-scripts/assets/52813779/607c5116-5f62-47de-8b66-9c4a597f0441) | ||
- Mask image | ||
![image](https://github.com/kohya-ss/sd-scripts/assets/52813779/53e9b0f8-a4bf-49ed-882d-4026f84e8450) | ||
|
||
```.toml | ||
[[datasets.subsets]] | ||
image_dir = "/path/to/a_zundamon" | ||
caption_extension = ".txt" | ||
conditioning_data_dir = "/path/to/a_zundamon_mask" | ||
num_repeats = 8 | ||
``` | ||
|
||
The mask image is the same size as the training image, with the part to be trained drawn in white and the part to be ignored in black. It also supports grayscale (127 gives a loss weight of 0.5). The R channel of the mask image is used currently. | ||
|
||
Use the dataset in the DreamBooth method, and save the mask image in the directory specified by `conditioning_data_dir`. It is the same as the ControlNet dataset, so please refer to [ControlNet-LLLite](train_lllite_README.md#Preparing-the-dataset) for details. | ||
|
||
### Using transparency (alpha channel) of the image | ||
|
||
The transparency (alpha channel) of the training image is used as a mask. The part with transparency 0 is ignored, the part with transparency 255 is trained. For semi-transparent parts, the loss weight changes according to the transparency (127 gives a weight of about 0.5). | ||
|
||
![image](https://github.com/kohya-ss/sd-scripts/assets/52813779/0baa129b-446a-4aac-b98c-7208efb0e75e) | ||
|
||
※Each image is a transparent PNG | ||
|
||
Specify `--alpha_mask` in the training script options or specify `alpha_mask` in the subset of the dataset configuration file. For example, it will look like this. | ||
|
||
```toml | ||
[[datasets.subsets]] | ||
image_dir = "/path/to/image/dir" | ||
caption_extension = ".txt" | ||
num_repeats = 8 | ||
alpha_mask = true | ||
``` | ||
|
||
## Notes on training | ||
|
||
- At the moment, only the dataset in the DreamBooth method is supported. | ||
- The mask is applied after the size is reduced to 1/8, which is the size of the latents. Therefore, fine details (such as ahoge or earrings) may not be learned well. Some dilations of the mask may be necessary. | ||
- If using masked loss, it may not be necessary to include parts that are not to be trained in the caption. (To be verified) | ||
- In the case of `alpha_mask`, the latents cache is automatically regenerated when the enable/disable state of the mask is switched. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.