Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merge #1132

Merged
merged 33 commits into from
Nov 10, 2024
Merged

merge #1132

Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
21f1499
Fix multi-caption parquets crashing in multiple locations (Closes #1092)
Oct 30, 2024
48cfc09
sd3: add skip layer guidance
Nov 7, 2024
f0aa07e
Add WSL support
Nov 7, 2024
d664101
Updated the docker installation guide
Nov 7, 2024
810c58e
fixing a typo
Nov 7, 2024
716e669
Update INSTALL.md
bghira Nov 7, 2024
48dd672
Update documentation/DOCKER.md
bghira Nov 7, 2024
c3e67e7
Update documentation/DOCKER.md
bghira Nov 7, 2024
dc21d1b
Update documentation/DOCKER.md
bghira Nov 7, 2024
3eafaea
Update documentation/DOCKER.md
bghira Nov 7, 2024
fff93bd
Merge pull request #1126 from Putzzmunta/main
bghira Nov 7, 2024
33ad3ca
Merge pull request #1109 from AmericanPresidentJimmyCarter/fix-multic…
bghira Nov 7, 2024
2167674
add custom sd3 transformer modeling code
Nov 7, 2024
c902580
add custom sd3 transformer modeling code (fix)
Nov 7, 2024
e1b0c4f
use str type and load layers
Nov 7, 2024
0695db4
add more configuration values for SLG
Nov 7, 2024
0c7dcd5
sd3: add skip layer guidance to the quickstart
Nov 7, 2024
e4a3a34
sd3: add skip layer guidance to the quickstart (typo)
Nov 7, 2024
0dab649
sd3: fix typo reference to validation args
Nov 7, 2024
1614839
update args for sd3 pipeline
Nov 7, 2024
8b86951
sd3: do not cast inputs for quanto compat
Nov 7, 2024
40907f5
sd3: add shift value of 1 suggestion to quickstart
Nov 8, 2024
e33d588
sd3: update SLG guidance doc
Nov 8, 2024
bf6c23a
Merge pull request #1125 from bghira/feature/sd3-skip-layer
bghira Nov 8, 2024
25cad1c
sd3: fix cpu / gpu location mismatch and dtype mismatch for quanto
Nov 8, 2024
c4fef7e
flux and sd3 could use uniform sampling instead of beta or sigmoid
Nov 8, 2024
e4e5097
sd3: model card detail expansion
Nov 8, 2024
73c3441
remove boilerplate template text
Nov 8, 2024
0b34f24
Merge pull request #1130 from bghira/feature/sd3-model-card-details
bghira Nov 8, 2024
7146c5e
Merge pull request #1129 from bghira/feature/flow-matching-uniform-sa…
bghira Nov 9, 2024
115ee0b
Revert early return in setup_pipeline back to a break. This fixes ran…
mhirki Nov 9, 2024
648c4ad
Apply suggested changes proposed by bghira.
mhirki Nov 10, 2024
3293fa0
Merge pull request #1131 from mhirki/fix-random-validation-errors-for…
bghira Nov 10, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,12 @@ RUN apt-get update -y
# on user input during build
ENV DEBIAN_FRONTEND noninteractive

# Install libg dependencies
RUN apt install libgl1-mesa-glx -y
RUN apt-get install 'ffmpeg'\
'libsm6'\
'libxext6' -y

# Install misc unix libraries
RUN apt-get install -y --no-install-recommends openssh-server \
openssh-client \
Expand Down
2 changes: 2 additions & 0 deletions INSTALL.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@ For users that wish to make use of Docker or another container orchestration pla

### Installation

For users operating on Windows 10 or newer, an installation guide based on Docker and WSL is available here [this document](/documentation/DOCKER.md).

Clone the SimpleTuner repository and set up the python venv:

```bash
Expand Down
43 changes: 43 additions & 0 deletions documentation/DOCKER.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,11 @@ This Docker configuration provides a comprehensive environment for running the S

## Getting Started

### Windows OS support via WSL (Experimental)

The following guide was tested in a WSL2 Distro that has Dockerengine installed.


### 1. Building the Container

Clone the repository and navigate to the directory containing the Dockerfile. Build the Docker image using:
Expand Down Expand Up @@ -68,6 +73,44 @@ If you want to add custom startup scripts or modify configurations, extend the e

If any capabilities cannot be achieved through this setup, please open a new issue.

### Docker Compose

For users who prefer `docker-compose.yaml`, this template is provided for you to extend and customise for your needs.

Once the stack is deployed you can connect to the container and start operating in it as mentioned in the steps above.

```bash
docker compose up -d

docker exec -it simpletuner /bin/bash
```

```docker-compose.yaml
services:
simpletuner:
container_name: simpletuner
build:
context: [Path to the repository]/SimpleTuner
dockerfile: Dockerfile
ports:
- "[port to connect to the container]:22"
volumes:
- "[path to your datasets]:/datasets"
- "[path to your configs]:/workspace/SimpleTuner/config"
environment:
HUGGING_FACE_HUB_TOKEN: [your hugging face token]
WANDB_TOKEN: [your wanddb token]
command: ["tail", "-f", "/dev/null"]
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
```

> ⚠️ Please be cautious of handling your WandB and Hugging Face tokens! It's advised not to commit them even to a private version-control repository to ensure they are not leaked. For production use-cases, key management storage is recommended, but out of scope for this guide.
---

## Troubleshooting
Expand Down
35 changes: 32 additions & 3 deletions documentation/quickstart/SD3.md
Original file line number Diff line number Diff line change
Expand Up @@ -264,6 +264,33 @@ For more information, see the [dataloader](/documentation/DATALOADER.md) and [tu

## Notes & troubleshooting tips

### Skip-layer guidance (SD3.5 Medium)

StabilityAI recommends enabling SLG (Skip-layer guidance) on SD 3.5 Medium inference. This doesn't impact training results, only the validation sample quality.

The following values are recommended for `config.json`:

```json
{
"--validation_guidance_skip_layers": [7, 8, 9],
"--validation_guidance_skip_layers_start": 0.01,
"--validation_guidance_skip_layers_stop": 0.2,
"--validation_guidance_skip_scale": 2.8,
"--validation_guidance": 4.0
}
```

- `..skip_scale` determines how much to scale the positive prompt prediction during skip-layer guidance. The default value of 2.8 is safe for the base model's skip value of `7, 8, 9` but will need to be increased if more layers are skipped, doubling it for each additional layer.
- `..skip_layers` tells which layers to skip during the negative prompt prediction.
- `..skip_layers_start` determine the fraction of the inference pipeline during which skip-layer guidance should begin to be applied.
- `..skip_layers_stop` will set the fraction of the total number of inference steps after which SLG will no longer be applied.

SLG can be applied for fewer steps for a weaker effect or less reduction of inference speed.

It seems that extensive training of a LoRA or LyCORIS model will require modification to these values, though it's not clear how exactly it changes.

**Lower CFG must be used during inference.**

### Model instability

The SD 3.5 Large 8B model has potential instabilities during training:
Expand All @@ -288,12 +315,14 @@ Some changes were made to SimpleTuner's SD3.5 support:
#### Stable configuration values

These options have been known to keep SD3.5 in-tact for as long as possible:
- optimizer=optimi-stableadamw
- learning_rate=1e-5
- optimizer=adamw_bf16
- flux_schedule_shift=1
- learning_rate=1e-4
- batch_size=4 * 3 GPUs
- max_grad_norm=0.01
- max_grad_norm=0.1
- base_model_precision=int8-quanto
- No loss masking or dataset regularisation, as their contribution to this instability is unknown
- `validation_guidance_skip_layers=[7,8,9]`

### Lowest VRAM config

Expand Down
52 changes: 52 additions & 0 deletions helpers/configuration/cmd_args.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
from typing import Dict, List, Optional, Tuple
import random
import time
import json
import logging
import sys
import torch
Expand Down Expand Up @@ -148,6 +149,15 @@ def get_argument_parser():
" which has improved results in short experiments. Thanks to @mhirki for the contribution."
),
)
parser.add_argument(
"--flux_use_uniform_schedule",
action="store_true",
help=(
"Whether or not to use a uniform schedule with Flux instead of sigmoid."
" Using uniform sampling may help preserve more capabilities from the base model."
" Some tasks may not benefit from this."
),
)
parser.add_argument(
"--flux_use_beta_schedule",
action="store_true",
Expand Down Expand Up @@ -1350,6 +1360,37 @@ def get_argument_parser():
" the default mode, provides the most benefit."
),
)
parser.add_argument(
"--validation_guidance_skip_layers",
type=str,
default=None,
help=(
"StabilityAI recommends a value of [7, 8, 9] for Stable Diffusion 3.5 Medium."
),
)
parser.add_argument(
"--validation_guidance_skip_layers_start",
type=float,
default=0.01,
help=("StabilityAI recommends a value of 0.01 for SLG start."),
)
parser.add_argument(
"--validation_guidance_skip_layers_stop",
type=float,
default=0.01,
help=("StabilityAI recommends a value of 0.2 for SLG start."),
)
parser.add_argument(
"--validation_guidance_skip_scale",
type=float,
default=2.8,
help=(
"StabilityAI recommends a value of 2.8 for SLG guidance skip scaling."
" When adding more layers, you must increase the scale, eg. adding one more layer requires doubling"
" the value given."
),
)

parser.add_argument(
"--allow_tf32",
action="store_true",
Expand Down Expand Up @@ -2391,4 +2432,15 @@ def parse_cmdline_args(input_args=None):
f"Invalid gradient_accumulation_steps parameter: {args.gradient_accumulation_steps}, should be >= 1"
)

if args.validation_guidance_skip_layers is not None:
try:
import json

args.validation_guidance_skip_layers = json.loads(
args.validation_guidance_skip_layers
)
except Exception as e:
logger.error(f"Could not load skip layers: {e}")
raise

return args
99 changes: 81 additions & 18 deletions helpers/data_backend/factory.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,8 @@
from tqdm import tqdm
import queue
from math import sqrt
import pandas as pd
import numpy as np

logger = logging.getLogger("DataBackendFactory")
if should_log():
Expand All @@ -48,6 +50,68 @@ def info_log(message):
logger.info(message)


def check_column_values(column_data, column_name, parquet_path, fallback_caption_column=False):
# Determine if the column contains arrays or scalar values
non_null_values = column_data.dropna()
if non_null_values.empty:
# All values are null
raise ValueError(
f"Parquet file {parquet_path} contains only null values in the '{column_name}' column."
)

first_non_null = non_null_values.iloc[0]
if isinstance(first_non_null, (list, tuple, np.ndarray, pd.Series)):
# Column contains arrays
# Check for null arrays
if column_data.isnull().any() and not fallback_caption_column:
raise ValueError(
f"Parquet file {parquet_path} contains null arrays in the '{column_name}' column."
)

# Check for empty arrays
empty_arrays = column_data.apply(lambda x: len(x) == 0)
if empty_arrays.any() and not fallback_caption_column:
raise ValueError(
f"Parquet file {parquet_path} contains empty arrays in the '{column_name}' column."
)

# Check for null elements within arrays
null_elements_in_arrays = column_data.apply(
lambda arr: any(pd.isnull(s) for s in arr)
)
if null_elements_in_arrays.any() and not fallback_caption_column:
raise ValueError(
f"Parquet file {parquet_path} contains null values within arrays in the '{column_name}' column."
)

# Check for empty strings within arrays
empty_strings_in_arrays = column_data.apply(
lambda arr: any(s == "" for s in arr)
)
if empty_strings_in_arrays.all() and not fallback_caption_column:
raise ValueError(
f"Parquet file {parquet_path} contains only empty strings within arrays in the '{column_name}' column."
)

elif isinstance(first_non_null, str):
# Column contains scalar strings
# Check for null values
if column_data.isnull().any() and not fallback_caption_column:
raise ValueError(
f"Parquet file {parquet_path} contains null values in the '{column_name}' column."
)

# Check for empty strings
if (column_data == "").any() and not fallback_caption_column:
raise ValueError(
f"Parquet file {parquet_path} contains empty strings in the '{column_name}' column."
)
else:
raise TypeError(
f"Unsupported data type in column '{column_name}'. Expected strings or arrays of strings."
)


def init_backend_config(backend: dict, args: dict, accelerator) -> dict:
output = {"id": backend["id"], "config": {}}
if backend.get("dataset_type", None) == "text_embeds":
Expand Down Expand Up @@ -292,24 +356,23 @@ def configure_parquet_database(backend: dict, args, data_backend: BaseDataBacken
raise ValueError(
f"Parquet file {parquet_path} does not contain a column named '{filename_column}'."
)
# Check for null values
if df[caption_column].isnull().values.any() and not fallback_caption_column:
raise ValueError(
f"Parquet file {parquet_path} contains null values in the '{caption_column}' column, but no fallback_caption_column was set."
)
if df[filename_column].isnull().values.any():
raise ValueError(
f"Parquet file {parquet_path} contains null values in the '{filename_column}' column."
)
# Check for empty strings
if (df[caption_column] == "").sum() > 0 and not fallback_caption_column:
raise ValueError(
f"Parquet file {parquet_path} contains empty strings in the '{caption_column}' column."
)
if (df[filename_column] == "").sum() > 0:
raise ValueError(
f"Parquet file {parquet_path} contains empty strings in the '{filename_column}' column."
)

# Apply the function to the caption_column.
check_column_values(
df[caption_column],
caption_column,
parquet_path,
fallback_caption_column=fallback_caption_column
)

# Apply the function to the filename_column.
check_column_values(
df[filename_column],
filename_column,
parquet_path,
fallback_caption_column=False # Always check filename_column
)

# Store the database in StateTracker
StateTracker.set_parquet_database(
backend["id"],
Expand Down
12 changes: 7 additions & 5 deletions helpers/metadata/backends/parquet.py
Original file line number Diff line number Diff line change
Expand Up @@ -150,19 +150,21 @@ def _extract_captions_to_fast_list(self):
if len(caption_column) > 0:
caption = [row[c] for c in caption_column]
else:
caption = row[caption_column]
caption = row.get(caption_column)
if isinstance(caption, (numpy.ndarray, pd.Series)):
caption = [str(item) for item in caption if item is not None]

if not caption and fallback_caption_column:
caption = row[fallback_caption_column]
if not caption:
if caption is None and fallback_caption_column:
caption = row.get(fallback_caption_column, None)
if caption is None or caption == "" or caption == []:
raise ValueError(
f"Could not locate caption for image {filename} in sampler_backend {self.id} with filename column {filename_column}, caption column {caption_column}, and a parquet database with {len(self.parquet_database)} entries."
)
if type(caption) == bytes:
caption = caption.decode("utf-8")
elif type(caption) == list:
caption = [c.strip() for c in caption if c.strip()]
if caption:
elif type(caption) == str:
caption = caption.strip()
captions[filename] = caption
return captions
Expand Down
Loading
Loading