Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[24.0.6] Train toml config seed type error #2370

Closed
hinablue opened this issue Apr 23, 2024 · 2 comments
Closed

[24.0.6] Train toml config seed type error #2370

hinablue opened this issue Apr 23, 2024 · 2 comments
Labels
enhancement New feature or request

Comments

@hinablue
Copy link
Contributor

Error when start the training. And I check the tmpfilelora.toml found the seed is float not the int.

Change this line will be good to go.

"seed": int(seed) if seed != 0 else None,  // Force the seed to int(seed)

Full Error trackback,

services-kohya-1  |   torch.utils._pytree._register_pytree_node(
services-kohya-1  | 2024-04-23 03:57:31 INFO     Loading settings from            train_util.py:3744
services-kohya-1  |                              ./outputs/tmpfilelora.toml...
services-kohya-1  |                     INFO     ./outputs/tmpfilelora            train_util.py:3763
services-kohya-1  | Traceback (most recent call last):
services-kohya-1  |   File "_mt19937.pyx", line 180, in numpy.random._mt19937.MT19937._legacy_seeding
services-kohya-1  | TypeError: 'float' object cannot be interpreted as an integer
services-kohya-1  |
services-kohya-1  | During handling of the above exception, another exception occurred:
services-kohya-1  |
services-kohya-1  | Traceback (most recent call last):
services-kohya-1  |   File "/app/sd-scripts/sdxl_train_network.py", line 185, in <module>
services-kohya-1  |     trainer.train(args)
services-kohya-1  |   File "/app/sd-scripts/train_network.py", line 151, in train
services-kohya-1  |     set_seed(args.seed)
services-kohya-1  |   File "/app/venv/lib/python3.10/site-packages/accelerate/utils/random.py", line 44, in set_seed
services-kohya-1  |     np.random.seed(seed)
services-kohya-1  |   File "numpy/random/mtrand.pyx", line 4806, in numpy.random.mtrand.seed
services-kohya-1  |   File "numpy/random/mtrand.pyx", line 250, in numpy.random.mtrand.RandomState.seed
services-kohya-1  |   File "_mt19937.pyx", line 168, in numpy.random._mt19937.MT19937._legacy_seeding
services-kohya-1  |   File "_mt19937.pyx", line 188, in numpy.random._mt19937.MT19937._legacy_seeding
services-kohya-1  | TypeError: Cannot cast scalar from dtype('float64') to dtype('int64') according to the rule 'safe'
services-kohya-1  | Traceback (most recent call last):
services-kohya-1  |   File "/app/venv/bin/accelerate", line 8, in <module>
services-kohya-1  |     sys.exit(main())
services-kohya-1  |   File "/app/venv/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 47, in main
services-kohya-1  |     args.func(args)
services-kohya-1  |   File "/app/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1017, in launch_command
services-kohya-1  |     simple_launcher(args)
services-kohya-1  |   File "/app/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 637, in simple_launcher
services-kohya-1  |     raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
services-kohya-1  | subprocess.CalledProcessError: Command '['/app/venv/bin/python3', '/app/sd-scripts/sdxl_train_network.py', '--config_file', './outputs/tmpfilelora.toml', '--log_prefix=xl-lora']' returned non-zero exit status 1.
services-kohya-1  | 03:57:33-429097 INFO     Training has ended.
@hinablue
Copy link
Contributor Author

Also those need to be int,

"caption_dropout_every_n_epochs": int(caption_dropout_every_n_epochs),

"lr_scheduler_num_cycles": (
    int(lr_scheduler_num_cycles) if lr_scheduler_num_cycles != "" else int(epoch)
),

"max_train_epochs": int(max_train_epochs) if max_train_epochs != 0 else None,

"max_train_steps": int(max_train_steps) if max_train_steps != 0 else None,

"persistent_data_loader_workers": int(persistent_data_loader_workers),


config_toml_data["max_data_loader_n_workers"] = int(max_data_loader_n_workers)

Error trackback,

services-kohya-1  | Using decoupled weight decay
services-kohya-1  | enable fp8 training.
services-kohya-1  | running training / 学習開始
services-kohya-1  |   num train images * repeats / 学習画像の数×繰り返し回数: 1650
services-kohya-1  |   num reg images / 正則化画像の数: 1400
services-kohya-1  |   num batches per epoch / 1epochのバッチ数: 839
services-kohya-1  |   num epochs / epoch数: 15
services-kohya-1  |   batch size per device / バッチサイズ: 4
services-kohya-1  |   gradient accumulation steps / 勾配を合計するステップ数 = 1
services-kohya-1  |   total optimization steps / 学習ステップ数: 12000.0
services-kohya-1  | Traceback (most recent call last):
services-kohya-1  |   File "/app/sd-scripts/sdxl_train_network.py", line 185, in <module>
services-kohya-1  |     trainer.train(args)
services-kohya-1  |   File "/app/sd-scripts/train_network.py", line 739, in train
services-kohya-1  |     progress_bar = tqdm(range(args.max_train_steps), smoothing=0, disable=not accelerator.is_local_main_process, desc="steps")
services-kohya-1  | TypeError: 'float' object cannot be interpreted as an integer
services-kohya-1  | Traceback (most recent call last):
services-kohya-1  |   File "/app/venv/bin/accelerate", line 8, in <module>
services-kohya-1  |     sys.exit(main())
services-kohya-1  |   File "/app/venv/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 47, in main
services-kohya-1  |     args.func(args)
services-kohya-1  |   File "/app/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1017, in launch_command
services-kohya-1  |     simple_launcher(args)
services-kohya-1  |   File "/app/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 637, in simple_launcher
services-kohya-1  |     raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
services-kohya-1  | subprocess.CalledProcessError: Command '['/app/venv/bin/python3', '/app/sd-scripts/sdxl_train_network.py', '--config_file', './outputs/tmpfilelora.toml', '--log_prefix=xl-locon']' returned non-zero exit status 1.
services-kohya-1  | 04:14:24-424949 INFO     Training has ended.

@bmaltais bmaltais added the enhancement New feature or request label Apr 23, 2024
@bmaltais
Copy link
Owner

Fixed in latest dev commit.

bmaltais added a commit that referenced this issue Apr 25, 2024
…ed-type-error

Fix [24.0.6] Train toml config seed type error #2370
@bmaltais bmaltais mentioned this issue Apr 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

2 participants