Add back accelerate compatibility by a-r-r-o-w · Pull Request #339 · huggingface/finetrainers

a-r-r-o-w · 2025-03-19T08:40:04Z

A few PRs ago (don't recall which exactly), I had to remove accelerate compatibility for a feature. This PR adds accelerate compatibility again and does a major refactor.

neph1 · 2025-03-20T17:51:49Z

I've only tested this with the ui, yet, so it might be something on my side that is missing, or a config issue. But I'm still using "pretrained_model_name_or_path", which I think might be relevant to this:

File "/finetrainers/train.py", line 70, in main
trainer.run()
File "/finetrainers/finetrainers/trainer/sft_trainer/trainer.py", line 97, in run
raise e
File "/finetrainers/finetrainers/trainer/sft_trainer/trainer.py", line 92, in run
self._train()
File "/finetrainers/finetrainers/trainer/sft_trainer/trainer.py", line 420, in _train
precomputed_condition_iterator, precomputed_latent_iterator = self._prepare_data(
File "/finetrainers/finetrainers/trainer/sft_trainer/trainer.py", line 938, in _prepare_data
condition_components = self.model_specification.load_condition_models()
File "/finetrainers/finetrainers/models/hunyuan_video/base_specification.py", line 136, in load_condition_models
self.pretrained_model_name_or_path, subfolder="tokenizer_2" ** common_kwargs
TypeError: unsupported operand type(s) for ** or pow(): 'str' and 'dict'

a-r-r-o-w · 2025-03-20T23:18:30Z

Oh oops, looks like I missed a comma somewhere. Taking a look

Edit: fixed in #341

a-r-r-o-w · 2025-03-21T03:59:07Z

Accelerate compatibility should be back now. Only did some mini-runs for now and starting a 1000-step run to verify correctness: https://wandb.ai/aryanvs/finetrainers-debug

PTD vs Accelerate loss curves (grey vs maroon):

The token count and order of data did not match up for both. This most likely suggests to me that there is a difference of determinism between the two for dataset, which needs to be further investigated in future.

update

04d7006

a-r-r-o-w and others added 4 commits March 21, 2025 00:21

update

327862d

Merge branch 'main' into feature/accelerate-compatibility

839fabf

add more tests

3c051b0

update

265a238

remove unused function

c19b5b7

a-r-r-o-w merged commit 7a2afa5 into main Mar 21, 2025

a-r-r-o-w deleted the feature/accelerate-compatibility branch March 21, 2025 09:03

a-r-r-o-w mentioned this pull request Mar 21, 2025

Prepare for v0.1.0 release #322

Merged

This was referenced Apr 12, 2025

'AcceleratedScheduler' object has no attribute 'get_lr_scheduler_state' #293

Closed

Fix #352: FSDP2 argument typo #370

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add back accelerate compatibility#339

Add back accelerate compatibility#339
a-r-r-o-w merged 6 commits into
mainfrom
feature/accelerate-compatibility

a-r-r-o-w commented Mar 19, 2025

Uh oh!

neph1 commented Mar 20, 2025

Uh oh!

a-r-r-o-w commented Mar 20, 2025 •

edited

Loading

Uh oh!

a-r-r-o-w commented Mar 21, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

a-r-r-o-w commented Mar 19, 2025

Uh oh!

neph1 commented Mar 20, 2025

Uh oh!

a-r-r-o-w commented Mar 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

a-r-r-o-w commented Mar 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

a-r-r-o-w commented Mar 20, 2025 •

edited

Loading

a-r-r-o-w commented Mar 21, 2025 •

edited

Loading