Skip to content

Add back accelerate compatibility#339

Merged
a-r-r-o-w merged 6 commits into
mainfrom
feature/accelerate-compatibility
Mar 21, 2025
Merged

Add back accelerate compatibility#339
a-r-r-o-w merged 6 commits into
mainfrom
feature/accelerate-compatibility

Conversation

@a-r-r-o-w
Copy link
Copy Markdown
Contributor

A few PRs ago (don't recall which exactly), I had to remove accelerate compatibility for a feature. This PR adds accelerate compatibility again and does a major refactor.

@neph1
Copy link
Copy Markdown
Contributor

neph1 commented Mar 20, 2025

I've only tested this with the ui, yet, so it might be something on my side that is missing, or a config issue. But I'm still using "pretrained_model_name_or_path", which I think might be relevant to this:

File "/finetrainers/train.py", line 70, in main
trainer.run()
File "/finetrainers/finetrainers/trainer/sft_trainer/trainer.py", line 97, in run
raise e
File "/finetrainers/finetrainers/trainer/sft_trainer/trainer.py", line 92, in run
self._train()
File "/finetrainers/finetrainers/trainer/sft_trainer/trainer.py", line 420, in _train
precomputed_condition_iterator, precomputed_latent_iterator = self._prepare_data(
File "/finetrainers/finetrainers/trainer/sft_trainer/trainer.py", line 938, in _prepare_data
condition_components = self.model_specification.load_condition_models()
File "/finetrainers/finetrainers/models/hunyuan_video/base_specification.py", line 136, in load_condition_models
self.pretrained_model_name_or_path, subfolder="tokenizer_2" ** common_kwargs
TypeError: unsupported operand type(s) for ** or pow(): 'str' and 'dict'

@a-r-r-o-w
Copy link
Copy Markdown
Contributor Author

a-r-r-o-w commented Mar 20, 2025

Oh oops, looks like I missed a comma somewhere. Taking a look

Edit: fixed in #341

@a-r-r-o-w
Copy link
Copy Markdown
Contributor Author

a-r-r-o-w commented Mar 21, 2025

Accelerate compatibility should be back now. Only did some mini-runs for now and starting a 1000-step run to verify correctness: https://wandb.ai/aryanvs/finetrainers-debug

PTD vs Accelerate loss curves (grey vs maroon):

image

The token count and order of data did not match up for both. This most likely suggests to me that there is a difference of determinism between the two for dataset, which needs to be further investigated in future.

@a-r-r-o-w a-r-r-o-w merged commit 7a2afa5 into main Mar 21, 2025
@a-r-r-o-w a-r-r-o-w deleted the feature/accelerate-compatibility branch March 21, 2025 09:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants