Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot load the cache when mapping the dataset #7261

Open
zhangn77 opened this issue Oct 29, 2024 · 0 comments
Open

Cannot load the cache when mapping the dataset #7261

zhangn77 opened this issue Oct 29, 2024 · 0 comments

Comments

@zhangn77
Copy link

Describe the bug

I'm training the flux controlnet. The train_dataset.map() takes long time to finish. However, when I killed one training process and want to restart a new training with the same dataset. I can't reuse the mapped result even I defined the cache dir for the dataset.

with accelerator.main_process_first():
from datasets.fingerprint import Hasher

    # fingerprint used by the cache for the other processes to load the result
    # details: https://github.com/huggingface/diffusers/pull/4038#discussion_r1266078401
    new_fingerprint = Hasher.hash(args)
    train_dataset = train_dataset.map(
        compute_embeddings_fn, batched=True, new_fingerprint=new_fingerprint, batch_size=10,
    )

Steps to reproduce the bug

train flux controlnet and start again

Expected behavior

will not map again

Environment info

latest diffusers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant