Support for Training with Custom Prompts Instead of Image IDs #52

rutuja1409 · 2024-09-30T13:58:14Z

I would like to train a model using my custom dataset. However, I noticed that the current training process only supports using image IDs. Is there a way to provide a custom prompt for each image instead of using just the image ID?

If this feature is not currently available, is there a plan to include it in any upcoming releases?

Thank you!

gasvn · 2024-09-30T14:38:38Z

I suggest you that you can follow stable diffusion to use the embeddings of clip output and use cross-attention to change the condition.

rutuja1409 · 2024-10-01T22:25:52Z

I suggest you that you can follow stable diffusion to use the embeddings of clip output and use cross-attention to change the condition.

Thank you for your reply. I understand the first part about using CLIP embeddings, but could you please clarify how you suggest changing the condition in the masked diffusion transformer code? Specifically, what modifications should I make to integrate the CLIP embeddings with the cross-attention mechanism in the training process?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for Training with Custom Prompts Instead of Image IDs #52

Support for Training with Custom Prompts Instead of Image IDs #52

rutuja1409 commented Sep 30, 2024

gasvn commented Sep 30, 2024

rutuja1409 commented Oct 1, 2024

Support for Training with Custom Prompts Instead of Image IDs #52

Support for Training with Custom Prompts Instead of Image IDs #52

Comments

rutuja1409 commented Sep 30, 2024

gasvn commented Sep 30, 2024

rutuja1409 commented Oct 1, 2024