Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Training with Custom Prompts Instead of Image IDs #52

Open
rutuja1409 opened this issue Sep 30, 2024 · 2 comments
Open

Support for Training with Custom Prompts Instead of Image IDs #52

rutuja1409 opened this issue Sep 30, 2024 · 2 comments

Comments

@rutuja1409
Copy link

Hi @gasvn,

I would like to train a model using my custom dataset. However, I noticed that the current training process only supports using image IDs. Is there a way to provide a custom prompt for each image instead of using just the image ID?

If this feature is not currently available, is there a plan to include it in any upcoming releases?

Thank you!

@gasvn
Copy link
Collaborator

gasvn commented Sep 30, 2024

I suggest you that you can follow stable diffusion to use the embeddings of clip output and use cross-attention to change the condition.

@rutuja1409
Copy link
Author

I suggest you that you can follow stable diffusion to use the embeddings of clip output and use cross-attention to change the condition.

Thank you for your reply. I understand the first part about using CLIP embeddings, but could you please clarify how you suggest changing the condition in the masked diffusion transformer code? Specifically, what modifications should I make to integrate the CLIP embeddings with the cross-attention mechanism in the training process?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants