Separately Optimizing CLIP Image and Text Encoders with Different Loss Functions #952

omrisuissabrown · 2024-09-28T08:37:43Z

omrisuissabrown
Sep 28, 2024

Hello,

I'm working with the CLIP model and would like to train the image and text encoders using different loss functions (i.e., image_total_loss for the image encoder and text_total_loss for the text encoder - instead of the combined total_loss).

To achieve this, I plan to use two separate optimizers, one for each encoder, so each optimizer updates its respective encoder based on its specific loss function.

The challenge I'm facing is:
I can't find a way to distinguish between the parameters of the image encoder and the text encoder within the CLIP model. When I inspect the model's parameters, they all seem to be part of a single collection, and there's no clear separation.

My questions are:

Is there a way to separately access the parameters of the image encoder and the text encoder in the CLIP model?
How can I set up two optimizers that individually track the parameters of each encoder?
There is another way to train CLIP encoders with different loss functions?

Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Separately Optimizing CLIP Image and Text Encoders with Different Loss Functions #952

{{title}}

Replies: 0 comments

Select a reply

Separately Optimizing CLIP Image and Text Encoders with Different Loss Functions #952

omrisuissabrown Sep 28, 2024

Replies: 0 comments

omrisuissabrown
Sep 28, 2024