You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When training a TextPairRegressor model with embed_separately=False (the default), via e.g. ModelTrainer.fine_tune, the GPU memory slowly creeps up with each batch, eventually causing an OOM even when the model and a single batch fits easily in GPU memory.
The function store_embeddings is supposed to clear any embeddings of each DataPoint. For this model, the type of data point is TextPair. It actually does seem to handle clearing text_pair.first and .second when embed_separately=True, because it runs embed for each sentence (see TextPairRegressor._get_embedding_for_data_point), and that embedding is attached to each sentence so it can be referenced via the sentence.
However, the default setting is False; in that case, to embed the pair, it concatenates the text of both sentences (adding a separator), creates a new sentence, embeds that sentence, and then returns that embedding. Since it's never attached to the DataPoint object, clear_embeddings doesn't find it when you iterate over the data points. The function identify_dynamic_embeddings also always comes up empty
The memory should remain relatively flat with each epoch of training if memory is cleared correctly. In other training, such as for a TextClassifier, it stays roughly the same after each mini-batch,
I saw that when training a TextClassifier, the memory usage goes back down to the value at the beginning of a batch after store_embeddings is called. In TextPairRegressor, the memory does not go down at all after store_embeddings is called.
Environment
Versions:
Flair
0.13.1
Pytorch
2.3.1+cu121
Transformers
4.31.0
GPU
True
The text was updated successfully, but these errors were encountered:
Describe the bug
When training a
TextPairRegressor
model withembed_separately=False
(the default), via e.g.ModelTrainer.fine_tune
, the GPU memory slowly creeps up with each batch, eventually causing an OOM even when the model and a single batch fits easily in GPU memory.The function
store_embeddings
is supposed to clear any embeddings of each DataPoint. For this model, the type of data point isTextPair
. It actually does seem to handle clearingtext_pair.first
and.second
whenembed_separately=True
, because it runs embed for each sentence (seeTextPairRegressor._get_embedding_for_data_point
), and that embedding is attached to each sentence so it can be referenced via the sentence.However, the default setting is
False
; in that case, to embed the pair, it concatenates the text of both sentences (adding a separator), creates a new sentence, embeds that sentence, and then returns that embedding. Since it's never attached to theDataPoint
object,clear_embeddings
doesn't find it when you iterate over the data points. The functionidentify_dynamic_embeddings
also always comes up emptyTo Reproduce
Expected behavior
The memory should remain relatively flat with each epoch of training if memory is cleared correctly. In other training, such as for a
TextClassifier
, it stays roughly the same after each mini-batch,Logs and Stack traces
Screenshots
No response
Additional Context
I printed out the GPU usage in an altered
train_custom
:I saw that when training a
TextClassifier
, the memory usage goes back down to the value at the beginning of a batch afterstore_embeddings
is called. InTextPairRegressor
, the memory does not go down at all afterstore_embeddings
is called.Environment
Versions:
Flair
0.13.1
Pytorch
2.3.1+cu121
Transformers
4.31.0
GPU
True
The text was updated successfully, but these errors were encountered: