You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Many current model implementations are 'suspicious' regarding memory usage.
Compare the following two pseudo codes:
Code 1:
letmut x = some_big_tensor;
x = x.some_op(x)?;// The original x will be dropped here.
Code 2:
let x = some_big_tensor;let x = x.some_op(x)?;// The original x is shadowed, but not dropped.
If my understanding is correct (although I haven't measured the actual memory usage), Code 2 will keep both original x and new x in memory because the original x is not dropped. It would be even worse if x is marked as variable in training scenario. In essence, variable shadowing could potentially lead to a significant increase in memory usage.
Models like LLaMA, Mistral, and LLaVA (which I have implemented) often rely on variable shadowing. If this issue is indeed valid, I believe numerous code segments should be revised, and a warning about this should be issued.
The text was updated successfully, but these errors were encountered:
Discussed in #2272
Originally posted by chenwanqq June 19, 2024
Considering following facts:
Many current model implementations are 'suspicious' regarding memory usage.
Compare the following two pseudo codes:
Code 1:
Code 2:
If my understanding is correct (although I haven't measured the actual memory usage), Code 2 will keep both original x and new x in memory because the original x is not dropped. It would be even worse if x is marked as variable in training scenario. In essence, variable shadowing could potentially lead to a significant increase in memory usage.
Models like LLaMA, Mistral, and LLaVA (which I have implemented) often rely on variable shadowing. If this issue is indeed valid, I believe numerous code segments should be revised, and a warning about this should be issued.
The text was updated successfully, but these errors were encountered: