Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it advisable to avoid variable shadowing when using Candle? #2273

Open
chenwanqq opened this issue Jun 18, 2024 Discussed in #2272 · 0 comments
Open

Is it advisable to avoid variable shadowing when using Candle? #2273

chenwanqq opened this issue Jun 18, 2024 Discussed in #2272 · 0 comments

Comments

@chenwanqq
Copy link
Contributor

Discussed in #2272

Originally posted by chenwanqq June 19, 2024
Considering following facts:

  • Rust doesn't have a garbage collector.
  • Variable shadowing does not release memory (drop) within the same scope.
  • If track_op() is true, applying an operator to a tensor creates a new result tensor that holds a copy of the original tensor(s).

pub fn elu(&self, alpha: f64) -> Result<Self> {
if self.elem_count() == 0 {
return Ok(self.clone());
}
let storage = self.storage().elu(self.layout(), alpha)?;
let op = BackpropOp::new1(self, |t| Op::Elu(t, alpha));
Ok(from_storage(storage, self.shape(), op, false))
}

pub(crate) fn new1(arg: &Tensor, f: impl Fn(Tensor) -> Op) -> Self {
let op = if arg.track_op() {
Some(f(arg.clone()))
} else {
None
};
Self(op)
}

Many current model implementations are 'suspicious' regarding memory usage.

Compare the following two pseudo codes:

Code 1:

let mut x = some_big_tensor;
x = x.some_op(x)?; // The original x will be dropped here.

Code 2:

let x = some_big_tensor;
let x = x.some_op(x)?; // The original x is shadowed, but not dropped.

If my understanding is correct (although I haven't measured the actual memory usage), Code 2 will keep both original x and new x in memory because the original x is not dropped. It would be even worse if x is marked as variable in training scenario. In essence, variable shadowing could potentially lead to a significant increase in memory usage.

Models like LLaMA, Mistral, and LLaVA (which I have implemented) often rely on variable shadowing. If this issue is indeed valid, I believe numerous code segments should be revised, and a warning about this should be issued.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant