Is it advisable to avoid variable shadowing when using Candle? #2273

chenwanqq · 2024-06-18T17:42:24Z

Discussed in #2272

^{Originally posted by chenwanqq June 19, 2024}
Considering following facts:

Rust doesn't have a garbage collector.
Variable shadowing does not release memory (drop) within the same scope.
If track_op() is true, applying an operator to a tensor creates a new result tensor that holds a copy of the original tensor(s).

candle/candle-core/src/tensor.rs

Lines 675 to 682 in 2b10aaa

pub fn elu(&self, alpha: f64) -> Result<Self> {

if self.elem_count() == 0 {

return Ok(self.clone());

}

let storage = self.storage().elu(self.layout(), alpha)?;

let op = BackpropOp::new1(self, |t| Op::Elu(t, alpha));

Ok(from_storage(storage, self.shape(), op, false))

}

candle/candle-core/src/op.rs

Lines 877 to 884 in 2b10aaa

pub(crate) fn new1(arg: &Tensor, f: impl Fn(Tensor) -> Op) -> Self {

let op = if arg.track_op() {

Some(f(arg.clone()))

} else {

None

};

Self(op)

}

Many current model implementations are 'suspicious' regarding memory usage.

Compare the following two pseudo codes:

Code 1:

let mut x = some_big_tensor;
x = x.some_op(x)?; // The original x will be dropped here.

Code 2:

let x = some_big_tensor;
let x = x.some_op(x)?; // The original x is shadowed, but not dropped.

If my understanding is correct (although I haven't measured the actual memory usage), Code 2 will keep both original x and new x in memory because the original x is not dropped. It would be even worse if x is marked as variable in training scenario. In essence, variable shadowing could potentially lead to a significant increase in memory usage.

Models like LLaMA, Mistral, and LLaVA (which I have implemented) often rely on variable shadowing. If this issue is indeed valid, I believe numerous code segments should be revised, and a warning about this should be issued.

chenwanqq mentioned this issue Jun 18, 2024

Model Wishlist EricLBuehler/mistral.rs#156

Open

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is it advisable to avoid variable shadowing when using Candle? #2273

Is it advisable to avoid variable shadowing when using Candle? #2273

chenwanqq commented Jun 18, 2024

Is it advisable to avoid variable shadowing when using Candle? #2273

Is it advisable to avoid variable shadowing when using Candle? #2273

Comments

chenwanqq commented Jun 18, 2024

Discussed in #2272