You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
nn.Linear(8192, 128_256, bias=False) with FrozenNF4Linear(8192, 128_256, bias=False)
in torchtune, I surprisingly end up using a lot more memory. Leaving the output layer in bf16 results in the training run using ~43gb of peak memory active, while quantizing the output results in ~52gb active. I wonder if this is due to the large size of the output layer.
Steps to reproduce:
Replace nn.Linear with FrozenNF4Linear in the model here (FrozenNF4Linear is just a linear_nf4 wrapper)
When I replace the output layer for llama3.1 70B
nn.Linear(8192, 128_256, bias=False)
withFrozenNF4Linear(8192, 128_256, bias=False)
in torchtune, I surprisingly end up using a lot more memory. Leaving the output layer in bf16 results in the training run using ~43gb of peak memory active, while quantizing the output results in ~52gb active. I wonder if this is due to the large size of the output layer.
Steps to reproduce:
tune run lora_finetune_single_device --config ./70B_qlora_long_context.yaml
tokenizer.max_seq_len=8192The text was updated successfully, but these errors were encountered: