Skip to content

Conversation

@fabianlim
Copy link
Contributor

@fabianlim fabianlim commented Jun 5, 2024

This PR addresses that when we use lora_adapters_switch_ddp_from_fsdp to ignore the lora modules, we have previously also ignored the base layers

  • this will cause a big memory increment since base layer can be very large

This fix properly addresses it, by ignoring only the LoRA modules

Tests

General Benchmarks

Before Fix: No Sharding of Attention Base Layer

Model Name Framework
Config
No. GPU Per Dev. Batch Size Throughput
per
device
(toks/sec)
Torch
Memory
Allocated (GiB)
TheBloke/Llama-2-70B-GPTQ accelerated-peft-gptq 1 4 455 36.3
TheBloke/Llama-2-70B-GPTQ accelerated-peft-gptq 2 4 445 18.1
TheBloke/Llama-2-70B-GPTQ accelerated-peft-gptq-foak 1 4 497 36.3
TheBloke/Llama-2-70B-GPTQ accelerated-peft-gptq-foak 2 4 476 15.1

After Fix: Attention Base Layer Sharded

Model Name Framework
Config
No. GPU Per Dev. Batch Size Throughput
(toks/sec)
Torch
Memory
Allocated (GiB)
TheBloke/Llama-2-70B-GPTQ accelerated-peft-gptq-foak 1 4 501 36.3
TheBloke/Llama-2-70B-GPTQ accelerated-peft-gptq-foak 2 4 497 18.1

Before Fix

Nothing runs for 2 GPUS due to #3

After Fix

QLoRA-FOAK is compatible with FSDP and there is roughly a 10% increase in speed from applying FOAK

Model Name Framework
Config
No. GPU Per Dev. Batch Size Throughput
(toks/sec)
Torch
Memory
Allocated (GiB)
Average
Loss
NousResearch/Llama-2-70b-hf accelerated-peft-bnb 2 4 441 19.2 0.922
NousResearch/Llama-2-70b-hf accelerated-peft-bnb-foak 2 4 485 19.2 0.922

Llama3

  • Distributed-QLoRA works with FOAK
  • There is a 10% increase when FOAK is applied.
  • The loss averages are verified to be the same even when FOAK is applied
  • Single-Device Experiments and Multi-Device+BatchSize=4 ran out of memory
Model Name Framework
Config
No. GPU Per Dev. Batch Size Throughput
(toks/sec)
Torch
Memory
Allocated (GiB)
Average
Loss
Meta-Llama3 accelerated-peft-bnb 2 2 398 20.9 0.922
Meta-Llama3 accelerated-peft-bnb-foak 2 2 434 20.9 0.922
TechxGenus/Meta-Llama-3-70B-Instruct-GPTQ accelerated-peft-autogptq 2 2 407 21.1 1.06
TechxGenus/Meta-Llama-3-70B-Instruct-GPTQ accelerated-peft-autogptq-foak 2 2 448 21.1 1.06

@fabianlim fabianlim requested a review from achew010 June 5, 2024 04:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants