Address Incorrect Ignoring of Base Layer Modules for FSDP with Kernels #31

fabianlim · 2024-06-05T04:03:32Z

This PR addresses that when we use lora_adapters_switch_ddp_from_fsdp to ignore the lora modules, we have previously also ignored the base layers

this will cause a big memory increment since base layer can be very large

This fix properly addresses it, by ignoring only the LoRA modules

for auto_gptq we found that it works well
for bnb, this will cause the quant_state on the parameter to be destroyed, to address this, we now get the quant_state from the base_layer, thus also addressing Failure in FSDP Benchmark Experiment using QLoRA with Custom Fused Modules #3

Tests

General Benchmarks

Before Fix: No Sharding of Attention Base Layer

Model Name	Framework Config	No. GPU	Per Dev. Batch Size	Throughput per device (toks/sec)	Torch Memory Allocated (GiB)
TheBloke/Llama-2-70B-GPTQ	accelerated-peft-gptq	1	4	455	36.3
TheBloke/Llama-2-70B-GPTQ	accelerated-peft-gptq	2	4	445	18.1
TheBloke/Llama-2-70B-GPTQ	accelerated-peft-gptq-foak	1	4	497	36.3
TheBloke/Llama-2-70B-GPTQ	accelerated-peft-gptq-foak	2	4	476	15.1

After Fix: Attention Base Layer Sharded

Model Name	Framework Config	No. GPU	Per Dev. Batch Size	Throughput (toks/sec)	Torch Memory Allocated (GiB)
TheBloke/Llama-2-70B-GPTQ	accelerated-peft-gptq-foak	1	4	501	36.3
TheBloke/Llama-2-70B-GPTQ	accelerated-peft-gptq-foak	2	4	497	18.1

Before Fix

Nothing runs for 2 GPUS due to #3

After Fix

QLoRA-FOAK is compatible with FSDP and there is roughly a 10% increase in speed from applying FOAK

Model Name	Framework Config	No. GPU	Per Dev. Batch Size	Throughput (toks/sec)	Torch Memory Allocated (GiB)	Average Loss
NousResearch/Llama-2-70b-hf	accelerated-peft-bnb	2	4	441	19.2	0.922
NousResearch/Llama-2-70b-hf	accelerated-peft-bnb-foak	2	4	485	19.2	0.922

Llama3

Distributed-QLoRA works with FOAK
There is a 10% increase when FOAK is applied.
The loss averages are verified to be the same even when FOAK is applied
Single-Device Experiments and Multi-Device+BatchSize=4 ran out of memory

Model Name	Framework Config	No. GPU	Per Dev. Batch Size	Throughput (toks/sec)	Torch Memory Allocated (GiB)	Average Loss
Meta-Llama3	accelerated-peft-bnb	2	2	398	20.9	0.922
Meta-Llama3	accelerated-peft-bnb-foak	2	2	434	20.9	0.922
TechxGenus/Meta-Llama-3-70B-Instruct-GPTQ	accelerated-peft-autogptq	2	2	407	21.1	1.06
TechxGenus/Meta-Llama-3-70B-Instruct-GPTQ	accelerated-peft-autogptq-foak	2	2	448	21.1	1.06

fabianlim added 4 commits June 5, 2024 10:30

properly ignore lora adapters

16d9104

handle qlora quant state

6a2ad4a

improve fix

bc5b681

further simplification of fix

aeddaed

fabianlim requested a review from achew010 June 5, 2024 04:03

achew010 approved these changes Jun 5, 2024

View reviewed changes

fabianlim mentioned this pull request Jun 7, 2024

Updated benchmark reference #34

Merged

updated benchmark reference (#34)

735cac8

fabianlim merged commit 00febdc into dev Jun 7, 2024

fabianlim mentioned this pull request Jun 7, 2024

Upstream Main: Fused Ops and Kernels, FSDP and Memory Fixes #35

Merged

fabianlim mentioned this pull request Jun 24, 2024

Failure in FSDP Benchmark Experiment using QLoRA with Custom Fused Modules #3

Closed

fabianlim deleted the fix/fsdp-ignore-modules branch July 19, 2024 07:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Address Incorrect Ignoring of Base Layer Modules for FSDP with Kernels #31

Address Incorrect Ignoring of Base Layer Modules for FSDP with Kernels #31

Uh oh!

fabianlim commented Jun 5, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Address Incorrect Ignoring of Base Layer Modules for FSDP with Kernels #31

Address Incorrect Ignoring of Base Layer Modules for FSDP with Kernels #31

Uh oh!

Conversation

fabianlim commented Jun 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Tests

General Benchmarks

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fabianlim commented Jun 5, 2024 •

edited

Loading