Address Incorrect Ignoring of Base Layer Modules for FSDP with Kernels #31
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR addresses that when we use
lora_adapters_switch_ddp_from_fsdpto ignore the lora modules, we have previously also ignored the base layersThis fix properly addresses it, by ignoring only the LoRA modules
auto_gptqwe found that it works wellbnb, this will cause thequant_stateon the parameter to be destroyed, to address this, we now get thequant_statefrom the base_layer, thus also addressing Failure in FSDP Benchmark Experiment using QLoRA with Custom Fused Modules #3Tests
General Benchmarks
Before Fix: No Sharding of Attention Base Layer
Config
per
device
(toks/sec)
Memory
Allocated (GiB)
After Fix: Attention Base Layer Sharded
Config
(toks/sec)
Memory
Allocated (GiB)
Before Fix
Nothing runs for 2 GPUS due to #3
After Fix
QLoRA-FOAK is compatible with FSDP and there is roughly a 10% increase in speed from applying FOAK
Config
(toks/sec)
Memory
Allocated (GiB)
Loss
Llama3
Config
(toks/sec)
Memory
Allocated (GiB)
Loss