-
Notifications
You must be signed in to change notification settings - Fork 5.4k
[LoRA][III] Add LoRA support for MoE layers and enable TP #14105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
190 commits
Select commit
Hold shift + click to select a range
5d121f8
Implement basic test
Jonahcb 317c4f6
Shard the model across two gpus
Jonahcb da4285f
Add multi-test support
Jonahcb 4008c42
Add comprehensive test configs
Jonahcb 30a3986
Add comprehensive test configs
Jonahcb 121fcae
Rename moe test file
Jonahcb d57c7a0
Add spec decoding cases
Jonahcb 9837cea
Merge branch 'main' into moe/comprehensive-moe-integration-tests
Jonahcb 54c0d01
Simplify code
Jonahcb 9dd39ce
Fix config issues
Jonahcb d092df0
Fix config issues
Jonahcb ac734f6
Add default mxfp4 moe test model
Jonahcb 34baf32
Add configs for auto backend choosing logic
Jonahcb f01cb32
Rename file and remove unnecessary configs
Jonahcb 46b34ed
Simplify configs
Jonahcb 53e9b18
Add helpful comments
Jonahcb 3e36425
Correct comment
Jonahcb c5d6d8b
Merge branch 'main' into moe/comprehensive-moe-integration-tests
Jonahcb 0bac315
Merge branch 'main' into moe/comprehensive-moe-integration-tests
Jonahcb 8081291
Adjust default model for each test case
Jonahcb e0193ec
Add default moe NVFP4 model name for test
Jonahcb aa9382a
Wire default NVFP4 moe model into moe integration tests
Jonahcb 3a92ceb
Wire default NVFP4 moe model into moe integration tests configs
Jonahcb f8e09fa
Remove unncessary args
Jonahcb 1a2a1c5
Merge branch 'main' into moe/comprehensive-moe-integration-tests
Jonahcb f347ce3
Merge branch 'main' into moe/comprehensive-moe-integration-tests
Jonahcb 4504c2b
Merge branch 'main' into moe/comprehensive-moe-integration-tests
Jonahcb aad84ef
Add to not_in_ci list
Jonahcb ccd9eca
fix lint issues
Jonahcb d0c4813
Clean up code
Jonahcb 4089930
merge main
Jonahcb 6bdfa6f
Merge remote-tracking branch 'upstream/main' into add-moe-lora-support
Jonahcb b218f23
fix
Jonahcb ba64942
add test
Jonahcb c339cee
fix
Jonahcb c8a408d
Merge remote-tracking branch 'upstream/main' into add-moe-lora-support
Jonahcb b4fafa4
Merge branch 'main' into add-moe-lora-support
Jonahcb fcd0768
Merge remote-tracking branch 'origin/main' into add-moe-lora-support
Jonahcb 5160a55
Merge branch 'add-moe-lora-support' of github.com:Jonahcb/sglang into…
Jonahcb 7fa7ddd
simplify test
Jonahcb 367612a
fix lora id issue
Jonahcb af3c758
remove unnecessary code
Jonahcb d3b27ee
rename vars for clarity
Jonahcb 13d1bfe
Merge branch 'main' into add-moe-lora-support
Jonahcb 53dc64d
move from lora_moe.py to layers.py
Jonahcb ad9c32e
clean up test files
Jonahcb d28a7bb
Merge branch 'main' into add-moe-lora-support
Jonahcb 8242c44
Merge remote-tracking branch 'upstream/main' into add-moe-lora-support
jhinpan 022e6f8
Merge remote-tracking branch 'upstream/main' into add-moe-lora-support
Jonahcb 2e199b9
Merge remote-tracking branch 'upstream/main' into add-moe-lora-support
Jonahcb 883b5ef
fix
Jonahcb b0cd554
modify shape initialization to use moe_intermediate_size_ from config…
Jonahcb d1e3155
fix dim mismatch in buffer_view and weights due to stacking
Jonahcb 1eb9df1
fix stacking issue for gate_up_proj for LoRA B
Jonahcb 7cde90f
Merge remote-tracking branch 'upstream/main' into add-moe-lora-support
Jonahcb b50cfe4
add down proj calculation
Jonahcb 28daf59
add debugging statements
Jonahcb af73406
use intermediate tensors
Jonahcb f5a22ef
return LoRA addition as well
Jonahcb 18476d7
fix atomic add issue
Jonahcb 0726a93
Make sure all tensor types match
Jonahcb b211fb0
make sure types match
Jonahcb cef9460
Add topk weights multiplications
Jonahcb 4e688e3
fix max_rank issues
Jonahcb c616f58
clean up debugging code
Jonahcb ac1ffea
use torch.zeros
Jonahcb 01f6c92
merge
Jonahcb 39c9316
fix
Jonahcb dfe69e9
fix
Jonahcb 8c41ff9
add comments for clarity
Jonahcb ac574c7
Merge branch 'main' into add-moe-lora-support
Jonahcb 5b3f5aa
add activation function
Jonahcb bbae67d
remove unused parameters
Jonahcb 3abf25e
fix mismatch types
Jonahcb f8e99e5
remove unnecessary if
Jonahcb e1d43aa
Merge branch 'main' into add-moe-lora-support
Jonahcb 315c64d
refactor so that LoRA computations are added inside base MoE path
Jonahcb 099aa82
refactor to utilize vLLM kernel
Jonahcb ac4a008
convert strings to int where necessary
Jonahcb 305acc9
fix
Jonahcb bf831a9
fix
Jonahcb 7c5880a
fix
Jonahcb fec49f1
fix
Jonahcb 307abef
fix
Jonahcb c9062b0
fix
Jonahcb 6e10967
add unit tests
Jonahcb 3e0047a
fix tests
Jonahcb e56b451
add unit test for lora + base path
Jonahcb eb24157
add end to end test
Jonahcb c8bbc25
fix layer_id issue
Jonahcb 64c3d96
Add moe lora align sum kernel
Jonahcb 0e8c05d
add call to moe lora align kernel
Jonahcb a03797e
fix
Jonahcb bab26e6
refactor to use MoE runners infra
Jonahcb b76d05a
update runner test case to work with refactoring
Jonahcb 06d22be
fix runner test case
Jonahcb 8dd1ef3
fix
Jonahcb 796db38
fix
Jonahcb d0a9f9b
fix
Jonahcb 952c8d3
fix small issues in lora moe runners
Jonahcb 49ac712
fix small issue in layers.py
Jonahcb 468a10f
remove custom kernel build path
Jonahcb e8259a3
remove unused code
Jonahcb 999dd7c
fix
Jonahcb 5fcdfd5
fix
Jonahcb f7cba25
major fixes
Jonahcb 688e0c2
fix
Jonahcb 66885cd
fix test
Jonahcb f5cf615
remove csgmv support
Jonahcb dccc359
fixes
Jonahcb 39ebafd
finalize fixes
Jonahcb e8b40e0
Merge branch 'main' into add-moe-lora-support
Jonahcb cf63435
fix
Jonahcb 29ceca1
fix
Jonahcb 26709e8
lint
Jonahcb 27336a5
fix merge conflict
Jonahcb e062d4f
fix comments
Jonahcb 1b8e359
remove unused code
Jonahcb bae8100
better check in mempool
Jonahcb 297abcc
code quality
Jonahcb 5b2c585
improve code quality
Jonahcb 05a9ca8
remove unused code
Jonahcb a329d01
remove unused code
Jonahcb d477548
add GDC support
Jonahcb 137c9cb
remove unused code
Jonahcb 2940aa2
remove unused code
Jonahcb 9ac01d3
move token sorting kernel to jit kernel folder
Jonahcb 0a576b3
move token sorting kernels to jit kernel
Jonahcb e89993a
Merge branch 'main' into add-moe-lora-support
Jonahcb 0ab7c9f
Merge branch 'main' into add-moe-lora-support
Jonahcb d2e2b35
Fix small error
Jonahcb e246c10
small fix
Jonahcb 8053b5f
fix hf test
Jonahcb 5959187
fix
Jonahcb 224982a
fix
Jonahcb 9f6aeec
remove unnecessary injection of max_lora_rank
Jonahcb 0b440e3
Revert "remove unnecessary injection of max_lora_rank"
Jonahcb 68ea9c9
fix max lora ranks calc
Jonahcb bf5448a
lint and fix dropping lora modules issue
Jonahcb de6ff7b
add prompts back
Jonahcb 18c6ae1
Merge branch 'main' into add-moe-lora-support
Jonahcb 7496364
Merge branch 'main' into add-moe-lora-support
Jonahcb 0bd02ce
modify some ci tests
yushengsu-thu 1461e8c
fix some tests
yushengsu-thu d382084
Merge remote-tracking branch 'upstream/main' into add-moe-lora-support
yushengsu-thu 0922c4a
pre-commit
yushengsu-thu 3c77e28
Merge branch 'main' into add-moe-lora-support
Fridge003 bfe9e1c
Merge branch 'main' into add-moe-lora-support
Jonahcb 3328486
Merge branch 'main' into add-moe-lora-support
Jonahcb 447dd0b
Merge branch 'main' into add-moe-lora-support
Jonahcb 13cbf1c
rename moe lora align block size kernel test file
Jonahcb 6872b3a
add vllm baseline comparison test
Jonahcb 6071178
add docstring
Jonahcb 1c669bf
Merge branch 'main' into add-moe-lora-support
Jonahcb d5f0e73
move unit test to jit-kernel directory
Jonahcb 4f35f5c
Merge branch 'main' into add-moe-lora-support
Jonahcb 0a2ad6e
Merge branch 'main' into add-moe-lora-support
yushengsu-thu cb48c65
fix max_lora_rank value in packed gate_up_proj case
Jonahcb 12bcbb1
fix the expand error in the last commit
yushengsu-thu 34ed28a
update
yushengsu-thu c48f6da
update vllm baseline test hardcode logprobs after bug fix
Jonahcb 1b0ce76
increase lora_moe_runner test fail threshold to 0.52 from 0.02
Jonahcb 0b02bba
lower tolerance threshold
Jonahcb 89e5274
fix mul_routed_weight being applied twice
Jonahcb 74b3471
increase test coverage to test mul_routed_weight=True
Jonahcb 27629fd
revert hardcoding mul_routed_weight
Jonahcb 071d9a0
fixed kernel unit test
Jonahcb 37f3a7f
Merge upstream/main into add-moe-lora-support
yushengsu-thu 0a9a154
Fix MoE LoRA down-projection shrink kernel reading wrong input rows
yushengsu-thu a73a33a
fix
yushengsu-thu 03ed3bb
Add MoE LoRA tensor parallel support and TP=2 CI tests
yushengsu-thu ac8a4dc
pre-commit
yushengsu-thu 881f7ab
Merge branch 'main' into add-moe-lora-support
yushengsu-thu 7a9721d
Merge remote-tracking branch 'upstream/main' into add-moe-lora-support
yushengsu-thu 55ea86e
tune ci mem
yushengsu-thu 86a8f6d
Merge branch 'main' into add-moe-lora-support
yushengsu-thu a633054
Merge branch 'main' into add-moe-lora-support
yushengsu-thu 78e15e2
fix mem in sgl to pass ci
yushengsu-thu 7fb7333
enlarge mem_fraction_static value
yushengsu-thu 14afea6
move ci to large
yushengsu-thu 342b5e8
change thread - still normal range
yushengsu-thu 067a007
upd tests
Fridge003 5d2631b
avoid regression of csgmv
Fridge003 33d70b0
upd test name
Fridge003 cba071e
upd
Fridge003 825bd5b
upd test
Fridge003 1725517
Merge branch 'main' into add-moe-lora-support
Fridge003 0a07f30
upd test
Fridge003 ab6aa5d
upd
Fridge003 c6c59b3
restore test_lora_hf_sgl_logprob_diff to main branch
Fridge003 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.