-
Notifications
You must be signed in to change notification settings - Fork 3.4k
feat: integrate deepgemm into EPMoE #6821
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
53 commits
Select commit
Hold shift + click to select a range
92d647c
feat(ep_moe): integrate deepgemm into origin ep moe
TianQiLin666666 e057acb
fix(ep_moe): group_gemm_mask bug
TianQiLin666666 19ec50e
fix bugs
TianQiLin666666 3ce1a91
fix bugs
TianQiLin666666 3d51a71
fix(em_moe): offset bugs
TianQiLin666666 c80fc3c
fix(deepgemm): bugfix
TianQiLin666666 af94a8b
fix: remove redundant code
TianQiLin666666 2022070
fix: clang-format
TianQiLin666666 55ea483
fix: remove print
TianQiLin666666 988a522
fix(ep_moe): replace EPMOE_USE_DEEPGEMM with _ENABLE_JIT_DEEPGEMM
TianQiLin666666 1f81f01
merge main
xutizhou b4ae984
Refactor moe_ep_deepgemm_preprocess to remove CUDA-specific handling …
xutizhou c6d51d2
Fix condition for expert fusion by updating the check for 'enable_dee…
xutizhou 0397c25
Fix typo in function name from 'moe_ep_deepgemm_preproess' to 'moe_ep…
xutizhou aeb437d
Update moe_ep_deepgemm_preprocess to adjust m_max calculation for mas…
xutizhou d2c19bf
Refactor compute_masked_m_triton_kernel to remove num_experts paramet…
xutizhou 0d3e793
Refactor moe_ep_deepgemm_preprocess to improve assertions and streaml…
xutizhou ee1a2b1
Refactor kernel functions in kernels.py to improve variable naming an…
xutizhou d530ee7
Enhance EPMoE layer by capturing hidden states' shape, dtype, and dev…
xutizhou cb6ea36
Refactor EPMoE layer to improve memory management by explicitly delet…
xutizhou 2769865
Update EPMoE layer to use hidden_states_device for tensor creation, e…
xutizhou a7a6235
Refactor deepgemm_post_reorder_triton_kernel to improve variable hand…
xutizhou a1493e5
Optimize EPMoE layer by explicitly deleting intermediate tensor varia…
xutizhou b1edec5
fix(ep_moe_deepgemm): use dispose_tensor to really free tensor mem
TianQiLin666666 773f151
Merge pull request #1 from TianQiLin666666/feat/ep_moe_deepgemm_zxt
xutizhou 4975723
Merge branch 'main' into feat/ep_moe_deepgemm
zhyncs 6584c21
Enhance EPMoE layer to conditionally use deep GEMM based on FP8 setti…
xutizhou c54a141
Optimize memory management in EPMoE layer by removing unnecessary ten…
xutizhou 86f06b6
Merge branch 'main' into feat/ep_moe_deepgemm
zhyncs 15c9514
Merge branch 'main' into feat/ep_moe_deepgemm
zhyncs afd55c2
Merge branch 'main' into feat/ep_moe_deepgemm
zhyncs b596fa2
fix: remove 'del gateup_input_scale' to avoid H20*8 OOM, and move ann…
TianQiLin666666 fe97d1f
Merge branch 'main' into feat/ep_moe_deepgemm
ch-wan 431f47a
Merge pull request #2 from TianQiLin666666/feat/ep_moe_deepgemm_zxt
xutizhou 1422da6
Merge branch 'main' into feat/ep_moe_deepgemm
ch-wan bbbd98a
Merge branch 'main' into feat/ep_moe_deepgemm
zhyncs 35cc918
Merge branch 'main' into feat/ep_moe_deepgemm
ch-wan 1a26425
dispose hidden_states
ch-wan a902338
Merge branch 'main' into feat/ep_moe_deepgemm
TianQiLin666666 d6b3afd
fix: call deep_gemm_wrapper APIs in epmoe forward_deepgemm
TianQiLin666666 b51f324
assert act=="silu" in epmoe forward_deepgemm
TianQiLin666666 ad9abb2
fix(epmoe): remove _ENABLE_JIT_DEEPGEMM
TianQiLin666666 1869b18
fix(fill_gateup_input_triton_kernel): pre-define a tl.arange() outsid…
TianQiLin666666 77123c6
replace deepgemm_post_reorder_triton_kernel with post_reorder_triton_…
TianQiLin666666 ddc0c33
fix args of post_reorder_triton_kernel in all tests and benchmarks
TianQiLin666666 8e71771
Merge branch 'main' into feat/ep_moe_deepgemm
ch-wan a6d61f6
Merge branch 'main' into feat/ep_moe_deepgemm
zhyncs e30b3ab
Merge branch 'main' into feat/ep_moe_deepgemm
zhyncs aafbe4e
add num_fused_shared_experts
TianQiLin666666 0ea5bd9
fix(moe_deepgemm): convert per-tensor weight quant to per-block quant…
TianQiLin666666 87d68e9
Merge branch 'main' into feat/ep_moe_deepgemm
ch-wan 26040f4
Merge branch 'main' into feat/ep_moe_deepgemm
xutizhou bbb127d
Merge branch 'main' into feat/ep_moe_deepgemm
ch-wan File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.