Use 2x less memory for MoE (including the DisposableTensor small tool) #5085

fzyzcjy · 2025-04-05T14:53:51Z

UPDATE: latest PR is in #6147

Brainstorm: maybe we can use the_tensor.set_(torch.empty((0,), device=...)) (e.g. if users do not need to access metadata of the tensor after this operation)

2025.04.21 Update

I observe something similar for forward_deepgemm_masked. Will update this PR containing that as well. For users who need it before this PR is extracted from my dev branch, please directly visit #5524

Original text

~~Do NOT look at the code now - it is based on two-batch-overlap branch (b/c that branch merges the tools branches). I will re-do the things on master branch later.~~
Updated

Before: peak 1.8+1.8+1.8GB = 5.4GB

After: peak is 1.8+0.9GB = 2.7GB

04.12: Updated the code, and re-checked the memory profile which still looks reasonable

Motivation

Modifications

Checklist

Format your code according to the Code Formatting with Pre-Commit.
Add unit tests as outlined in the Running Unit Tests.
Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
Please feel free to join our Slack channel at https://slack.sglang.ai to discuss your PR.

This reverts commit 740cd8a.

python/sglang/srt/layers/moe/ep_moe/layer.py

fzyzcjy · 2025-05-09T03:23:32Z

Title seems to be inaccurate - 2x memory save is due to multiple changes

fzyzcjy added 26 commits April 12, 2025 07:58

rebase to master

9ab1f2a

cp back

e378285

cp

a8ebfe9

fmt

b26010d

more

65c08f4

more

c24c212

more

8dccb00

more

2a4c630

more

86f1c99

fmt

a986630

more

70d0e75

more

ade9523

more

b9cd9de

more

49f5761

more

e5b9b97

more

73c5a43

more

fad9b18

more

b914ff3

more

f0fdbe1

more

3b1c1aa

more

915e864

more

f066678

more

7b14a6f

fmt

23e6a32

more

b0348b2

more

5c3d1b5

fzyzcjy force-pushed the feat/opt_moe_mem branch from 5ce3a41 to 5c3d1b5 Compare April 12, 2025 00:18

fzyzcjy marked this pull request as ready for review April 12, 2025 00:18

fzyzcjy requested review from Ying1123 and merrymercy as code owners April 12, 2025 00:18

fzyzcjy requested review from ByronHsu, HaiShaw, hnyls2002, ispobock and zhyncs as code owners April 12, 2025 00:18

fzyzcjy added 6 commits April 12, 2025 08:19

more

fb97f95

rm

d184576

more

54b0f96

more

3efc364

temp

740cd8a

more

0e2c71c

fzyzcjy mentioned this pull request Apr 12, 2025

EPLB #5295

Closed

Revert "temp"

1d771ce

This reverts commit 740cd8a.

ch-wan reviewed Apr 17, 2025

View reviewed changes

python/sglang/srt/layers/moe/ep_moe/layer.py Show resolved Hide resolved

ch-wan changed the title ~~Use 2x less memory for MoE~~ [Feature] Introducing DisposableTensor to Save 2x Memory for MoE May 9, 2025

fzyzcjy changed the title ~~[Feature] Introducing DisposableTensor to Save 2x Memory for MoE~~ Use 2x less memory for MoE May 9, 2025

fzyzcjy closed this May 9, 2025

fzyzcjy mentioned this pull request May 9, 2025

Reduce MoE memory usage #6147

Merged

6 tasks

fzyzcjy changed the title ~~Use 2x less memory for MoE~~ Use 2x less memory for MoE (including the DisposableTensor small tool) May 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use 2x less memory for MoE (including the DisposableTensor small tool) #5085

Use 2x less memory for MoE (including the DisposableTensor small tool) #5085

Uh oh!

fzyzcjy commented Apr 5, 2025 •

edited

Loading

Uh oh!

Uh oh!

fzyzcjy commented May 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Use 2x less memory for MoE (including the DisposableTensor small tool) #5085

Use 2x less memory for MoE (including the DisposableTensor small tool) #5085

Uh oh!

Conversation

fzyzcjy commented Apr 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

2025.04.21 Update

Original text

Motivation

Modifications

Checklist

Uh oh!

Uh oh!

fzyzcjy commented May 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fzyzcjy commented Apr 5, 2025 •

edited

Loading