Skip to content

Conversation

@fzyzcjy
Copy link
Collaborator

@fzyzcjy fzyzcjy commented Apr 5, 2025

UPDATE: latest PR is in #6147

Brainstorm: maybe we can use the_tensor.set_(torch.empty((0,), device=...)) (e.g. if users do not need to access metadata of the tensor after this operation)

2025.04.21 Update

I observe something similar for forward_deepgemm_masked. Will update this PR containing that as well. For users who need it before this PR is extracted from my dev branch, please directly visit #5524

image

Original text

Do NOT look at the code now - it is based on two-batch-overlap branch (b/c that branch merges the tools branches). I will re-do the things on master branch later.
Updated

Before: peak 1.8+1.8+1.8GB = 5.4GB

image

After: peak is 1.8+0.9GB = 2.7GB

image

04.12: Updated the code, and re-checked the memory profile which still looks reasonable

Pasted image 20250412091002

Motivation

Modifications

Checklist

@fzyzcjy fzyzcjy marked this pull request as ready for review April 12, 2025 00:18
@fzyzcjy fzyzcjy mentioned this pull request Apr 12, 2025
This reverts commit 740cd8a.
@ch-wan ch-wan changed the title Use 2x less memory for MoE [Feature] Introducing DisposableTensor to Save 2x Memory for MoE May 9, 2025
@fzyzcjy fzyzcjy changed the title [Feature] Introducing DisposableTensor to Save 2x Memory for MoE Use 2x less memory for MoE May 9, 2025
@fzyzcjy
Copy link
Collaborator Author

fzyzcjy commented May 9, 2025

Title seems to be inaccurate - 2x memory save is due to multiple changes

@fzyzcjy fzyzcjy closed this May 9, 2025
@fzyzcjy fzyzcjy mentioned this pull request May 9, 2025
6 tasks
@fzyzcjy fzyzcjy changed the title Use 2x less memory for MoE Use 2x less memory for MoE (including the DisposableTensor small tool) May 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants