Skip to content

Conversation

@hyukn
Copy link
Collaborator

@hyukn hyukn commented Apr 23, 2025

  • Replace the pre-defined bucket sizes with a generating function based on the tune_max_num_tokens in fused moe op tuning.
  • Add free_memory logic of workspace in min_latency_mode fused moe path.

* Replace pre-defined bucket size strategy with a generating function based on the tune_max_num_tokens.
* Add free_memory logic of workspace in min_latency_mode fused moe path.

Signed-off-by: Yukun He <[email protected]>
@hyukn hyukn requested review from HuiGao-NV and litaotju April 23, 2025 03:01
@hyukn hyukn self-assigned this Apr 23, 2025
@hyukn
Copy link
Collaborator Author

hyukn commented Apr 23, 2025

/bot run

@hyukn hyukn changed the title Reduce memory usage in fused moe op associated with AutoTuning. fix: Reduce memory usage in fused moe op associated with AutoTuning. Apr 23, 2025
@tensorrt-cicd
Copy link
Collaborator

PR_Github #3119 [ run ] triggered by Bot

@hyukn
Copy link
Collaborator Author

hyukn commented Apr 23, 2025

/bot kill

@hyukn
Copy link
Collaborator Author

hyukn commented Apr 23, 2025

Retarget to release/0.19 with this PR #3793.
Thus, close this one.

@hyukn hyukn closed this Apr 23, 2025
@tensorrt-cicd
Copy link
Collaborator

PR_Github #3119 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #2172 completed with status: 'SUCCESS'

@hyukn
Copy link
Collaborator Author

hyukn commented Apr 23, 2025

/bot kill

@hyukn hyukn deleted the fix/reduce_autotune_mem_usage branch May 20, 2025 10:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants