[Bugfix] Fix illegal memory access#12758
Conversation
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
dbfc6c9 to
12ee626
Compare
12ee626 to
a6936cb
Compare
| @@ -128,7 +128,7 @@ def flashinfer_allreduce_residual_rmsnorm( | |||
| residual: torch.Tensor, | |||
| weight: torch.Tensor, | |||
| eps: float = 1e-6, | |||
| max_token_num: int = 2048, | |||
| max_token_num: int = 16384, | |||
There was a problem hiding this comment.
Hang issue WAR: Increase max_token_num to allocate a larger workspace
There was a problem hiding this comment.
add an assert: assert input_tensor.shape[0] <= max_token_num ?
There was a problem hiding this comment.
should be covered by the code below:
if input_tensor.shape[0] > max_token_num:
logger.debug(
"Input token(%d) is greater than max_token_num(%d), "
"falling back to standard implementation",
input_tensor.shape[0],
max_token_num,
)
return None, None
|
@elvischenv Failed at GPTOSS Ci test, please have a look |
This is DeepseekV2. This failure seems also related to #12524. cc @merrymercy |
|
@elvischenv Other PRs can pass dpsk test. |
7683443 to
3e35e3a
Compare
3e35e3a to
ddff77b
Compare
work around hanging issue of trtllm_allreduce_fusion fix correctly
ddff77b to
b008fbe
Compare
|
The gptoss CI test on B200 is passing here |
Motivation
Fixed illegal memory access issue in #12695 and flashinfer-ai/flashinfer#2034
Caused by #12524
Modifications
Accuracy Tests
Benchmarking and Profiling
Checklist