-
Notifications
You must be signed in to change notification settings - Fork 31.9k
[Mamba2] Fix slow path
#34901
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Mamba2] Fix slow path
#34901
Conversation
|
Thanks @vasqu for the investigation and fix - and @HanGuo97 for reporting! |
|
Also can you please push this empty commit to trigger the slow tests workflow? |
|
@molbap I think there might be a missing cache initialization for the conv in the cuda forward, i.e. see transformers/src/transformers/models/mamba2/modeling_mamba2.py Lines 368 to 373 in 0044dab
Added a test now. It takes 3-4s on my machine due to the warming of the triton kernel - should I make it a slow test? |
|
@vasqu 3-4 s is ok, no need to flag it as slow I think! # Discretize x into dB
# [bsz, intermediate_size] -> [bsz, num_heads, head_dim]
hidden_states = hidden_states.reshape(batch_size, -1, self.head_dim)
dBx = dB * hidden_states[..., None]
# State calculation
cache_params.ssm_states[self.layer_idx].copy_(
> cache_params.ssm_states[self.layer_idx] * dA + dBx
)
E RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!it's related to multi-GPU runs, it's separate from this issue, I think we're ok to go here! cc @ArthurZucker |
molbap
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM as discussed above :) to be reviewed by core maintainer
|
I found an issue for the failure, so yea seems like a separate issue ^^ #33567 |
|
gentle ping @ArthurZucker |
|
Closing in favor of #35154 |
What does this PR do?
Verified it locally, see the test over here
Fixes #34817
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@ArthurZucker @molbap