-
-
Notifications
You must be signed in to change notification settings - Fork 11.7k
[BugFix] Fix DBO hang #25625
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BugFix] Fix DBO hang #25625
Conversation
Signed-off-by: Lucas Wilkinson <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request addresses a hang in the Dual-ubatch Overlap (DBO) feature when processing mixed prefill-decode batches. The change modifies the condition for replaying a cached CUDA graph to ensure it only happens when the runtime mode is CUDAGraphMode.FULL. This prevents the incorrect replay of a full CUDA graph on an incompatible mixed batch. The fix is targeted and appears correct. I have not identified any further issues of high or critical severity in this change.
|
Is this related to #25607? |
Signed-off-by: Lucas Wilkinson <[email protected]> Signed-off-by: yewentao256 <[email protected]>
Signed-off-by: Lucas Wilkinson <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>
Signed-off-by: Lucas Wilkinson <[email protected]>
Signed-off-by: Lucas Wilkinson <[email protected]>
Signed-off-by: Lucas Wilkinson <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>
Signed-off-by: Lucas Wilkinson <[email protected]>
Fix DBO using full-cudagraphs for mixed-prefill-decode batches (it should not have been doing this)