fix: DeepSeek activation uninitialized data#2128
Conversation
WalkthroughModified the DeepSeek activation kernel in TensorRT-LLM's fused MOE implementation by extracting a constant for thread configuration and adding per-token state initialization loops to ensure clean array states before accumulation across tokens. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
🔇 Additional comments (4)
Tip 📝 Customizable high-level summaries are now available in beta!You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.
Example instruction:
Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @nekorobov, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request resolves a bug in the DeepSeek activation kernel where certain local arrays were used without prior initialization, which could result in erroneous calculations. The fix involves explicitly initializing these arrays to zero. Additionally, a minor refactoring was performed to define the kernel's thread block size using a named constant for better code readability and consistency. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request fixes a critical bug where register arrays in the activationDeepSeekKernel were used without being initialized, leading to incorrect computations. The fix correctly initializes these arrays to zero at the beginning of each hiddenIdx loop iteration. The changes also improve code maintainability by replacing a magic number for the number of threads with a named constexpr constant. The fix is correct and necessary. I've added one suggestion to further optimize the initialization by removing redundant assignments.
| for (int tokenInCtaIdx = 0; tokenInCtaIdx < NumTokensPerCta; tokenInCtaIdx++) { | ||
| scale1Arr[tokenInCtaIdx] = 0.0f; | ||
| scale2Arr[tokenInCtaIdx] = 0.0f; | ||
| dataX1Arr[tokenInCtaIdx] = 0.0f; | ||
| dataX2Arr[tokenInCtaIdx] = 0.0f; | ||
| outArr[tokenInCtaIdx] = 0.0f; | ||
| absOutArr[tokenInCtaIdx] = 0.0f; | ||
| } |
There was a problem hiding this comment.
The arrays outArr and absOutArr are unconditionally written to in the subsequent loop (lines 278-285) before being read. Therefore, initializing them to zero here is redundant and can be removed for a minor performance improvement.
for (int tokenInCtaIdx = 0; tokenInCtaIdx < NumTokensPerCta; tokenInCtaIdx++) {
scale1Arr[tokenInCtaIdx] = 0.0f;
scale2Arr[tokenInCtaIdx] = 0.0f;
dataX1Arr[tokenInCtaIdx] = 0.0f;
dataX2Arr[tokenInCtaIdx] = 0.0f;
}
There was a problem hiding this comment.
@nekorobov do you think gemini's suggestion is reasonable?
|
/bot run |
|
[FAILED] Pipeline #38984466: 14/18 passed |
📌 Description
🔍 Related Issues
🚀 Pull Request Checklist
Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.
✅ Pre-commit Checks
pre-commitby runningpip install pre-commit(or used your preferred method).pre-commit install.pre-commit run --all-filesand fixed any reported issues.🧪 Tests
unittest, etc.).Reviewer Notes
Summary by CodeRabbit
Bug Fixes
Performance
✏️ Tip: You can customize this high-level summary in your review settings.