[vLLM IR] 3/N fused_add_rms_norm and maybe_inplace#34068
[vLLM IR] 3/N fused_add_rms_norm and maybe_inplace#34068ProExpertProg wants to merge 19 commits intovllm-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a significant and well-executed architectural improvement by adding a new vLLM Intermediate Representation (IR) layer for custom operations. The initial focus is on rms_norm and fused_add_rms_norm, including a maybe_inplace variant to handle in-place operations gracefully. The changes are comprehensive, including a robust registration and dispatching mechanism, lowering passes to translate IR ops into concrete kernel implementations, and a new configuration system for kernel priorities. Existing fusion passes and model layers have been cleanly refactored to adopt this new IR. The addition of extensive and thorough tests for the new IR system is commendable. Overall, this is an excellent refactoring that builds a solid foundation for managing and extending kernel implementations in vLLM, greatly improving maintainability and extensibility.
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
… default application and validation, including more robust schema checks Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
5046f7a to
02d1591
Compare
|
Measuring the dispatching overhead, it's about 24% in eager mode (and 0 in compiled mode). By skipping the extra layer of torch custom op, we drop down to 15%. Note that this is only rms-norm, if every op was wrapped into a vLLM IR op it would be more. But also it's qwen which has a ton of norms. This PR: Main: |
|
That's another |
|
This pull request has merge conflicts that must be resolved before it can be |
|
New PR: #36823 |


This is part two of RMSNorm to vLLM IR conversion, after #33825. Includes adding a maybe_inplace overload and proper handling.
TODO: remove clones, properly pickle
custom_pre_grad_passDispatching overhead seems to be around 20% for now, which is not great.
UPDATE: got the dispatching overhead down to negligible!
Purpose
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.