[vLLM IR] 3/N fused_add_rms_norm and maybe_inplace#36823
[vLLM IR] 3/N fused_add_rms_norm and maybe_inplace#36823ProExpertProg wants to merge 8 commits intoluka/vllm-ir/rms-norm-batch-invariantfrom
Conversation
|
This pull request has merge conflicts that must be resolved before it can be |
4b47060 to
837d6f3
Compare
There was a problem hiding this comment.
Code Review
This pull request introduces a maybe_inplace mechanism to the vLLM IR, allowing for performance optimizations through in-place operations while maintaining functional semantics for default op calls. It also adds a new fused_add_rms_norm op that leverages this new capability. The changes are extensive, touching the IR definition, compiler passes, and kernel implementations. My review has identified a critical safety issue where a potential misuse of an in-place operation only triggers a warning instead of an error, and a high-severity issue related to an incomplete compiler pass (CloneCleanupPass) that is being added.
| logger.warning( | ||
| "Node %s (input to %s) has another use", arg, node | ||
| ) | ||
| # TODO raise error, this is undefined behavior, which should not be allowed. | ||
| # Users can just use the default overload if they want to keep activation inputs untouched. |
There was a problem hiding this comment.
The check for other users of an activation input that is about to be modified in-place currently only logs a warning. This can lead to silent correctness issues if the input tensor is used elsewhere after being modified. As the TODO comment suggests, this should raise an error to prevent such undefined behavior. Allowing compilation to proceed with a warning could introduce hard-to-debug errors downstream.
| logger.warning( | |
| "Node %s (input to %s) has another use", arg, node | |
| ) | |
| # TODO raise error, this is undefined behavior, which should not be allowed. | |
| # Users can just use the default overload if they want to keep activation inputs untouched. | |
| raise ValueError( | |
| f"Node {arg} (input to {node}) has another use in {user}. " | |
| f"Using maybe_inplace on an input with multiple users is not allowed. " | |
| f"Use the default overload if you want to keep activation inputs untouched." | |
| ) |
| continue # TODO | ||
| node.replace_all_uses_with(node.args[0]) | ||
| graph.erase_node(node) | ||
| count += 1 |
There was a problem hiding this comment.
The CloneCleanupPass is added to the pass manager but its implementation is currently a no-op due to the continue # TODO statement. The logic to remove clone nodes is commented out. Merging incomplete or placeholder code can lead to confusion and makes it unclear if the feature is intended to be active. The pass should either be fully implemented or removed from the pass manager until it's ready.
| continue # TODO | |
| node.replace_all_uses_with(node.args[0]) | |
| graph.erase_node(node) | |
| count += 1 | |
| # A clone is safe to remove if its input has no other users. | |
| # This is a conservative check. A more sophisticated analysis | |
| # could trace back to the `maybe_inplace` call and its metadata. | |
| if len(node.args[0].users) == 1: | |
| node.replace_all_uses_with(node.args[0]) | |
| graph.erase_node(node) | |
| count += 1 |
837d6f3 to
d5e968e
Compare
8041106 to
d8fe95a
Compare
d5e968e to
1939b89
Compare
d8fe95a to
810b9f3
Compare
|
This pull request has merge conflicts that must be resolved before it can be |
4868432 to
fefd5b0
Compare
fefd5b0 to
81c6989
Compare
1939b89 to
2725b81
Compare
|
Current status of perf: Mainold |
2725b81 to
65f848e
Compare
49ddf9b to
06de5e1
Compare
|
This pull request has merge conflicts that must be resolved before it can be |
c6f01a4 to
dafa54a
Compare
ebca8b9 to
8b5e95e
Compare
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
8b5e95e to
6b2a07f
Compare
Purpose
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.