[vLLM IR] Cache the fx_replacement to avoid re-tracing the same impl#39034
[vLLM IR] Cache the fx_replacement to avoid re-tracing the same impl#39034gcanlin wants to merge 2 commits intovllm-project:mainfrom
Conversation
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
|
@ProExpertProg Could you help review it? I’m still new to vLLM IR, so some parts of the code may not be fully thought through. Appreciate your understanding. |
There was a problem hiding this comment.
Code Review
This pull request introduces a caching mechanism for traced replacement graphs in the VllmIRLoweringPass to avoid redundant tracing of the same implementations. It adds a _replacement_cache and helper methods to generate hashable metadata from operation arguments. A critical issue was identified in the _make_arg_meta implementation, which lacks support for unhashable types like lists or symbolic types (e.g., torch.SymInt), which will cause a TypeError when used as cache keys.
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
| @@ -87,17 +134,33 @@ def lower_matched_op(self, match: Match, *args, **kwargs): | |||
|
|
|||
| # replace_by_example wants node args, not the fake tensors | |||
There was a problem hiding this comment.
Fix this comment: move below and mention replace_with_graph requires node args
| )(self.lower_matched_op) | ||
|
|
||
| @staticmethod | ||
| def _make_arg_meta(val: Any) -> Any: |
| Return a cached traced replacement graph, or trace and cache a new one. | ||
| """ | ||
| cache_key = ( | ||
| impl_fn, |
There was a problem hiding this comment.
Is it safe to use the impl_fn? Or should we use the IR op's name and provider instead?
Purpose
fwd_onlytrace result inVllmIRLoweringPassto avoid re-tracing the sameimpl_fn+ arg shapes across subgraphsmatch.replace_with_graphwith cachedGraphModuleinstead ofmatch.replace_by_examplewhich re-traces every timeTest Plan
Test Result
Didn't hit cache: 30ms
hit cache: 1.5ms
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.