Optimize configuration access with LRU cache in custom ops#22204
Optimize configuration access with LRU cache in custom ops#22204vllm-bot merged 3 commits intovllm-project:mainfrom
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Code Review
This pull request introduces a cached function get_cached_compilation_config using lru_cache to optimize repeated access to the compilation configuration. While the intent to improve performance is good, the current implementation has a critical flaw. The caching mechanism doesn't account for changes in the global configuration context, which can lead to stale cache data and incorrect behavior, especially in testing environments. I've provided a detailed comment on how to address this to ensure correctness.
There was a problem hiding this comment.
Using lru_cache on this function is problematic because it depends on a mutable global variable (_current_vllm_config in vllm.config). The set_current_vllm_config context manager can change this global, but the cache won't be invalidated, leading to stale data. This will break tests that iterate over different configurations, like tests/model_executor/test_enabled_custom_ops.py.
A robust solution is to clear the cache when the context changes. This can be done by moving get_cached_compilation_config to vllm/config.py and calling get_cached_compilation_config.cache_clear() in the finally block of the set_current_vllm_config context manager.
This change would require modifying vllm/config.py and importing this function from there.
Example change in vllm/config.py:
@contextmanager
def set_current_vllm_config(vllm_config: VllmConfig, ...):
...
try:
...
yield
finally:
get_cached_compilation_config.cache_clear()
...fa5997e to
d680d2b
Compare
Add cached compilation config function to reduce repeated calls to get_current_vllm_config().compilation_config in CustomOp class. This improves performance by avoiding redundant configuration lookups during model execution. Changes: - Add get_cached_compilation_config() with @lru_cache decorator - Replace direct config calls in dispatch_forward(), enabled(), and default_on() methods - Import functools.lru_cache for caching functionality Signed-off-by: zitian zhao <zitian.zhao@tencentmusic.com>
Shortened docstring from 87 to 73 characters to comply with line length limit. Signed-off-by: zitian zhao <zitian.zhao@tencentmusic.com>
Move get_cached_compilation_config from custom_op.py to config.py and add cache_clear() call in set_current_vllm_config finally block to ensure cache is invalidated when configuration context changes. This prevents stale cached data in tests that iterate over different configurations, particularly in test_enabled_custom_ops.py. Signed-off-by: zitian zhao <zitian.zhao@tencentmusic.com>
87a919a to
ca92f9c
Compare
| get_cached_compilation_config.cache_clear() | ||
|
|
||
|
|
||
| @lru_cache(maxsize=1) |
There was a problem hiding this comment.
Nit: you could use cache instead if you aren't going to use the lru part
…ect#22204) Signed-off-by: zitian zhao <zitian.zhao@tencentmusic.com>
…ect#22204) Signed-off-by: zitian zhao <zitian.zhao@tencentmusic.com>
…ect#22204) Signed-off-by: zitian zhao <zitian.zhao@tencentmusic.com> Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
…ect#22204) Signed-off-by: zitian zhao <zitian.zhao@tencentmusic.com> Signed-off-by: Noam Gat <noamgat@gmail.com>
…ect#22204) Signed-off-by: zitian zhao <zitian.zhao@tencentmusic.com> Signed-off-by: Paul Pak <paulpak58@gmail.com>
…ect#22204) Signed-off-by: zitian zhao <zitian.zhao@tencentmusic.com> Signed-off-by: Diego-Castan <diego.castan@ibm.com>
…ect#22204) Signed-off-by: zitian zhao <zitian.zhao@tencentmusic.com>
…ect#22204) Signed-off-by: zitian zhao <zitian.zhao@tencentmusic.com> Signed-off-by: Xiao Yu <xiao.yu@amd.com>
…ect#22204) Signed-off-by: zitian zhao <zitian.zhao@tencentmusic.com>
|
Hi @skyloevil @mgoin, #22204 leads to a lru cache issue on compilation configs. #31376 should fix the issue. |
Changes
@lru_cache(maxsize=1) to cache compilation config access
manager to ensure cache freshness
Benefits
Files Modified