[Kernel][Helion] Optimize Helion config parsing latency#40850
[Kernel][Helion] Optimize Helion config parsing latency#40850vllm-bot merged 1 commit intovllm-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces CaseKey, a structured, immutable, and hashable dictionary for identifying kernel configurations, replacing the previous string-based key system. The storage format for platform configurations has been updated to a JSON array of entries, and the ConfigManager, ConfigSet, and kernel registration logic have been refactored to support this new structure. Feedback includes optimizing the config_exists method to avoid unnecessary object instantiation, caching the hash and string representations in CaseKey for better performance, and blocking the in-place OR operator to ensure full immutability.
|
Nice! Will this mean we can use helion without cudagraphs? 👀 |
f9648bd to
41904fb
Compare
Unfortunately still not yet, but I will look at Helion's compilation and dispatching overhead deeper to optimize as much as possible. |
41904fb to
78227e4
Compare
|
@zou3519 @BoyuanFeng @ProExpertProg @xiaohongchen1991 Could you take a look when you get a chance? Thanks Note to @xiaohongchen1991 this will change serialization format of configs, so you may need to update your inflight PRs. Thankfully claude can do a decent job in updating at scale. |
Replace regex-based config key parsing with structured dict config
keys. The framework deserializes keys once on load and passes
parsed dicts to kernel pick_config functions, eliminating the
per-call regex overhead that dominated CUDA graph capture time.
Config file format changes from a flat dict to config entries so
that config keys can contain any JSON-serializable values (ints,
lists, tuples, etc.) without escaping issues:
[{"key": {"intermediate": 2048, "numtokens": 256}, "config": {...}},
{"key": {}, "config": {...}}]
- Add ConfigKeyDict: a hashable dict subclass with stable JSON
__str__ for use as config keys, dict keys, and cache keys
- Change config_picker to receive list[ConfigKeyDict] and return
ConfigKeyDict | None
- input_generator returns dict[ConfigKeyDict, tuple] -- kernel
authors construct keys as ConfigKeyDict(intermediate=2048, ...)
- Default config uses empty ConfigKeyDict() -- no None, no
union types, no special cases
- Framework caches parsed key lists per config identity
- Migrate existing JSON config files to config entries format
- Update all tests and autotune script
Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
78227e4 to
6f6a788
Compare
…#40850) Signed-off-by: Yanan Cao <gmagogsfm@gmail.com> Co-authored-by: Claude Sonnet 4 <noreply@anthropic.com> Signed-off-by: Libin Tang <libin.tang@intel.com>
This PR optimizes Helion compilation time.
Previously config picking is not cached, so every cuda graph capture size (per num_token) needs to go through all stored configs for matching, and each matching requires expensive RegEx operations due to free-form string-based config key.
Two improvements are made:
Benchmark: 80,000 calls to
pick_configwith 300 config keys, 8 different input shapes. Measured on NVIDIA H100.