Skip to content

[Kernel][Helion] Optimize Helion config parsing latency#40850

Merged
vllm-bot merged 1 commit intovllm-project:mainfrom
gmagogsfm:helion_startup_opt
May 8, 2026
Merged

[Kernel][Helion] Optimize Helion config parsing latency#40850
vllm-bot merged 1 commit intovllm-project:mainfrom
gmagogsfm:helion_startup_opt

Conversation

@gmagogsfm
Copy link
Copy Markdown
Contributor

This PR optimizes Helion compilation time.

Previously config picking is not cached, so every cuda graph capture size (per num_token) needs to go through all stored configs for matching, and each matching requires expensive RegEx operations due to free-form string-based config key.

Two improvements are made:

  • Replace regex-based config key parsing with structured dict config keys. eliminating the per-call regex overhead
  • Deserializes keys once on load and passes parsed dicts to kernel pick_config functions with caching enabled, so for different shapes config parsing doesn't need to be done repeatedly

Benchmark: 80,000 calls to pick_config with 300 config keys, 8 different input shapes. Measured on NVIDIA H100.

Approach us/call Speedup
Old (regex, no cache) 1289 1.0×
New (CaseKey, no cache) 46 28×
New (CaseKey, cached) 1.8 719×

Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces CaseKey, a structured, immutable, and hashable dictionary for identifying kernel configurations, replacing the previous string-based key system. The storage format for platform configurations has been updated to a JSON array of entries, and the ConfigManager, ConfigSet, and kernel registration logic have been refactored to support this new structure. Feedback includes optimizing the config_exists method to avoid unnecessary object instantiation, caching the hash and string representations in CaseKey for better performance, and blocking the in-place OR operator to ensure full immutability.

Comment thread vllm/kernels/helion/config_manager.py Outdated
Comment thread vllm/kernels/helion/case_key.py Outdated
Comment thread vllm/kernels/helion/case_key.py
@gmagogsfm gmagogsfm changed the title [Kernel][Helion] Use dict config keys with config entries file format [Kernel][Helion] Optimize Helion config parsing latency Apr 25, 2026
@ProExpertProg
Copy link
Copy Markdown
Collaborator

Nice! Will this mean we can use helion without cudagraphs? 👀

@gmagogsfm gmagogsfm force-pushed the helion_startup_opt branch from f9648bd to 41904fb Compare April 25, 2026 02:10
@gmagogsfm
Copy link
Copy Markdown
Contributor Author

Nice! Will this mean we can use helion without cudagraphs? 👀

Unfortunately still not yet, but I will look at Helion's compilation and dispatching overhead deeper to optimize as much as possible.

@gmagogsfm gmagogsfm force-pushed the helion_startup_opt branch from 41904fb to 78227e4 Compare April 25, 2026 02:15
@gmagogsfm
Copy link
Copy Markdown
Contributor Author

@zou3519 @BoyuanFeng @ProExpertProg @xiaohongchen1991 Could you take a look when you get a chance? Thanks

Note to @xiaohongchen1991 this will change serialization format of configs, so you may need to update your inflight PRs. Thankfully claude can do a decent job in updating at scale.

@zou3519 zou3519 added the ready ONLY add when PR is ready to merge/full CI is needed label May 6, 2026
Replace regex-based config key parsing with structured dict config
keys.  The framework deserializes keys once on load and passes
parsed dicts to kernel pick_config functions, eliminating the
per-call regex overhead that dominated CUDA graph capture time.

Config file format changes from a flat dict to config entries so
that config keys can contain any JSON-serializable values (ints,
lists, tuples, etc.) without escaping issues:

    [{"key": {"intermediate": 2048, "numtokens": 256}, "config": {...}},
     {"key": {}, "config": {...}}]

- Add ConfigKeyDict: a hashable dict subclass with stable JSON
  __str__ for use as config keys, dict keys, and cache keys
- Change config_picker to receive list[ConfigKeyDict] and return
  ConfigKeyDict | None
- input_generator returns dict[ConfigKeyDict, tuple] -- kernel
  authors construct keys as ConfigKeyDict(intermediate=2048, ...)
- Default config uses empty ConfigKeyDict() -- no None, no
  union types, no special cases
- Framework caches parsed key lists per config identity
- Migrate existing JSON config files to config entries format
- Update all tests and autotune script

Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
@gmagogsfm gmagogsfm force-pushed the helion_startup_opt branch from 78227e4 to 6f6a788 Compare May 6, 2026 21:57
@vllm-bot vllm-bot merged commit 0b99971 into vllm-project:main May 8, 2026
46 of 49 checks passed
libinta pushed a commit to libinta/vllm that referenced this pull request May 8, 2026
…#40850)

Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
Co-authored-by: Claude Sonnet 4 <noreply@anthropic.com>
Signed-off-by: Libin Tang <libin.tang@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants