[Core] Use standalone autograd_cache_key for compilation dedup optimization by frgossen · Pull Request #37929 · vllm-project/vllm

frgossen · 2026-03-23T21:02:11Z

Purpose

Use the new torch.compile standalone_compile.autograd_cache_key API
(torch >= 2.12) to compute cache keys up front, avoiding the legacy
monkey-patching of autograd_cache.autograd_cache_key during compilation.
This enables deduplication without compiling duplicate subgraphs.

A new VLLM_DEBUG_COMPILE_CACHE_KEY env var cross-checks the standalone
API against the legacy path. This uses a dedicated env var rather than
a log-level guard because the check changes the compilation codepath
(forces legacy compile + extra key computation), not just verbosity.
This follows the established VLLM_DEBUG_* convention (VLLM_DEBUG_WORKSPACE,
VLLM_DEBUG_MFU_METRICS, etc.).

Test Plan

Run meta-llama/Meta-Llama-3-70B-Instruct with TP=4. We expect no
functional changes.

Test Result

Cold-compile benchmark (Llama 3 70B, TP=4, 16 runs each):
Before (1e688fa): mean 34.2s ± 0.8s, median 34.3s
After (eecf384): mean 34.4s ± 0.6s, median 34.7s
No significant difference (within noise)

github-actions · 2026-03-23T21:02:21Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

gemini-code-assist

Code Review

This pull request refactors the compilation deduplication logic to use the new autograd_cache_key API in torch >= 2.11, which is a great improvement. The code is well-structured, separating the new path and the legacy monkey-patching path. However, I've found a critical issue where the new caching logic assumes an Inductor backend, which could cause failures when other backends like 'eager' are used. My review includes a suggestion to make the caching logic conditional on the backend type.

mergify · 2026-03-27T13:57:46Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @frgossen.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

frgossen · 2026-03-27T14:27:39Z

I tested this locally with meta-llama/Meta-Llama-3-70B-Instruct. I don't think the CI will cover this because it is hidden behind a debug var and it will only kick in with a newer Pytorch version. I will run the tests locally before landing this.

	has_standalone_key_api =False	has_standalone_key_api = is_torch_equal_or_newer("2.12.0.dev")
VLLM_DEBUG_COMPILE_CACHE_KEY=0	PASS	PASS
VLLM_DEBUG_COMPILE_CACHE_KEY=1	PASS	PASS

mergify · 2026-04-02T18:34:07Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @frgossen.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

…zation ## Purpose Use the new torch.compile standalone_compile.autograd_cache_key API (torch >= 2.12) to compute cache keys up front, avoiding the legacy monkey-patching of autograd_cache.autograd_cache_key during compilation. This enables deduplication without compiling duplicate subgraphs. A new VLLM_DEBUG_COMPILE_CACHE_KEY env var cross-checks the standalone API against the legacy path. This uses a dedicated env var rather than a log-level guard because the check changes the compilation codepath (forces legacy compile + extra key computation), not just verbosity. This follows the established VLLM_DEBUG_* convention (VLLM_DEBUG_WORKSPACE, VLLM_DEBUG_MFU_METRICS, etc.). ## Test Plan - Run meta-llama/Meta-Llama-3-70B-Instruct with TP=4. We expect no functional changes. ## Test Result Cold-compile benchmark (Llama 3 70B, TP=4, 16 runs each): Before (1e688fa): mean 34.2s ± 0.8s, median 34.3s After (eecf384): mean 34.4s ± 0.6s, median 34.7s No significant difference (within noise) Signed-off-by: Frederik Gossen <frgossen@meta.com>

frgossen · 2026-04-10T17:40:08Z

duplicate creation

frgossen requested review from BoyuanFeng, ProExpertProg, youkaichao and zou3519 as code owners March 23, 2026 21:02

gemini-code-assist bot reviewed Mar 23, 2026

View reviewed changes

Comment thread vllm/compilation/backends.py Outdated

frgossen force-pushed the use-aot-autograd-cache-key branch from b5d03e8 to 919bfc7 Compare March 23, 2026 22:06

zou3519 reviewed Mar 24, 2026

View reviewed changes

Comment thread vllm/compilation/backends.py Outdated

zou3519 reviewed Mar 24, 2026

View reviewed changes

Comment thread vllm/compilation/backends.py

frgossen force-pushed the use-aot-autograd-cache-key branch 2 times, most recently from 4ae8674 to 8e74249 Compare March 27, 2026 13:56

frgossen requested a review from tjtanaa as a code owner March 27, 2026 13:56

mergify bot added ci/build nvidia rocm Related to AMD ROCm labels Mar 27, 2026

github-project-automation bot added this to NVIDIA and AMD Mar 27, 2026

mergify bot added the cpu Related to CPU backends label Mar 27, 2026

github-project-automation bot moved this to Todo in AMD Mar 27, 2026

frgossen marked this pull request as draft March 27, 2026 13:57

mergify bot added the needs-rebase label Mar 27, 2026

frgossen force-pushed the use-aot-autograd-cache-key branch from 8e74249 to 4755a59 Compare March 27, 2026 14:02

mergify bot removed the needs-rebase label Mar 27, 2026

frgossen force-pushed the use-aot-autograd-cache-key branch from 4755a59 to 1fb06db Compare April 2, 2026 18:33

mergify bot added the needs-rebase label Apr 2, 2026

frgossen force-pushed the use-aot-autograd-cache-key branch 2 times, most recently from 60b7522 to 8a949ef Compare April 2, 2026 22:03

mergify bot removed the needs-rebase label Apr 2, 2026

frgossen force-pushed the use-aot-autograd-cache-key branch 4 times, most recently from ed08987 to c161ce1 Compare April 8, 2026 14:39

frgossen force-pushed the use-aot-autograd-cache-key branch from c161ce1 to 24f24fd Compare April 8, 2026 21:31

frgossen closed this Apr 10, 2026

github-project-automation bot moved this from Todo to Done in AMD Apr 10, 2026

github-project-automation bot moved this to Done in NVIDIA Apr 10, 2026

panpan0000 mentioned this pull request Apr 14, 2026

Introduce De-dup/Similarity-Check in CI Workflow for PR/Issue #39695

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Core] Use standalone autograd_cache_key for compilation dedup optimization#37929

[Core] Use standalone autograd_cache_key for compilation dedup optimization#37929
frgossen wants to merge 1 commit intovllm-project:mainfrom
frgossen:use-aot-autograd-cache-key

frgossen commented Mar 23, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 23, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mergify bot commented Mar 27, 2026

Uh oh!

frgossen commented Mar 27, 2026

Uh oh!

mergify bot commented Apr 2, 2026

Uh oh!

frgossen commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

frgossen commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

github-actions bot commented Mar 23, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mergify bot commented Mar 27, 2026

Uh oh!

frgossen commented Mar 27, 2026

Uh oh!

mergify bot commented Apr 2, 2026

Uh oh!

frgossen commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

frgossen commented Mar 23, 2026 •

edited

Loading