Skip to content

[torch.compile] refactor config hashing to compile_factors and unify factor collection#29117

Open
vnadathur wants to merge 88 commits intovllm-project:mainfrom
vnadathur:hash
Open

[torch.compile] refactor config hashing to compile_factors and unify factor collection#29117
vnadathur wants to merge 88 commits intovllm-project:mainfrom
vnadathur:hash

Conversation

@vnadathur
Copy link
Contributor

@vnadathur vnadathur commented Nov 20, 2025

Motivation

After this PR: #26468 there were a couple follow-ups to add.

In this PR, the following are added/changed:

  • Uniform use of normalize_value & hash_factors across configs with a compute_hash function
  • Followed up on docstrings not being up-to date in the opt-out configs
  • Refactor all locations of compute_hash to compile_factors this is the reason for so many files changed
  • compute_hash isn’t the right name because real computation of the hash is actually done by utils.hash_factors.

Additions:

  • I also made it so before calling normalize value in get_compile_factors, it checks if .compile_factors exists on the subobject - based on Luka's suggestion
  • This allows us to avoid handling PassConfig specially as in the case for compilation.py
  • This is in response to comment here
  • Also created a shared helper instead of inlining both _compute_code_hash and compilation_config_hash_factors
cache_key_factors.json

{
  "env": {
    "VLLM_USE_PRECOMPILED": "1",
    "VLLM_LOGGING_LEVEL": "DEBUG"
  },
  "config": {
    "model": "meta-llama/Llama-3.3-70B-Instruct",
    "dtype": "bfloat16",
    "max_model_len": 4096
  },
  "config_hash": "931b379b52f8c28316d4c25e74046d51d7147ddfc09abf8c1346e053467a70e6",
  "compiler": {},
  "compiler_hash": "44136fa355b3678a1146ad16f7e8649e94fb4fc21fe77e8310c060f61caaff8a",
  "code": {
    "files": [
      {
        "path": "/home/ubuntu/vllm_cache_llama33/torch_compile_cache/64625fd5e0/rank_0_0/backbone/artifact_shape_None_subgraph_0/xs/cxs442zdr3simkg7obtwq7bgw6kifbypb5iksbjzqjt5px3j5sfu.py"
      },
      {
        "path": "/home/ubuntu/vllm_cache_llama33/torch_compile_cache/64625fd5e0/rank_0_0/backbone/artifact_shape_None_subgraph_1/kg/ckgjrp5zvupvlt7gcy5jv3i3fq7j43n7svcjizgs4wzjdaivbjy6.py"
      },
      {
        "path": "/home/ubuntu/vllm_cache_llama33/torch_compile_cache/64625fd5e0/rank_0_0/backbone/artifact_shape_None_subgraph_10/kg/ckgjrp5zvupvlt7gcy5jv3i3fq7j43n7svcjizgs4wzjdaivbjy6.py"
      },
      {
        "path": "/home/ubuntu/vllm_cache_llama33/torch_compile_cache/64625fd5e0/rank_0_0/backbone/artifact_shape_None_subgraph_11/kg/ckgjrp5zvupvlt7gcy5jv3i3fq7j43n7svcjizgs4wzjdaivbjy6.py"
      },
      {
        "path": "/home/ubuntu/vllm_cache_llama33/torch_compile_cache/64625fd5e0/rank_0_0/backbone/artifact_shape_None_subgraph_12/kg/ckgjrp5zvupvlt7gcy5jv3i3fq7j43n7svcjizgs4wzjdaivbjy6.py"
      },
      {
        "path": "/home/ubuntu/vllm_cache_llama33/torch_compile_cache/64625fd5e0/rank_0_0/backbone/artifact_shape_None_subgraph_13/kg/ckgjrp5zvupvlt7gcy5jv3i3fq7j43n7svcjizgs4wzjdaivbjy6.py"
      },
      {
        "path": "/home/ubuntu/vllm_cache_llama33/torch_compile_cache/64625fd5e0/rank_0_0/backbone/artifact_shape_None_subgraph_14/kg/ckgjrp5zvupvlt7gcy5jv3i3fq7j43n7svcjizgs4wzjdaivbjy6.py"
      },
      {
        "path": "/home/ubuntu/vllm_cache_llama33/torch_compile_cache/64625fd5e0/rank_0_0/backbone/artifact_shape_None_subgraph_15/kg/ckgjrp5zvupvlt7gcy5jv3i3fq7j43n7svcjizgs4wzjdaivbjy6.py"
      },
      {
        "path": "/home/ubuntu/vllm_cache_llama33/torch_compile_cache/64625fd5e0/rank_0_0/backbone/artifact_shape_None_subgraph_16/kg/ckgjrp5zvupvlt7gcy5jv3i3fq7j43n7svcjizgs4wzjdaivbjy6.py"
      },
      {
        "path": "/home/ubuntu/vllm_cache_llama33/torch_compile_cache/64625fd5e0/rank_0_0/backbone/artifact_shape_None_subgraph_17/kg/ckgjrp5zvupvlt7gcy5jv3i3fq7j43n7svcjizgs4wzjdaivbjy6.py"
      },
      {
        "path": "/home/ubuntu/vllm_cache_llama33/torch_compile_cache/64625fd5e0/rank_0_0/backbone/artifact_shape_None_subgraph_18/kg/ckgjrp5zvupvlt7gcy5jv3i3fq7j43n7svcjizgs4wzjdaivbjy6.py"
      },
      {
        "path": "/home/ubuntu/vllm_cache_llama33/torch_compile_cache/64625fd5e0/rank_0_0/backbone/artifact_shape_None_subgraph_19/kg/ckgjrp5zvupvlt7gcy5jv3i3fq7j43n7svcjizgs4wzjdaivbjy6.py"
      },
      {
        "path": "/home/ubuntu/vllm_cache_llama33/torch_compile_cache/64625fd5e0/rank_0_0/backbone/artifact_shape_None_subgraph_2/kg/ckgjrp5zvupvlt7gcy5jv3i3fq7j43n7svcjizgs4wzjdaivbjy6.py"
      },
      {
        "path": "/home/ubuntu/vllm_cache_llama33/torch_compile_cache/64625fd5e0/rank_0_0/backbone/artifact_shape_None_subgraph_20/kg/ckgjrp5zvupvlt7gcy5jv3i3fq7j43n7svcjizgs4wzjdaivbjy6.py"
      },
      {
        "path": "/home/ubuntu/vllm_cache_llama33/torch_compile_cache/64625fd5e0/rank_0_0/backbone/artifact_shape_None_subgraph_21/kg/ckgjrp5zvupvlt7gcy5jv3i3fq7j43n7svcjizgs4wzjdaivbjy6.py"
      },
      {
        "path": "/home/ubuntu/vllm_cache_llama33/torch_compile_cache/64625fd5e0/rank_0_0/backbone/artifact_shape_None_subgraph_22/kg/ckgjrp5zvupvlt7gcy5jv3i3fq7j43n7svcjizgs4wzjdaivbjy6.py"
      },
      {
        "path": "/home/ubuntu/vllm_cache_llama33/torch_compile_cache/64625fd5e0/rank_0_0/backbone/artifact_shape_None_subgraph_23/kg/ckgjrp5zvupvlt7gcy5jv3i3fq7j43n7svcjizgs4wzjdaivbjy6.py"
      },
      {
        "path": "/home/ubuntu/vllm_cache_llama33/torch_compile_cache/64625fd5e0/rank_0_0/backbone/artifact_shape_None_subgraph_24/kg/ckgjrp5zvupvlt7gcy5jv3i3fq7j43n7svcjizgs4wzjdaivbjy6.py"
      },
      {
        "path": "/home/ubuntu/vllm_cache_llama33/torch_compile_cache/64625fd5e0/rank_0_0/backbone/artifact_shape_None_subgraph_25/kg/ckgjrp5zvupvlt7gcy5jv3i3fq7j43n7svcjizgs4wzjdaivbjy6.py"
      },
      {
        "path": "/home/ubuntu/vllm_cache_llama33/torch_compile_cache/64625fd5e0/rank_0_0/backbone/artifact_shape_None_subgraph_26/kg/ckgjrp5zvupvlt7gcy5jv3i3fq7j43n7svcjizgs4wzjdaivbjy6.py"
      },
      {
        "path": "/home/ubuntu/vllm_cache_llama33/torch_compile_cache/64625fd5e0/rank_0_0/backbone/artifact_shape_None_subgraph_27/kg/ckgjrp5zvupvlt7gcy5jv3i3fq7j43n7svcjizgs4wzjdaivbjy6.py"
      },
      {
        "path": "/home/ubuntu/vllm_cache_llama33/torch_compile_cache/64625fd5e0/rank_0_0/backbone/artifact_shape_None_subgraph_28/kg/ckgjrp5zvupvlt7gcy5jv3i3fq7j43n7svcjizgs4wzjdaivbjy6.py"
      },
      {
        "path": "/home/ubuntu/vllm_cache_llama33/torch_compile_cache/64625fd5e0/rank_0_0/backbone/artifact_shape_None_subgraph_29/kg/ckgjrp5zvupvlt7gcy5jv3i3fq7j43n7svcjizgs4wzjdaivbjy6.py"
      },
      {
        "path": "/home/ubuntu/vllm_cache_llama33/torch_compile_cache/64625fd5e0/rank_0_0/backbone/artifact_shape_None_subgraph_3/kg/ckgjrp5zvupvlt7gcy5jv3i3fq7j43n7svcjizgs4wzjdaivbjy6.py"
      },
      {
        "path": "/home/ubuntu/vllm_cache_llama33/torch_compile_cache/64625fd5e0/rank_0_0/backbone/artifact_shape_None_subgraph_30/kg/ckgjrp5zvupvlt7gcy5jv3i3fq7j43n7svcjizgs4wzjdaivbjy6.py"
      },
      {
        "path": "/home/ubuntu/vllm_cache_llama33/torch_compile_cache/64625fd5e0/rank_0_0/backbone/artifact_shape_None_subgraph_31/kg/ckgjrp5zvupvlt7gcy5jv3i3fq7j43n7svcjizgs4wzjdaivbjy6.py"
      },
      {
        "path": "/home/ubuntu/vllm_cache_llama33/torch_compile_cache/64625fd5e0/rank_0_0/backbone/artifact_shape_None_subgraph_32/kg/ckgjrp5zvupvlt7gcy5jv3i3fq7j43n7svcjizgs4wzjdaivbjy6.py"
      },
      {
        "path": "/home/ubuntu/vllm_cache_llama33/torch_compile_cache/64625fd5e0/rank_0_0/backbone/artifact_shape_None_subgraph_33/kg/ckgjrp5zvupvlt7gcy5jv3i3fq7j43n7svcjizgs4wzjdaivbjy6.py"
      },
      {
        "path": "/home/ubuntu/vllm_cache_llama33/torch_compile_cache/64625fd5e0/rank_0_0/backbone/artifact_shape_None_subgraph_34/kg/ckgjrp5zvupvlt7gcy5jv3i3fq7j43n7svcjizgs4wzjdaivbjy6.py"
      },
      {
        "path": "/home/ubuntu/vllm_cache_llama33/torch_compile_cache/64625fd5e0/rank_0_0/backbone/artifact_shape_None_subgraph_35/kg/ckgjrp5zvupvlt7gcy5jv3i3fq7j43n7svcjizgs4wzjdaivbjy6.py"
      },
      {
        "path": "/home/ubuntu/vllm_cache_llama33/torch_compile_cache/64625fd5e0/rank_0_0/backbone/artifact_shape_None_subgraph_36/kg/ckgjrp5zvupvlt7gcy5jv3i3fq7j43n7svcjizgs4wzjdaivbjy6.py"
      },
      {
        "path": "/home/ubuntu/vllm_cache_llama33/torch_compile_cache/64625fd5e0/rank_0_0/backbone/artifact_shape_None_subgraph_37/kg/ckgjrp5zvupvlt7gcy5jv3i3fq7j43n7svcjizgs4wzjdaivbjy6.py"
      },
      {
        "path": "/home/ubuntu/vllm_cache_llama33/torch_compile_cache/64625fd5e0/rank_0_0/backbone/artifact_shape_None_subgraph_38/kg/ckgjrp5zvupvlt7gcy5jv3i3fq7j43n7svcjizgs4wzjdaivbjy6.py"
      },
      {
        "path": "/home/ubuntu/vllm_cache_llama33/torch_compile_cache/64625fd5e0/rank_0_0/backbone/artifact_shape_None_subgraph_39/kg/ckgjrp5zvupvlt7gcy5jv3i3fq7j43n7svcjizgs4wzjdaivbjy6.py"
      },
      {
        "path": "/home/ubuntu/vllm_cache_llama33/torch_compile_cache/64625fd5e0/rank_0_0/backbone/artifact_shape_None_subgraph_4/kg/ckgjrp5zvupvlt7gcy5jv3i3fq7j43n7svcjizgs4wzjdaivbjy6.py"
      },
      {
        "path": "/home/ubuntu/vllm_cache_llama33/torch_compile_cache/64625fd5e0/rank_0_0/backbone/artifact_shape_None_subgraph_40/kg/ckgjrp5zvupvlt7gcy5jv3i3fq7j43n7svcjizgs4wzjdaivbjy6.py"
      },
      {
        "path": "/home/ubuntu/vllm_cache_llama33/torch_compile_cache/64625fd5e0/rank_0_0/backbone/artifact_shape_None_subgraph_41/kg/ckgjrp5zvupvlt7gcy5jv3i3fq7j43n7svcjizgs4wzjdaivbjy6.py"
      },
      {
        "path": "/home/ubuntu/vllm_cache_llama33/torch_compile_cache/64625fd5e0/rank_0_0/backbone/artifact_shape_None_subgraph_42/kg/ckgjrp5zvupvlt7gcy5jv3i3fq7j43n7svcjizgs4wzjdaivbjy6.py"
      },
      {
        "path": "/home/ubuntu/vllm_cache_llama33/torch_compile_cache/64625fd5e0/rank_0_0/backbone/artifact_shape_None_subgraph_43/kg/ckgjrp5zvupvlt7gcy5jv3i3fq7j43n7svcjizgs4wzjdaivbjy6.py"
      },
      {
        "path": "/home/ubuntu/vllm_cache_llama33/torch_compile_cache/64625fd5e0/rank_0_0/backbone/artifact_shape_None_subgraph_44/kg/ckgjrp5zvupvlt7gcy5jv3i3fq7j43n7svcjizgs4wzjdaivbjy6.py"
      },
      {
        "path": "/home/ubuntu/vllm_cache_llama33/torch_compile_cache/64625fd5e0/rank_0_0/backbone/artifact_shape_None_subgraph_45/kg/ckgjrp5zvupvlt7gcy5jv3i3fq7j43n7svcjizgs4wzjdaivbjy6.py"
      },
      {
        "path": "/home/ubuntu/vllm_cache_llama33/torch_compile_cache/64625fd5e0/rank_0_0/backbone/artifact_shape_None_subgraph_46/kg/ckgjrp5zvupvlt7gcy5jv3i3fq7j43n7svcjizgs4wzjdaivbjy6.py"
      },
      {
        "path": "/home/ubuntu/vllm_cache_llama33/torch_compile_cache/64625fd5e0/rank_0_0/backbone/artifact_shape_None_subgraph_47/kg/ckgjrp5zvupvlt7gcy5jv3i3fq7j43n7svcjizgs4wzjdaivbjy6.py"
      },
      {
        "path": "/home/ubuntu/vllm_cache_llama33/torch_compile_cache/64625fd5e0/rank_0_0/backbone/artifact_shape_None_subgraph_48/kg/ckgjrp5zvupvlt7gcy5jv3i3fq7j43n7svcjizgs4wzjdaivbjy6.py"
      },
      {
        "path": "/home/ubuntu/vllm_cache_llama33/torch_compile_cache/64625fd5e0/rank_0_0/backbone/artifact_shape_None_subgraph_49/kg/ckgjrp5zvupvlt7gcy5jv3i3fq7j43n7svcjizgs4wzjdaivbjy6.py"
      },
      {
        "path": "/home/ubuntu/vllm_cache_llama33/torch_compile_cache/64625fd5e0/rank_0_0/backbone/artifact_shape_None_subgraph_5/kg/ckgjrp5zvupvlt7gcy5jv3i3fq7j43n7svcjizgs4wzjdaivbjy6.py"
      },
      {
        "path": "/home/ubuntu/vllm_cache_llama33/torch_compile_cache/64625fd5e0/rank_0_0/backbone/artifact_shape_None_subgraph_50/kg/ckgjrp5zvupvlt7gcy5jv3i3fq7j43n7svcjizgs4wzjdaivbjy6.py"
      },
      {
        "path": "/home/ubuntu/vllm_cache_llama33/torch_compile_cache/64625fd5e0/rank_0_0/backbone/artifact_shape_None_subgraph_51/kg/ckgjrp5zvupvlt7gcy5jv3i3fq7j43n7svcjizgs4wzjdaivbjy6.py"
      },
      {
        "path": "/home/ubuntu/vllm_cache_llama33/torch_compile_cache/64625fd5e0/rank_0_0/backbone/artifact_shape_None_subgraph_52/kg/ckgjrp5zvupvlt7gcy5jv3i3fq7j43n7svcjizgs4wzjdaivbjy6.py"
      },
      {
        "path": "/home/ubuntu/vllm_cache_llama33/torch_compile_cache/64625fd5e0/rank_0_0/backbone/artifact_shape_None_subgraph_53/kg/ckgjrp5zvupvlt7gcy5jv3i3fq7j43n7svcjizgs4wzjdaivbjy6.py"
      },
      {
        "path": "/home/ubuntu/vllm_cache_llama33/torch_compile_cache/64625fd5e0/rank_0_0/backbone/artifact_shape_None_subgraph_54/kg/ckgjrp5zvupvlt7gcy5jv3i3fq7j43n7svcjizgs4wzjdaivbjy6.py"
      },
      {
        "path": "/home/ubuntu/vllm_cache_llama33/torch_compile_cache/64625fd5e0/rank_0_0/backbone/artifact_shape_None_subgraph_55/kg/ckgjrp5zvupvlt7gcy5jv3i3fq7j43n7svcjizgs4wzjdaivbjy6.py"
      },
      {
        "path": "/home/ubuntu/vllm_cache_llama33/torch_compile_cache/64625fd5e0/rank_0_0/backbone/artifact_shape_None_subgraph_56/kg/ckgjrp5zvupvlt7gcy5jv3i3fq7j43n7svcjizgs4wzjdaivbjy6.py"
      },
      {
        "path": "/home/ubuntu/vllm_cache_llama33/torch_compile_cache/64625fd5e0/rank_0_0/backbone/artifact_shape_None_subgraph_57/kg/ckgjrp5zvupvlt7gcy5jv3i3fq7j43n7svcjizgs4wzjdaivbjy6.py"
      },
      {
        "path": "/home/ubuntu/vllm_cache_llama33/torch_compile_cache/64625fd5e0/rank_0_0/backbone/artifact_shape_None_subgraph_58/kg/ckgjrp5zvupvlt7gcy5jv3i3fq7j43n7svcjizgs4wzjdaivbjy6.py"
      },
      {
        "path": "/home/ubuntu/vllm_cache_llama33/torch_compile_cache/64625fd5e0/rank_0_0/backbone/artifact_shape_None_subgraph_59/kg/ckgjrp5zvupvlt7gcy5jv3i3fq7j43n7svcjizgs4wzjdaivbjy6.py"
      },
      {
        "path": "/home/ubuntu/vllm_cache_llama33/torch_compile_cache/64625fd5e0/rank_0_0/backbone/artifact_shape_None_subgraph_6/kg/ckgjrp5zvupvlt7gcy5jv3i3fq7j43n7svcjizgs4wzjdaivbjy6.py"
      },
      {
        "path": "/home/ubuntu/vllm_cache_llama33/torch_compile_cache/64625fd5e0/rank_0_0/backbone/artifact_shape_None_subgraph_60/kg/ckgjrp5zvupvlt7gcy5jv3i3fq7j43n7svcjizgs4wzjdaivbjy6.py"
      },
      {
        "path": "/home/ubuntu/vllm_cache_llama33/torch_compile_cache/64625fd5e0/rank_0_0/backbone/artifact_shape_None_subgraph_61/kg/ckgjrp5zvupvlt7gcy5jv3i3fq7j43n7svcjizgs4wzjdaivbjy6.py"
      },
      {
        "path": "/home/ubuntu/vllm_cache_llama33/torch_compile_cache/64625fd5e0/rank_0_0/backbone/artifact_shape_None_subgraph_62/kg/ckgjrp5zvupvlt7gcy5jv3i3fq7j43n7svcjizgs4wzjdaivbjy6.py"
      },
      {
        "path": "/home/ubuntu/vllm_cache_llama33/torch_compile_cache/64625fd5e0/rank_0_0/backbone/artifact_shape_None_subgraph_63/kg/ckgjrp5zvupvlt7gcy5jv3i3fq7j43n7svcjizgs4wzjdaivbjy6.py"
      },
      {
        "path": "/home/ubuntu/vllm_cache_llama33/torch_compile_cache/64625fd5e0/rank_0_0/backbone/artifact_shape_None_subgraph_64/kg/ckgjrp5zvupvlt7gcy5jv3i3fq7j43n7svcjizgs4wzjdaivbjy6.py"
      },
      {
        "path": "/home/ubuntu/vllm_cache_llama33/torch_compile_cache/64625fd5e0/rank_0_0/backbone/artifact_shape_None_subgraph_65/kg/ckgjrp5zvupvlt7gcy5jv3i3fq7j43n7svcjizgs4wzjdaivbjy6.py"
      },
      {
        "path": "/home/ubuntu/vllm_cache_llama33/torch_compile_cache/64625fd5e0/rank_0_0/backbone/artifact_shape_None_subgraph_66/kg/ckgjrp5zvupvlt7gcy5jv3i3fq7j43n7svcjizgs4wzjdaivbjy6.py"
      },
      {
        "path": "/home/ubuntu/vllm_cache_llama33/torch_compile_cache/64625fd5e0/rank_0_0/backbone/artifact_shape_None_subgraph_67/kg/ckgjrp5zvupvlt7gcy5jv3i3fq7j43n7svcjizgs4wzjdaivbjy6.py"
      },
      {
        "path": "/home/ubuntu/vllm_cache_llama33/torch_compile_cache/64625fd5e0/rank_0_0/backbone/artifact_shape_None_subgraph_68/kg/ckgjrp5zvupvlt7gcy5jv3i3fq7j43n7svcjizgs4wzjdaivbjy6.py"
      },
      {
        "path": "/home/ubuntu/vllm_cache_llama33/torch_compile_cache/64625fd5e0/rank_0_0/backbone/artifact_shape_None_subgraph_69/kg/ckgjrp5zvupvlt7gcy5jv3i3fq7j43n7svcjizgs4wzjdaivbjy6.py"
      },
      {
        "path": "/home/ubuntu/vllm_cache_llama33/torch_compile_cache/64625fd5e0/rank_0_0/backbone/artifact_shape_None_subgraph_7/kg/ckgjrp5zvupvlt7gcy5jv3i3fq7j43n7svcjizgs4wzjdaivbjy6.py"
      },
      {
        "path": "/home/ubuntu/vllm_cache_llama33/torch_compile_cache/64625fd5e0/rank_0_0/backbone/artifact_shape_None_subgraph_70/kg/ckgjrp5zvupvlt7gcy5jv3i3fq7j43n7svcjizgs4wzjdaivbjy6.py"
      },
      {
        "path": "/home/ubuntu/vllm_cache_llama33/torch_compile_cache/64625fd5e0/rank_0_0/backbone/artifact_shape_None_subgraph_71/kg/ckgjrp5zvupvlt7gcy5jv3i3fq7j43n7svcjizgs4wzjdaivbjy6.py"
      },
      {
        "path": "/home/ubuntu/vllm_cache_llama33/torch_compile_cache/64625fd5e0/rank_0_0/backbone/artifact_shape_None_subgraph_72/kg/ckgjrp5zvupvlt7gcy5jv3i3fq7j43n7svcjizgs4wzjdaivbjy6.py"
      },
      {
        "path": "/home/ubuntu/vllm_cache_llama33/torch_compile_cache/64625fd5e0/rank_0_0/backbone/artifact_shape_None_subgraph_73/kg/ckgjrp5zvupvlt7gcy5jv3i3fq7j43n7svcjizgs4wzjdaivbjy6.py"
      },
      {
        "path": "/home/ubuntu/vllm_cache_llama33/torch_compile_cache/64625fd5e0/rank_0_0/backbone/artifact_shape_None_subgraph_74/kg/ckgjrp5zvupvlt7gcy5jv3i3fq7j43n7svcjizgs4wzjdaivbjy6.py"
      },
      {
        "path": "/home/ubuntu/vllm_cache_llama33/torch_compile_cache/64625fd5e0/rank_0_0/backbone/artifact_shape_None_subgraph_75/kg/ckgjrp5zvupvlt7gcy5jv3i3fq7j43n7svcjizgs4wzjdaivbjy6.py"
      },
      {
        "path": "/home/ubuntu/vllm_cache_llama33/torch_compile_cache/64625fd5e0/rank_0_0/backbone/artifact_shape_None_subgraph_76/kg/ckgjrp5zvupvlt7gcy5jv3i3fq7j43n7svcjizgs4wzjdaivbjy6.py"
      },
      {
        "path": "/home/ubuntu/vllm_cache_llama33/torch_compile_cache/64625fd5e0/rank_0_0/backbone/artifact_shape_None_subgraph_77/kg/ckgjrp5zvupvlt7gcy5jv3i3fq7j43n7svcjizgs4wzjdaivbjy6.py"
      },
      {
        "path": "/home/ubuntu/vllm_cache_llama33/torch_compile_cache/64625fd5e0/rank_0_0/backbone/artifact_shape_None_subgraph_78/kg/ckgjrp5zvupvlt7gcy5jv3i3fq7j43n7svcjizgs4wzjdaivbjy6.py"
      },
      {
        "path": "/home/ubuntu/vllm_cache_llama33/torch_compile_cache/64625fd5e0/rank_0_0/backbone/artifact_shape_None_subgraph_79/kg/ckgjrp5zvupvlt7gcy5jv3i3fq7j43n7svcjizgs4wzjdaivbjy6.py"
      },
      {
        "path": "/home/ubuntu/vllm_cache_llama33/torch_compile_cache/64625fd5e0/rank_0_0/backbone/artifact_shape_None_subgraph_8/kg/ckgjrp5zvupvlt7gcy5jv3i3fq7j43n7svcjizgs4wzjdaivbjy6.py"
      },
      {
        "path": "/home/ubuntu/vllm_cache_llama33/torch_compile_cache/64625fd5e0/rank_0_0/backbone/artifact_shape_None_subgraph_80/6q/c6q3fggehlhrvala3bnm2bfzwslcjpwwqidiag4jkmnwdxkmhxmt.py"
      },
      {
        "path": "/home/ubuntu/vllm_cache_llama33/torch_compile_cache/64625fd5e0/rank_0_0/backbone/artifact_shape_None_subgraph_9/kg/ckgjrp5zvupvlt7gcy5jv3i3fq7j43n7svcjizgs4wzjdaivbjy6.py"
      },
      {
        "path": "/home/ubuntu/vllm_cache_llama33/torch_compile_cache/64625fd5e0/rank_0_0/backbone/computation_graph.py"
      },
      {
        "path": "/home/ubuntu/vllm_cache_llama33/torch_compile_cache/64625fd5e0/rank_0_0/backbone/vllm_compile_cache.py"
      }
    ]
  },
  "code_hash": "f37b1c49ecca58acde543bd84c7de6dfbb439fa66f112b511f239a6bf5a1c39b"
}

cc @ProExpertProg @hmellor @zou3519


Note

Unifies torch.compile cache key generation and factor collection.

  • Introduces CompileFactors and get_compile_factors, replacing compute_hash with compile_factors across config classes and compiler adaptors; VllmConfig.compile_factors aggregates nested config factors
  • Standardizes hashing via normalize_value and hash_factors, including nested compile_factors for subobjects
  • Adds helpers compute_env_and_config_hashes and get_code_factors; cache keys now combine env/config/compiler/code factors and persist to cache_key_factors.json
  • Aligns AOT and JIT cache key computation; updates backends to derive cache dirs from hashed factor sets
  • Updates tests to use get_compile_factors and validates multimodal backend factor changes; minor enum import path fix

Written by Cursor Bugbot for commit 7fbc6a6. This will update automatically on new commits. Configure here.


Note

Unifies torch.compile cache key derivation and factor collection across the codebase.

  • Introduces CompileFactors and get_compile_factors; migrates config classes and compiler adaptors from compute_hash to compile_factors (opt-out factor collection, nested configs supported)
  • Standardizes hashing via normalize_value and hash_factors; VllmConfig.compile_factors aggregates nested config factors
  • Adds shared helpers compute_env_and_config_hashes and get_code_factors; backends now derive cache dirs from hashed {env, config, compiler, code} factors and persist them in cache_key_factors.json
  • Aligns AOT and JIT cache key computation; updates logging and factor persistence
  • Updates tests to use get_compile_factors and validate factor changes; fixes enum import for AttentionBackendEnum
  • Minor API/typing cleanups (e.g., return {} where configs don’t affect graphs; use Path for traced files; consistent DP hash validation)

Written by Cursor Bugbot for commit 929d2f7. This will update automatically on new commits. Configure here.


Note

Cursor Bugbot is generating a summary for commit a0b60a4. Configure here.


Note

Unifies torch.compile cache key derivation and factor collection across the codebase.

  • Introduces CompileFactors and get_compile_factors; migrates configs and compiler adaptors from compute_hash to compile_factors (opt‑out, nested configs supported)
  • Standardizes hashing via normalize_value and hash_factors; VllmConfig.compile_factors aggregates nested factors
  • Adds helpers compute_env_and_config_hashes and get_code_factors; backends now hash {env, config, compiler, code} to derive cache dirs and persist cache_key_factors.json
  • Aligns AOT and JIT cache keys; updates logging and DP/parallel config validation to use factor hashes
  • Test updates and minor cleanups (enum import path, Path usage, typing tweaks, pass manager UUID includes post‑cleanup)

Written by Cursor Bugbot for commit a0b60a4. This will update automatically on new commits. Configure here.


Note

Standardizes torch.compile cache keys and factor collection across the codebase.

  • Introduces CompileFactors and get_compile_factors; replaces compute_hash with compile_factors in configs and compiler adaptors, and aggregates in VllmConfig.compile_factors
  • Adds compute_env_and_config_hashes and get_code_factors; backends now build cache dirs from hashed {env, config, compiler, code} and persist cache_key_factors.json
  • Aligns AOT and JIT cache key computation; updates logging and DP parallel validation to use factor hashes
  • Updates pass manager UUID composition and minor typing/import cleanups (e.g., Path/Sequence); adjusts tests to new APIs and enum import path

Written by Cursor Bugbot for commit 8f3d1af. This will update automatically on new commits. Configure here.

vnadathur and others added 5 commits November 19, 2025 19:24
Updated all config classes to support an optional 'return_factors' argument in their compute_hash methods, allowing retrieval of hash factors instead of just the hash string.

Signed-off-by: vnadathur <glvikramn@gmail.com>
Co-Authored-By: Srreyansh Sethi <107075589+WorldExplored@users.noreply.github.com>
Signed-off-by: WorldExplored <srreyansh.sethi@gmail.com>
Co-Authored-By: vnadathur <236933696+vnadathur@users.noreply.github.com>
Signed-off-by: WorldExplored <srreyansh.sethi@gmail.com>
Co-Authored-By: vnadathur <236933696+vnadathur@users.noreply.github.com>
Signed-off-by: vnadathur <glvikramn@gmail.com>
Co-Authored-By: Srreyansh Sethi <107075589+WorldExplored@users.noreply.github.com>
allows us to stop handling passconfig specially.

Signed-off-by: vnadathur <glvikramn@gmail.com>
Co-Authored-By: Srreyansh Sethi <107075589+WorldExplored@users.noreply.github.com>
Signed-off-by: vnadathur <glvikramn@gmail.com>
@mergify
Copy link

mergify bot commented Nov 20, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @vnadathur.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Nov 20, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request is a large-scale refactoring that renames compute_hash to compile_factors across many configuration classes, aiming for a more uniform and consistent hashing mechanism. The changes introduce a standardized way to handle nested configurations and allow for either returning the hash factors or the final hash string. Overall, the refactoring improves code consistency. However, I've identified a critical issue in vllm/envs.py that will cause an infinite recursion and needs to be addressed.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@mergify mergify bot removed the needs-rebase label Nov 20, 2025
Signed-off-by: vnadathur <glvikramn@gmail.com>
Signed-off-by: vnadathur <glvikramn@gmail.com>
@vnadathur vnadathur changed the title Hash [torch.compile] refactoring compute_hash & fixing small bugs Nov 20, 2025
Signed-off-by: vnadathur <glvikramn@gmail.com>
vnadathur and others added 3 commits January 26, 2026 11:19
Signed-off-by: vnadathur <glvikramn@gmail.com>
Signed-off-by: WorldExplored <srreyansh.sethi@gmail.com>
Signed-off-by: WorldExplored <srreyansh.sethi@gmail.com>
@WorldExplored
Copy link
Contributor

New Generated Cache

{
    "code": {
        "files": [
            {
                "hash": "6720a328f9f442397d7f1acdfc19c6830abaab8fa17d2313aa7674208c36f844",
                "path": "/home/ubuntu/.cache/vllm/torch_compile_cache/31144e7915/rank_0_0/backbone/artifact_compile_range_1_8192_subgraph_0"
            },
            {
                "hash": "e66c92c160ab82ee390749d4f4a0816b86c86a5dc4b87b218ec6a06b561ec0fe",
                "path": "/home/ubuntu/.cache/vllm/torch_compile_cache/31144e7915/rank_0_0/backbone/artifact_compile_range_1_8192_subgraph_1"
            },
            {
                "hash": "2e18636f4f57f1b3e307ed9a642417597f2ec3ef5f130142e516a39afecb1007",
                "path": "/home/ubuntu/.cache/vllm/torch_compile_cache/31144e7915/rank_0_0/backbone/artifact_compile_range_1_8192_subgraph_10"
            },
            {
                "hash": "2e18636f4f57f1b3e307ed9a642417597f2ec3ef5f130142e516a39afecb1007",
                "path": "/home/ubuntu/.cache/vllm/torch_compile_cache/31144e7915/rank_0_0/backbone/artifact_compile_range_1_8192_subgraph_11"
            },
            {
                "hash": "59e5e4c4c2ff6a23dd1d9b7012b976517b0cb63b83dec3c30581d48d8329cda8",
                "path": "/home/ubuntu/.cache/vllm/torch_compile_cache/31144e7915/rank_0_0/backbone/artifact_compile_range_1_8192_subgraph_12"
            },
            {
                "hash": "2e18636f4f57f1b3e307ed9a642417597f2ec3ef5f130142e516a39afecb1007",
                "path": "/home/ubuntu/.cache/vllm/torch_compile_cache/31144e7915/rank_0_0/backbone/artifact_compile_range_1_8192_subgraph_2"
            },
            {
                "hash": "2e18636f4f57f1b3e307ed9a642417597f2ec3ef5f130142e516a39afecb1007",
                "path": "/home/ubuntu/.cache/vllm/torch_compile_cache/31144e7915/rank_0_0/backbone/artifact_compile_range_1_8192_subgraph_3"
            },
            {
                "hash": "2e18636f4f57f1b3e307ed9a642417597f2ec3ef5f130142e516a39afecb1007",
                "path": "/home/ubuntu/.cache/vllm/torch_compile_cache/31144e7915/rank_0_0/backbone/artifact_compile_range_1_8192_subgraph_4"
            },
            {
                "hash": "2e18636f4f57f1b3e307ed9a642417597f2ec3ef5f130142e516a39afecb1007",
                "path": "/home/ubuntu/.cache/vllm/torch_compile_cache/31144e7915/rank_0_0/backbone/artifact_compile_range_1_8192_subgraph_5"
            },
            {
                "hash": "2e18636f4f57f1b3e307ed9a642417597f2ec3ef5f130142e516a39afecb1007",
                "path": "/home/ubuntu/.cache/vllm/torch_compile_cache/31144e7915/rank_0_0/backbone/artifact_compile_range_1_8192_subgraph_6"
            },
            {
                "hash": "2e18636f4f57f1b3e307ed9a642417597f2ec3ef5f130142e516a39afecb1007",
                "path": "/home/ubuntu/.cache/vllm/torch_compile_cache/31144e7915/rank_0_0/backbone/artifact_compile_range_1_8192_subgraph_7"
            },
            {
                "hash": "2e18636f4f57f1b3e307ed9a642417597f2ec3ef5f130142e516a39afecb1007",
                "path": "/home/ubuntu/.cache/vllm/torch_compile_cache/31144e7915/rank_0_0/backbone/artifact_compile_range_1_8192_subgraph_8"
            },
            {
                "hash": "2e18636f4f57f1b3e307ed9a642417597f2ec3ef5f130142e516a39afecb1007",
                "path": "/home/ubuntu/.cache/vllm/torch_compile_cache/31144e7915/rank_0_0/backbone/artifact_compile_range_1_8192_subgraph_9"
            },
            {
                "hash": "081d27301dd54622f751dc08d5e39c89669a7019393e2a40adc5912a91f947b7",
                "path": "/home/ubuntu/.cache/vllm/torch_compile_cache/31144e7915/rank_0_0/backbone/computation_graph.py"
            },
            {
                "hash": "f2a215663fc5f9e5d1df403233217137ea663545e81a8cf9f655d804082e3782",
                "path": "/home/ubuntu/.cache/vllm/torch_compile_cache/31144e7915/rank_0_0/backbone/vllm_compile_cache.py"
            },
            {
                "hash": "61ee71caec330e282a35d59b6b830fbb8acc114b5dda5bdf8b2c2b5f4ba2dbb4",
                "path": "/tmp/vllm-hash-src/vllm/attention/layer.py"
            },
            {
                "hash": "636a31a0a8c282cebe2477fbaf423c5581084992dc99f45d7644e52bcfaffbd4",
                "path": "/tmp/vllm-hash-src/vllm/distributed/communication_op.py"
            },
            {
                "hash": "ce95b7c93d864826032199ce6144ac0f4fcc6c080fffc6e8bd28d42c69084721",
                "path": "/tmp/vllm-hash-src/vllm/distributed/parallel_state.py"
            },
            {
                "hash": "6fbe57eed85816551197a90ef8ffcf9566fb283ab70f925c24be85b27d077dc1",
                "path": "/tmp/vllm-hash-src/vllm/model_executor/custom_op.py"
            },
            {
                "hash": "80d04afcb4ceca71d660adbe52123cccb415e6c286c84fd161fb9e54e41f562f",
                "path": "/tmp/vllm-hash-src/vllm/model_executor/layers/linear.py"
            },
            {
                "hash": "9e883dce6bb4cf19d2f7a848357b1c95c1cb77b57810627d96411d8a71513240",
                "path": "/tmp/vllm-hash-src/vllm/model_executor/layers/utils.py"
            },
            {
                "hash": "a5600b7aa02421ff860200dff486cbefeee723139a073d4f34eaae60958fa824",
                "path": "/tmp/vllm-hash-src/vllm/model_executor/layers/vocab_parallel_embedding.py"
            },
            {
                "hash": "956b97f53593f68cdbf3c42e5a74377678b350fc14a5c499544dac025047e050",
                "path": "/tmp/vllm-hash-src/vllm/model_executor/models/opt.py"
            },
            {
                "hash": "c1190135988009696dd9d4b8b66473a681b14d1919ea82890c1159e12865bac7",
                "path": "/tmp/vllm-hash-src/vllm/platforms/interface.py"
            },
            {
                "hash": "311d56619d73e0b29077cd78011606bdbf2944556f28cde60da3bb46366bfd56",
                "path": "/tmp/vllm-hash/lib/python3.10/site-packages/torch/_dynamo/polyfills/__init__.py"
            },
            {
                "hash": "18f87f1f92ac47831fc62ab2a53c082ca42f5cee56524adf4081bb4824e9595a",
                "path": "/tmp/vllm-hash/lib/python3.10/site-packages/torch/_dynamo/polyfills/builtins.py"
            },
            {
                "hash": "a42d9303672ffb02ae3edc03d2956fa46184062bb00e302b78a5197238ca5ed4",
                "path": "/tmp/vllm-hash/lib/python3.10/site-packages/torch/_dynamo/polyfills/itertools.py"
            },
            {
                "hash": "35573aa23738a40c320aac595d917bef5dbe2ddeb9e5107206c3132ba1650a69",
                "path": "/tmp/vllm-hash/lib/python3.10/site-packages/torch/nn/modules/activation.py"
            },
            {
                "hash": "ba7371ec383efc276583abfa6d291bd18daaad28530c28f48712617dbd959c7c",
                "path": "/tmp/vllm-hash/lib/python3.10/site-packages/torch/nn/modules/container.py"
            },
            {
                "hash": "8f45992a756cdf1f164a38452677a028c88d984afcef5b3af7c9d6ad7edaea31",
                "path": "/tmp/vllm-hash/lib/python3.10/site-packages/torch/nn/modules/normalization.py"
            },
            {
                "hash": "244fc231217e0d0f332ea13d35b915b15496cf318b577b924fd6fa2347753e1c",
                "path": "/tmp/vllm-hash/lib/python3.10/site-packages/torch/nn/modules/sparse.py"
            }
        ]
    },
    "code_hash": "7904f2e7ec09fcfd5df8dbcce0e64da993d734a1e0426198756eb352a109d12c",
    "compiler": {
        "inductor_standalone": [
            [
                [
                    "device",
                    [
                        [
                            "name",
                            "NVIDIA A100-SXM4-40GB"
                        ]
                    ]
                ],
                [
                    "hash",
                    "1a23896099465ff3088435b0b12fb5b1b808cca419af7aa8667698c9d7d3802c"
                ],
                [
                    "version",
                    [
                        [
                            "cuda",
                            "12.8"
                        ],
                        [
                            "triton",
                            "3.5.10355c3e3452fa4500122244b8fff8864f22bc05c417a3d6f5dada9f2e92f8040-c4d799c32eb6a3b92ad36ad7ea5645efc5b2fc6cd5c73dadc3a678d57dd2ccf4-c37149275a03d063fc1c339cd76a19da1c17e3d555410372e6cad29eedef6202-23d635e690d670bf61798e1259674b78c0ed5ba222ab6a455f329f27a758fc2d-e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855-e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855-03bffcc16c9d6be5ea8fde645caf3a6e3c0ac892eec49051a79d4cc99b66b9c7-e68505098eef3e7c0b050cb91da2587ab6c10d455ca0b941ff06fa8068e16305-318dbf7101b6ea9ebccfc57046fd8d963fe1d837c487005b37edf471a3207a9d-25cb0bee9547488335de2d495af738298ba6d4c20f1d37941dc17751c57a211e-e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855-3e3d33fbab70c7c05fc70e2e2fdd763dca0b847f62150592b99f8886a419ee64-83ef58f2371da7ad8e01822c2afc82f9a2d6516c2249ee0bbd8873bb20616be0-31a81ba046f7d349e5f86d59555dfd20a9bc126123b68e9b2526d97d94ca514e-5c1281b67c0d949da34ccf1c3b68804a2caa2665953984d837d753e91af611fe-5d15c5bebef8d7aa51b21fd187e5faa95eba4a213254355bc69e0648013599f7-30106ed84518c6ca7aca08e2c0ee188755f512cc0cb2d7da8914cc48c1ad6dcc-adb54e71d0ffc3bdac437fbc97929769fbbe4ebd03e6361c6357f2a24f7c5954-27b2a5d1e8db008bacefe6019f63922bbd65926de90bb1b527ee597477d2f365-a610dc5c215589aab7a784e1c07acef3e16d53ef00f08de793899964956f4e2a-18572e33e474a820799036f2b2f8c3e54d8a526386356716cccf2bf32a832376-2cdca74c4297804dcf499a7e9d4315ab87edfe2d72f536a8fdc02f28a3e7dacd-b53abe93473eb37d88bc378692065c9a8b1bf54b6417cb1911a13d10918c6d20-f60c2bb2d8eebe1c191f4b8b819844414dd1bce243645635a094f9f92665a58e-08abee21ce6230a873ed0831f70f9570b7ce39969dbf9b2f28ae1a1992ee1cc7-8e4b8599f819f32bcabae6fd118dbbccfbec0ba9e1909224d39c5fe32fbb491f-3db4bee9427c7eb0e2105aff484bdacc819357d298e8f6e89c372ae9c3625bdf-59cf295f3aab4fa62b96a627aa9fec1302950133750de59e542c7b4c9e5b80b6-5305890c3b133def44e2f3d3405e0fb1fd6ce78d0a28b2127670a195bbe11c66"
                        ]
                    ]
                ]
            ],
            "61919ad07c4a396a916cc94a3f688f2e565e8ced3d29586c564082b15ed3f6d9"
        ]
    },
    "compiler_hash": "0559fef67a149439da50b855d85ac726824812e035aff75788c7e98e243c52f5",
    "config": {
        "additional": {},
        "attention": {
            "backend": [
                "vllm.v1.attention.backends.registry.AttentionBackendEnum",
                "vllm.v1.attention.backends.triton_attn.TritonAttentionBackend"
            ],
            "disable_flashinfer_prefill": false,
            "disable_flashinfer_q_quantization": false,
            "flash_attn_max_num_splits_for_cuda_graph": 32,
            "flash_attn_version": null,
            "use_cudnn_prefill": false,
            "use_prefill_decode_attention": false,
            "use_trtllm_attention": null,
            "use_trtllm_ragged_deepseek_prefill": true
        },
        "cache": {
            "block_size": 16,
            "cache_dtype": "auto",
            "calculate_kv_scales": false,
            "cpu_offload_gb": 0.0,
            "kv_cache_memory_bytes": null,
            "kv_offloading_backend": "native",
            "kv_offloading_size": null,
            "mamba_block_size": null,
            "mamba_cache_dtype": "auto",
            "mamba_cache_mode": "none",
            "mamba_ssm_cache_dtype": "auto",
            "sliding_window": null
        },
        "compilation": {
            "backend": "inductor",
            "compile_cache_save_format": "binary",
            "compile_mm_encoder": false,
            "compile_ranges_split_points": [
                8192
            ],
            "compile_sizes": [],
            "cudagraph_capture_sizes": [
                1,
                2,
                4,
                8,
                16,
                24,
                32,
                40,
                48,
                56,
                64,
                72,
                80,
                88,
                96,
                104,
                112,
                120,
                128,
                136,
                144,
                152,
                160,
                168,
                176,
                184,
                192,
                200,
                208,
                216,
                224,
                232,
                240,
                248,
                256,
                272,
                288,
                304,
                320,
                336,
                352,
                368,
                384,
                400,
                416,
                432,
                448,
                464,
                480,
                496,
                512
            ],
            "cudagraph_copy_inputs": false,
            "cudagraph_mode": [
                "vllm.config.compilation.CUDAGraphMode",
                [
                    2,
                    1
                ]
            ],
            "cudagraph_num_of_warmups": 1,
            "cudagraph_specialize_lora": true,
            "custom_ops": [
                "none"
            ],
            "disabled_custom_ops": [
                [
                    "column_parallel_linear",
                    24
                ],
                [
                    "logits_processor",
                    1
                ],
                [
                    "row_parallel_linear",
                    24
                ],
                [
                    "vocab_parallel_embedding",
                    1
                ]
            ],
            "dynamic_shapes_config": {
                "assume_32_bit_indexing": true,
                "evaluate_guards": false,
                "type": "backed"
            },
            "enabled_custom_ops": [],
            "inductor_compile_config": [
                [
                    "benchmark_combo_kernel",
                    true
                ],
                [
                    "combo_kernels",
                    true
                ],
                [
                    "enable_auto_functionalized_v2",
                    false
                ]
            ],
            "inductor_passes": [],
            "level": null,
            "max_cudagraph_capture_size": 512,
            "mode": 3,
            "pass_config": {
                "eliminate_noops": true,
                "enable_qk_norm_rope_fusion": false,
                "enable_sp": false,
                "fi_allreduce_fusion_max_size_mb": null,
                "fuse_act_quant": false,
                "fuse_allreduce_rms": false,
                "fuse_attn_quant": false,
                "fuse_gemm_comms": false,
                "fuse_norm_quant": false
            },
            "splitting_ops": [
                "vllm::unified_attention",
                "vllm::unified_attention_with_output",
                "vllm::unified_mla_attention",
                "vllm::unified_mla_attention_with_output",
                "vllm::mamba_mixer2",
                "vllm::mamba_mixer",
                "vllm::short_conv",
                "vllm::linear_attention",
                "vllm::plamo2_mamba_mixer",
                "vllm::gdn_attention_core",
                "vllm::kda_attention",
                "vllm::sparse_attn_indexer",
                "vllm::rocm_aiter_sparse_attn_indexer"
            ],
            "use_inductor_graph_partition": false
        },
        "device": {},
        "ec_transfer": {},
        "kv_transfer": {},
        "load": {},
        "lora": {},
        "model": {
            "allow_deprecated_quantization": false,
            "code_revision": null,
            "disable_sliding_window": false,
            "dtype": "torch.bfloat16",
            "enable_prompt_embeds": false,
            "enable_return_routed_experts": false,
            "enable_sleep_mode": false,
            "generation_config": "auto",
            "hf_config": "{\n  \"_remove_final_layer_norm\": false,\n  \"activation_dropout\": 0.0,\n  \"activation_function\": \"relu\",\n  \"architectures\": [\n    \"OPTForCausalLM\"\n  ],\n  \"attention_dropout\": 0.0,\n  \"bos_token_id\": 2,\n  \"do_layer_norm_before\": true,\n  \"dropout\": 0.1,\n  \"dtype\": \"float16\",\n  \"enable_bias\": true,\n  \"eos_token_id\": 2,\n  \"ffn_dim\": 3072,\n  \"hidden_size\": 768,\n  \"init_std\": 0.02,\n  \"layer_norm_elementwise_affine\": true,\n  \"layerdrop\": 0.0,\n  \"max_position_embeddings\": 2048,\n  \"model_type\": \"opt\",\n  \"num_attention_heads\": 12,\n  \"num_hidden_layers\": 12,\n  \"pad_token_id\": 1,\n  \"prefix\": \"\",\n  \"transformers_version\": \"4.57.6\",\n  \"use_cache\": true,\n  \"vocab_size\": 50272,\n  \"word_embed_proj_dim\": 768\n}\n",
            "hf_text_config": "{\n  \"_remove_final_layer_norm\": false,\n  \"activation_dropout\": 0.0,\n  \"activation_function\": \"relu\",\n  \"architectures\": [\n    \"OPTForCausalLM\"\n  ],\n  \"attention_dropout\": 0.0,\n  \"bos_token_id\": 2,\n  \"do_layer_norm_before\": true,\n  \"dropout\": 0.1,\n  \"dtype\": \"float16\",\n  \"enable_bias\": true,\n  \"eos_token_id\": 2,\n  \"ffn_dim\": 3072,\n  \"hidden_size\": 768,\n  \"init_std\": 0.02,\n  \"layer_norm_elementwise_affine\": true,\n  \"layerdrop\": 0.0,\n  \"max_position_embeddings\": 2048,\n  \"model_type\": \"opt\",\n  \"num_attention_heads\": 12,\n  \"num_hidden_layers\": 12,\n  \"pad_token_id\": 1,\n  \"prefix\": \"\",\n  \"transformers_version\": \"4.57.6\",\n  \"use_cache\": true,\n  \"vocab_size\": 50272,\n  \"word_embed_proj_dim\": 768\n}\n",
            "max_logprobs": 20,
            "max_model_len": 512,
            "model": "facebook/opt-125m",
            "model_impl": "auto",
            "model_weights": "",
            "override_generation_config": [],
            "quantization": null,
            "revision": null,
            "trust_remote_code": false
        },
        "observability": {},
        "parallel": {
            "all2all_backend": "allgather_reducescatter",
            "cp_kv_cache_interleave_size": 1,
            "data_parallel_size": 1,
            "dbo_decode_token_threshold": 32,
            "dbo_prefill_token_threshold": 512,
            "dcp_kv_cache_interleave_size": 1,
            "decode_context_parallel_size": 1,
            "disable_nccl_for_dp_synchronization": true,
            "enable_dbo": false,
            "enable_eplb": false,
            "enable_expert_parallel": false,
            "eplb_config": [
                "vllm.config.parallel.EPLBConfig",
                [
                    [
                        "log_balancedness",
                        false
                    ],
                    [
                        "log_balancedness_interval",
                        1
                    ],
                    [
                        "num_redundant_experts",
                        0
                    ],
                    [
                        "policy",
                        "default"
                    ],
                    [
                        "step_interval",
                        3000
                    ],
                    [
                        "use_async",
                        false
                    ],
                    [
                        "window_size",
                        1000
                    ]
                ]
            ],
            "expert_placement_strategy": "linear",
            "is_moe_model": false,
            "pipeline_parallel_size": 1,
            "prefill_context_parallel_size": 1,
            "tensor_parallel_size": 1,
            "ubatch_size": 0,
            "world_size": 1
        },
        "profiler": {},
        "scheduler": {
            "max_num_batched_tokens": 8192
        },
        "speculative": {},
        "structured_outputs": {},
        "version": "0.1.dev13348+g662fae03f"
    },
    "config_hash": "d214331261f16cfadc46ef545cb17529866e3ac02f55f290393c561abdeb679e",
    "env": {
        "CMAKE_BUILD_TYPE": null,
        "CUDA_HOME": null,
        "K_SCALE_CONSTANT": 200,
        "NVCC_THREADS": null,
        "Q_SCALE_CONSTANT": 200,
        "RAY_EXPERIMENTAL_NOSET_ASCEND_RT_VISIBLE_DEVICES": null,
        "RAY_EXPERIMENTAL_NOSET_CUDA_VISIBLE_DEVICES": null,
        "RAY_EXPERIMENTAL_NOSET_HABANA_VISIBLE_MODULES": null,
        "RAY_EXPERIMENTAL_NOSET_HIP_VISIBLE_DEVICES": null,
        "RAY_EXPERIMENTAL_NOSET_NEURON_RT_VISIBLE_CORES": null,
        "RAY_EXPERIMENTAL_NOSET_ONEAPI_DEVICE_SELECTOR": null,
        "RAY_EXPERIMENTAL_NOSET_RBLN_RT_VISIBLE_DEVICES": null,
        "RAY_EXPERIMENTAL_NOSET_ROCR_VISIBLE_DEVICES": null,
        "RAY_EXPERIMENTAL_NOSET_TPU_VISIBLE_CHIPS": null,
        "VERBOSE": false,
        "VLLM_ALL2ALL_BACKEND": null,
        "VLLM_ALLOW_CHUNKED_LOCAL_ATTN_WITH_HYBRID_KV_CACHE": true,
        "VLLM_ALLOW_INSECURE_SERIALIZATION": false,
        "VLLM_ALLOW_LONG_MAX_MODEL_LEN": false,
        "VLLM_ALLOW_RUNTIME_LORA_UPDATING": false,
        "VLLM_ALLREDUCE_USE_SYMM_MEM": true,
        "VLLM_API_KEY": null,
        "VLLM_BLOCKSCALE_FP8_GEMM_FLASHINFER": false,
        "VLLM_COMPILE_CACHE_SAVE_FORMAT": "binary",
        "VLLM_COMPUTE_NANS_IN_LOGITS": false,
        "VLLM_CONFIGURE_LOGGING": true,
        "VLLM_CONFIG_ROOT": "/home/ubuntu/.config/vllm",
        "VLLM_CUDART_SO_PATH": null,
        "VLLM_CUSTOM_SCOPES_FOR_PROFILING": false,
        "VLLM_DBO_COMM_SMS": 20,
        "VLLM_DEBUG_MFU_METRICS": false,
        "VLLM_DEBUG_WORKSPACE": false,
        "VLLM_DEEPEPLL_NVFP4_DISPATCH": false,
        "VLLM_DEEPEP_BUFFER_SIZE_MB": 1024,
        "VLLM_DEEPEP_HIGH_THROUGHPUT_FORCE_INTRA_NODE": false,
        "VLLM_DEEPEP_LOW_LATENCY_USE_MNNVL": false,
        "VLLM_DEEP_GEMM_WARMUP": "relax",
        "VLLM_DISABLED_KERNELS": [],
        "VLLM_DISABLE_COMPILE_CACHE": false,
        "VLLM_DISABLE_LOG_LOGO": false,
        "VLLM_DISABLE_PYNCCL": false,
        "VLLM_DISABLE_SHARED_EXPERTS_STREAM": false,
        "VLLM_DOCKER_BUILD_CONTEXT": false,
        "VLLM_DP_RANK": 0,
        "VLLM_DP_RANK_LOCAL": 0,
        "VLLM_DP_SIZE": 1,
        "VLLM_ENABLE_CUDAGRAPH_GC": false,
        "VLLM_ENABLE_FUSED_MOE_ACTIVATION_CHUNKING": true,
        "VLLM_ENABLE_INDUCTOR_COORDINATE_DESCENT_TUNING": true,
        "VLLM_ENABLE_INDUCTOR_MAX_AUTOTUNE": true,
        "VLLM_ENABLE_MOE_DP_CHUNK": true,
        "VLLM_ENABLE_RESPONSES_API_STORE": false,
        "VLLM_ENGINE_READY_TIMEOUT_S": 600,
        "VLLM_FLASHINFER_ALLREDUCE_FUSION_THRESHOLDS_MB": [],
        "VLLM_FLASHINFER_MOE_BACKEND": "latency",
        "VLLM_FLASHINFER_WORKSPACE_BUFFER_SIZE": 413138944,
        "VLLM_FLOAT32_MATMUL_PRECISION": "highest",
        "VLLM_FUSED_MOE_CHUNK_SIZE": 16384,
        "VLLM_GC_DEBUG": "",
        "VLLM_GPT_OSS_HARMONY_SYSTEM_INSTRUCTIONS": false,
        "VLLM_GPT_OSS_SYSTEM_TOOL_MCP_LABELS": [],
        "VLLM_HAS_FLASHINFER_CUBIN": false,
        "VLLM_KV_CACHE_LAYOUT": null,
        "VLLM_KV_EVENTS_USE_INT_BLOCK_HASHES": true,
        "VLLM_LOG_BATCHSIZE_INTERVAL": -1.0,
        "VLLM_LOG_MODEL_INSPECTION": false,
        "VLLM_LOOPBACK_IP": "",
        "VLLM_LORA_DISABLE_PDL": false,
        "VLLM_LORA_RESOLVER_CACHE_DIR": null,
        "VLLM_MAIN_CUDA_VERSION": "12.9",
        "VLLM_MARLIN_INPUT_DTYPE": null,
        "VLLM_MARLIN_USE_ATOMIC_ADD": false,
        "VLLM_MAX_TOKENS_PER_EXPERT_FP4_MOE": 163840,
        "VLLM_MLA_DISABLE": false,
        "VLLM_MM_HASHER_ALGORITHM": "blake3",
        "VLLM_MOE_DP_CHUNK_SIZE": 256,
        "VLLM_MOE_ROUTING_SIMULATION_STRATEGY": "",
        "VLLM_MOE_USE_DEEP_GEMM": true,
        "VLLM_MOONCAKE_ABORT_REQUEST_TIMEOUT": 480,
        "VLLM_MOONCAKE_BOOTSTRAP_PORT": 8998,
        "VLLM_MORIIO_CONNECTOR_READ_MODE": false,
        "VLLM_MORIIO_NUM_WORKERS": 1,
        "VLLM_MORIIO_POST_BATCH_SIZE": -1,
        "VLLM_MORIIO_QP_PER_TRANSFER": 1,
        "VLLM_MQ_MAX_CHUNK_BYTES_MB": 16,
        "VLLM_MSGPACK_ZERO_COPY_THRESHOLD": 256,
        "VLLM_MXFP4_USE_MARLIN": null,
        "VLLM_NCCL_INCLUDE_PATH": null,
        "VLLM_NCCL_SO_PATH": null,
        "VLLM_NIXL_ABORT_REQUEST_TIMEOUT": 480,
        "VLLM_NIXL_SIDE_CHANNEL_HOST": "localhost",
        "VLLM_NIXL_SIDE_CHANNEL_PORT": 5600,
        "VLLM_NVFP4_GEMM_BACKEND": null,
        "VLLM_NVTX_SCOPES_FOR_PROFILING": false,
        "VLLM_PATTERN_MATCH_DEBUG": null,
        "VLLM_PLUGINS": null,
        "VLLM_PP_LAYER_PARTITION": null,
        "VLLM_PROCESS_NAME_PREFIX": "VLLM",
        "VLLM_PROFILER_DELAY_ITERS": null,
        "VLLM_PROFILER_MAX_ITERS": null,
        "VLLM_RAY_BUNDLE_INDICES": "",
        "VLLM_RAY_DP_PACK_STRATEGY": "strict",
        "VLLM_RAY_PER_WORKER_GPUS": 1.0,
        "VLLM_ROCM_CUSTOM_PAGED_ATTN": true,
        "VLLM_ROCM_FP8_MFMA_PAGE_ATTN": false,
        "VLLM_ROCM_FP8_PADDING": true,
        "VLLM_ROCM_MOE_PADDING": true,
        "VLLM_ROCM_QUICK_REDUCE_CAST_BF16_TO_FP16": true,
        "VLLM_ROCM_QUICK_REDUCE_MAX_SIZE_BYTES_MB": null,
        "VLLM_ROCM_QUICK_REDUCE_QUANTIZATION": "NONE",
        "VLLM_ROCM_SHUFFLE_KV_CACHE_LAYOUT": false,
        "VLLM_ROCM_SLEEP_MEM_CHUNK_SIZE": 256,
        "VLLM_ROCM_USE_AITER": false,
        "VLLM_ROCM_USE_AITER_FP4BMM": true,
        "VLLM_ROCM_USE_AITER_FP4_ASM_GEMM": false,
        "VLLM_ROCM_USE_AITER_FP8BMM": true,
        "VLLM_ROCM_USE_AITER_FUSION_SHARED_EXPERTS": false,
        "VLLM_ROCM_USE_AITER_LINEAR": true,
        "VLLM_ROCM_USE_AITER_MHA": true,
        "VLLM_ROCM_USE_AITER_MLA": true,
        "VLLM_ROCM_USE_AITER_MOE": true,
        "VLLM_ROCM_USE_AITER_PAGED_ATTN": false,
        "VLLM_ROCM_USE_AITER_RMSNORM": true,
        "VLLM_ROCM_USE_AITER_TRITON_GEMM": true,
        "VLLM_ROCM_USE_AITER_TRITON_ROPE": false,
        "VLLM_ROCM_USE_AITER_UNIFIED_ATTENTION": false,
        "VLLM_ROCM_USE_SKINNY_GEMM": true,
        "VLLM_RPC_TIMEOUT": 10000,
        "VLLM_SHARED_EXPERTS_STREAM_TOKEN_THRESHOLD": 256,
        "VLLM_SKIP_P2P_CHECK": true,
        "VLLM_SKIP_PRECOMPILED_VERSION_SUFFIX": false,
        "VLLM_TARGET_DEVICE": "cuda",
        "VLLM_TEST_FORCE_FP8_MARLIN": false,
        "VLLM_TOOL_JSON_ERROR_AUTOMATIC_RETRY": false,
        "VLLM_TOOL_PARSE_REGEX_TIMEOUT_SECONDS": 1,
        "VLLM_TORCH_CUDA_PROFILE": null,
        "VLLM_TORCH_PROFILER_DIR": "",
        "VLLM_TORCH_PROFILER_DISABLE_ASYNC_LLM": null,
        "VLLM_TORCH_PROFILER_DUMP_CUDA_TIME_TOTAL": null,
        "VLLM_TORCH_PROFILER_RECORD_SHAPES": null,
        "VLLM_TORCH_PROFILER_USE_GZIP": null,
        "VLLM_TORCH_PROFILER_WITH_FLOPS": null,
        "VLLM_TORCH_PROFILER_WITH_PROFILE_MEMORY": null,
        "VLLM_TORCH_PROFILER_WITH_STACK": null,
        "VLLM_TPU_BUCKET_PADDING_GAP": 0,
        "VLLM_TPU_MOST_MODEL_LEN": null,
        "VLLM_TPU_USING_PATHWAYS": false,
        "VLLM_TRACE_FUNCTION": 0,
        "VLLM_USAGE_SOURCE": "production",
        "VLLM_USE_AOT_COMPILE": false,
        "VLLM_USE_BYTECODE_HOOK": true,
        "VLLM_USE_DEEP_GEMM": false,
        "VLLM_USE_DEEP_GEMM_E8M0": true,
        "VLLM_USE_DEEP_GEMM_TMA_ALIGNED_SCALES": true,
        "VLLM_USE_EXPERIMENTAL_PARSER_CONTEXT": false,
        "VLLM_USE_FBGEMM": false,
        "VLLM_USE_FLASHINFER_MOE_FP16": false,
        "VLLM_USE_FLASHINFER_MOE_FP4": false,
        "VLLM_USE_FLASHINFER_MOE_FP8": false,
        "VLLM_USE_FLASHINFER_MOE_MXFP4_BF16": false,
        "VLLM_USE_FLASHINFER_MOE_MXFP4_MXFP8": false,
        "VLLM_USE_FLASHINFER_MOE_MXFP4_MXFP8_CUTLASS": false,
        "VLLM_USE_FLASHINFER_SAMPLER": false,
        "VLLM_USE_FUSED_MOE_GROUPED_TOPK": true,
        "VLLM_USE_MEGA_AOT_ARTIFACT": false,
        "VLLM_USE_NCCL_SYMM_MEM": false,
        "VLLM_USE_NVFP4_CT_EMULATIONS": false,
        "VLLM_USE_PRECOMPILED": true,
        "VLLM_USE_RAY_COMPILED_DAG_CHANNEL_TYPE": "auto",
        "VLLM_USE_RAY_COMPILED_DAG_OVERLAP_COMM": false,
        "VLLM_USE_RAY_WRAPPED_PP_COMM": true,
        "VLLM_USE_STANDALONE_COMPILE": true,
        "VLLM_USE_TRITON_AWQ": false,
        "VLLM_USE_V2_MODEL_RUNNER": false,
        "VLLM_V1_USE_OUTLINES_CACHE": false,
        "VLLM_XGRAMMAR_CACHE_MB": 512,
        "VLLM_XLA_CACHE_PATH": "/home/ubuntu/.cache/vllm/xla_cache",
        "VLLM_XLA_CHECK_RECOMPILATION": false,
        "VLLM_XLA_USE_SPMD": false,
        "V_SCALE_CONSTANT": 100
    }
}

@mergify
Copy link

mergify bot commented Jan 27, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @vnadathur.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Jan 27, 2026
Signed-off-by: Srreyansh Sethi <107075589+WorldExplored@users.noreply.github.com>
@mergify mergify bot removed the needs-rebase label Jan 27, 2026
@mergify
Copy link

mergify bot commented Jan 31, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @vnadathur.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Jan 31, 2026
Signed-off-by: Srreyansh Sethi <107075589+WorldExplored@users.noreply.github.com>
@mergify mergify bot removed the needs-rebase label Jan 31, 2026
@mergify
Copy link

mergify bot commented Jan 31, 2026

Hi @vnadathur, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?
mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

Signed-off-by: WorldExplored <srreyansh.sethi@gmail.com>
@mergify
Copy link

mergify bot commented Feb 1, 2026

Hi @vnadathur, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?
mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

Signed-off-by: WorldExplored <srreyansh.sethi@gmail.com>
@mergify
Copy link

mergify bot commented Feb 2, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @vnadathur.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Feb 2, 2026
Signed-off-by: Srreyansh Sethi <107075589+WorldExplored@users.noreply.github.com>
@mergify mergify bot removed the needs-rebase label Feb 2, 2026
Signed-off-by: Srreyansh Sethi <107075589+WorldExplored@users.noreply.github.com>
@WorldExplored
Copy link
Contributor

WorldExplored commented Feb 3, 2026

@ProExpertProg

Vllm serve:

{"id":"cmpl-849c2077eae12cbd","object":"text_completion","created":1770111472,"model":"facebook/opt-125m","choices":[{"index":0,"text":" wide G24bs.... I need 10-12 swim parts. Seriously, no","logprobs":null,"finish_reason":"length","stop_reason":null,"token_ids":null,"prompt_logprobs":null,"prompt_token_ids":null}],"service_tier":null,"system_fingerprint":null,"usage":{"prompt_tokens":4,"total_tokens":20,"completion_tokens":16,"prompt_tokens_details":null},"kv_transfer_params":null}

Signed-off-by: Vikram Nadathur <glvikramn@gmail.com>
@mergify
Copy link

mergify bot commented Feb 3, 2026

Hi @vnadathur, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?
mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

Signed-off-by: WorldExplored <srreyansh.sethi@gmail.com>
@mergify
Copy link

mergify bot commented Feb 4, 2026

Hi @vnadathur, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?
mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

Signed-off-by: WorldExplored <srreyansh.sethi@gmail.com>
@mergify
Copy link

mergify bot commented Feb 5, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @vnadathur.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Feb 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation llama Related to Llama models needs-rebase v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants