[Kernel][Quantization][MoE] add marlin kernel support for turing (sm75) by jinzhen-lin · Pull Request #29901 · vllm-project/vllm

jinzhen-lin · 2025-12-02T16:55:01Z

This PR add marlin kernel support for turing (sm75) (e.g. 2080ti / T4).

The Turing architecture lacks the cp.async instruction, but we can still use synchronous instructions to read from global memory and write to shared memory.
The Turing architecture does not have the m16n8k16 MMA instruction, but it does have the m16n8k8 instruction. We only need to stack the instruction twice to achieve the same effect as m16n8k16.
The throughput of the MMA instruction on the Turing architecture is only half when using an FP32 accumulator compared to an FP16 accumulator (In fact, this is not only for Turing but also for architectures like Ada Lovelace). I also believe that if the final result will not overflow, the probability of the intermediate results overflowing is also very low. Therefore, I have temporarily changed the implementation on Turing to use an FP16 accumulator, while leaving the other architectures unchanged.
Supported Weights: GPTQ, AWQ, FP8, NVFP4 (MXFP4 is not supported since it requires BF16 activation)
Supported Activation: FP16, INT8 (only for GPTQ and AWQ)

Kernel Benchmark

2080ti + Dense Marlin + GPTQ Channelwise

   n     k     m    torch-fp16    marlin-w4a16    marlin-w4a8-int8
----  ----  ----  ------------  --------------  ------------------
1024  2048     1    0.0116937       0.00575462          0.00530092
1024  2048     2    0.0141614       0.00586464          0.00534524
1024  2048     4    0.0132746       0.00596898          0.00544628
1024  2048     8    0.0133064       0.00612971          0.0055609
1024  2048    16    0.013491        0.00690344          0.00596592
1024  2048    32    0.0136423       0.00926577          0.00758568
1024  2048    64    0.0158824       0.0149638           0.0111064
1024  2048   128    0.023759        0.0209582           0.0143538
1024  2048   256    0.0349035       0.0301925           0.0181232
1024  2048   512    0.0500498       0.044179            0.0295197
1024  2048  1024    0.0899547       0.077805            0.0546717
1024  2048  2048    0.157824        0.14364             0.099661
1024  2048  4096    0.313458        0.269902            0.183839
2048  1024     1    0.00999344      0.00581668          0.00541013
2048  1024     2    0.00840535      0.00590156          0.00544462
2048  1024     4    0.00863134      0.00599312          0.0055447
2048  1024     8    0.0113632       0.00618215          0.00561357
2048  1024    16    0.0115773       0.00673722          0.0057703
2048  1024    32    0.0136734       0.00927547          0.00743136
2048  1024    64    0.0149048       0.0150578           0.0111013
2048  1024   128    0.0169164       0.0176558           0.0113795
2048  1024   256    0.0266904       0.0271966           0.0202067
2048  1024   512    0.0745618       0.0369416           0.0243453
2048  1024  1024    0.0841283       0.082084            0.0573731
2048  1024  2048    0.163343        0.1494              0.103495
2048  1024  4096    0.324556        0.282609            0.196996
2048  2048     1    0.0176079       0.00765261          0.00705818
2048  2048     2    0.0225935       0.00773011          0.00711238
2048  2048     4    0.0230041       0.00791787          0.00727277
2048  2048     8    0.0236289       0.00824997          0.00741823
2048  2048    16    0.0245314       0.00918505          0.00762321
2048  2048    32    0.0350614       0.0127286           0.00993129
2048  2048    64    0.0352187       0.0214241           0.0145516
2048  2048   128    0.0361422       0.0293447           0.0186189
2048  2048   256    0.0483972       0.0441569           0.0305642
2048  2048   512    0.144697        0.0781706           0.0554056
2048  2048  1024    0.162762        0.142649            0.099364
2048  2048  2048    0.31577         0.268887            0.182841
2048  2048  4096    0.626884        0.521153            0.349923
2048  4096     1    0.0321256       0.0115125           0.0106321
2048  4096     2    0.0370204       0.0115893           0.0106488
2048  4096     4    0.0372456       0.0118732           0.0108092
2048  4096     8    0.0377987       0.0122913           0.0110519
2048  4096    16    0.0389271       0.0140648           0.0114702
2048  4096    32    0.0470796       0.0197545           0.0145479
2048  4096    64    0.048178        0.0340334           0.0218937
2048  4096   128    0.0619403       0.0586362           0.0369304
2048  4096   256    0.0891697       0.077394            0.0550759
2048  4096   512    0.164789        0.142405            0.0994122
2048  4096  1024    0.319634        0.261681            0.177465
2048  4096  2048    0.622543        0.510484            0.343938
2048  4096  4096    1.23747         0.99095             0.667521
4096  2048     1    0.0319543       0.0097417           0.00963335
4096  2048     2    0.0408578       0.00977983          0.00962789
4096  2048     4    0.0415483       0.00994312          0.0096959
4096  2048     8    0.0427313       0.0102296           0.00988781
4096  2048    16    0.0444555       0.0122868           0.010294
4096  2048    32    0.0423349       0.0184839           0.0131554
4096  2048    64    0.0434447       0.0300758           0.0191552
4096  2048   128    0.0754317       0.0453082           0.0334332
4096  2048   256    0.0863373       0.0801641           0.0581882
4096  2048   512    0.161727        0.14526             0.101503
4096  2048  1024    0.32583         0.273086            0.186877
4096  2048  2048    0.63812         0.525983            0.357648
4096  2048  4096    1.27389         1.04019             0.703852
4096  4096     1    0.0612238       0.020549            0.0211881
4096  4096     2    0.0695407       0.0206079           0.0212637
4096  4096     4    0.070176        0.0206765           0.02131
4096  4096     8    0.071402        0.0208103           0.0212639
4096  4096    16    0.0730782       0.0230811           0.0217029
4096  4096    32    0.0719593       0.0376631           0.0293375
4096  4096    64    0.0736471       0.0592898           0.0393748
4096  4096   128    0.147562        0.0775972           0.0557396
4096  4096   256    0.168878        0.146612            0.107362
4096  4096   512    0.317432        0.262749            0.181462
4096  4096  1024    0.642687        0.510883            0.345919
4096  4096  2048    1.2584          1.00235             0.676912
4096  4096  4096    2.51882         1.99368             1.3387
2048  8192     1    0.16137         0.026266            0.0245548
2048  8192     2    0.0662309       0.0263196           0.024618
2048  8192     4    0.0665458       0.0265283           0.0248257
2048  8192     8    0.0671225       0.0270891           0.0250747
2048  8192    16    0.0683316       0.0298897           0.0256567
2048  8192    32    0.0769621       0.0401793           0.0310113
2048  8192    64    0.0788945       0.0649515           0.0442475
2048  8192   128    0.113392        0.10748             0.0693615
2048  8192   256    0.173364        0.159826            0.10942
2048  8192   512    0.323973        0.299647            0.200056
2048  8192  1024    0.64264         0.53803             0.361474
2048  8192  2048    1.25534         1.02195             0.68934
2048  8192  4096    2.50339         1.96898             1.31185
8192  2048     1    0.0633854       0.0198738           0.0192932
8192  2048     2    0.0635814       0.0199413           0.0193456
8192  2048     4    0.0637044       0.0200488           0.0194411
8192  2048     8    0.0638777       0.0203066           0.0196915
8192  2048    16    0.0642607       0.0218322           0.0200444
8192  2048    32    0.0689704       0.0307074           0.0273373
8192  2048    64    0.0709058       0.0468018           0.0360965
8192  2048   128    0.0898186       0.0811262           0.058757
8192  2048   256    0.166978        0.149034            0.105919
8192  2048   512    0.326734        0.277388            0.192024
8192  2048  1024    0.668465        0.536812            0.368867
8192  2048  2048    1.29818         1.07355             0.738619
8192  2048  4096    2.58765         2.14702             1.4801

2080ti + Dense Marlin + GPTQ Group 128 (Comparing with gptq exllama v2)

   n     k     m    torch-fp16    marlin-w4a16    marlin-w4a8    gptq-w4a16
----  ----  ----  ------------  --------------  -------------  ------------
1024  2048     1    0.0116816       0.00570508     0.00533296    0.00970128
1024  2048     2    0.0132596       0.00575533     0.00540595    0.0116711
1024  2048     4    0.0133097       0.0058858      0.00548693    0.0131015
1024  2048     8    0.0133379       0.00608175     0.00560841    0.0159934
1024  2048    16    0.013497        0.00691286     0.00574237    0.0157302
1024  2048    32    0.0136667       0.00945722     0.00755328    0.0191417
1024  2048    64    0.0157954       0.0152038      0.0113518     0.0272986
1024  2048   128    0.0237185       0.0217476      0.0147955     0.0302778
1024  2048   256    0.0348438       0.031878       0.0191553     0.0388242
1024  2048   512    0.0501229       0.0467516      0.0314875     0.0455436
1024  2048  1024    0.0893146       0.0759608      0.0473371     0.0651221
1024  2048  2048    0.157217        0.154767       0.106439      0.0989046
1024  2048  4096    0.312804        0.293173       0.197971      0.186495
2048  1024     1    0.00985493      0.00573698     0.00541377    0.00782123
2048  1024     2    0.00842491      0.00578576     0.00542078    0.00920246
2048  1024     4    0.00851974      0.00589026     0.00553511    0.0109139
2048  1024     8    0.0112825       0.0060466      0.00562143    0.0152441
2048  1024    16    0.0114888       0.00721432     0.00587448    0.0151153
2048  1024    32    0.0136017       0.00961641     0.00754939    0.0177038
2048  1024    64    0.0147832       0.0150962      0.0112153     0.0277373
2048  1024   128    0.0167879       0.0190002      0.0116672     0.0316199
2048  1024   256    0.026206        0.028231       0.0210049     0.0349763
2048  1024   512    0.0744576       0.0393536      0.0260636     0.0419609
2048  1024  1024    0.0842302       0.0803164      0.0531366     0.0617511
2048  1024  2048    0.163123        0.160705       0.109549      0.109276
2048  1024  4096    0.32423         0.304405       0.206174      0.199692
2048  2048     1    0.0174095       0.00775969     0.00708167    0.00971785
2048  2048     2    0.0227312       0.00781531     0.00719867    0.0101431
2048  2048     4    0.0229961       0.00796007     0.00726145    0.0117646
2048  2048     8    0.023567        0.00818787     0.00740354    0.0155181
2048  2048    16    0.0246176       0.00974645     0.00765642    0.0180157
2048  2048    32    0.0350608       0.013237       0.00997827    0.0276656
2048  2048    64    0.0352493       0.0213044      0.014849      0.052319
2048  2048   128    0.0360358       0.0323526      0.0195279     0.0557073
2048  2048   256    0.0481593       0.046594       0.0322562     0.0601379
2048  2048   512    0.143434        0.0751223      0.0475045     0.079178
2048  2048  1024    0.161866        0.154594       0.105879      0.112547
2048  2048  2048    0.315543        0.291289       0.196512      0.203909
2048  2048  4096    0.627071        0.564269       0.377404      0.391834
2048  4096     1    0.031872        0.01185        0.0107379     0.0149717
2048  4096     2    0.0371513       0.0119226      0.0107711     0.0155135
2048  4096     4    0.0373817       0.0121659      0.010904      0.0169075
2048  4096     8    0.0378928       0.0126617      0.0112145     0.0199639
2048  4096    16    0.0391324       0.0154461      0.0115339     0.0282768
2048  4096    32    0.0469785       0.0210041      0.0150017     0.0465491
2048  4096    64    0.0480248       0.0362089      0.0224538     0.0923858
2048  4096   128    0.0616701       0.0613175      0.038242      0.096068
2048  4096   256    0.0887389       0.0819624      0.0568162     0.1084
2048  4096   512    0.16388         0.154138       0.106999      0.165708
2048  4096  1024    0.318878        0.294634       0.199113      0.227279
2048  4096  2048    0.622141        0.559438       0.372877      0.404011
2048  4096  4096    1.2351          1.08419        0.722203      0.780257
4096  2048     1    0.0319777       0.0100751      0.00994842    0.0151799
4096  2048     2    0.0406847       0.0101028      0.00998851    0.0152718
4096  2048     4    0.0413119       0.0103089      0.0100781     0.0164548
4096  2048     8    0.042485        0.0105031      0.0101524     0.0195448
4096  2048    16    0.0442025       0.0133239      0.0104083     0.0282614
4096  2048    32    0.0421578       0.0196706      0.0135363     0.0459276
4096  2048    64    0.0432241       0.0320334      0.0197477     0.0927346
4096  2048   128    0.0757402       0.0474964      0.0349818     0.094288
4096  2048   256    0.0862882       0.0756996      0.0508669     0.105061
4096  2048   512    0.161067        0.15456        0.107091      0.138844
4096  2048  1024    0.325535        0.295623       0.200372      0.225714
4096  2048  2048    0.636461        0.572775       0.384564      0.412503
4096  2048  4096    1.27269         1.12714        0.753262      0.811694
4096  4096     1    0.0612122       0.021195       0.0214473     0.0256669
4096  4096     2    0.0694996       0.0212084      0.0215258     0.0256019
4096  4096     4    0.0701524       0.0213122      0.0216577     0.0273442
4096  4096     8    0.0711881       0.0216465      0.0217386     0.0306666
4096  4096    16    0.0730575       0.0251507      0.0219591     0.047634
4096  4096    32    0.0718163       0.0398475      0.0301507     0.0824537
4096  4096    64    0.0736373       0.0634567      0.0411969     0.186985
4096  4096   128    0.146896        0.0825977      0.0579095     0.175266
4096  4096   256    0.168192        0.154703       0.108524      0.1938
4096  4096   512    0.317717        0.291529       0.198829      0.276054
4096  4096  1024    0.6416          0.559374       0.374179      0.45576
4096  4096  2048    1.25562         1.09678        0.724383      0.825946
4096  4096  4096    2.50862         2.179          1.44283       1.60869
2048  8192     1    0.159598        0.0265268      0.0248393     0.0261178
2048  8192     2    0.0661538       0.026585       0.024971      0.0274813
2048  8192     4    0.0664468       0.0267863      0.0251455     0.0302112
2048  8192     8    0.0669536       0.0273711      0.0252716     0.0330851
2048  8192    16    0.0681342       0.032093       0.0257848     0.0479004
2048  8192    32    0.0767194       0.0422321      0.0317477     0.0829447
2048  8192    64    0.0786887       0.0685708      0.0456932     0.174856
2048  8192   128    0.113365        0.115649       0.0728853     0.177812
2048  8192   256    0.172515        0.155269       0.106864      0.207837
2048  8192   512    0.322201        0.302246       0.203392      0.27221
2048  8192  1024    0.641008        0.562538       0.375918      0.443765
2048  8192  2048    1.25337         1.1099         0.742508      0.838518
2048  8192  4096    2.50231         2.15649        1.4186        1.60569
8192  2048     1    0.0625685       0.0197703      0.0196989     0.0259901
8192  2048     2    0.063562        0.0199219      0.0197979     0.0261364
8192  2048     4    0.0636504       0.0200691      0.0198907     0.027649
8192  2048     8    0.0638215       0.0204386      0.0201919     0.0308678
8192  2048    16    0.0642298       0.0237775      0.0204666     0.0478515
8192  2048    32    0.0687856       0.0330392      0.0284924     0.083755
8192  2048    64    0.0707605       0.049206       0.0369213     0.17064
8192  2048   128    0.0888048       0.0770049      0.0522759     0.188366
8192  2048   256    0.165606        0.158083       0.111405      0.191101
8192  2048   512    0.325458        0.299126       0.205144      0.27808
8192  2048  1024    0.66475         0.579661       0.391968      0.46474
8192  2048  2048    1.28975         1.15806        0.784916      0.853704
8192  2048  4096    2.56381         2.31955        1.56935       1.64457

Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com>

gemini-code-assist

Code Review

This pull request adds support for the Turing architecture (sm75) to the Marlin kernels, including both dense and MoE variants. The changes involve adding architecture-specific compilation paths in CMake, providing synchronous implementations for cp_async on older architectures, and using m16n8k8 MMA instructions to emulate m16n8k16. The changes look mostly correct and well-structured. However, I've found a few critical issues: a likely debugging leftover in a preprocessor directive that would cause performance regressions on newer GPUs, and the removal of static_asserts that could hide potential shared memory corruption bugs. There is also a minor correctness issue in a CMake file. Please address these points.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

csrc/quantization/gptq_marlin/marlin_mma.h

vllm/model_executor/layers/quantization/mxfp4.py

Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com>

robertgshaw2-redhat · 2025-12-04T15:35:59Z

@mgoin - could you take a loot at this

mgoin

This looks really solid. It seems the added complexity isn't much, just the emulation and fp16_accum. Am I correct that it supports all weight types?

CMakeLists.txt

csrc/moe/marlin_moe_wna16/generate_kernels.py

jinzhen-lin · 2025-12-09T03:19:53Z

This looks really solid. It seems the added complexity isn't much, just the emulation and fp16_accum. Am I correct that it supports all weight types?

It support all weight types except MXFP4, which requires BF16 activation but Turing doesn't support BF16.

But for most practical weights, the value range of E8M0 scales should be within E5M0. So we can also make it support FP16 and do some check when loading weight.

Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com>

…in-turing

jinzhen-lin · 2025-12-12T05:42:02Z

@mgoin I have added MXFP4 x FP16 support (and added necessray check). If you think this support is inappropriate, I can revert it.

mergify · 2025-12-12T05:43:20Z

Hi @jinzhen-lin, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com>

mgoin · 2025-12-15T22:55:58Z

@jinzhen-lin Personally I think supporting MXFP4 x FP16 is too confusing, especially since MXFP4 is still hardcoded for GPT-OSS at the moment with BF16 weights for the other layers. If you could remove it I would appreciate it. It is impressive you were able to support all the other formats though!

mgoin · 2025-12-15T22:59:31Z

Could you show a benchmark comparing the original GPTQ to this Marlin gemm on turing? I'm curious if there is a large speedup. Also does this potentially mean we can remove vllm/model_executor/layers/quantization/moe_wna16.py in a subsequent PR?

jinzhen-lin · 2025-12-16T03:02:51Z

Could you show a benchmark comparing the original GPTQ to this Marlin gemm on turing? I'm curious if there is a large speedup. Also does this potentially mean we can remove vllm/model_executor/layers/quantization/moe_wna16.py in a subsequent PR?

Added. The improvement in small to medium batch sizes is significant. However, since the original gptq kernel use dequant + cublas for m > 8, the marlin is slower than it when batchsize is large.

This reverts commit dada848. Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com>

…in-turing

jinzhen-lin · 2025-12-16T03:06:15Z

@jinzhen-lin Personally I think supporting MXFP4 x FP16 is too confusing, especially since MXFP4 is still hardcoded for GPT-OSS at the moment with BF16 weights for the other layers. If you could remove it I would appreciate it. It is impressive you were able to support all the other formats though!

Reverted. Thank you for your suggestions!

mgoin · 2025-12-16T22:31:46Z

vllm/model_executor/layers/quantization/modelopt.py

Don't we need to update other places as well? Such as vllm/model_executor/layers/quantization/kernels/mixed_precision/marlin.py, vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w8a16_fp8.py, vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w4a16_nvfp4.py, maybe some ct moe methods I'm not sure

mgoin · 2025-12-16T22:35:17Z

I'm going to merge this PR for now since the tests looks good (failures are known), so please cover the capability updates in a follow up PR. Thanks!

…5) (vllm-project#29901) Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>

mokieli · 2025-12-18T06:25:46Z

Thank you for your contribution, but it still fails to run cpatonn/Qwen3-Next-80B-A3B-Thinking-AWQ-4bit. Before merging this PR, the error was that the Marlin kernel was missing, but now a new error has emerged.

Device: Tesla T10 x4

logs:

mk@alex:~$ docker run --rm \
  --gpus all \
  --memory 28g \
  --memory-swap 28g \
  --shm-size 28g \
  -p 8081:8080 \
  -v /home/mk/model:/model \
  --ipc=host \
  --name vllm-test \
  --env VLLM_SLEEP_WHEN_IDLE=1 \
  --env VLLM_USE_FLASHINFER_SAMPLER=1 \
  --env OMP_NUM_THREADS=2 \
  vllm/vllm-openai:nightly-d4d2751732c3ccae162a5a0160c7d4fe05d2779a \
  /model/Qwen3-Next-80B-A3B-Thinking-AWQ-4bit \
  --served-model-name model \
  --quantization compressed-tensors \
  --dtype float16 \
  --gpu-memory-utilization 0.90 \
  --max-model-len 10240 \
  --max-num-seqs 8 \
  --max-num-batched-tokens 2048 \
  --tensor-parallel-size 4 \
  --async-scheduling \
  --enable-prefix-caching \
  --disable-custom-all-reduce \
  --attention-config.backend FLASHINFER \
  --host 0.0.0.0 \
  --port 8080
(APIServer pid=1) INFO 12-17 21:14:29 [api_server.py:1351] vLLM API server version 0.13.0rc2.dev211+gd4d275173
(APIServer pid=1) INFO 12-17 21:14:29 [utils.py:253] non-default args: {'model_tag': '/model/Qwen3-Next-80B-A3B-Thinking-AWQ-4bit', 'host': '0.0.0.0', 'port': 8080, 'model': '/model/Qwen3-Next-80B-A3B-Thinking-AWQ-4bit', 'dtype': 'float16', 'max_model_len': 10240, 'quantization': 'compressed-tensors', 'served_model_name': ['model'], 'tensor_parallel_size': 4, 'disable_custom_all_reduce': True, 'enable_prefix_caching': True, 'max_num_batched_tokens': 2048, 'max_num_seqs': 8, 'async_scheduling': True, 'attention_config': AttentionConfig(backend=<AttentionBackendEnum.FLASHINFER: 'vllm.v1.attention.backends.flashinfer.FlashInferBackend'>, flash_attn_version=None, use_prefill_decode_attention=False, flash_attn_max_num_splits_for_cuda_graph=32, use_cudnn_prefill=False, use_trtllm_ragged_deepseek_prefill=False, use_trtllm_attention=None, disable_flashinfer_prefill=False, disable_flashinfer_q_quantization=False)}
(APIServer pid=1) INFO 12-17 21:14:37 [model.py:514] Resolved architecture: Qwen3NextForCausalLM
(APIServer pid=1) WARNING 12-17 21:14:37 [model.py:2005] Casting torch.bfloat16 to torch.float16.
(APIServer pid=1) INFO 12-17 21:14:37 [model.py:1661] Using max model len 10240
(APIServer pid=1) INFO 12-17 21:14:38 [scheduler.py:230] Chunked prefill is enabled with max_num_batched_tokens=2048.
(APIServer pid=1) INFO 12-17 21:14:38 [config.py:302] Hybrid or mamba-based model detected without support for prefix caching: disabling.
(APIServer pid=1) INFO 12-17 21:14:38 [config.py:312] Disabling cascade attention since it is not supported for hybrid models.
(APIServer pid=1) INFO 12-17 21:14:38 [config.py:439] Setting attention block size to 272 tokens to ensure that attention page size is >= mamba page size.
(APIServer pid=1) INFO 12-17 21:14:38 [config.py:463] Padding mamba page size by 1.49% to ensure that mamba page size and attention page size are exactly equal.
(APIServer pid=1) WARNING 12-17 21:14:38 [cache.py:232] Possibly too large swap space. 16.00 GiB out of the 31.10 GiB total CPU memory is allocated for the swap space.
(APIServer pid=1) INFO 12-17 21:14:38 [vllm.py:598] Disabling NCCL for DP synchronization when using async scheduling.
(APIServer pid=1) The tokenizer you are loading from '/model/Qwen3-Next-80B-A3B-Thinking-AWQ-4bit' with an incorrect regex pattern: https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503/discussions/84#69121093e8b480e709447d5e. This will lead to incorrect tokenization. You should set the `fix_mistral_regex=True` flag when loading this tokenizer to fix this issue.
(EngineCore_DP0 pid=31) INFO 12-17 21:14:45 [core.py:93] Initializing a V1 LLM engine (v0.13.0rc2.dev211+gd4d275173) with config: model='/model/Qwen3-Next-80B-A3B-Thinking-AWQ-4bit', speculative_config=None, tokenizer='/model/Qwen3-Next-80B-A3B-Thinking-AWQ-4bit', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=10240, download_dir=None, load_format=auto, tensor_parallel_size=4, pipeline_parallel_size=1, data_parallel_size=1, disable_custom_all_reduce=True, quantization=compressed-tensors, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser='', reasoning_parser_plugin='', enable_in_reasoning=False), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None, kv_cache_metrics=False, kv_cache_metrics_sample=0.01, cudagraph_metrics=False, enable_layerwise_nvtx_tracing=False), seed=0, served_model_name=model, enable_prefix_caching=False, enable_chunked_prefill=True, pooler_config=None, compilation_config={'level': None, 'mode': <CompilationMode.VLLM_COMPILE: 3>, 'debug_dump_path': None, 'cache_dir': '', 'compile_cache_save_format': 'binary', 'backend': 'inductor', 'custom_ops': ['none'], 'splitting_ops': ['vllm::unified_attention', 'vllm::unified_attention_with_output', 'vllm::unified_mla_attention', 'vllm::unified_mla_attention_with_output', 'vllm::mamba_mixer2', 'vllm::mamba_mixer', 'vllm::short_conv', 'vllm::linear_attention', 'vllm::plamo2_mamba_mixer', 'vllm::gdn_attention_core', 'vllm::kda_attention', 'vllm::sparse_attn_indexer'], 'compile_mm_encoder': False, 'compile_sizes': [], 'compile_ranges_split_points': [2048], 'inductor_compile_config': {'enable_auto_functionalized_v2': False, 'combo_kernels': True, 'benchmark_combo_kernel': True}, 'inductor_passes': {}, 'cudagraph_mode': <CUDAGraphMode.FULL_AND_PIECEWISE: (2, 1)>, 'cudagraph_num_of_warmups': 1, 'cudagraph_capture_sizes': [1, 2, 4, 8, 16], 'cudagraph_copy_inputs': False, 'cudagraph_specialize_lora': True, 'use_inductor_graph_partition': False, 'pass_config': {'fuse_norm_quant': False, 'fuse_act_quant': False, 'fuse_attn_quant': False, 'eliminate_noops': True, 'enable_sp': False, 'fuse_gemm_comms': False, 'fuse_allreduce_rms': False}, 'max_cudagraph_capture_size': 16, 'dynamic_shapes_config': {'type': <DynamicShapesType.BACKED: 'backed'>, 'evaluate_guards': False}, 'local_cache_dir': None}
INFO 12-17 21:14:53 [parallel_state.py:1210] world_size=4 rank=2 local_rank=2 distributed_init_method=tcp://127.0.0.1:57845 backend=nccl
INFO 12-17 21:14:53 [parallel_state.py:1210] world_size=4 rank=1 local_rank=1 distributed_init_method=tcp://127.0.0.1:57845 backend=nccl
INFO 12-17 21:14:53 [parallel_state.py:1210] world_size=4 rank=0 local_rank=0 distributed_init_method=tcp://127.0.0.1:57845 backend=nccl
INFO 12-17 21:14:53 [parallel_state.py:1210] world_size=4 rank=3 local_rank=3 distributed_init_method=tcp://127.0.0.1:57845 backend=nccl
INFO 12-17 21:14:53 [pynccl.py:111] vLLM is using nccl==2.27.5
WARNING 12-17 21:14:54 [symm_mem.py:67] SymmMemCommunicator: Device capability 7.5 not supported, communicator is not available.
WARNING 12-17 21:14:54 [symm_mem.py:67] SymmMemCommunicator: Device capability 7.5 not supported, communicator is not available.
WARNING 12-17 21:14:54 [symm_mem.py:67] SymmMemCommunicator: Device capability 7.5 not supported, communicator is not available.
WARNING 12-17 21:14:54 [symm_mem.py:67] SymmMemCommunicator: Device capability 7.5 not supported, communicator is not available.
INFO 12-17 21:14:54 [parallel_state.py:1418] rank 0 in world size 4 is assigned as DP rank 0, PP rank 0, PCP rank 0, TP rank 0, EP rank 0
INFO 12-17 21:14:54 [parallel_state.py:1418] rank 2 in world size 4 is assigned as DP rank 0, PP rank 0, PCP rank 0, TP rank 2, EP rank 2
INFO 12-17 21:14:54 [parallel_state.py:1418] rank 3 in world size 4 is assigned as DP rank 0, PP rank 0, PCP rank 0, TP rank 3, EP rank 3
INFO 12-17 21:14:54 [parallel_state.py:1418] rank 1 in world size 4 is assigned as DP rank 0, PP rank 0, PCP rank 0, TP rank 1, EP rank 1
ERROR 12-17 21:14:54 [fa_utils.py:73] Cannot use FA version 2 is not supported due to FA2 is only supported on devices with compute capability >= 8
ERROR 12-17 21:14:54 [fa_utils.py:73] Cannot use FA version 2 is not supported due to FA2 is only supported on devices with compute capability >= 8
ERROR 12-17 21:14:54 [fa_utils.py:73] Cannot use FA version 2 is not supported due to FA2 is only supported on devices with compute capability >= 8
ERROR 12-17 21:14:54 [fa_utils.py:73] Cannot use FA version 2 is not supported due to FA2 is only supported on devices with compute capability >= 8
INFO 12-17 21:15:15 [topk_topp_sampler.py:47] Using FlashInfer for top-p & top-k sampling.
(Worker_TP0 pid=41) INFO 12-17 21:15:15 [gpu_model_runner.py:3591] Starting to load model /model/Qwen3-Next-80B-A3B-Thinking-AWQ-4bit...
(Worker_TP1 pid=42) WARNING 12-17 21:15:15 [compressed_tensors.py:742] Acceleration for non-quantized schemes is not supported by Compressed Tensors. Falling back to UnquantizedLinearMethod
(Worker_TP2 pid=43) WARNING 12-17 21:15:15 [compressed_tensors.py:742] Acceleration for non-quantized schemes is not supported by Compressed Tensors. Falling back to UnquantizedLinearMethod
(Worker_TP3 pid=44) WARNING 12-17 21:15:15 [compressed_tensors.py:742] Acceleration for non-quantized schemes is not supported by Compressed Tensors. Falling back to UnquantizedLinearMethod
(Worker_TP0 pid=41) WARNING 12-17 21:15:15 [compressed_tensors.py:742] Acceleration for non-quantized schemes is not supported by Compressed Tensors. Falling back to UnquantizedLinearMethod
(Worker_TP0 pid=41) INFO 12-17 21:15:15 [layer.py:373] Enabled separate cuda stream for MoE shared_experts
(Worker_TP1 pid=42) INFO 12-17 21:15:15 [compressed_tensors_moe.py:193] Using CompressedTensorsWNA16MarlinMoEMethod
(Worker_TP2 pid=43) INFO 12-17 21:15:15 [compressed_tensors_moe.py:193] Using CompressedTensorsWNA16MarlinMoEMethod
(Worker_TP3 pid=44) INFO 12-17 21:15:15 [compressed_tensors_moe.py:193] Using CompressedTensorsWNA16MarlinMoEMethod
(Worker_TP0 pid=41) INFO 12-17 21:15:15 [compressed_tensors_moe.py:193] Using CompressedTensorsWNA16MarlinMoEMethod
(Worker_TP3 pid=44) INFO 12-17 21:15:15 [cuda.py:315] Using AttentionBackendEnum.FLASHINFER backend.
(Worker_TP1 pid=42) INFO 12-17 21:15:15 [cuda.py:315] Using AttentionBackendEnum.FLASHINFER backend.
(Worker_TP0 pid=41) INFO 12-17 21:15:15 [cuda.py:315] Using AttentionBackendEnum.FLASHINFER backend.
(Worker_TP2 pid=43) INFO 12-17 21:15:15 [cuda.py:315] Using AttentionBackendEnum.FLASHINFER backend.
Loading safetensors checkpoint shards:   0% Completed | 0/10 [00:00<?, ?it/s]
Loading safetensors checkpoint shards:  10% Completed | 1/10 [00:12<01:55, 12.83s/it]
Loading safetensors checkpoint shards:  20% Completed | 2/10 [00:24<01:36, 12.09s/it]
Loading safetensors checkpoint shards:  30% Completed | 3/10 [00:35<01:22, 11.72s/it]
Loading safetensors checkpoint shards:  40% Completed | 4/10 [00:36<00:44,  7.36s/it]
Loading safetensors checkpoint shards:  50% Completed | 5/10 [00:47<00:44,  8.87s/it]
Loading safetensors checkpoint shards:  60% Completed | 6/10 [00:59<00:39,  9.77s/it]
Loading safetensors checkpoint shards:  70% Completed | 7/10 [01:11<00:31, 10.39s/it]
Loading safetensors checkpoint shards:  80% Completed | 8/10 [01:23<00:21, 10.95s/it]
Loading safetensors checkpoint shards:  90% Completed | 9/10 [01:33<00:10, 10.84s/it]
Loading safetensors checkpoint shards: 100% Completed | 10/10 [01:45<00:00, 11.16s/it]
Loading safetensors checkpoint shards: 100% Completed | 10/10 [01:45<00:00, 10.57s/it]
(Worker_TP0 pid=41)
(Worker_TP0 pid=41) INFO 12-17 21:17:02 [default_loader.py:308] Loading weights took 105.85 seconds
(Worker_TP0 pid=41) INFO 12-17 21:17:06 [gpu_model_runner.py:3688] Model loading took 11.3403 GiB memory and 110.762369 seconds
(Worker_TP0 pid=41) INFO 12-17 21:17:18 [backends.py:643] Using cache directory: /root/.cache/vllm/torch_compile_cache/e87c25837a/rank_0_0/backbone for vLLM's torch.compile
(Worker_TP0 pid=41) INFO 12-17 21:17:18 [backends.py:703] Dynamo bytecode transform time: 10.66 s
(Worker_TP1 pid=42) [rank1]:W1217 21:17:25.153000 42 torch/_inductor/utils.py:1558] Not enough SMs to use max_autotune_gemm mode
(Worker_TP0 pid=41) [rank0]:W1217 21:17:25.154000 41 torch/_inductor/utils.py:1558] Not enough SMs to use max_autotune_gemm mode
(Worker_TP3 pid=44) [rank3]:W1217 21:17:25.215000 44 torch/_inductor/utils.py:1558] Not enough SMs to use max_autotune_gemm mode
(Worker_TP2 pid=43) [rank2]:W1217 21:17:25.222000 43 torch/_inductor/utils.py:1558] Not enough SMs to use max_autotune_gemm mode
(Worker_TP0 pid=41) INFO 12-17 21:17:28 [backends.py:261] Cache the graph of compile range (1, 2048) for later use
(Worker_TP2 pid=43) INFO 12-17 21:17:28 [backends.py:261] Cache the graph of compile range (1, 2048) for later use
(Worker_TP1 pid=42) INFO 12-17 21:17:28 [backends.py:261] Cache the graph of compile range (1, 2048) for later use
(Worker_TP3 pid=44) INFO 12-17 21:17:28 [backends.py:261] Cache the graph of compile range (1, 2048) for later use
(Worker_TP3 pid=44) /usr/local/lib/python3.12/dist-packages/torch/backends/cuda/__init__.py:131: UserWarning: Please use the new API settings to control TF32 behavior, such as torch.backends.cudnn.conv.fp32_precision = 'tf32' or torch.backends.cuda.matmul.fp32_precision = 'ieee'. Old settings, e.g, torch.backends.cuda.matmul.allow_tf32 = True, torch.backends.cudnn.allow_tf32 = True, allowTF32CuDNN() and allowTF32CuBLAS() will be deprecated after Pytorch 2.9. Please see https://pytorch.org/docs/main/notes/cuda.html#tensorfloat-32-tf32-on-ampere-and-later-devices (Triggered internally at /pytorch/aten/src/ATen/Context.cpp:80.)
(Worker_TP3 pid=44)   return torch._C._get_cublas_allow_tf32()
(Worker_TP0 pid=41) /usr/local/lib/python3.12/dist-packages/torch/backends/cuda/__init__.py:131: UserWarning: Please use the new API settings to control TF32 behavior, such as torch.backends.cudnn.conv.fp32_precision = 'tf32' or torch.backends.cuda.matmul.fp32_precision = 'ieee'. Old settings, e.g, torch.backends.cuda.matmul.allow_tf32 = True, torch.backends.cudnn.allow_tf32 = True, allowTF32CuDNN() and allowTF32CuBLAS() will be deprecated after Pytorch 2.9. Please see https://pytorch.org/docs/main/notes/cuda.html#tensorfloat-32-tf32-on-ampere-and-later-devices (Triggered internally at /pytorch/aten/src/ATen/Context.cpp:80.)
(Worker_TP0 pid=41)   return torch._C._get_cublas_allow_tf32()
(Worker_TP1 pid=42) /usr/local/lib/python3.12/dist-packages/torch/backends/cuda/__init__.py:131: UserWarning: Please use the new API settings to control TF32 behavior, such as torch.backends.cudnn.conv.fp32_precision = 'tf32' or torch.backends.cuda.matmul.fp32_precision = 'ieee'. Old settings, e.g, torch.backends.cuda.matmul.allow_tf32 = True, torch.backends.cudnn.allow_tf32 = True, allowTF32CuDNN() and allowTF32CuBLAS() will be deprecated after Pytorch 2.9. Please see https://pytorch.org/docs/main/notes/cuda.html#tensorfloat-32-tf32-on-ampere-and-later-devices (Triggered internally at /pytorch/aten/src/ATen/Context.cpp:80.)
(Worker_TP1 pid=42)   return torch._C._get_cublas_allow_tf32()
(Worker_TP2 pid=43) /usr/local/lib/python3.12/dist-packages/torch/backends/cuda/__init__.py:131: UserWarning: Please use the new API settings to control TF32 behavior, such as torch.backends.cudnn.conv.fp32_precision = 'tf32' or torch.backends.cuda.matmul.fp32_precision = 'ieee'. Old settings, e.g, torch.backends.cuda.matmul.allow_tf32 = True, torch.backends.cudnn.allow_tf32 = True, allowTF32CuDNN() and allowTF32CuBLAS() will be deprecated after Pytorch 2.9. Please see https://pytorch.org/docs/main/notes/cuda.html#tensorfloat-32-tf32-on-ampere-and-later-devices (Triggered internally at /pytorch/aten/src/ATen/Context.cpp:80.)
(Worker_TP2 pid=43)   return torch._C._get_cublas_allow_tf32()
(Worker_TP0 pid=41) /usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py:2772: UserWarning: Tesla T10 does not support bfloat16 compilation natively, skipping
(Worker_TP0 pid=41)   warnings.warn(
(Worker_TP1 pid=42) /usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py:2772: UserWarning: Tesla T10 does not support bfloat16 compilation natively, skipping
(Worker_TP1 pid=42)   warnings.warn(
(Worker_TP2 pid=43) /usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py:2772: UserWarning: Tesla T10 does not support bfloat16 compilation natively, skipping
(Worker_TP2 pid=43)   warnings.warn(
(Worker_TP3 pid=44) /usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py:2772: UserWarning: Tesla T10 does not support bfloat16 compilation natively, skipping
(Worker_TP3 pid=44)   warnings.warn(
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824] WorkerProc hit an exception.
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824] Traceback (most recent call last):
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 819, in worker_busy_loop
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     output = func(*args, **kwargs)
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]              ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return func(*args, **kwargs)
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 340, in determine_available_memory
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     self.model_runner.profile_run()
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 4515, in profile_run
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     hidden_states, last_hidden_states = self._dummy_run(
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                                         ^^^^^^^^^^^^^^^^
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return func(*args, **kwargs)
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 4239, in _dummy_run
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     outputs = self.model(
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]               ^^^^^^^^^^^
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 220, in __call__
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return self.runnable(*args, **kwargs)
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return self._call_impl(*args, **kwargs)
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1786, in _call_impl
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return forward_call(*args, **kwargs)
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_next.py", line 1231, in forward
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     hidden_states = self.model(
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                     ^^^^^^^^^^^
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 526, in __call__
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     output = TorchCompileWithNoGuardsWrapper.__call__(self, *args, **kwargs)
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/wrapper.py", line 218, in __call__
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return self._call_with_optional_nvtx_range(
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/wrapper.py", line 109, in _call_with_optional_nvtx_range
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return callable_fn(*args, **kwargs)
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 832, in compile_wrapper
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return fn(*args, **kwargs)
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_next.py", line 997, in forward
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     def forward(
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return fn(*args, **kwargs)
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/caching.py", line 54, in __call__
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return self.optimized_call(*args, **kwargs)
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 837, in call_wrapped
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return self._wrapped_call(self, *args, **kwargs)
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 413, in __call__
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     raise e
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 400, in __call__
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return super(self.cls, obj).__call__(*args, **kwargs)  # type: ignore[misc]
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return self._call_impl(*args, **kwargs)
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1786, in _call_impl
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return forward_call(*args, **kwargs)
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "<eval_with_key>.98", line 307, in forward
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     submod_2 = self.submod_2(getitem_4, s72, getitem_3, l_self_modules_layers_modules_0_modules_linear_attn_modules_norm_parameters_weight_, l_self_modules_layers_modules_0_modules_linear_attn_modules_out_proj_parameters_weight_, getitem_5, l_self_modules_layers_modules_0_modules_post_attention_layernorm_parameters_weight_, getitem_6, l_self_modules_layers_modules_1_modules_input_layernorm_parameters_weight_, l_self_modules_layers_modules_1_modules_linear_attn_modules_in_proj_qkvz_parameters_weight_, l_self_modules_layers_modules_1_modules_linear_attn_modules_in_proj_ba_parameters_weight_);  getitem_4 = getitem_3 = l_self_modules_layers_modules_0_modules_linear_attn_modules_norm_parameters_weight_ = l_self_modules_layers_modules_0_modules_linear_attn_modules_out_proj_parameters_weight_ = getitem_5 = l_self_modules_layers_modules_0_modules_post_attention_layernorm_parameters_weight_ = getitem_6 = l_self_modules_layers_modules_1_modules_input_layernorm_parameters_weight_ = l_self_modules_layers_modules_1_modules_linear_attn_modules_in_proj_qkvz_parameters_weight_ = l_self_modules_layers_modules_1_modules_linear_attn_modules_in_proj_ba_parameters_weight_ = None
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 220, in __call__
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return self.runnable(*args, **kwargs)
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/piecewise_backend.py", line 177, in __call__
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     self._maybe_compile_for_range_entry(range_entry, args)
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/piecewise_backend.py", line 144, in _maybe_compile_for_range_entry
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     range_entry.runnable = self.vllm_backend.compiler_manager.compile(
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/backends.py", line 244, in compile
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     compiled_graph, handle = self.compiler.compile(
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                              ^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/compiler_interface.py", line 233, in compile
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     compiled_graph = standalone_compile(
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                      ^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/__init__.py", line 422, in standalone_compile
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return standalone_compile(
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/standalone_compile.py", line 252, in standalone_compile
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     compiled_fn = compile_fx(
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                   ^^^^^^^^^^^
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2413, in compile_fx
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return compile_fx(
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return aot_autograd(
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/backends/common.py", line 117, in __call__
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     cg = aot_module_simplified(gm, example_inputs, **self.kwargs)
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/aot_autograd.py", line 1106, in aot_module_simplified
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     compiled_fn, _ = aot_stage2_compile(aot_state, aot_graph_capture)
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 242, in aot_stage2_compile
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return aot_stage2_inference(aot_state, aot_graph_capture)
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 315, in aot_stage2_inference
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     compiled_fw = compiler(fw_module, updated_flat_args)
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/schemas.py", line 1251, in __call__
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return self.compiler_fn(gm, example_inputs)
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2558, in fw_compiler_base
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return compile_fx_forward(
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2275, in compile_fx_forward
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return inner_compile(
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/lib/python3.12/contextlib.py", line 81, in inner
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return func(*args, **kwds)
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 782, in compile_fx_inner
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")(
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/repro/after_aot.py", line 144, in debug_wrapper
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     inner_compiled_fn = compiler_fn(gm, example_inputs)
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 974, in _compile_fx_inner
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     mb_compiled_graph = fx_codegen_and_compile(
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                         ^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 1695, in fx_codegen_and_compile
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 1439, in codegen_and_compile
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     _check_triton_bf16_support(graph)
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2792, in _check_triton_bf16_support
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     warn_and_skip(node.get_device())
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2775, in warn_and_skip
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     raise SkipFrame("BF16 is not supported")
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824] torch._dynamo.exc.SkipFrame: BF16 is not supported
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824] Traceback (most recent call last):
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 819, in worker_busy_loop
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     output = func(*args, **kwargs)
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]              ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return func(*args, **kwargs)
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 340, in determine_available_memory
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     self.model_runner.profile_run()
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 4515, in profile_run
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     hidden_states, last_hidden_states = self._dummy_run(
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                                         ^^^^^^^^^^^^^^^^
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return func(*args, **kwargs)
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 4239, in _dummy_run
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     outputs = self.model(
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]               ^^^^^^^^^^^
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 220, in __call__
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return self.runnable(*args, **kwargs)
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return self._call_impl(*args, **kwargs)
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1786, in _call_impl
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return forward_call(*args, **kwargs)
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_next.py", line 1231, in forward
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     hidden_states = self.model(
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                     ^^^^^^^^^^^
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 526, in __call__
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     output = TorchCompileWithNoGuardsWrapper.__call__(self, *args, **kwargs)
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/wrapper.py", line 218, in __call__
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return self._call_with_optional_nvtx_range(
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/wrapper.py", line 109, in _call_with_optional_nvtx_range
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return callable_fn(*args, **kwargs)
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 832, in compile_wrapper
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return fn(*args, **kwargs)
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_next.py", line 997, in forward
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     def forward(
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return fn(*args, **kwargs)
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/caching.py", line 54, in __call__
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return self.optimized_call(*args, **kwargs)
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 837, in call_wrapped
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return self._wrapped_call(self, *args, **kwargs)
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 413, in __call__
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     raise e
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 400, in __call__
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return super(self.cls, obj).__call__(*args, **kwargs)  # type: ignore[misc]
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return self._call_impl(*args, **kwargs)
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1786, in _call_impl
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return forward_call(*args, **kwargs)
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "<eval_with_key>.98", line 307, in forward
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     submod_2 = self.submod_2(getitem_4, s72, getitem_3, l_self_modules_layers_modules_0_modules_linear_attn_modules_norm_parameters_weight_, l_self_modules_layers_modules_0_modules_linear_attn_modules_out_proj_parameters_weight_, getitem_5, l_self_modules_layers_modules_0_modules_post_attention_layernorm_parameters_weight_, getitem_6, l_self_modules_layers_modules_1_modules_input_layernorm_parameters_weight_, l_self_modules_layers_modules_1_modules_linear_attn_modules_in_proj_qkvz_parameters_weight_, l_self_modules_layers_modules_1_modules_linear_attn_modules_in_proj_ba_parameters_weight_);  getitem_4 = getitem_3 = l_self_modules_layers_modules_0_modules_linear_attn_modules_norm_parameters_weight_ = l_self_modules_layers_modules_0_modules_linear_attn_modules_out_proj_parameters_weight_ = getitem_5 = l_self_modules_layers_modules_0_modules_post_attention_layernorm_parameters_weight_ = getitem_6 = l_self_modules_layers_modules_1_modules_input_layernorm_parameters_weight_ = l_self_modules_layers_modules_1_modules_linear_attn_modules_in_proj_qkvz_parameters_weight_ = l_self_modules_layers_modules_1_modules_linear_attn_modules_in_proj_ba_parameters_weight_ = None
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 220, in __call__
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return self.runnable(*args, **kwargs)
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/piecewise_backend.py", line 177, in __call__
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     self._maybe_compile_for_range_entry(range_entry, args)
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/piecewise_backend.py", line 144, in _maybe_compile_for_range_entry
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     range_entry.runnable = self.vllm_backend.compiler_manager.compile(
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/backends.py", line 244, in compile
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     compiled_graph, handle = self.compiler.compile(
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                              ^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/compiler_interface.py", line 233, in compile
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     compiled_graph = standalone_compile(
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                      ^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/__init__.py", line 422, in standalone_compile
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return standalone_compile(
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/standalone_compile.py", line 252, in standalone_compile
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     compiled_fn = compile_fx(
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                   ^^^^^^^^^^^
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2413, in compile_fx
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return compile_fx(
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return aot_autograd(
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/backends/common.py", line 117, in __call__
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     cg = aot_module_simplified(gm, example_inputs, **self.kwargs)
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/aot_autograd.py", line 1106, in aot_module_simplified
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     compiled_fn, _ = aot_stage2_compile(aot_state, aot_graph_capture)
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 242, in aot_stage2_compile
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return aot_stage2_inference(aot_state, aot_graph_capture)
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 315, in aot_stage2_inference
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     compiled_fw = compiler(fw_module, updated_flat_args)
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/schemas.py", line 1251, in __call__
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return self.compiler_fn(gm, example_inputs)
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2558, in fw_compiler_base
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return compile_fx_forward(
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2275, in compile_fx_forward
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return inner_compile(
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/lib/python3.12/contextlib.py", line 81, in inner
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return func(*args, **kwds)
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 782, in compile_fx_inner
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")(
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/repro/after_aot.py", line 144, in debug_wrapper
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     inner_compiled_fn = compiler_fn(gm, example_inputs)
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 974, in _compile_fx_inner
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     mb_compiled_graph = fx_codegen_and_compile(
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                         ^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 1695, in fx_codegen_and_compile
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 1439, in codegen_and_compile
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     _check_triton_bf16_support(graph)
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2792, in _check_triton_bf16_support
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     warn_and_skip(node.get_device())
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2775, in warn_and_skip
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     raise SkipFrame("BF16 is not supported")
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824] torch._dynamo.exc.SkipFrame: BF16 is not supported
(Worker_TP0 pid=41) ERROR 12-17 21:17:29 [multiproc_executor.py:824]
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824] WorkerProc hit an exception.
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824] Traceback (most recent call last):
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 819, in worker_busy_loop
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     output = func(*args, **kwargs)
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]              ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return func(*args, **kwargs)
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 340, in determine_available_memory
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     self.model_runner.profile_run()
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 4515, in profile_run
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     hidden_states, last_hidden_states = self._dummy_run(
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                                         ^^^^^^^^^^^^^^^^
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return func(*args, **kwargs)
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 4239, in _dummy_run
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     outputs = self.model(
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]               ^^^^^^^^^^^
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 220, in __call__
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return self.runnable(*args, **kwargs)
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return self._call_impl(*args, **kwargs)
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1786, in _call_impl
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return forward_call(*args, **kwargs)
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_next.py", line 1231, in forward
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     hidden_states = self.model(
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                     ^^^^^^^^^^^
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 526, in __call__
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     output = TorchCompileWithNoGuardsWrapper.__call__(self, *args, **kwargs)
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/wrapper.py", line 218, in __call__
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return self._call_with_optional_nvtx_range(
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/wrapper.py", line 109, in _call_with_optional_nvtx_range
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return callable_fn(*args, **kwargs)
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 832, in compile_wrapper
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return fn(*args, **kwargs)
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_next.py", line 997, in forward
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     def forward(
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return fn(*args, **kwargs)
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/caching.py", line 54, in __call__
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return self.optimized_call(*args, **kwargs)
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 837, in call_wrapped
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return self._wrapped_call(self, *args, **kwargs)
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 413, in __call__
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     raise e
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 400, in __call__
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return super(self.cls, obj).__call__(*args, **kwargs)  # type: ignore[misc]
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return self._call_impl(*args, **kwargs)
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1786, in _call_impl
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return forward_call(*args, **kwargs)
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "<eval_with_key>.98", line 307, in forward
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     submod_2 = self.submod_2(getitem_4, s72, getitem_3, l_self_modules_layers_modules_0_modules_linear_attn_modules_norm_parameters_weight_, l_self_modules_layers_modules_0_modules_linear_attn_modules_out_proj_parameters_weight_, getitem_5, l_self_modules_layers_modules_0_modules_post_attention_layernorm_parameters_weight_, getitem_6, l_self_modules_layers_modules_1_modules_input_layernorm_parameters_weight_, l_self_modules_layers_modules_1_modules_linear_attn_modules_in_proj_qkvz_parameters_weight_, l_self_modules_layers_modules_1_modules_linear_attn_modules_in_proj_ba_parameters_weight_);  getitem_4 = getitem_3 = l_self_modules_layers_modules_0_modules_linear_attn_modules_norm_parameters_weight_ = l_self_modules_layers_modules_0_modules_linear_attn_modules_out_proj_parameters_weight_ = getitem_5 = l_self_modules_layers_modules_0_modules_post_attention_layernorm_parameters_weight_ = getitem_6 = l_self_modules_layers_modules_1_modules_input_layernorm_parameters_weight_ = l_self_modules_layers_modules_1_modules_linear_attn_modules_in_proj_qkvz_parameters_weight_ = l_self_modules_layers_modules_1_modules_linear_attn_modules_in_proj_ba_parameters_weight_ = None
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 220, in __call__
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return self.runnable(*args, **kwargs)
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/piecewise_backend.py", line 177, in __call__
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     self._maybe_compile_for_range_entry(range_entry, args)
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/piecewise_backend.py", line 144, in _maybe_compile_for_range_entry
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     range_entry.runnable = self.vllm_backend.compiler_manager.compile(
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/backends.py", line 244, in compile
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     compiled_graph, handle = self.compiler.compile(
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                              ^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/compiler_interface.py", line 233, in compile
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     compiled_graph = standalone_compile(
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                      ^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/__init__.py", line 422, in standalone_compile
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return standalone_compile(
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/standalone_compile.py", line 252, in standalone_compile
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     compiled_fn = compile_fx(
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                   ^^^^^^^^^^^
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2413, in compile_fx
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return compile_fx(
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return aot_autograd(
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/backends/common.py", line 117, in __call__
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     cg = aot_module_simplified(gm, example_inputs, **self.kwargs)
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/aot_autograd.py", line 1106, in aot_module_simplified
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     compiled_fn, _ = aot_stage2_compile(aot_state, aot_graph_capture)
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 242, in aot_stage2_compile
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return aot_stage2_inference(aot_state, aot_graph_capture)
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 315, in aot_stage2_inference
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     compiled_fw = compiler(fw_module, updated_flat_args)
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/schemas.py", line 1251, in __call__
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return self.compiler_fn(gm, example_inputs)
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2558, in fw_compiler_base
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return compile_fx_forward(
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2275, in compile_fx_forward
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return inner_compile(
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/lib/python3.12/contextlib.py", line 81, in inner
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return func(*args, **kwds)
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 782, in compile_fx_inner
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")(
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/repro/after_aot.py", line 144, in debug_wrapper
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     inner_compiled_fn = compiler_fn(gm, example_inputs)
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 974, in _compile_fx_inner
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     mb_compiled_graph = fx_codegen_and_compile(
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                         ^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 1695, in fx_codegen_and_compile
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 1439, in codegen_and_compile
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     _check_triton_bf16_support(graph)
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2792, in _check_triton_bf16_support
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     warn_and_skip(node.get_device())
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2775, in warn_and_skip
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     raise SkipFrame("BF16 is not supported")
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824] torch._dynamo.exc.SkipFrame: BF16 is not supported
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824] Traceback (most recent call last):
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 819, in worker_busy_loop
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     output = func(*args, **kwargs)
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]              ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return func(*args, **kwargs)
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 340, in determine_available_memory
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     self.model_runner.profile_run()
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 4515, in profile_run
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     hidden_states, last_hidden_states = self._dummy_run(
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                                         ^^^^^^^^^^^^^^^^
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return func(*args, **kwargs)
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 4239, in _dummy_run
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     outputs = self.model(
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]               ^^^^^^^^^^^
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 220, in __call__
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return self.runnable(*args, **kwargs)
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return self._call_impl(*args, **kwargs)
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1786, in _call_impl
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return forward_call(*args, **kwargs)
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_next.py", line 1231, in forward
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     hidden_states = self.model(
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                     ^^^^^^^^^^^
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 526, in __call__
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     output = TorchCompileWithNoGuardsWrapper.__call__(self, *args, **kwargs)
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/wrapper.py", line 218, in __call__
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return self._call_with_optional_nvtx_range(
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/wrapper.py", line 109, in _call_with_optional_nvtx_range
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return callable_fn(*args, **kwargs)
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 832, in compile_wrapper
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return fn(*args, **kwargs)
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_next.py", line 997, in forward
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     def forward(
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return fn(*args, **kwargs)
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/caching.py", line 54, in __call__
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return self.optimized_call(*args, **kwargs)
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 837, in call_wrapped
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return self._wrapped_call(self, *args, **kwargs)
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 413, in __call__
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     raise e
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 400, in __call__
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return super(self.cls, obj).__call__(*args, **kwargs)  # type: ignore[misc]
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return self._call_impl(*args, **kwargs)
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1786, in _call_impl
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return forward_call(*args, **kwargs)
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "<eval_with_key>.98", line 307, in forward
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     submod_2 = self.submod_2(getitem_4, s72, getitem_3, l_self_modules_layers_modules_0_modules_linear_attn_modules_norm_parameters_weight_, l_self_modules_layers_modules_0_modules_linear_attn_modules_out_proj_parameters_weight_, getitem_5, l_self_modules_layers_modules_0_modules_post_attention_layernorm_parameters_weight_, getitem_6, l_self_modules_layers_modules_1_modules_input_layernorm_parameters_weight_, l_self_modules_layers_modules_1_modules_linear_attn_modules_in_proj_qkvz_parameters_weight_, l_self_modules_layers_modules_1_modules_linear_attn_modules_in_proj_ba_parameters_weight_);  getitem_4 = getitem_3 = l_self_modules_layers_modules_0_modules_linear_attn_modules_norm_parameters_weight_ = l_self_modules_layers_modules_0_modules_linear_attn_modules_out_proj_parameters_weight_ = getitem_5 = l_self_modules_layers_modules_0_modules_post_attention_layernorm_parameters_weight_ = getitem_6 = l_self_modules_layers_modules_1_modules_input_layernorm_parameters_weight_ = l_self_modules_layers_modules_1_modules_linear_attn_modules_in_proj_qkvz_parameters_weight_ = l_self_modules_layers_modules_1_modules_linear_attn_modules_in_proj_ba_parameters_weight_ = None
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 220, in __call__
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return self.runnable(*args, **kwargs)
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/piecewise_backend.py", line 177, in __call__
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     self._maybe_compile_for_range_entry(range_entry, args)
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/piecewise_backend.py", line 144, in _maybe_compile_for_range_entry
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     range_entry.runnable = self.vllm_backend.compiler_manager.compile(
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/backends.py", line 244, in compile
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     compiled_graph, handle = self.compiler.compile(
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                              ^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/compiler_interface.py", line 233, in compile
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     compiled_graph = standalone_compile(
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                      ^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/__init__.py", line 422, in standalone_compile
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return standalone_compile(
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/standalone_compile.py", line 252, in standalone_compile
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     compiled_fn = compile_fx(
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                   ^^^^^^^^^^^
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2413, in compile_fx
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return compile_fx(
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return aot_autograd(
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/backends/common.py", line 117, in __call__
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     cg = aot_module_simplified(gm, example_inputs, **self.kwargs)
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/aot_autograd.py", line 1106, in aot_module_simplified
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     compiled_fn, _ = aot_stage2_compile(aot_state, aot_graph_capture)
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 242, in aot_stage2_compile
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return aot_stage2_inference(aot_state, aot_graph_capture)
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 315, in aot_stage2_inference
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     compiled_fw = compiler(fw_module, updated_flat_args)
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/schemas.py", line 1251, in __call__
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return self.compiler_fn(gm, example_inputs)
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2558, in fw_compiler_base
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return compile_fx_forward(
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2275, in compile_fx_forward
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return inner_compile(
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/lib/python3.12/contextlib.py", line 81, in inner
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return func(*args, **kwds)
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 782, in compile_fx_inner
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")(
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/repro/after_aot.py", line 144, in debug_wrapper
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     inner_compiled_fn = compiler_fn(gm, example_inputs)
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 974, in _compile_fx_inner
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     mb_compiled_graph = fx_codegen_and_compile(
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                         ^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 1695, in fx_codegen_and_compile
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 1439, in codegen_and_compile
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     _check_triton_bf16_support(graph)
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2792, in _check_triton_bf16_support
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     warn_and_skip(node.get_device())
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2775, in warn_and_skip
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     raise SkipFrame("BF16 is not supported")
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824] torch._dynamo.exc.SkipFrame: BF16 is not supported
(Worker_TP1 pid=42) ERROR 12-17 21:17:29 [multiproc_executor.py:824]
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824] WorkerProc hit an exception.
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824] Traceback (most recent call last):
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 819, in worker_busy_loop
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     output = func(*args, **kwargs)
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]              ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return func(*args, **kwargs)
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 340, in determine_available_memory
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     self.model_runner.profile_run()
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 4515, in profile_run
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     hidden_states, last_hidden_states = self._dummy_run(
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                                         ^^^^^^^^^^^^^^^^
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return func(*args, **kwargs)
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 4239, in _dummy_run
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     outputs = self.model(
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]               ^^^^^^^^^^^
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 220, in __call__
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return self.runnable(*args, **kwargs)
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return self._call_impl(*args, **kwargs)
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1786, in _call_impl
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return forward_call(*args, **kwargs)
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_next.py", line 1231, in forward
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     hidden_states = self.model(
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                     ^^^^^^^^^^^
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 526, in __call__
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     output = TorchCompileWithNoGuardsWrapper.__call__(self, *args, **kwargs)
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/wrapper.py", line 218, in __call__
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return self._call_with_optional_nvtx_range(
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/wrapper.py", line 109, in _call_with_optional_nvtx_range
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return callable_fn(*args, **kwargs)
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 832, in compile_wrapper
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return fn(*args, **kwargs)
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_next.py", line 997, in forward
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     def forward(
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return fn(*args, **kwargs)
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/caching.py", line 54, in __call__
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return self.optimized_call(*args, **kwargs)
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 837, in call_wrapped
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return self._wrapped_call(self, *args, **kwargs)
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 413, in __call__
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     raise e
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 400, in __call__
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return super(self.cls, obj).__call__(*args, **kwargs)  # type: ignore[misc]
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return self._call_impl(*args, **kwargs)
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1786, in _call_impl
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return forward_call(*args, **kwargs)
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "<eval_with_key>.98", line 307, in forward
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     submod_2 = self.submod_2(getitem_4, s72, getitem_3, l_self_modules_layers_modules_0_modules_linear_attn_modules_norm_parameters_weight_, l_self_modules_layers_modules_0_modules_linear_attn_modules_out_proj_parameters_weight_, getitem_5, l_self_modules_layers_modules_0_modules_post_attention_layernorm_parameters_weight_, getitem_6, l_self_modules_layers_modules_1_modules_input_layernorm_parameters_weight_, l_self_modules_layers_modules_1_modules_linear_attn_modules_in_proj_qkvz_parameters_weight_, l_self_modules_layers_modules_1_modules_linear_attn_modules_in_proj_ba_parameters_weight_);  getitem_4 = getitem_3 = l_self_modules_layers_modules_0_modules_linear_attn_modules_norm_parameters_weight_ = l_self_modules_layers_modules_0_modules_linear_attn_modules_out_proj_parameters_weight_ = getitem_5 = l_self_modules_layers_modules_0_modules_post_attention_layernorm_parameters_weight_ = getitem_6 = l_self_modules_layers_modules_1_modules_input_layernorm_parameters_weight_ = l_self_modules_layers_modules_1_modules_linear_attn_modules_in_proj_qkvz_parameters_weight_ = l_self_modules_layers_modules_1_modules_linear_attn_modules_in_proj_ba_parameters_weight_ = None
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 220, in __call__
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return self.runnable(*args, **kwargs)
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/piecewise_backend.py", line 177, in __call__
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     self._maybe_compile_for_range_entry(range_entry, args)
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/piecewise_backend.py", line 144, in _maybe_compile_for_range_entry
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     range_entry.runnable = self.vllm_backend.compiler_manager.compile(
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/backends.py", line 244, in compile
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     compiled_graph, handle = self.compiler.compile(
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                              ^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/compiler_interface.py", line 233, in compile
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     compiled_graph = standalone_compile(
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                      ^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/__init__.py", line 422, in standalone_compile
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return standalone_compile(
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/standalone_compile.py", line 252, in standalone_compile
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     compiled_fn = compile_fx(
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                   ^^^^^^^^^^^
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2413, in compile_fx
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return compile_fx(
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return aot_autograd(
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/backends/common.py", line 117, in __call__
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     cg = aot_module_simplified(gm, example_inputs, **self.kwargs)
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/aot_autograd.py", line 1106, in aot_module_simplified
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     compiled_fn, _ = aot_stage2_compile(aot_state, aot_graph_capture)
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 242, in aot_stage2_compile
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return aot_stage2_inference(aot_state, aot_graph_capture)
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 315, in aot_stage2_inference
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     compiled_fw = compiler(fw_module, updated_flat_args)
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/schemas.py", line 1251, in __call__
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return self.compiler_fn(gm, example_inputs)
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2558, in fw_compiler_base
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return compile_fx_forward(
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2275, in compile_fx_forward
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return inner_compile(
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/lib/python3.12/contextlib.py", line 81, in inner
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return func(*args, **kwds)
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 782, in compile_fx_inner
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")(
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/repro/after_aot.py", line 144, in debug_wrapper
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     inner_compiled_fn = compiler_fn(gm, example_inputs)
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 974, in _compile_fx_inner
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     mb_compiled_graph = fx_codegen_and_compile(
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                         ^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 1695, in fx_codegen_and_compile
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 1439, in codegen_and_compile
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     _check_triton_bf16_support(graph)
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2792, in _check_triton_bf16_support
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     warn_and_skip(node.get_device())
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2775, in warn_and_skip
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     raise SkipFrame("BF16 is not supported")
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824] torch._dynamo.exc.SkipFrame: BF16 is not supported
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824] Traceback (most recent call last):
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 819, in worker_busy_loop
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     output = func(*args, **kwargs)
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]              ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return func(*args, **kwargs)
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 340, in determine_available_memory
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     self.model_runner.profile_run()
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 4515, in profile_run
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     hidden_states, last_hidden_states = self._dummy_run(
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                                         ^^^^^^^^^^^^^^^^
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return func(*args, **kwargs)
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 4239, in _dummy_run
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     outputs = self.model(
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]               ^^^^^^^^^^^
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 220, in __call__
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return self.runnable(*args, **kwargs)
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return self._call_impl(*args, **kwargs)
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1786, in _call_impl
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return forward_call(*args, **kwargs)
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_next.py", line 1231, in forward
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     hidden_states = self.model(
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                     ^^^^^^^^^^^
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 526, in __call__
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     output = TorchCompileWithNoGuardsWrapper.__call__(self, *args, **kwargs)
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/wrapper.py", line 218, in __call__
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return self._call_with_optional_nvtx_range(
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/wrapper.py", line 109, in _call_with_optional_nvtx_range
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return callable_fn(*args, **kwargs)
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 832, in compile_wrapper
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return fn(*args, **kwargs)
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_next.py", line 997, in forward
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     def forward(
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return fn(*args, **kwargs)
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/caching.py", line 54, in __call__
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return self.optimized_call(*args, **kwargs)
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 837, in call_wrapped
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return self._wrapped_call(self, *args, **kwargs)
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 413, in __call__
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     raise e
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 400, in __call__
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return super(self.cls, obj).__call__(*args, **kwargs)  # type: ignore[misc]
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return self._call_impl(*args, **kwargs)
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1786, in _call_impl
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return forward_call(*args, **kwargs)
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "<eval_with_key>.98", line 307, in forward
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     submod_2 = self.submod_2(getitem_4, s72, getitem_3, l_self_modules_layers_modules_0_modules_linear_attn_modules_norm_parameters_weight_, l_self_modules_layers_modules_0_modules_linear_attn_modules_out_proj_parameters_weight_, getitem_5, l_self_modules_layers_modules_0_modules_post_attention_layernorm_parameters_weight_, getitem_6, l_self_modules_layers_modules_1_modules_input_layernorm_parameters_weight_, l_self_modules_layers_modules_1_modules_linear_attn_modules_in_proj_qkvz_parameters_weight_, l_self_modules_layers_modules_1_modules_linear_attn_modules_in_proj_ba_parameters_weight_);  getitem_4 = getitem_3 = l_self_modules_layers_modules_0_modules_linear_attn_modules_norm_parameters_weight_ = l_self_modules_layers_modules_0_modules_linear_attn_modules_out_proj_parameters_weight_ = getitem_5 = l_self_modules_layers_modules_0_modules_post_attention_layernorm_parameters_weight_ = getitem_6 = l_self_modules_layers_modules_1_modules_input_layernorm_parameters_weight_ = l_self_modules_layers_modules_1_modules_linear_attn_modules_in_proj_qkvz_parameters_weight_ = l_self_modules_layers_modules_1_modules_linear_attn_modules_in_proj_ba_parameters_weight_ = None
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 220, in __call__
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return self.runnable(*args, **kwargs)
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/piecewise_backend.py", line 177, in __call__
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     self._maybe_compile_for_range_entry(range_entry, args)
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/piecewise_backend.py", line 144, in _maybe_compile_for_range_entry
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     range_entry.runnable = self.vllm_backend.compiler_manager.compile(
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/backends.py", line 244, in compile
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     compiled_graph, handle = self.compiler.compile(
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                              ^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/compiler_interface.py", line 233, in compile
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     compiled_graph = standalone_compile(
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                      ^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/__init__.py", line 422, in standalone_compile
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return standalone_compile(
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/standalone_compile.py", line 252, in standalone_compile
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     compiled_fn = compile_fx(
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                   ^^^^^^^^^^^
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2413, in compile_fx
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return compile_fx(
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return aot_autograd(
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/backends/common.py", line 117, in __call__
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     cg = aot_module_simplified(gm, example_inputs, **self.kwargs)
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/aot_autograd.py", line 1106, in aot_module_simplified
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     compiled_fn, _ = aot_stage2_compile(aot_state, aot_graph_capture)
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 242, in aot_stage2_compile
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return aot_stage2_inference(aot_state, aot_graph_capture)
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 315, in aot_stage2_inference
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     compiled_fw = compiler(fw_module, updated_flat_args)
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/schemas.py", line 1251, in __call__
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return self.compiler_fn(gm, example_inputs)
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2558, in fw_compiler_base
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return compile_fx_forward(
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2275, in compile_fx_forward
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return inner_compile(
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/lib/python3.12/contextlib.py", line 81, in inner
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return func(*args, **kwds)
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 782, in compile_fx_inner
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")(
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/repro/after_aot.py", line 144, in debug_wrapper
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     inner_compiled_fn = compiler_fn(gm, example_inputs)
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 974, in _compile_fx_inner
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     mb_compiled_graph = fx_codegen_and_compile(
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                         ^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 1695, in fx_codegen_and_compile
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 1439, in codegen_and_compile
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     _check_triton_bf16_support(graph)
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2792, in _check_triton_bf16_support
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     warn_and_skip(node.get_device())
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2775, in warn_and_skip
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     raise SkipFrame("BF16 is not supported")
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824] torch._dynamo.exc.SkipFrame: BF16 is not supported
(Worker_TP2 pid=43) ERROR 12-17 21:17:29 [multiproc_executor.py:824]
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824] WorkerProc hit an exception.
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824] Traceback (most recent call last):
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 819, in worker_busy_loop
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     output = func(*args, **kwargs)
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]              ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return func(*args, **kwargs)
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 340, in determine_available_memory
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     self.model_runner.profile_run()
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 4515, in profile_run
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     hidden_states, last_hidden_states = self._dummy_run(
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                                         ^^^^^^^^^^^^^^^^
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return func(*args, **kwargs)
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 4239, in _dummy_run
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     outputs = self.model(
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]               ^^^^^^^^^^^
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 220, in __call__
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return self.runnable(*args, **kwargs)
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return self._call_impl(*args, **kwargs)
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1786, in _call_impl
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return forward_call(*args, **kwargs)
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_next.py", line 1231, in forward
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     hidden_states = self.model(
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                     ^^^^^^^^^^^
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 526, in __call__
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     output = TorchCompileWithNoGuardsWrapper.__call__(self, *args, **kwargs)
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/wrapper.py", line 218, in __call__
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return self._call_with_optional_nvtx_range(
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/wrapper.py", line 109, in _call_with_optional_nvtx_range
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return callable_fn(*args, **kwargs)
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 832, in compile_wrapper
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return fn(*args, **kwargs)
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_next.py", line 997, in forward
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     def forward(
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return fn(*args, **kwargs)
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/caching.py", line 54, in __call__
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return self.optimized_call(*args, **kwargs)
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 837, in call_wrapped
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return self._wrapped_call(self, *args, **kwargs)
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 413, in __call__
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     raise e
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 400, in __call__
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return super(self.cls, obj).__call__(*args, **kwargs)  # type: ignore[misc]
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return self._call_impl(*args, **kwargs)
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1786, in _call_impl
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return forward_call(*args, **kwargs)
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "<eval_with_key>.98", line 307, in forward
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     submod_2 = self.submod_2(getitem_4, s72, getitem_3, l_self_modules_layers_modules_0_modules_linear_attn_modules_norm_parameters_weight_, l_self_modules_layers_modules_0_modules_linear_attn_modules_out_proj_parameters_weight_, getitem_5, l_self_modules_layers_modules_0_modules_post_attention_layernorm_parameters_weight_, getitem_6, l_self_modules_layers_modules_1_modules_input_layernorm_parameters_weight_, l_self_modules_layers_modules_1_modules_linear_attn_modules_in_proj_qkvz_parameters_weight_, l_self_modules_layers_modules_1_modules_linear_attn_modules_in_proj_ba_parameters_weight_);  getitem_4 = getitem_3 = l_self_modules_layers_modules_0_modules_linear_attn_modules_norm_parameters_weight_ = l_self_modules_layers_modules_0_modules_linear_attn_modules_out_proj_parameters_weight_ = getitem_5 = l_self_modules_layers_modules_0_modules_post_attention_layernorm_parameters_weight_ = getitem_6 = l_self_modules_layers_modules_1_modules_input_layernorm_parameters_weight_ = l_self_modules_layers_modules_1_modules_linear_attn_modules_in_proj_qkvz_parameters_weight_ = l_self_modules_layers_modules_1_modules_linear_attn_modules_in_proj_ba_parameters_weight_ = None
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 220, in __call__
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return self.runnable(*args, **kwargs)
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/piecewise_backend.py", line 177, in __call__
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     self._maybe_compile_for_range_entry(range_entry, args)
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/piecewise_backend.py", line 144, in _maybe_compile_for_range_entry
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     range_entry.runnable = self.vllm_backend.compiler_manager.compile(
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/backends.py", line 244, in compile
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     compiled_graph, handle = self.compiler.compile(
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                              ^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/compiler_interface.py", line 233, in compile
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     compiled_graph = standalone_compile(
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                      ^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/__init__.py", line 422, in standalone_compile
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return standalone_compile(
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/standalone_compile.py", line 252, in standalone_compile
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     compiled_fn = compile_fx(
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                   ^^^^^^^^^^^
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2413, in compile_fx
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return compile_fx(
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return aot_autograd(
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/backends/common.py", line 117, in __call__
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     cg = aot_module_simplified(gm, example_inputs, **self.kwargs)
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/aot_autograd.py", line 1106, in aot_module_simplified
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     compiled_fn, _ = aot_stage2_compile(aot_state, aot_graph_capture)
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 242, in aot_stage2_compile
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return aot_stage2_inference(aot_state, aot_graph_capture)
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 315, in aot_stage2_inference
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     compiled_fw = compiler(fw_module, updated_flat_args)
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/schemas.py", line 1251, in __call__
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return self.compiler_fn(gm, example_inputs)
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2558, in fw_compiler_base
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return compile_fx_forward(
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2275, in compile_fx_forward
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return inner_compile(
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/lib/python3.12/contextlib.py", line 81, in inner
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return func(*args, **kwds)
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 782, in compile_fx_inner
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")(
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/repro/after_aot.py", line 144, in debug_wrapper
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     inner_compiled_fn = compiler_fn(gm, example_inputs)
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 974, in _compile_fx_inner
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     mb_compiled_graph = fx_codegen_and_compile(
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                         ^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 1695, in fx_codegen_and_compile
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 1439, in codegen_and_compile
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     _check_triton_bf16_support(graph)
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2792, in _check_triton_bf16_support
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     warn_and_skip(node.get_device())
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2775, in warn_and_skip
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     raise SkipFrame("BF16 is not supported")
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824] torch._dynamo.exc.SkipFrame: BF16 is not supported
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824] Traceback (most recent call last):
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 819, in worker_busy_loop
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     output = func(*args, **kwargs)
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]              ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return func(*args, **kwargs)
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 340, in determine_available_memory
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     self.model_runner.profile_run()
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 4515, in profile_run
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     hidden_states, last_hidden_states = self._dummy_run(
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                                         ^^^^^^^^^^^^^^^^
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return func(*args, **kwargs)
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 4239, in _dummy_run
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     outputs = self.model(
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]               ^^^^^^^^^^^
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 220, in __call__
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return self.runnable(*args, **kwargs)
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return self._call_impl(*args, **kwargs)
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1786, in _call_impl
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return forward_call(*args, **kwargs)
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_next.py", line 1231, in forward
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     hidden_states = self.model(
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                     ^^^^^^^^^^^
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 526, in __call__
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     output = TorchCompileWithNoGuardsWrapper.__call__(self, *args, **kwargs)
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/wrapper.py", line 218, in __call__
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return self._call_with_optional_nvtx_range(
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/wrapper.py", line 109, in _call_with_optional_nvtx_range
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return callable_fn(*args, **kwargs)
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 832, in compile_wrapper
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return fn(*args, **kwargs)
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_next.py", line 997, in forward
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     def forward(
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return fn(*args, **kwargs)
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/caching.py", line 54, in __call__
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return self.optimized_call(*args, **kwargs)
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 837, in call_wrapped
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return self._wrapped_call(self, *args, **kwargs)
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 413, in __call__
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     raise e
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 400, in __call__
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return super(self.cls, obj).__call__(*args, **kwargs)  # type: ignore[misc]
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return self._call_impl(*args, **kwargs)
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1786, in _call_impl
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return forward_call(*args, **kwargs)
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "<eval_with_key>.98", line 307, in forward
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     submod_2 = self.submod_2(getitem_4, s72, getitem_3, l_self_modules_layers_modules_0_modules_linear_attn_modules_norm_parameters_weight_, l_self_modules_layers_modules_0_modules_linear_attn_modules_out_proj_parameters_weight_, getitem_5, l_self_modules_layers_modules_0_modules_post_attention_layernorm_parameters_weight_, getitem_6, l_self_modules_layers_modules_1_modules_input_layernorm_parameters_weight_, l_self_modules_layers_modules_1_modules_linear_attn_modules_in_proj_qkvz_parameters_weight_, l_self_modules_layers_modules_1_modules_linear_attn_modules_in_proj_ba_parameters_weight_);  getitem_4 = getitem_3 = l_self_modules_layers_modules_0_modules_linear_attn_modules_norm_parameters_weight_ = l_self_modules_layers_modules_0_modules_linear_attn_modules_out_proj_parameters_weight_ = getitem_5 = l_self_modules_layers_modules_0_modules_post_attention_layernorm_parameters_weight_ = getitem_6 = l_self_modules_layers_modules_1_modules_input_layernorm_parameters_weight_ = l_self_modules_layers_modules_1_modules_linear_attn_modules_in_proj_qkvz_parameters_weight_ = l_self_modules_layers_modules_1_modules_linear_attn_modules_in_proj_ba_parameters_weight_ = None
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 220, in __call__
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return self.runnable(*args, **kwargs)
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/piecewise_backend.py", line 177, in __call__
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     self._maybe_compile_for_range_entry(range_entry, args)
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/piecewise_backend.py", line 144, in _maybe_compile_for_range_entry
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     range_entry.runnable = self.vllm_backend.compiler_manager.compile(
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/backends.py", line 244, in compile
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     compiled_graph, handle = self.compiler.compile(
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                              ^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/compiler_interface.py", line 233, in compile
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     compiled_graph = standalone_compile(
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                      ^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/__init__.py", line 422, in standalone_compile
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return standalone_compile(
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/standalone_compile.py", line 252, in standalone_compile
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     compiled_fn = compile_fx(
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                   ^^^^^^^^^^^
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2413, in compile_fx
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return compile_fx(
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return aot_autograd(
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/backends/common.py", line 117, in __call__
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     cg = aot_module_simplified(gm, example_inputs, **self.kwargs)
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/aot_autograd.py", line 1106, in aot_module_simplified
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     compiled_fn, _ = aot_stage2_compile(aot_state, aot_graph_capture)
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 242, in aot_stage2_compile
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return aot_stage2_inference(aot_state, aot_graph_capture)
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 315, in aot_stage2_inference
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     compiled_fw = compiler(fw_module, updated_flat_args)
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/schemas.py", line 1251, in __call__
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return self.compiler_fn(gm, example_inputs)
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2558, in fw_compiler_base
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return compile_fx_forward(
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2275, in compile_fx_forward
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return inner_compile(
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/lib/python3.12/contextlib.py", line 81, in inner
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return func(*args, **kwds)
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 782, in compile_fx_inner
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")(
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/repro/after_aot.py", line 144, in debug_wrapper
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     inner_compiled_fn = compiler_fn(gm, example_inputs)
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 974, in _compile_fx_inner
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     mb_compiled_graph = fx_codegen_and_compile(
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]                         ^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 1695, in fx_codegen_and_compile
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 1439, in codegen_and_compile
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     _check_triton_bf16_support(graph)
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2792, in _check_triton_bf16_support
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     warn_and_skip(node.get_device())
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2775, in warn_and_skip
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]     raise SkipFrame("BF16 is not supported")
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824] torch._dynamo.exc.SkipFrame: BF16 is not supported
(Worker_TP3 pid=44) ERROR 12-17 21:17:29 [multiproc_executor.py:824]
(EngineCore_DP0 pid=31) ERROR 12-17 21:17:29 [core.py:866] EngineCore failed to start.
(EngineCore_DP0 pid=31) ERROR 12-17 21:17:29 [core.py:866] Traceback (most recent call last):
(EngineCore_DP0 pid=31) ERROR 12-17 21:17:29 [core.py:866]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 857, in run_engine_core
(EngineCore_DP0 pid=31) ERROR 12-17 21:17:29 [core.py:866]     engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=31) ERROR 12-17 21:17:29 [core.py:866]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=31) ERROR 12-17 21:17:29 [core.py:866]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 637, in __init__
(EngineCore_DP0 pid=31) ERROR 12-17 21:17:29 [core.py:866]     super().__init__(
(EngineCore_DP0 pid=31) ERROR 12-17 21:17:29 [core.py:866]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 109, in __init__
(EngineCore_DP0 pid=31) ERROR 12-17 21:17:29 [core.py:866]     num_gpu_blocks, num_cpu_blocks, kv_cache_config = self._initialize_kv_caches(
(EngineCore_DP0 pid=31) ERROR 12-17 21:17:29 [core.py:866]                                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=31) ERROR 12-17 21:17:29 [core.py:866]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 240, in _initialize_kv_caches
(EngineCore_DP0 pid=31) ERROR 12-17 21:17:29 [core.py:866]     available_gpu_memory = self.model_executor.determine_available_memory()
(EngineCore_DP0 pid=31) ERROR 12-17 21:17:29 [core.py:866]                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=31) ERROR 12-17 21:17:29 [core.py:866]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 126, in determine_available_memory
(EngineCore_DP0 pid=31) ERROR 12-17 21:17:29 [core.py:866]     return self.collective_rpc("determine_available_memory")
(EngineCore_DP0 pid=31) ERROR 12-17 21:17:29 [core.py:866]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=31) ERROR 12-17 21:17:29 [core.py:866]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 359, in collective_rpc
(EngineCore_DP0 pid=31) ERROR 12-17 21:17:29 [core.py:866]     return aggregate(get_response())
(EngineCore_DP0 pid=31) ERROR 12-17 21:17:29 [core.py:866]                      ^^^^^^^^^^^^^^
(EngineCore_DP0 pid=31) ERROR 12-17 21:17:29 [core.py:866]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 342, in get_response
(EngineCore_DP0 pid=31) ERROR 12-17 21:17:29 [core.py:866]     raise RuntimeError(
(EngineCore_DP0 pid=31) ERROR 12-17 21:17:29 [core.py:866] RuntimeError: Worker failed with error 'BF16 is not supported', please check the stack trace above for the root cause
(EngineCore_DP0 pid=31) ERROR 12-17 21:17:32 [multiproc_executor.py:231] Worker proc VllmWorker-1 died unexpectedly, shutting down executor.
(EngineCore_DP0 pid=31) Process EngineCore_DP0:
(EngineCore_DP0 pid=31) Traceback (most recent call last):
(EngineCore_DP0 pid=31)   File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore_DP0 pid=31)     self.run()
(EngineCore_DP0 pid=31)   File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
(EngineCore_DP0 pid=31)     self._target(*self._args, **self._kwargs)
(EngineCore_DP0 pid=31)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 870, in run_engine_core
(EngineCore_DP0 pid=31)     raise e
(EngineCore_DP0 pid=31)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 857, in run_engine_core
(EngineCore_DP0 pid=31)     engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=31)                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=31)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 637, in __init__
(EngineCore_DP0 pid=31)     super().__init__(
(EngineCore_DP0 pid=31)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 109, in __init__
(EngineCore_DP0 pid=31)     num_gpu_blocks, num_cpu_blocks, kv_cache_config = self._initialize_kv_caches(
(EngineCore_DP0 pid=31)                                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=31)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 240, in _initialize_kv_caches
(EngineCore_DP0 pid=31)     available_gpu_memory = self.model_executor.determine_available_memory()
(EngineCore_DP0 pid=31)                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=31)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 126, in determine_available_memory
(EngineCore_DP0 pid=31)     return self.collective_rpc("determine_available_memory")
(EngineCore_DP0 pid=31)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=31)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 359, in collective_rpc
(EngineCore_DP0 pid=31)     return aggregate(get_response())
(EngineCore_DP0 pid=31)                      ^^^^^^^^^^^^^^
(EngineCore_DP0 pid=31)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 342, in get_response
(EngineCore_DP0 pid=31)     raise RuntimeError(
(EngineCore_DP0 pid=31) RuntimeError: Worker failed with error 'BF16 is not supported', please check the stack trace above for the root cause
(APIServer pid=1) Traceback (most recent call last):
(APIServer pid=1)   File "/usr/local/bin/vllm", line 10, in <module>
(APIServer pid=1)     sys.exit(main())
(APIServer pid=1)              ^^^^^^
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/main.py", line 73, in main
(APIServer pid=1)     args.dispatch_function(args)
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/serve.py", line 60, in cmd
(APIServer pid=1)     uvloop.run(run_server(args))
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 96, in run
(APIServer pid=1)     return __asyncio.run(
(APIServer pid=1)            ^^^^^^^^^^^^^^
(APIServer pid=1)   File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run
(APIServer pid=1)     return runner.run(main)
(APIServer pid=1)            ^^^^^^^^^^^^^^^^
(APIServer pid=1)   File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
(APIServer pid=1)     return self._loop.run_until_complete(task)
(APIServer pid=1)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1)   File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 48, in wrapper
(APIServer pid=1)     return await main
(APIServer pid=1)            ^^^^^^^^^^
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 1398, in run_server
(APIServer pid=1)     await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 1417, in run_server_worker
(APIServer pid=1)     async with build_async_engine_client(
(APIServer pid=1)                ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1)   File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=1)     return await anext(self.gen)
(APIServer pid=1)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 172, in build_async_engine_client
(APIServer pid=1)     async with build_async_engine_client_from_engine_args(
(APIServer pid=1)                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1)   File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=1)     return await anext(self.gen)
(APIServer pid=1)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 213, in build_async_engine_client_from_engine_args
(APIServer pid=1)     async_llm = AsyncLLM.from_vllm_config(
(APIServer pid=1)                 ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 215, in from_vllm_config
(APIServer pid=1)     return cls(
(APIServer pid=1)            ^^^^
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 134, in __init__
(APIServer pid=1)     self.engine_core = EngineCoreClient.make_async_mp_client(
(APIServer pid=1)                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 121, in make_async_mp_client
(APIServer pid=1)     return AsyncMPClient(*client_args)
(APIServer pid=1)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 820, in __init__
(APIServer pid=1)     super().__init__(
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 477, in __init__
(APIServer pid=1)     with launch_core_engines(vllm_config, executor_class, log_stats) as (
(APIServer pid=1)          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1)   File "/usr/lib/python3.12/contextlib.py", line 144, in __exit__
(APIServer pid=1)     next(self.gen)
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 903, in launch_core_engines
(APIServer pid=1)     wait_for_engine_startup(
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 960, in wait_for_engine_startup
(APIServer pid=1)     raise RuntimeError(
(APIServer pid=1) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}

jinzhen-lin · 2025-12-18T08:16:26Z

@mokieli Seems that some modules are still running with bf16. Try --enforce-eager first.

…5) (vllm-project#29901) Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Signed-off-by: Ubuntu <mjtaheri68@gmail.com>

…5) (vllm-project#29901) Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>

gesong2077 · 2026-02-04T15:49:47Z

@mokieli Seems that some modules are still running with bf16. Try --enforce-eager first.

Hi @jinzhen-lin, any updates? I tested Qwen3-Coder-Next-FP8 and it showed me the same error message 'torch._dynamo.exc.SkipFrame: BF16 is not supported'
vllm did run with '--enforce-eager' but it's not considered as usable

…5) (vllm-project#29901) Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>

add marlin kernel support for turing

55f7192

Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com>

jinzhen-lin requested review from LucasWilkinson, mgoin, pavanimajety, robertgshaw2-redhat, tlrmchlsmth and yewentao256 as code owners December 2, 2025 16:55

mergify bot added the ci/build label Dec 2, 2025

gemini-code-assist bot reviewed Dec 2, 2025

View reviewed changes

chatgpt-codex-connector bot reviewed Dec 2, 2025

View reviewed changes

csrc/quantization/gptq_marlin/marlin_mma.h Outdated Show resolved Hide resolved

vllm/model_executor/layers/quantization/mxfp4.py Outdated Show resolved Hide resolved

fix

0bdda1c

Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com>

robertgshaw2-redhat assigned mgoin Dec 4, 2025

Merge remote-tracking branch 'origin2/main' into marlin-turing

c704e46

mgoin reviewed Dec 9, 2025

View reviewed changes

CMakeLists.txt Outdated Show resolved Hide resolved

csrc/moe/marlin_moe_wna16/generate_kernels.py Outdated Show resolved Hide resolved

mgoin added quantization ready ONLY add when PR is ready to merge/full CI is needed labels Dec 9, 2025

jinzhen-lin and others added 7 commits December 9, 2025 11:25

fix

aa45de4

Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com>

Merge remote-tracking branch 'origin2/main' into marlin-turing

e74753a

fix sm75 build

6466953

Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com>

Merge branch 'main' into marlin-turing

a381339

mxfp4 support

dada848

Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com>

Merge branch 'marlin-turing' of github.com:jinzhen-lin/vllm into marl…

e1c1b7d

…in-turing

Merge remote-tracking branch 'origin2/main' into marlin-turing

c1dfad8

jinzhen-lin added 2 commits December 12, 2025 13:44

fix pre-commit

1af413f

Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com>

Merge remote-tracking branch 'origin2/main' into marlin-turing

8261b30

Merge branch 'main' into marlin-turing

9d57a6b

jinzhen-lin added 2 commits December 16, 2025 11:04

Revert "mxfp4 support"

f057a6c

This reverts commit dada848. Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com>

Merge branch 'marlin-turing' of github.com:jinzhen-lin/vllm into marl…

ee9f325

…in-turing

mgoin reviewed Dec 16, 2025

View reviewed changes

vllm-bot merged commit ce96857 into vllm-project:main Dec 16, 2025
87 of 92 checks passed

jinzhen-lin mentioned this pull request Dec 19, 2025

[Quantization] enable compressed-tensors marlin support for turing #31000

Merged

mgoin mentioned this pull request Dec 26, 2025

[ROCm][GPTQ][Bugfix] Fix GPTQ GEMM kernel output zeroing race condition #30719

Merged

iori2333 mentioned this pull request Jan 31, 2026

[Bug]: Marlin NVFP4 GEMM kernel on Turing produces meaningless outputs #33461

Open

1 task

ir1ka mentioned this pull request Feb 2, 2026

[Bug]: lm-eval shows significant accuracy differences on RedHatAI/Qwen3-8B-NVFP4 model (Turing vs. Ampere) #33560

Closed

1 task

iori2333 mentioned this pull request Feb 11, 2026

[Doc] Update Marlin support matrix for Turing #34319

Merged

Uh oh!

Conversation

jinzhen-lin commented Dec 2, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Kernel Benchmark

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

robertgshaw2-redhat commented Dec 4, 2025

Uh oh!

mgoin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jinzhen-lin commented Dec 9, 2025

Uh oh!

jinzhen-lin commented Dec 12, 2025

Uh oh!

mergify bot commented Dec 12, 2025

Uh oh!

mgoin commented Dec 15, 2025

Uh oh!

mgoin commented Dec 15, 2025

Uh oh!

jinzhen-lin commented Dec 16, 2025

Uh oh!

jinzhen-lin commented Dec 16, 2025

Uh oh!

mgoin Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

mgoin commented Dec 16, 2025

Uh oh!

Uh oh!

mokieli commented Dec 18, 2025

Uh oh!

jinzhen-lin commented Dec 18, 2025

Uh oh!

gesong2077 commented Feb 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

jinzhen-lin commented Dec 2, 2025 •

edited by github-actions bot

Loading