[ModelBash][DSV3] Add TRTLLM DSV3 Router GEMM kernel (6% B1 Speedup) by robertgshaw2-redhat · Pull Request #34302 · vllm-project/vllm

robertgshaw2-redhat · 2026-02-11T02:23:15Z

Port the optimized router GEMM kernel from sglang's sgl-kernel for DeepSeek V3 MoE models. This kernel is specifically optimized for small batch sizes (1-16 tokens) common in decode phase. The kernel is originally adapted from TRTLLM.

original PR: https://github.com/sgl-project/sglang/pull/7627/changes

Key features:

Computes output = mat_a @ mat_b.T for MoE routing
Supports bfloat16 input with float32 or bfloat16 output (router logits uses fp32)
Optimized for DSV3 dimensions: hidden_dim=7168, num_experts={256,384}
Requires SM90+ (Hopper) GPUs and CUDA 12.0+
Supports Programmatic Dependent Launch (PDL) via TRTLLM_ENABLE_PDL=1

Original kernel adapted from TensorRT-LLM's dsv3RouterGemm implementation.

5.5% E2E Speedup for Batch 1 Decode.

Purpose

Test Plan

eval:
	lm_eval \
		--model local-completions \
		--tasks gsm8k \
		--model_args "model={{MODEL}},base_url=http://localhost:{{PORT}}/v1/completions,num_concurrent=10,tokenized_requests=False" --limit 100

^ run with concurrency 10 to hit the low batch size

benchmark:
	vllm bench serve \
		--port {{PORT}} \
		--model {{MODEL}} \
		--dataset-name random \
		--input-len 2 \
		--output-len 100 \
		--max-concurrency 1 \
		--num-prompts 10 \
		--seed $(date +%s) \
		--temperature 0.0

Test Result

pr accuracy

local-completions ({'model': 'nvidia/DeepSeek-V3.1-NVFP4', 'base_url': 'http://localhost:8001/v1/completions', 'num_concurrent': 10, 'tokenized_requests': False}), gen_kwargs: ({}), limit: 100.0, num_fewshot: None, batch_size: 1
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value|   |Stderr|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  | 0.97|±  |0.0171|
|     |       |strict-match    |     5|exact_match|↑  | 0.97|±  |0.0171|

main

============ Serving Benchmark Result ============
Successful requests:                     10        
Failed requests:                         0         
Maximum request concurrency:             1         
Benchmark duration (s):                  9.13      
Total input tokens:                      10        
Total generated tokens:                  1000      
Request throughput (req/s):              1.10      
Output token throughput (tok/s):         109.58    
Peak output token throughput (tok/s):    111.00    
Peak concurrent requests:                2.00      
Total token throughput (tok/s):          110.67    
---------------Time to First Token----------------
Mean TTFT (ms):                          21.32     
Median TTFT (ms):                        18.55     
P99 TTFT (ms):                           43.17     
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          9.00      
Median TPOT (ms):                        9.00      
P99 TPOT (ms):                           9.02      
---------------Inter-token Latency----------------
Mean ITL (ms):                           9.00      
Median ITL (ms):                         8.99      
P99 ITL (ms):                            9.34      
==================================================

pr

============ Serving Benchmark Result ============
Successful requests:                     10        
Failed requests:                         0         
Maximum request concurrency:             1         
Benchmark duration (s):                  8.65      
Total input tokens:                      10        
Total generated tokens:                  1000      
Request throughput (req/s):              1.16      
Output token throughput (tok/s):         115.65    
Peak output token throughput (tok/s):    117.00    
Peak concurrent requests:                3.00      
Total token throughput (tok/s):          116.80    
---------------Time to First Token----------------
Mean TTFT (ms):                          22.68     
Median TTFT (ms):                        18.25     
P99 TTFT (ms):                           58.48     
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          8.50      
Median TPOT (ms):                        8.50      
P99 TPOT (ms):                           8.53      
---------------Inter-token Latency----------------
Mean ITL (ms):                           8.50      
Median ITL (ms):                         8.50      
P99 ITL (ms):                            8.88      
==================================================

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Port the optimized router GEMM kernel from sglang's sgl-kernel for DeepSeek V3 MoE models. This kernel is specifically optimized for small batch sizes (1-16 tokens) common in decode phase. Key features: - Computes output = mat_a @ mat_b.T for MoE routing - Supports bfloat16 input with float32 or bfloat16 output - Optimized for DSV3 dimensions: hidden_dim=7168, num_experts={256,384} - Requires SM90+ (Hopper) GPUs and CUDA 12.0+ - Supports Programmatic Dependent Launch (PDL) via TRTLLM_ENABLE_PDL=1 Original kernel adapted from TensorRT-LLM's dsv3RouterGemm implementation. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> Signed-off-by: Robert Shaw <robshaw@redhat.com>

gemini-code-assist

Code Review

This pull request ports an optimized router GEMM kernel for DeepSeek V3 MoE models from sglang. The changes include the CUDA kernel implementation, build system integration, and PyTorch bindings. The kernel is highly specialized for specific model configurations and hardware (SM90+). My review focuses on the new CUDA kernel implementation. I've identified significant code duplication between the float32 and bfloat16 output kernels, which should be refactored for better maintainability. Additionally, there are missing error checks for CUDA API calls, which could lead to unhandled runtime errors.

gemini-code-assist · 2026-02-11T02:25:08Z

csrc/moe/dsv3_router_gemm.cu

+inline int getSMVersion() {
+  int device{-1};
+  cudaGetDevice(&device);
+  int sm_major = 0;
+  int sm_minor = 0;
+  cudaDeviceGetAttribute(&sm_major, cudaDevAttrComputeCapabilityMajor, device);
+  cudaDeviceGetAttribute(&sm_minor, cudaDevAttrComputeCapabilityMinor, device);
+  return sm_major * 10 + sm_minor;
+}


The CUDA API calls cudaGetDevice and cudaDeviceGetAttribute can return errors, but their return values are not being checked. This could lead to silent failures or undefined behavior if an error occurs (e.g., no CUDA device is available). It's important to handle these potential errors by checking the cudaError_t return value.

For example:

cudaError_t err = cudaGetDevice(&device); if (err != cudaSuccess) { // Handle error }

Or using a macro for checking, which is a common practice in CUDA projects to reduce boilerplate. Other parts of the vLLM codebase use error checking macros for CUDA calls, and that practice should be followed here for consistency and robustness.

gemini-code-assist · 2026-02-11T02:25:08Z

csrc/moe/dsv3_router_gemm.cu

+template <typename T, int kBlockSize, int VPT, int kNumTokens, int kNumExperts,
+          int kHiddenDim>
+__global__ __launch_bounds__(128, 1) void router_gemm_kernel_float_output(
+    float* out, T const* mat_a, T const* mat_b) {
+  // Each block handles one expert column
+  int const n_idx = blockIdx.x;
+  int const tid = threadIdx.x;
+  constexpr int kWarpSize = 32;
+  constexpr int kNumWarps = kBlockSize / kWarpSize;
+  constexpr int k_elems_per_k_iteration = VPT * kBlockSize;
+  constexpr int k_iterations = kHiddenDim / k_elems_per_k_iteration;
+
+  // Initialize accumulators for all M rows
+  float acc[kNumTokens] = {};
+
+  // Shared memory for warp-level reduction
+  __shared__ float sm_reduction[kNumTokens][kNumWarps];
+
+  // B matrix is in column-major order
+  T const* b_col = mat_b + n_idx * kHiddenDim;
+
+  // Pre-compute k_base values
+  int k_bases[k_iterations];
+#pragma unroll
+  for (int ki = 0; ki < k_iterations; ki++) {
+    k_bases[ki] = ki * k_elems_per_k_iteration + tid * VPT;
+  }
+
+#if (defined(__CUDA_ARCH__) && (__CUDA_ARCH__ >= 900))
+  asm volatile("griddepcontrol.wait;");
+#endif
+
+  // Process the GEMM in chunks
+  for (int ki = 0; ki < k_iterations; ki++) {
+    int const k_base = k_bases[ki];
+
+    // Load B matrix values using vector load
+    uint4 b_vec = *reinterpret_cast<uint4 const*>(b_col + k_base);
+
+    // Convert B values to float
+    float b_float[VPT];
+    bf16_uint4_to_float8<VPT>(b_vec, b_float);
+
+#pragma unroll
+    for (int m_idx = 0; m_idx < kNumTokens; m_idx++) {
+      uint4 a_vec = *reinterpret_cast<uint4 const*>(
+          mat_a + (m_idx * kHiddenDim) + k_base);
+
+      float a_float[VPT];
+      bf16_uint4_to_float8<VPT>(a_vec, a_float);
+
+#pragma unroll
+      for (int k = 0; k < VPT; k++) {
+        acc[m_idx] += a_float[k] * b_float[k];
+      }
+    }
+  }
+
+  // Warp-level reduction
+  int const warpId = tid / 32;
+  int const laneId = tid % 32;
+
+  float warp_result[kNumTokens];
+#pragma unroll
+  for (int m_idx = 0; m_idx < kNumTokens; m_idx++) {
+    warp_result[m_idx] = acc[m_idx];
+  }
+
+#pragma unroll
+  for (int m = 0; m < kNumTokens; m++) {
+    float sum = warp_result[m];
+    sum += __shfl_xor_sync(0xffffffff, sum, 16);
+    sum += __shfl_xor_sync(0xffffffff, sum, 8);
+    sum += __shfl_xor_sync(0xffffffff, sum, 4);
+    sum += __shfl_xor_sync(0xffffffff, sum, 2);
+    sum += __shfl_xor_sync(0xffffffff, sum, 1);
+
+    if (laneId == 0) {
+      sm_reduction[m][warpId] = sum;
+    }
+  }
+
+  __syncthreads();
+
+  // Final reduction across warps
+  if (tid == 0) {
+#pragma unroll
+    for (int m = 0; m < kNumTokens; m++) {
+      float final_sum = 0.0f;
+#pragma unroll
+      for (int w = 0; w < kNumWarps; w++) {
+        final_sum += sm_reduction[m][w];
+      }
+      out[m * kNumExperts + n_idx] = final_sum;
+    }
+  }
+
+#if (defined(__CUDA_ARCH__) && (__CUDA_ARCH__ >= 900))
+  asm volatile("griddepcontrol.launch_dependents;");
+#endif
+}
+
+// Router GEMM kernel with bfloat16 output
+template <typename T, int kBlockSize, int VPT, int kNumTokens, int kNumExperts,
+          int kHiddenDim>
+__global__ __launch_bounds__(128, 1) void router_gemm_kernel_bf16_output(
+    __nv_bfloat16* out, T const* mat_a, T const* mat_b) {
+  int const n_idx = blockIdx.x;
+  int const tid = threadIdx.x;
+  constexpr int kWarpSize = 32;
+  constexpr int kNumWarps = kBlockSize / kWarpSize;
+  constexpr int k_elems_per_k_iteration = VPT * kBlockSize;
+  constexpr int k_iterations = kHiddenDim / k_elems_per_k_iteration;
+
+  float acc[kNumTokens] = {};
+  __shared__ float sm_reduction[kNumTokens][kNumWarps];
+
+  T const* b_col = mat_b + n_idx * kHiddenDim;
+
+  int k_bases[k_iterations];
+#pragma unroll
+  for (int ki = 0; ki < k_iterations; ki++) {
+    k_bases[ki] = ki * k_elems_per_k_iteration + tid * VPT;
+  }
+
+#if (defined(__CUDA_ARCH__) && (__CUDA_ARCH__ >= 900))
+  asm volatile("griddepcontrol.wait;");
+#endif
+
+  for (int ki = 0; ki < k_iterations; ki++) {
+    int const k_base = k_bases[ki];
+    uint4 b_vec = *reinterpret_cast<uint4 const*>(b_col + k_base);
+
+    float b_float[VPT];
+    bf16_uint4_to_float8<VPT>(b_vec, b_float);
+
+#pragma unroll
+    for (int m_idx = 0; m_idx < kNumTokens; m_idx++) {
+      uint4 a_vec = *reinterpret_cast<uint4 const*>(
+          mat_a + (m_idx * kHiddenDim) + k_base);
+
+      float a_float[VPT];
+      bf16_uint4_to_float8<VPT>(a_vec, a_float);
+
+#pragma unroll
+      for (int k = 0; k < VPT; k++) {
+        acc[m_idx] += a_float[k] * b_float[k];
+      }
+    }
+  }
+
+  int const warpId = tid / 32;
+  int const laneId = tid % 32;
+
+  float warp_result[kNumTokens];
+#pragma unroll
+  for (int m_idx = 0; m_idx < kNumTokens; m_idx++) {
+    warp_result[m_idx] = acc[m_idx];
+  }
+
+#pragma unroll
+  for (int m = 0; m < kNumTokens; m++) {
+    float sum = warp_result[m];
+    sum += __shfl_xor_sync(0xffffffff, sum, 16);
+    sum += __shfl_xor_sync(0xffffffff, sum, 8);
+    sum += __shfl_xor_sync(0xffffffff, sum, 4);
+    sum += __shfl_xor_sync(0xffffffff, sum, 2);
+    sum += __shfl_xor_sync(0xffffffff, sum, 1);
+
+    if (laneId == 0) {
+      sm_reduction[m][warpId] = sum;
+    }
+  }
+
+  __syncthreads();
+
+  if (tid == 0) {
+#pragma unroll
+    for (int m = 0; m < kNumTokens; m++) {
+      float final_sum = 0.0f;
+#pragma unroll
+      for (int w = 0; w < kNumWarps; w++) {
+        final_sum += sm_reduction[m][w];
+      }
+      out[m * kNumExperts + n_idx] = __float2bfloat16(final_sum);
+    }
+  }
+
+#if (defined(__CUDA_ARCH__) && (__CUDA_ARCH__ >= 900))
+  asm volatile("griddepcontrol.launch_dependents;");
+#endif
+}


The kernels router_gemm_kernel_float_output and router_gemm_kernel_bf16_output are nearly identical, with the only difference being the output data type and the final store operation. This large amount of duplicated code increases maintenance overhead and the risk of introducing inconsistencies.

To improve maintainability, these two kernels should be refactored into a single templated kernel. You can introduce a helper struct OutputWriter templated on the output type to handle the final store operation.

Here's a sketch of the proposed refactoring:

template <typename T_out> struct OutputWriter; template <> struct OutputWriter<float> { __device__ __forceinline__ static void write(float* out, int index, float value) { out[index] = value; } }; template <> struct OutputWriter<__nv_bfloat16> { __device__ __forceinline__ static void write(__nv_bfloat16* out, int index, float value) { out[index] = __float2bfloat16(value); } }; template <typename T, typename T_out, int kBlockSize, int VPT, int kNumTokens, int kNumExperts, int kHiddenDim> __global__ __launch_bounds__(128, 1) void router_gemm_kernel( T_out* out, T const* mat_a, T const* mat_b) { // ... common kernel logic ... // In the final reduction section if (tid == 0) { #pragma unroll for (int m = 0; m < kNumTokens; m++) { float final_sum = 0.0f; #pragma unroll for (int w = 0; w < kNumWarps; w++) { final_sum += sm_reduction[m][w]; } OutputWriter<T_out>::write(out, m * kNumExperts + n_idx, final_sum); } } // ... rest of common kernel logic ... }

Then, invokeRouterGemmFloatOutput and invokeRouterGemmBf16Output can call this unified router_gemm_kernel with the appropriate output type (float or __nv_bfloat16). This will eliminate about 100 lines of redundant code.

mgoin · 2026-02-11T02:28:18Z

Can we use the router gemm interface already present in flashinfer?

Signed-off-by: Robert Shaw <robshaw@redhat.com>

LucasWilkinson

Nice! LGTM, thanks!

robertgshaw2-redhat · 2026-02-11T04:18:09Z

@pavanimajety - do you know the plan around: https://github.com/flashinfer-ai/flashinfer/blob/30bf78e1a05d18aaebd484aa6f3c1db8e4c0b1b4/flashinfer/gemm/routergemm_dsv3.py#L80

robertgshaw2-redhat · 2026-02-11T04:18:30Z

Can we use the router gemm interface already present in flashinfer?

DSV3 [DSV3] Optimized Router Gemm flashinfer-ai/flashinfer#2019

ML3 [ML3] Optimized Router Gemm flashinfer-ai/flashinfer#2323

cc @pavanimajety - looks like these dont support SM90. Any idea of the plan here?

robertgshaw2-redhat · 2026-02-11T04:25:33Z

TODO:

conditions for when to deploy [expert size, etc]
see if we should use fp32 or bf16 for non-trtllm

mgoin · 2026-02-11T04:19:02Z

CMakeLists.txt

  endif()
+
+  # DeepSeek V3 router GEMM kernel - requires SM90+
+  cuda_archs_loose_intersection(DSV3_ROUTER_GEMM_ARCHS "9.0a;10.0a" "${CUDA_ARCHS}")


This isn't compatible with CUDA 13 and missing blackwell ultra, should be something like

if(${CMAKE_CUDA_COMPILER_VERSION} VERSION_GREATER_EQUAL 13.0) cuda_archs_loose_intersection(DSV3_ROUTER_GEMM_ARCHS "9.0a;10.0f;11.0f" "${CUDA_ARCHS}") else() cuda_archs_loose_intersection(DSV3_ROUTER_GEMM_ARCHS "9.0a;10.0a;10.1a;10.3a" "${CUDA_ARCHS}") endif()

mgoin · 2026-02-11T04:19:42Z

vllm/model_executor/models/deepseek_v2.py

+    def _set_allow_dsv3_router_gemm(self) -> None:
+        self.allow_dsv3_router_gemm = (
+            current_platform.is_cuda()
+            and current_platform.has_device_capability((9, 0))


This should be current_platform.is_device_capability(90) and current_platform.is_device_capability_family(100) since we aren't supporting sm120

It also looks like you need to check against supported n_experts since I only see instantiations for 256 or 384 experts

csrc/moe/dsv3_router_gemm_utils.h

mgoin · 2026-02-11T04:27:06Z

csrc/moe/dsv3_router_gemm_entry.cu

+              "output must be float32 or bf16");
+
+  auto const sm = getSMVersion();
+  TORCH_CHECK(sm >= 90, "required CUDA ARCH >= SM_90");


Do you know if this would work on SM120? Better to be explicit if we don't know

Signed-off-by: Robert Shaw <robshaw@redhat.com>

pavanimajety · 2026-02-18T17:02:09Z

Hey all, FYI - there's a flashinfer PR ready that removes the restriction for non SM100 in case we want to switch to the flashinfer implementation - flashinfer-ai/flashinfer#2576

xinli-sw · 2026-02-18T19:51:21Z

I think we still expect these kernels to improve and evolve(new HW arch) in FI, it would be great to consider invoking them directly with Flashinfer (perhaps with 0.6.5 update). Not blocking for this PR though, I'll keep track

robertgshaw2-redhat · 2026-02-19T15:48:02Z

@xinli-sw - sounds good. We can add it once FI once flashinfer hits 0.6.5

stavinsky · 2026-02-23T15:14:06Z

this commit somehow breaks the model loading on spark
i'm not sure if it important as nvfp is broken on spark in any case but I think I have to share this.

@robertgshaw2-redhat

thanks

the log

(vllm_source) dev@spark-476c:~/dev/vllm_source$ VLLM_USE_FLASHINFER_MOE_FP4=1 VLLM_FLASHINFER_MOE_BACKEND=throughput  vllm serve  --host 0.0.0.0 --gpu-memory-utilization 0.4 --load-format fastsafetensors --max-num-seqs 1 --kv-cache-dtype fp8 nvidia/Qwen3-Next-80B-A3B-Instruct-NVFP4
(APIServer pid=23205) INFO 02-23 19:09:18 [utils.py:293]
(APIServer pid=23205) INFO 02-23 19:09:18 [utils.py:293]        █     █     █▄   ▄█
(APIServer pid=23205) INFO 02-23 19:09:18 [utils.py:293]  ▄▄ ▄█ █     █     █ ▀▄▀ █  version 0.16.0rc2.dev397+g8435b2e04.d20260223
(APIServer pid=23205) INFO 02-23 19:09:18 [utils.py:293]   █▄█▀ █     █     █     █  model   nvidia/Qwen3-Next-80B-A3B-Instruct-NVFP4
(APIServer pid=23205) INFO 02-23 19:09:18 [utils.py:293]    ▀▀  ▀▀▀▀▀ ▀▀▀▀▀ ▀     ▀
(APIServer pid=23205) INFO 02-23 19:09:18 [utils.py:293]
(APIServer pid=23205) INFO 02-23 19:09:18 [utils.py:229] non-default args: {'model_tag': 'nvidia/Qwen3-Next-80B-A3B-Instruct-NVFP4', 'host': '0.0.0.0', 'model': 'nvidia/Qwen3-Next-80B-A3B-Instruct-NVFP4', 'load_format': 'fastsafetensors', 'gpu_memory_utilization': 0.4, 'kv_cache_dtype': 'fp8', 'max_num_seqs': 1}
(APIServer pid=23205) INFO 02-23 19:09:20 [model.py:532] Resolved architecture: Qwen3NextForCausalLM
(APIServer pid=23205) INFO 02-23 19:09:20 [model.py:1556] Using max model len 262144
(APIServer pid=23205) INFO 02-23 19:09:20 [cache.py:225] Using fp8 data type to store kv cache. It reduces the GPU memory footprint and boosts the performance. Meanwhile, it may cause accuracy drop without a proper scaling factor.
(APIServer pid=23205) INFO 02-23 19:09:20 [scheduler.py:231] Chunked prefill is enabled with max_num_batched_tokens=2048.
(APIServer pid=23205) /home/dev/dev/vllm_source/.venv/lib/python3.12/site-packages/torch/cuda/__init__.py:435: UserWarning:
(APIServer pid=23205)     Found GPU0 NVIDIA GB10 which is of cuda capability 12.1.
(APIServer pid=23205)     Minimum and Maximum cuda capability supported by this version of PyTorch is
(APIServer pid=23205)     (8.0) - (12.0)
(APIServer pid=23205)
(APIServer pid=23205)   queued_call()
(APIServer pid=23205) INFO 02-23 19:09:20 [config.py:500] Setting attention block size to 1072 tokens to ensure that attention page size is >= mamba page size.
(APIServer pid=23205) WARNING 02-23 19:09:20 [modelopt.py:1011] Detected ModelOpt NVFP4 checkpoint. Please note that the format is experimental and could change in future.
(APIServer pid=23205) INFO 02-23 19:09:20 [vllm.py:697] Asynchronous scheduling is enabled.
(EngineCore_DP0 pid=23247) INFO 02-23 19:09:25 [core.py:98] Initializing a V1 LLM engine (v0.16.0rc2.dev397+g8435b2e04.d20260223) with config: model='nvidia/Qwen3-Next-80B-A3B-Instruct-NVFP4', speculative_config=None, tokenizer='nvidia/Qwen3-Next-80B-A3B-Instruct-NVFP4', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=262144, download_dir=None, load_format=fastsafetensors, tensor_parallel_size=1, pipeline_parallel_size=1, data_parallel_size=1, disable_custom_all_reduce=False, quantization=modelopt_fp4, enforce_eager=False, enable_return_routed_experts=False, kv_cache_dtype=fp8, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser='', reasoning_parser_plugin='', enable_in_reasoning=False), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None, kv_cache_metrics=False, kv_cache_metrics_sample=0.01, cudagraph_metrics=False, enable_layerwise_nvtx_tracing=False, enable_mfu_metrics=False, enable_mm_processor_stats=False, enable_logging_iteration_details=False), seed=0, served_model_name=nvidia/Qwen3-Next-80B-A3B-Instruct-NVFP4, enable_prefix_caching=False, enable_chunked_prefill=True, pooler_config=None, compilation_config={'level': None, 'mode': <CompilationMode.VLLM_COMPILE: 3>, 'debug_dump_path': None, 'cache_dir': '', 'compile_cache_save_format': 'binary', 'backend': 'inductor', 'custom_ops': ['none'], 'splitting_ops': ['vllm::unified_attention', 'vllm::unified_attention_with_output', 'vllm::unified_mla_attention', 'vllm::unified_mla_attention_with_output', 'vllm::mamba_mixer2', 'vllm::mamba_mixer', 'vllm::short_conv', 'vllm::linear_attention', 'vllm::plamo2_mamba_mixer', 'vllm::gdn_attention_core', 'vllm::kda_attention', 'vllm::sparse_attn_indexer', 'vllm::rocm_aiter_sparse_attn_indexer', 'vllm::unified_kv_cache_update'], 'compile_mm_encoder': False, 'compile_sizes': [], 'compile_ranges_split_points': [2048], 'inductor_compile_config': {'enable_auto_functionalized_v2': False, 'combo_kernels': True, 'benchmark_combo_kernel': True}, 'inductor_passes': {}, 'cudagraph_mode': <CUDAGraphMode.FULL_AND_PIECEWISE: (2, 1)>, 'cudagraph_num_of_warmups': 1, 'cudagraph_capture_sizes': [1, 2], 'cudagraph_copy_inputs': False, 'cudagraph_specialize_lora': True, 'use_inductor_graph_partition': False, 'pass_config': {'fuse_norm_quant': False, 'fuse_act_quant': True, 'fuse_attn_quant': False, 'enable_sp': False, 'fuse_gemm_comms': False, 'fuse_allreduce_rms': False, 'fuse_act_padding': False}, 'max_cudagraph_capture_size': 2, 'dynamic_shapes_config': {'type': <DynamicShapesType.BACKED: 'backed'>, 'evaluate_guards': False, 'assume_32_bit_indexing': False}, 'local_cache_dir': None, 'fast_moe_cold_start': True, 'static_all_moe_layers': []}
(EngineCore_DP0 pid=23247) /home/dev/dev/vllm_source/.venv/lib/python3.12/site-packages/torch/cuda/__init__.py:435: UserWarning:
(EngineCore_DP0 pid=23247)     Found GPU0 NVIDIA GB10 which is of cuda capability 12.1.
(EngineCore_DP0 pid=23247)     Minimum and Maximum cuda capability supported by this version of PyTorch is
(EngineCore_DP0 pid=23247)     (8.0) - (12.0)
(EngineCore_DP0 pid=23247)
(EngineCore_DP0 pid=23247)   queued_call()
(EngineCore_DP0 pid=23247) INFO 02-23 19:09:25 [parallel_state.py:1307] world_size=1 rank=0 local_rank=0 distributed_init_method=tcp://192.168.88.12:38557 backend=nccl
(EngineCore_DP0 pid=23247) INFO 02-23 19:09:25 [parallel_state.py:1535] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, PCP rank 0, TP rank 0, EP rank 0, EPLB rank N/A
(EngineCore_DP0 pid=23247) INFO 02-23 19:09:25 [gpu_model_runner.py:4139] Starting to load model nvidia/Qwen3-Next-80B-A3B-Instruct-NVFP4...
(EngineCore_DP0 pid=23247) INFO 02-23 19:09:26 [nvfp4_utils.py:85] Using NvFp4LinearBackend.FLASHINFER_CUTLASS for NVFP4 GEMM
(EngineCore_DP0 pid=23247) INFO 02-23 19:09:26 [nvfp4.py:169] Using 'FLASHINFER_CUTLASS' NvFp4 MoE backend out of potential backends: ['FLASHINFER_TRTLLM', 'FLASHINFER_CUTEDSL', 'FLASHINFER_CUTLASS', 'VLLM_CUTLASS', 'MARLIN'].
(EngineCore_DP0 pid=23247) INFO 02-23 19:09:26 [cuda.py:402] Using FLASHINFER attention backend out of potential backends: ['FLASHINFER', 'TRITON_ATTN'].
Loading safetensors using Fastsafetensor loader:   0% Completed | 0/11 [00:00<?, ?it/s]
(EngineCore_DP0 pid=23247) /home/dev/dev/vllm_source/.venv/lib/python3.12/site-packages/fastsafetensors/copier/gds.py:185: UserWarning: GDS is not supported in this platform but nogds is False. use nogds=True
(EngineCore_DP0 pid=23247)   warnings.warn(
Loading safetensors using Fastsafetensor loader:   9% Completed | 1/11 [00:02<00:27,  2.79s/it]
Loading safetensors using Fastsafetensor loader:  18% Completed | 2/11 [00:05<00:24,  2.69s/it]
Loading safetensors using Fastsafetensor loader:  27% Completed | 3/11 [00:08<00:21,  2.69s/it]
Loading safetensors using Fastsafetensor loader:  36% Completed | 4/11 [00:10<00:18,  2.62s/it]
Loading safetensors using Fastsafetensor loader:  45% Completed | 5/11 [00:13<00:15,  2.59s/it]
Loading safetensors using Fastsafetensor loader:  55% Completed | 6/11 [00:15<00:12,  2.59s/it]
Loading safetensors using Fastsafetensor loader:  64% Completed | 7/11 [00:18<00:10,  2.59s/it]
Loading safetensors using Fastsafetensor loader:  73% Completed | 8/11 [00:20<00:07,  2.60s/it]
Loading safetensors using Fastsafetensor loader:  82% Completed | 9/11 [00:23<00:05,  2.60s/it]
Loading safetensors using Fastsafetensor loader:  91% Completed | 10/11 [00:24<00:02,  2.13s/it]
Loading safetensors using Fastsafetensor loader: 100% Completed | 11/11 [00:24<00:00,  2.24s/it]
(EngineCore_DP0 pid=23247)
(EngineCore_DP0 pid=23247) INFO 02-23 19:09:54 [default_loader.py:293] Loading weights took 24.69 seconds
(EngineCore_DP0 pid=23247) INFO 02-23 19:09:54 [nvfp4.py:410] Using MoEPrepareAndFinalizeNoEP
(EngineCore_DP0 pid=23247) WARNING 02-23 19:09:54 [kv_cache.py:94] Checkpoint does not provide a q scaling factor. Setting it to k_scale. This only matters for FP8 Attention backends (flash-attn or flashinfer).
(EngineCore_DP0 pid=23247) WARNING 02-23 19:09:54 [kv_cache.py:108] Using KV cache scaling factor 1.0 for fp8_e4m3. If this is unintended, verify that k/v_scale scaling factors are properly set in the checkpoint.
(EngineCore_DP0 pid=23247) WARNING 02-23 19:09:54 [kv_cache.py:147] Using uncalibrated q_scale 1.0 and/or prob_scale 1.0 with fp8 attention. This may cause accuracy issues. Please make sure q/prob scaling factors are available in the fp8 checkpoint.
(EngineCore_DP0 pid=23247) INFO 02-23 19:09:55 [gpu_model_runner.py:4236] Model loading took 44.2 GiB memory and 28.832492 seconds
(EngineCore_DP0 pid=23247) INFO 02-23 19:10:00 [backends.py:916] Using cache directory: /home/dev/.cache/vllm/torch_compile_cache/f66b42c46f/rank_0_0/backbone for vLLM's torch.compile
(EngineCore_DP0 pid=23247) INFO 02-23 19:10:00 [backends.py:976] Dynamo bytecode transform time: 4.26 s
(EngineCore_DP0 pid=23247) INFO 02-23 19:10:00 [backends.py:350] Cache the graph of compile range (1, 2048) for later use
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029] EngineCore failed to start.
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029] Traceback (most recent call last):
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]   File "/home/dev/dev/vllm_source/vllm/v1/engine/core.py", line 1019, in run_engine_core
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]     engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]   File "/home/dev/dev/vllm_source/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]     return func(*args, **kwargs)
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]   File "/home/dev/dev/vllm_source/vllm/v1/engine/core.py", line 763, in __init__
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]     super().__init__(
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]   File "/home/dev/dev/vllm_source/vllm/v1/engine/core.py", line 114, in __init__
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]     num_gpu_blocks, num_cpu_blocks, kv_cache_config = self._initialize_kv_caches(
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]                                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]   File "/home/dev/dev/vllm_source/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]     return func(*args, **kwargs)
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]   File "/home/dev/dev/vllm_source/vllm/v1/engine/core.py", line 248, in _initialize_kv_caches
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]     available_gpu_memory = self.model_executor.determine_available_memory()
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]   File "/home/dev/dev/vllm_source/vllm/v1/executor/abstract.py", line 128, in determine_available_memory
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]     return self.collective_rpc("determine_available_memory")
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]   File "/home/dev/dev/vllm_source/vllm/v1/executor/uniproc_executor.py", line 75, in collective_rpc
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]     result = run_method(self.driver_worker, method, args, kwargs)
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]   File "/home/dev/dev/vllm_source/vllm/v1/serial_utils.py", line 459, in run_method
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]     return func(*args, **kwargs)
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]   File "/home/dev/dev/vllm_source/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]     return func(*args, **kwargs)
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]   File "/home/dev/dev/vllm_source/vllm/v1/worker/gpu_worker.py", line 371, in determine_available_memory
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]     self.model_runner.profile_run()
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]   File "/home/dev/dev/vllm_source/vllm/v1/worker/gpu_model_runner.py", line 5189, in profile_run
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]     hidden_states, last_hidden_states = self._dummy_run(
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]                                         ^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]   File "/home/dev/dev/vllm_source/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]     return func(*args, **kwargs)
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]   File "/home/dev/dev/vllm_source/vllm/v1/worker/gpu_model_runner.py", line 4887, in _dummy_run
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]     outputs = self.model(
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]               ^^^^^^^^^^^
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]   File "/home/dev/dev/vllm_source/vllm/compilation/cuda_graph.py", line 222, in __call__
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]     return self.runnable(*args, **kwargs)
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]   File "/home/dev/dev/vllm_source/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]     return self._call_impl(*args, **kwargs)
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]   File "/home/dev/dev/vllm_source/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]     return forward_call(*args, **kwargs)
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]   File "/home/dev/dev/vllm_source/vllm/model_executor/models/qwen3_next.py", line 1376, in forward
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]     hidden_states = self.model(
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]                     ^^^^^^^^^^^
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]   File "/home/dev/dev/vllm_source/vllm/compilation/decorators.py", line 558, in __call__
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]     output = self.aot_compiled_fn(self, *args, **kwargs)
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]   File "/home/dev/dev/vllm_source/.venv/lib/python3.12/site-packages/torch/_dynamo/aot_compile.py", line 124, in __call__
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]     return self.fn(*args, **kwargs)
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]            ^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]   File "/home/dev/dev/vllm_source/vllm/model_executor/models/qwen3_next.py", line 1133, in forward
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]     def forward(
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]   File "/home/dev/dev/vllm_source/vllm/compilation/caching.py", line 185, in __call__
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]     return self.optimized_call(*args, **kwargs)
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]   File "/home/dev/dev/vllm_source/.venv/lib/python3.12/site-packages/torch/fx/graph_module.py", line 936, in call_wrapped
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]     return self._wrapped_call(self, *args, **kwargs)
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]   File "/home/dev/dev/vllm_source/.venv/lib/python3.12/site-packages/torch/fx/graph_module.py", line 455, in __call__
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]     raise e
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]   File "/home/dev/dev/vllm_source/.venv/lib/python3.12/site-packages/torch/fx/graph_module.py", line 442, in __call__
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]     return super(self.cls, obj).__call__(*args, **kwargs)  # type: ignore[misc]
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]   File "/home/dev/dev/vllm_source/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]     return self._call_impl(*args, **kwargs)
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]   File "/home/dev/dev/vllm_source/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]     return forward_call(*args, **kwargs)
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]   File "<eval_with_key>.103", line 451, in forward
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]     submod_2 = self.submod_2(getitem_4, s72, getitem_3, l_self_modules_layers_modules_0_modules_linear_attn_modules_norm_parameters_weight_, l_self_modules_layers_modules_0_modules_linear_attn_modules_out_proj_parameters_input_global_scale_inv_, l_self_modules_layers_modules_0_modules_linear_attn_modules_out_proj_parameters_weight_scale_, l_self_modules_layers_modules_0_modules_linear_attn_modules_out_proj_parameters_weight_, l_self_modules_layers_modules_0_modules_linear_attn_modules_out_proj_parameters_alpha_, getitem_5, l_self_modules_layers_modules_0_modules_post_attention_layernorm_parameters_weight_, getitem_6, l_self_modules_layers_modules_1_modules_input_layernorm_parameters_weight_, l_self_modules_layers_modules_1_modules_linear_attn_modules_in_proj_qkvz_parameters_weight_, l_self_modules_layers_modules_1_modules_linear_attn_modules_in_proj_ba_parameters_weight_);  getitem_4 = getitem_3 = l_self_modules_layers_modules_0_modules_linear_attn_modules_norm_parameters_weight_ = l_self_modules_layers_modules_0_modules_linear_attn_modules_out_proj_parameters_input_global_scale_inv_ = l_self_modules_layers_modules_0_modules_linear_attn_modules_out_proj_parameters_weight_scale_ = l_self_modules_layers_modules_0_modules_linear_attn_modules_out_proj_parameters_weight_ = l_self_modules_layers_modules_0_modules_linear_attn_modules_out_proj_parameters_alpha_ = getitem_5 = l_self_modules_layers_modules_0_modules_post_attention_layernorm_parameters_weight_ = getitem_6 = l_self_modules_layers_modules_1_modules_input_layernorm_parameters_weight_ = l_self_modules_layers_modules_1_modules_linear_attn_modules_in_proj_qkvz_parameters_weight_ = l_self_modules_layers_modules_1_modules_linear_attn_modules_in_proj_ba_parameters_weight_ = None
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]   File "/home/dev/dev/vllm_source/vllm/compilation/cuda_graph.py", line 222, in __call__
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]     return self.runnable(*args, **kwargs)
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]   File "/home/dev/dev/vllm_source/vllm/compilation/piecewise_backend.py", line 343, in __call__
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]     return range_entry.runnable(*args)
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]   File "/home/dev/dev/vllm_source/.venv/lib/python3.12/site-packages/torch/_inductor/standalone_compile.py", line 122, in __call__
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]     return self._compiled_fn(*args)
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]            ^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]   File "/home/dev/dev/vllm_source/.venv/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1181, in _fn
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]     return fn(*args, **kwargs)
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]            ^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]   File "/home/dev/dev/vllm_source/.venv/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1148, in forward
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]     return compiled_fn(full_args)
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]            ^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]   File "/home/dev/dev/vllm_source/.venv/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 1962, in __call__
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]     return self.compiled_fn(*args, **kwargs)
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]   File "/home/dev/dev/vllm_source/.venv/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 357, in runtime_wrapper
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]     all_outs = call_func_at_runtime_with_args(
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]   File "/home/dev/dev/vllm_source/.venv/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 134, in call_func_at_runtime_with_args
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]     out = normalize_as_list(f(args))
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]                             ^^^^^^^
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]   File "/home/dev/dev/vllm_source/.venv/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 531, in wrapper
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]     return compiled_fn(runtime_args)
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]            ^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]   File "/home/dev/dev/vllm_source/.venv/lib/python3.12/site-packages/torch/_inductor/output_code.py", line 638, in __call__
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]     return self.current_callable(inputs)
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]   File "/home/dev/dev/vllm_source/.venv/lib/python3.12/site-packages/torch/_inductor/utils.py", line 3220, in run
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]     out = model(new_inputs)
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]           ^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]   File "/tmp/torchinductor_dev/qj/cqjmvk2t5jqhc5pzphte336xjj3n3giws6lsk3q7grp4bfmcatc2.py", line 1436, in call
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]     buf15 = torch.ops.vllm.moe_forward_shared.default(buf12, buf13, buf14, 'from_forward_context')
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]   File "/home/dev/dev/vllm_source/.venv/lib/python3.12/site-packages/torch/_ops.py", line 819, in __call__
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]     return self._op(*args, **kwargs)
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]            ^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]   File "/home/dev/dev/vllm_source/vllm/model_executor/layers/fused_moe/runner/default_moe_runner.py", line 91, in _moe_forward_shared
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]     return layer.runner.forward_impl(
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]            ^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]   File "/home/dev/dev/vllm_source/vllm/model_executor/layers/fused_moe/runner/default_moe_runner.py", line 705, in forward_impl
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]     topk_weights, topk_ids = self.router.select_experts(
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]   File "/home/dev/dev/vllm_source/vllm/model_executor/layers/fused_moe/router/base_router.py", line 235, in select_experts
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]     topk_weights, topk_ids = self._compute_routing(
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]                              ^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]   File "/home/dev/dev/vllm_source/vllm/model_executor/layers/fused_moe/router/fused_topk_router.py", line 156, in _compute_routing
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]     topk_weights, topk_ids, token_expert_indices = fused_topk(
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]                                                    ^^^^^^^^^^^
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]   File "/home/dev/dev/vllm_source/vllm/model_executor/layers/fused_moe/router/fused_topk_router.py", line 98, in fused_topk
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]     topk_weights, topk_ids = topk_func(
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]                              ^^^^^^^^^^
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]   File "/home/dev/dev/vllm_source/vllm/model_executor/layers/fused_moe/router/fused_topk_router.py", line 24, in vllm_topk_softmax
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]     ops.topk_softmax(
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]   File "/home/dev/dev/vllm_source/vllm/_custom_ops.py", line 2216, in topk_softmax
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]     torch.ops._moe_C.topk_softmax(
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]   File "/home/dev/dev/vllm_source/.venv/lib/python3.12/site-packages/torch/_ops.py", line 1319, in __getattr__
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029]     raise AttributeError(
(EngineCore_DP0 pid=23247) ERROR 02-23 19:10:01 [core.py:1029] AttributeError: '_OpNamespace' '_moe_C' object has no attribute 'topk_softmax'
(EngineCore_DP0 pid=23247) Process EngineCore_DP0:
(EngineCore_DP0 pid=23247) Traceback (most recent call last):
(EngineCore_DP0 pid=23247)   File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore_DP0 pid=23247)     self.run()
(EngineCore_DP0 pid=23247)   File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
(EngineCore_DP0 pid=23247)     self._target(*self._args, **self._kwargs)
(EngineCore_DP0 pid=23247)   File "/home/dev/dev/vllm_source/vllm/v1/engine/core.py", line 1033, in run_engine_core
(EngineCore_DP0 pid=23247)     raise e
(EngineCore_DP0 pid=23247)   File "/home/dev/dev/vllm_source/vllm/v1/engine/core.py", line 1019, in run_engine_core
(EngineCore_DP0 pid=23247)     engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
(EngineCore_DP0 pid=23247)                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=23247)   File "/home/dev/dev/vllm_source/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore_DP0 pid=23247)     return func(*args, **kwargs)
(EngineCore_DP0 pid=23247)            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=23247)   File "/home/dev/dev/vllm_source/vllm/v1/engine/core.py", line 763, in __init__
(EngineCore_DP0 pid=23247)     super().__init__(
(EngineCore_DP0 pid=23247)   File "/home/dev/dev/vllm_source/vllm/v1/engine/core.py", line 114, in __init__
(EngineCore_DP0 pid=23247)     num_gpu_blocks, num_cpu_blocks, kv_cache_config = self._initialize_kv_caches(
(EngineCore_DP0 pid=23247)                                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=23247)   File "/home/dev/dev/vllm_source/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore_DP0 pid=23247)     return func(*args, **kwargs)
(EngineCore_DP0 pid=23247)            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=23247)   File "/home/dev/dev/vllm_source/vllm/v1/engine/core.py", line 248, in _initialize_kv_caches
(EngineCore_DP0 pid=23247)     available_gpu_memory = self.model_executor.determine_available_memory()
(EngineCore_DP0 pid=23247)                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=23247)   File "/home/dev/dev/vllm_source/vllm/v1/executor/abstract.py", line 128, in determine_available_memory
(EngineCore_DP0 pid=23247)     return self.collective_rpc("determine_available_memory")
(EngineCore_DP0 pid=23247)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=23247)   File "/home/dev/dev/vllm_source/vllm/v1/executor/uniproc_executor.py", line 75, in collective_rpc
(EngineCore_DP0 pid=23247)     result = run_method(self.driver_worker, method, args, kwargs)
(EngineCore_DP0 pid=23247)              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=23247)   File "/home/dev/dev/vllm_source/vllm/v1/serial_utils.py", line 459, in run_method
(EngineCore_DP0 pid=23247)     return func(*args, **kwargs)
(EngineCore_DP0 pid=23247)            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=23247)   File "/home/dev/dev/vllm_source/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(EngineCore_DP0 pid=23247)     return func(*args, **kwargs)
(EngineCore_DP0 pid=23247)            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=23247)   File "/home/dev/dev/vllm_source/vllm/v1/worker/gpu_worker.py", line 371, in determine_available_memory
(EngineCore_DP0 pid=23247)     self.model_runner.profile_run()
(EngineCore_DP0 pid=23247)   File "/home/dev/dev/vllm_source/vllm/v1/worker/gpu_model_runner.py", line 5189, in profile_run
(EngineCore_DP0 pid=23247)     hidden_states, last_hidden_states = self._dummy_run(
(EngineCore_DP0 pid=23247)                                         ^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=23247)   File "/home/dev/dev/vllm_source/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(EngineCore_DP0 pid=23247)     return func(*args, **kwargs)
(EngineCore_DP0 pid=23247)            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=23247)   File "/home/dev/dev/vllm_source/vllm/v1/worker/gpu_model_runner.py", line 4887, in _dummy_run
(EngineCore_DP0 pid=23247)     outputs = self.model(
(EngineCore_DP0 pid=23247)               ^^^^^^^^^^^
(EngineCore_DP0 pid=23247)   File "/home/dev/dev/vllm_source/vllm/compilation/cuda_graph.py", line 222, in __call__
(EngineCore_DP0 pid=23247)     return self.runnable(*args, **kwargs)
(EngineCore_DP0 pid=23247)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=23247)   File "/home/dev/dev/vllm_source/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
(EngineCore_DP0 pid=23247)     return self._call_impl(*args, **kwargs)
(EngineCore_DP0 pid=23247)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=23247)   File "/home/dev/dev/vllm_source/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
(EngineCore_DP0 pid=23247)     return forward_call(*args, **kwargs)
(EngineCore_DP0 pid=23247)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=23247)   File "/home/dev/dev/vllm_source/vllm/model_executor/models/qwen3_next.py", line 1376, in forward
(EngineCore_DP0 pid=23247)     hidden_states = self.model(
(EngineCore_DP0 pid=23247)                     ^^^^^^^^^^^
(EngineCore_DP0 pid=23247)   File "/home/dev/dev/vllm_source/vllm/compilation/decorators.py", line 558, in __call__
(EngineCore_DP0 pid=23247)     output = self.aot_compiled_fn(self, *args, **kwargs)
(EngineCore_DP0 pid=23247)              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=23247)   File "/home/dev/dev/vllm_source/.venv/lib/python3.12/site-packages/torch/_dynamo/aot_compile.py", line 124, in __call__
(EngineCore_DP0 pid=23247)     return self.fn(*args, **kwargs)
(EngineCore_DP0 pid=23247)            ^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=23247)   File "/home/dev/dev/vllm_source/vllm/model_executor/models/qwen3_next.py", line 1133, in forward
(EngineCore_DP0 pid=23247)     def forward(
(EngineCore_DP0 pid=23247)   File "/home/dev/dev/vllm_source/vllm/compilation/caching.py", line 185, in __call__
(EngineCore_DP0 pid=23247)     return self.optimized_call(*args, **kwargs)
(EngineCore_DP0 pid=23247)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=23247)   File "/home/dev/dev/vllm_source/.venv/lib/python3.12/site-packages/torch/fx/graph_module.py", line 936, in call_wrapped
(EngineCore_DP0 pid=23247)     return self._wrapped_call(self, *args, **kwargs)
(EngineCore_DP0 pid=23247)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=23247)   File "/home/dev/dev/vllm_source/.venv/lib/python3.12/site-packages/torch/fx/graph_module.py", line 455, in __call__
(EngineCore_DP0 pid=23247)     raise e
(EngineCore_DP0 pid=23247)   File "/home/dev/dev/vllm_source/.venv/lib/python3.12/site-packages/torch/fx/graph_module.py", line 442, in __call__
(EngineCore_DP0 pid=23247)     return super(self.cls, obj).__call__(*args, **kwargs)  # type: ignore[misc]
(EngineCore_DP0 pid=23247)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=23247)   File "/home/dev/dev/vllm_source/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
(EngineCore_DP0 pid=23247)     return self._call_impl(*args, **kwargs)
(EngineCore_DP0 pid=23247)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=23247)   File "/home/dev/dev/vllm_source/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
(EngineCore_DP0 pid=23247)     return forward_call(*args, **kwargs)
(EngineCore_DP0 pid=23247)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=23247)   File "<eval_with_key>.103", line 451, in forward
(EngineCore_DP0 pid=23247)     submod_2 = self.submod_2(getitem_4, s72, getitem_3, l_self_modules_layers_modules_0_modules_linear_attn_modules_norm_parameters_weight_, l_self_modules_layers_modules_0_modules_linear_attn_modules_out_proj_parameters_input_global_scale_inv_, l_self_modules_layers_modules_0_modules_linear_attn_modules_out_proj_parameters_weight_scale_, l_self_modules_layers_modules_0_modules_linear_attn_modules_out_proj_parameters_weight_, l_self_modules_layers_modules_0_modules_linear_attn_modules_out_proj_parameters_alpha_, getitem_5, l_self_modules_layers_modules_0_modules_post_attention_layernorm_parameters_weight_, getitem_6, l_self_modules_layers_modules_1_modules_input_layernorm_parameters_weight_, l_self_modules_layers_modules_1_modules_linear_attn_modules_in_proj_qkvz_parameters_weight_, l_self_modules_layers_modules_1_modules_linear_attn_modules_in_proj_ba_parameters_weight_);  getitem_4 = getitem_3 = l_self_modules_layers_modules_0_modules_linear_attn_modules_norm_parameters_weight_ = l_self_modules_layers_modules_0_modules_linear_attn_modules_out_proj_parameters_input_global_scale_inv_ = l_self_modules_layers_modules_0_modules_linear_attn_modules_out_proj_parameters_weight_scale_ = l_self_modules_layers_modules_0_modules_linear_attn_modules_out_proj_parameters_weight_ = l_self_modules_layers_modules_0_modules_linear_attn_modules_out_proj_parameters_alpha_ = getitem_5 = l_self_modules_layers_modules_0_modules_post_attention_layernorm_parameters_weight_ = getitem_6 = l_self_modules_layers_modules_1_modules_input_layernorm_parameters_weight_ = l_self_modules_layers_modules_1_modules_linear_attn_modules_in_proj_qkvz_parameters_weight_ = l_self_modules_layers_modules_1_modules_linear_attn_modules_in_proj_ba_parameters_weight_ = None
(EngineCore_DP0 pid=23247)                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=23247)   File "/home/dev/dev/vllm_source/vllm/compilation/cuda_graph.py", line 222, in __call__
(EngineCore_DP0 pid=23247)     return self.runnable(*args, **kwargs)
(EngineCore_DP0 pid=23247)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=23247)   File "/home/dev/dev/vllm_source/vllm/compilation/piecewise_backend.py", line 343, in __call__
(EngineCore_DP0 pid=23247)     return range_entry.runnable(*args)
(EngineCore_DP0 pid=23247)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=23247)   File "/home/dev/dev/vllm_source/.venv/lib/python3.12/site-packages/torch/_inductor/standalone_compile.py", line 122, in __call__
(EngineCore_DP0 pid=23247)     return self._compiled_fn(*args)
(EngineCore_DP0 pid=23247)            ^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=23247)   File "/home/dev/dev/vllm_source/.venv/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1181, in _fn
(EngineCore_DP0 pid=23247)     return fn(*args, **kwargs)
(EngineCore_DP0 pid=23247)            ^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=23247)   File "/home/dev/dev/vllm_source/.venv/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1148, in forward
(EngineCore_DP0 pid=23247)     return compiled_fn(full_args)
(EngineCore_DP0 pid=23247)            ^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=23247)   File "/home/dev/dev/vllm_source/.venv/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 1962, in __call__
(EngineCore_DP0 pid=23247)     return self.compiled_fn(*args, **kwargs)
(EngineCore_DP0 pid=23247)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=23247)   File "/home/dev/dev/vllm_source/.venv/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 357, in runtime_wrapper
(EngineCore_DP0 pid=23247)     all_outs = call_func_at_runtime_with_args(
(EngineCore_DP0 pid=23247)                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=23247)   File "/home/dev/dev/vllm_source/.venv/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 134, in call_func_at_runtime_with_args
(EngineCore_DP0 pid=23247)     out = normalize_as_list(f(args))
(EngineCore_DP0 pid=23247)                             ^^^^^^^
(EngineCore_DP0 pid=23247)   File "/home/dev/dev/vllm_source/.venv/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 531, in wrapper
(EngineCore_DP0 pid=23247)     return compiled_fn(runtime_args)
(EngineCore_DP0 pid=23247)            ^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=23247)   File "/home/dev/dev/vllm_source/.venv/lib/python3.12/site-packages/torch/_inductor/output_code.py", line 638, in __call__
(EngineCore_DP0 pid=23247)     return self.current_callable(inputs)
(EngineCore_DP0 pid=23247)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=23247)   File "/home/dev/dev/vllm_source/.venv/lib/python3.12/site-packages/torch/_inductor/utils.py", line 3220, in run
(EngineCore_DP0 pid=23247)     out = model(new_inputs)
(EngineCore_DP0 pid=23247)           ^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=23247)   File "/tmp/torchinductor_dev/qj/cqjmvk2t5jqhc5pzphte336xjj3n3giws6lsk3q7grp4bfmcatc2.py", line 1436, in call
(EngineCore_DP0 pid=23247)     buf15 = torch.ops.vllm.moe_forward_shared.default(buf12, buf13, buf14, 'from_forward_context')
(EngineCore_DP0 pid=23247)             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=23247)   File "/home/dev/dev/vllm_source/.venv/lib/python3.12/site-packages/torch/_ops.py", line 819, in __call__
(EngineCore_DP0 pid=23247)     return self._op(*args, **kwargs)
(EngineCore_DP0 pid=23247)            ^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=23247)   File "/home/dev/dev/vllm_source/vllm/model_executor/layers/fused_moe/runner/default_moe_runner.py", line 91, in _moe_forward_shared
(EngineCore_DP0 pid=23247)     return layer.runner.forward_impl(
(EngineCore_DP0 pid=23247)            ^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=23247)   File "/home/dev/dev/vllm_source/vllm/model_executor/layers/fused_moe/runner/default_moe_runner.py", line 705, in forward_impl
(EngineCore_DP0 pid=23247)     topk_weights, topk_ids = self.router.select_experts(
(EngineCore_DP0 pid=23247)                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=23247)   File "/home/dev/dev/vllm_source/vllm/model_executor/layers/fused_moe/router/base_router.py", line 235, in select_experts
(EngineCore_DP0 pid=23247)     topk_weights, topk_ids = self._compute_routing(
(EngineCore_DP0 pid=23247)                              ^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=23247)   File "/home/dev/dev/vllm_source/vllm/model_executor/layers/fused_moe/router/fused_topk_router.py", line 156, in _compute_routing
(EngineCore_DP0 pid=23247)     topk_weights, topk_ids, token_expert_indices = fused_topk(
(EngineCore_DP0 pid=23247)                                                    ^^^^^^^^^^^
(EngineCore_DP0 pid=23247)   File "/home/dev/dev/vllm_source/vllm/model_executor/layers/fused_moe/router/fused_topk_router.py", line 98, in fused_topk
(EngineCore_DP0 pid=23247)     topk_weights, topk_ids = topk_func(
(EngineCore_DP0 pid=23247)                              ^^^^^^^^^^
(EngineCore_DP0 pid=23247)   File "/home/dev/dev/vllm_source/vllm/model_executor/layers/fused_moe/router/fused_topk_router.py", line 24, in vllm_topk_softmax
(EngineCore_DP0 pid=23247)     ops.topk_softmax(
(EngineCore_DP0 pid=23247)   File "/home/dev/dev/vllm_source/vllm/_custom_ops.py", line 2216, in topk_softmax
(EngineCore_DP0 pid=23247)     torch.ops._moe_C.topk_softmax(
(EngineCore_DP0 pid=23247)     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=23247)   File "/home/dev/dev/vllm_source/.venv/lib/python3.12/site-packages/torch/_ops.py", line 1319, in __getattr__
(EngineCore_DP0 pid=23247)     raise AttributeError(
(EngineCore_DP0 pid=23247) AttributeError: '_OpNamespace' '_moe_C' object has no attribute 'topk_softmax'
[rank0]:[W223 19:10:02.247586813 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
(APIServer pid=23205) Traceback (most recent call last):
(APIServer pid=23205)   File "/home/dev/dev/vllm_source/.venv/bin/vllm", line 6, in <module>
(APIServer pid=23205)     sys.exit(main())
(APIServer pid=23205)              ^^^^^^
(APIServer pid=23205)   File "/home/dev/dev/vllm_source/vllm/entrypoints/cli/main.py", line 73, in main
(APIServer pid=23205)     args.dispatch_function(args)
(APIServer pid=23205)   File "/home/dev/dev/vllm_source/vllm/entrypoints/cli/serve.py", line 112, in cmd
(APIServer pid=23205)     uvloop.run(run_server(args))
(APIServer pid=23205)   File "/home/dev/dev/vllm_source/.venv/lib/python3.12/site-packages/uvloop/__init__.py", line 96, in run
(APIServer pid=23205)     return __asyncio.run(
(APIServer pid=23205)            ^^^^^^^^^^^^^^
(APIServer pid=23205)   File "/usr/lib/python3.12/asyncio/runners.py", line 194, in run
(APIServer pid=23205)     return runner.run(main)
(APIServer pid=23205)            ^^^^^^^^^^^^^^^^
(APIServer pid=23205)   File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
(APIServer pid=23205)     return self._loop.run_until_complete(task)
(APIServer pid=23205)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=23205)   File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=23205)   File "/home/dev/dev/vllm_source/.venv/lib/python3.12/site-packages/uvloop/__init__.py", line 48, in wrapper
(APIServer pid=23205)     return await main
(APIServer pid=23205)            ^^^^^^^^^^
(APIServer pid=23205)   File "/home/dev/dev/vllm_source/vllm/entrypoints/openai/api_server.py", line 471, in run_server
(APIServer pid=23205)     await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=23205)   File "/home/dev/dev/vllm_source/vllm/entrypoints/openai/api_server.py", line 490, in run_server_worker
(APIServer pid=23205)     async with build_async_engine_client(
(APIServer pid=23205)   File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=23205)     return await anext(self.gen)
(APIServer pid=23205)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=23205)   File "/home/dev/dev/vllm_source/vllm/entrypoints/openai/api_server.py", line 96, in build_async_engine_client
(APIServer pid=23205)     async with build_async_engine_client_from_engine_args(
(APIServer pid=23205)   File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=23205)     return await anext(self.gen)
(APIServer pid=23205)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=23205)   File "/home/dev/dev/vllm_source/vllm/entrypoints/openai/api_server.py", line 137, in build_async_engine_client_from_engine_args
(APIServer pid=23205)     async_llm = AsyncLLM.from_vllm_config(
(APIServer pid=23205)                 ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=23205)   File "/home/dev/dev/vllm_source/vllm/v1/engine/async_llm.py", line 223, in from_vllm_config
(APIServer pid=23205)     return cls(
(APIServer pid=23205)            ^^^^
(APIServer pid=23205)   File "/home/dev/dev/vllm_source/vllm/v1/engine/async_llm.py", line 152, in __init__
(APIServer pid=23205)     self.engine_core = EngineCoreClient.make_async_mp_client(
(APIServer pid=23205)                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=23205)   File "/home/dev/dev/vllm_source/vllm/tracing/otel.py", line 178, in sync_wrapper
(APIServer pid=23205)     return func(*args, **kwargs)
(APIServer pid=23205)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=23205)   File "/home/dev/dev/vllm_source/vllm/v1/engine/core_client.py", line 125, in make_async_mp_client
(APIServer pid=23205)     return AsyncMPClient(*client_args)
(APIServer pid=23205)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=23205)   File "/home/dev/dev/vllm_source/vllm/tracing/otel.py", line 178, in sync_wrapper
(APIServer pid=23205)     return func(*args, **kwargs)
(APIServer pid=23205)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=23205)   File "/home/dev/dev/vllm_source/vllm/v1/engine/core_client.py", line 839, in __init__
(APIServer pid=23205)     super().__init__(
(APIServer pid=23205)   File "/home/dev/dev/vllm_source/vllm/v1/engine/core_client.py", line 493, in __init__
(APIServer pid=23205)     with launch_core_engines(vllm_config, executor_class, log_stats) as (
(APIServer pid=23205)   File "/usr/lib/python3.12/contextlib.py", line 144, in __exit__
(APIServer pid=23205)     next(self.gen)
(APIServer pid=23205)   File "/home/dev/dev/vllm_source/vllm/v1/engine/utils.py", line 925, in launch_core_engines
(APIServer pid=23205)     wait_for_engine_startup(
(APIServer pid=23205)   File "/home/dev/dev/vllm_source/vllm/v1/engine/utils.py", line 984, in wait_for_engine_startup
(APIServer pid=23205)     raise RuntimeError(
(APIServer pid=23205) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}

eugr · 2026-02-23T17:43:03Z

@robertgshaw2-redhat - this is a second DSV3-related PR that breaks vLLM on DGX Spark (and other sm12x). I believe you need to guard it properly.

@mgoin, @johnnynunez - FYI.

EDIT: the first PR was this one: #34758

mgoin · 2026-02-23T18:02:07Z

Thank you for reporting @eugr @stavinsky and sorry for the disruption. I should have a fix here #35123

robertgshaw2-redhat · 2026-02-23T18:11:40Z

thanks, sorry for the issues

stavinsky · 2026-02-23T20:21:05Z

always happy to help, guys

eugr · 2026-02-23T20:35:04Z

no problem, it's a big project with a very wide hardware support. Stuff happens.

robertgshaw2-redhat · 2026-02-23T20:41:27Z

no problem, it's a big project with a very wide hardware support. Stuff happens.

I have to say, I dont quite guy why this did not break on SM89 where we run a lot of tests.

mgoin · 2026-02-24T23:29:32Z

@robertgshaw2-redhat It is because the CI image is built with a wide ranging TORCH_CUDA_ARCH_LIST, basically including all source files and cases across CUDA arches. You would only run into this issue if you build for just your arch i.e. TORCH_CUDA_ARCH_LIST=12.0 since you wouldn't build those source files.

…llm-project#34302) Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com>

…llm-project#34302) Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com> Signed-off-by: Andrii Skliar <askliar@nvidia.com>

…llm-project#34302) Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com>

…llm-project#34302) Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com> Signed-off-by: EricccYang <yangyang4991@gmail.com>

robertgshaw2-redhat requested review from LucasWilkinson and tlrmchlsmth as code owners February 11, 2026 02:23

mergify bot added ci/build deepseek Related to DeepSeek models labels Feb 11, 2026

gemini-code-assist bot reviewed Feb 11, 2026

View reviewed changes

Robert Shaw added 8 commits February 10, 2026 21:42

match sgl structure

48bae28

Signed-off-by: Robert Shaw <robshaw@redhat.com>

add the missing files

ef4549f

Signed-off-by: Robert Shaw <robshaw@redhat.com>

make dsv3 out work nicely

b863fe5

Signed-off-by: Robert Shaw <robshaw@redhat.com>

creating it

6385e0e

Signed-off-by: Robert Shaw <robshaw@redhat.com>

compile for sm100

744bcc6

Signed-off-by: Robert Shaw <robshaw@redhat.com>

trying to get compilation working

dbde9d6

Signed-off-by: Robert Shaw <robshaw@redhat.com>

torch --> at

d717911

Signed-off-by: Robert Shaw <robshaw@redhat.com>

working end to end

0c0478f

Signed-off-by: Robert Shaw <robshaw@redhat.com>

This was referenced Feb 11, 2026

[Model Bash][DeepSeek]: PDL for AR+RMS + RouterGEMM #34287

Open

[Model Bash][DeepSeek]: Remove Logits Casts in DSR1 NVFP4 TRTLLM #34300

Closed

robertgshaw2-redhat changed the title ~~[MoE] Add DeepSeek V3 router GEMM kernel from sglang~~ [MoE] Add TRTLLM DeepSeek V3 router GEMM kernel Feb 11, 2026

robertgshaw2-redhat changed the title ~~[MoE] Add TRTLLM DeepSeek V3 router GEMM kernel~~ [MoE] Add TRTLLM DeepSeek V3 router GEMM kernel (5.5% B1 Speedup) Feb 11, 2026

robertgshaw2-redhat changed the title ~~[MoE] Add TRTLLM DeepSeek V3 router GEMM kernel (5.5% B1 Speedup)~~ [MoE] Add TRTLLM DSV3 Router GEMM kernel (5.5% B1 Speedup) Feb 11, 2026

robertgshaw2-redhat changed the title ~~[MoE] Add TRTLLM DSV3 Router GEMM kernel (5.5% B1 Speedup)~~ [ModelBash][DSV3] Add TRTLLM DSV3 Router GEMM kernel (5.5% B1 Speedup) Feb 11, 2026

LucasWilkinson approved these changes Feb 11, 2026

View reviewed changes

robertgshaw2-redhat marked this pull request as draft February 11, 2026 04:25

mgoin requested changes Feb 11, 2026

View reviewed changes

robertgshaw2-redhat changed the title ~~[ModelBash][DSV3] Add TRTLLM DSV3 Router GEMM kernel (5.5% B1 Speedup)~~ [ModelBash][DSV3] Add TRTLLM DSV3 Router GEMM kernel (6% B1 Speedup) Feb 11, 2026

robertgshaw2-redhat mentioned this pull request Feb 11, 2026

[Model Bash]: DeepSeek R1 NVFP4 Low Latency (B=1) #34286

Open

8 tasks

only use fp32 for trtllm nvfp4

dd0a2b0

Signed-off-by: Robert Shaw <robshaw@redhat.com>

Merge branch 'main' into use-sgl-gate-for-fp32-router-logits

1a82cd0

roikoren755 mentioned this pull request Feb 18, 2026

Revert "[NemotronH] Do not force router to run in fp32 (#34582)" #34808

Merged

5 tasks

Merge branch 'main' into use-sgl-gate-for-fp32-router-logits

e7781de

Merge branch 'main' into use-sgl-gate-for-fp32-router-logits

931fef9

mgoin and others added 6 commits February 19, 2026 18:39

Merge branch 'main' into use-sgl-gate-for-fp32-router-logits

18ea1e8

Merge branch 'main' into use-sgl-gate-for-fp32-router-logits

1fbac3d

Merge branch 'main' into use-sgl-gate-for-fp32-router-logits

5f87204

Merge branch 'main' into use-sgl-gate-for-fp32-router-logits

0cca4f3

Merge branch 'main' into use-sgl-gate-for-fp32-router-logits

e2deb0c

Merge branch 'main' into use-sgl-gate-for-fp32-router-logits

880b6f0

robertgshaw2-redhat merged commit 8435b2e into main Feb 23, 2026
116 checks passed

robertgshaw2-redhat deleted the use-sgl-gate-for-fp32-router-logits branch February 23, 2026 14:02

stavinsky mentioned this pull request Feb 23, 2026

nvidia/Qwen3-Next-80B-A3B-Instruct-NVFP4 broken eugr/spark-vllm-docker#60

Closed

llsj14 pushed a commit to llsj14/vllm that referenced this pull request Mar 1, 2026

[ModelBash][DSV3] Add TRTLLM DSV3 Router GEMM kernel (6% B1 Speedup) (v…

cde76bc

…llm-project#34302) Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com>

Copilot AI pushed a commit to machov/vllm that referenced this pull request Mar 10, 2026

[ModelBash][DSV3] Add TRTLLM DSV3 Router GEMM kernel (6% B1 Speedup) (v…

c54756c

…llm-project#34302) Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com>

robertgshaw2-redhat mentioned this pull request Mar 17, 2026

[Bug]: R1 NVFP4 gsm8k drop in lm_eval #37302

Closed

1 task

Uh oh!

Conversation

robertgshaw2-redhat commented Feb 11, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

mgoin commented Feb 11, 2026

Uh oh!

LucasWilkinson left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

robertgshaw2-redhat commented Feb 11, 2026

Uh oh!

robertgshaw2-redhat commented Feb 11, 2026

Uh oh!

robertgshaw2-redhat commented Feb 11, 2026

Uh oh!

mgoin Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

mgoin Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

mgoin Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mgoin Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

pavanimajety commented Feb 18, 2026

Uh oh!

xinli-sw commented Feb 18, 2026

Uh oh!

robertgshaw2-redhat commented Feb 19, 2026

Uh oh!

Uh oh!

stavinsky commented Feb 23, 2026

Uh oh!

eugr commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mgoin commented Feb 23, 2026

Uh oh!

robertgshaw2-redhat commented Feb 23, 2026

Uh oh!

stavinsky commented Feb 23, 2026

Uh oh!

eugr commented Feb 23, 2026

Uh oh!

robertgshaw2-redhat commented Feb 23, 2026

Uh oh!

mgoin commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

robertgshaw2-redhat commented Feb 11, 2026 •

edited by github-actions bot

Loading

LucasWilkinson left a comment •

edited

Loading

eugr commented Feb 23, 2026 •

edited

Loading

mgoin commented Feb 24, 2026 •

edited

Loading