[Feature] Adapt DispathGmmCombineDecode opertor to align with weight scale dtype of small operators. [RFC: issue 5476] by wangqiankun13 · Pull Request #5755 · vllm-project/vllm-ascend

wangqiankun13 · 2026-01-09T03:56:38Z

What this PR does / why we need it?

[Feature] Adapt DispathGmmCombineDecode opertor to align with weight scale dtype of small operators.

Before: weight scale must be float32
After: weight scale can be float32/float16 when x is float16, float32/bfloat16 when x is float32/bfloat16. And w1 scale can use different dtype with w2 scale.

More info about this operator, please refer to RFC: issue #5476

Does this PR introduce any user-facing change?

How was this patch tested?

Perf

When scale is of type fp16 or bf16, it will be cast to fp32 internally within the operator, while the subsequent computations remain unchanged. Therefore, this PR will introduce an additional cast operation but halve the memory copy operations for scale . Furthermore, since the scale data is only a few KB in size and participates in relatively few computations, its impact is almost negligible compared to major operations like matrix multiplication. Thus, the theoretical performance change should be minimal.

test single operator cases from qwen3-235b,

single A3 node(ep16), 64 moe experts, 4 experts / die (like qwen3-235b ep32)
batch=18/32, token_hidden_size 4096, moe_intermediate_size 1536

The test was conducted for 100 rounds, and the average of the last 95 rounds was taken.

	bs18(us)	bs32(us)
Without this PR	96.28	108.83
With this PR	96.06	107.90

Note: Single-operator benchmarks represent an ideal scenario. They are usually only useful for referencing relative changes and may not fully align with performance data observed within the full model.

Acc

test qwen3-235b eplb on a single A3 node(ep16),
with dispatch_gmm_combine_decode

dataset	version	metric	mode	vllm-api-stream-chat
aime2024	604a78	accuracy	gen	83.33

vLLM version: v0.13.0
vLLM main: vllm-project/vllm@2f4e654

gemini-code-assist

Code Review

This pull request adapts the DispathGmmCombineDecode operator to support different data types for weight scales, enhancing flexibility. The changes are consistently applied across C++ kernel definitions, templates, and the corresponding Python code. While the overall direction is good, I've identified a critical issue in two C++ headers related to uninitialized tensor access that could lead to undefined behavior, and a high-severity issue with incorrect static assertions for buffer sizing.

gemini-code-assist · 2026-01-09T04:00:39Z

...de/op_kernel/dispatch_gmm_combine_decode/epilogue/block/block_epilogue_per_token_dequant.hpp

+            auto &ubRawScale = ubRawScaleList[ubListId];
+            auto layoutRawUbScale = LayoutScale::template MakeLayoutInUb<ElementRawScale>(scaleTileShape);


There is a critical issue here. The ubRawScaleList member is not initialized in the constructor when ElementRawScale is the same as ElementFp32Scale (e.g., float). Accessing it here via ubRawScaleList[ubListId] leads to using an uninitialized LocalTensor, which is undefined behavior.

To fix this, you should move the declarations of ubRawScale and layoutRawUbScale inside the if constexpr blocks where they are actually used (at lines 307 and 334).

gemini-code-assist · 2026-01-09T04:00:39Z

..._kernel/dispatch_gmm_combine_decode/epilogue/block/block_epilogue_per_token_dequant_swiglu.h

+            auto &ubRawScale = ubRawScaleList[ubListId];
+            auto layoutRawUbScale = LayoutScale::template MakeLayoutInUb<ElementRawScale>(scaleTileShape);


There is a critical issue here. The ubRawScaleList member is not initialized in the constructor when ElementRawScale is the same as ElementFp32Scale (e.g., float). Accessing it here via ubRawScaleList[ubListId] leads to using an uninitialized LocalTensor, which is undefined behavior.

To fix this, you should move the declarations of ubRawScale and layoutRawUbScale inside the if constexpr blocks where ubRawScale is actually used (at lines 225 and 251).

gemini-code-assist · 2026-01-09T04:00:39Z

...de/op_kernel/dispatch_gmm_combine_decode/epilogue/block/block_epilogue_per_token_dequant.hpp

+    static_assert((UB_STAGES * (TileShape::COUNT * sizeof(ElementC) + TileShape::COLUMN * (sizeof(ElementRawScale) + sizeof(ElementFp32Scale)) +
                                TileShape::ROW * sizeof(ElementPerTokenScale) + TileShape::COUNT * sizeof(ElementD)) +
                   (TileShape::COUNT + TileShape::COUNT) * sizeof(float) + TileShape::ROW * BYTE_PER_BLK) <=
                      ArchTag::UB_SIZE,
                  "TileShape is too large to fit in UB");


The static_assert for UB size is incorrect. When ElementRawScale is the same as ElementFp32Scale (i.e., float), only ubFp32ScaleList is allocated. However, this static_assert calculates the size as if both ubRawScaleList and ubFp32ScaleList are allocated, by using sizeof(ElementRawScale) + sizeof(ElementFp32Scale). This makes the check overly restrictive and may cause compilation to fail for valid configurations.

static_assert((UB_STAGES * (TileShape::COUNT * sizeof(ElementC) + (std::is_same_v<ElementRawScale, ElementFp32Scale> ? 0 : TileShape::COLUMN * sizeof(ElementRawScale)) + TileShape::COLUMN * sizeof(ElementFp32Scale) + TileShape::ROW * sizeof(ElementPerTokenScale) + TileShape::COUNT * sizeof(ElementD)) + (TileShape::COUNT + TileShape::COUNT) * sizeof(float) + TileShape::ROW * BYTE_PER_BLK) <= ArchTag::UB_SIZE, "TileShape is too large to fit in UB");

gemini-code-assist · 2026-01-09T04:00:39Z

..._kernel/dispatch_gmm_combine_decode/epilogue/block/block_epilogue_per_token_dequant_swiglu.h

+    static_assert((UB_STAGES * (TileShape::COUNT * sizeof(ElementC) + TileShape::COLUMN * (sizeof(ElementRawScale) + sizeof(ElementFp32Scale)) +
                                TileShape::ROW * sizeof(ElementPerTokenScale) + TileShape::COUNT * sizeof(ElementD)) +
                   (TileShape::COUNT + TileShape::COUNT) * sizeof(float) + TileShape::ROW * BYTE_PER_BLK) <=
                      ArchTag::UB_SIZE,
                  "TileShape is too large to fit in UB");


The static_assert for UB size is incorrect. When ElementRawScale is the same as ElementFp32Scale (i.e., float), only ubFp32ScaleList is allocated. However, this static_assert calculates the size as if both ubRawScaleList and ubFp32ScaleList are allocated, by using sizeof(ElementRawScale) + sizeof(ElementFp32Scale). This makes the check overly restrictive and may cause compilation to fail for valid configurations.

static_assert((UB_STAGES * (TileShape::COUNT * sizeof(ElementC) + (std::is_same_v<ElementRawScale, ElementFp32Scale> ? 0 : TileShape::COLUMN * sizeof(ElementRawScale)) + TileShape::COLUMN * sizeof(ElementFp32Scale) + TileShape::ROW * sizeof(ElementPerTokenScale) + TileShape::COUNT * sizeof(ElementD)) + (TileShape::COUNT + TileShape::COUNT) * sizeof(float) + TileShape::ROW * BYTE_PER_BLK) <= ArchTag::UB_SIZE, "TileShape is too large to fit in UB");

github-actions · 2026-01-09T04:03:45Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

…scale dtype of small operators Signed-off-by: wangqiankun <wangqiankun13@huawei.com>

…to FIA_rebase * 'main' of https://github.com/vllm-project/vllm-ascend: (110 commits) [Performance] Remove index opetation when VLLM_ASCEND_FLASHCOMM2_PARALLEL_SIZE=1 (vllm-project#5936) [main][bugfix] fix mooncake kv cache transfer when one P has multi nodes (vllm-project#5960) [Feature] Adapt DispathGmmCombineDecode opertor to align with weight scale dtype of small operators. [RFC: issue 5476] (vllm-project#5755) [Refactor] Move AttentionSpec initialization to Attention module (vllm-project#5834) [EPLB][Bugfix] policy_swift_balancer bugfix and renaming (vllm-project#5897) [CI]fix for lint CI (vllm-project#5982) [Fusion] [Graph]Add Matmul Allreduce Rmsnorm fusion Pass (vllm-project#5034) [Refactor] Migrate profiler config from env vars to explicit ProfilerConfig (vllm-project#5928) [EPLB][Bugfix] Dispatch Allgather use log2phy if enable eplb (vllm-project#5933) [EPLB][Nightly][Bugfix] Get expert from moe layer only (vllm-project#5908) [Bugfix][MM] Fix multi-modal inference OOM issues by setting `expandable_segments:True` (vllm-project#5855) [doc]Table split (vllm-project#5929) [Doc] Upgrade outdated ut doc (vllm-project#5937) [Lint]Style: Convert `vllm-ascend/` to ruff format(Batch vllm-project#2) (vllm-project#5977) Eagle3 mm support, enablement on qwen3vl (vllm-project#4848) [Doc] Remove Chinese characters from the icons in the doc. (vllm-project#5959) [P/D]The issue of solving the force-free secondary release request, which causes the node to crash. (vllm-project#5968) [Feature] Support fine-grained shared expert overlap (vllm-project#5482) [Bugfix] fix cpu offload hang with tp=1 (vllm-project#5963) [Feature]: Support 310P device run qwen2.5/3 dense and qwen2.5vl models (vllm-project#5776) ...

…to qwen3next_rebase * 'main' of https://github.com/vllm-project/vllm-ascend: (637 commits) [Performance] Remove index opetation when VLLM_ASCEND_FLASHCOMM2_PARALLEL_SIZE=1 (vllm-project#5936) [main][bugfix] fix mooncake kv cache transfer when one P has multi nodes (vllm-project#5960) [Feature] Adapt DispathGmmCombineDecode opertor to align with weight scale dtype of small operators. [RFC: issue 5476] (vllm-project#5755) [Refactor] Move AttentionSpec initialization to Attention module (vllm-project#5834) [EPLB][Bugfix] policy_swift_balancer bugfix and renaming (vllm-project#5897) [CI]fix for lint CI (vllm-project#5982) [Fusion] [Graph]Add Matmul Allreduce Rmsnorm fusion Pass (vllm-project#5034) [Refactor] Migrate profiler config from env vars to explicit ProfilerConfig (vllm-project#5928) [EPLB][Bugfix] Dispatch Allgather use log2phy if enable eplb (vllm-project#5933) [EPLB][Nightly][Bugfix] Get expert from moe layer only (vllm-project#5908) [Bugfix][MM] Fix multi-modal inference OOM issues by setting `expandable_segments:True` (vllm-project#5855) [doc]Table split (vllm-project#5929) [Doc] Upgrade outdated ut doc (vllm-project#5937) [Lint]Style: Convert `vllm-ascend/` to ruff format(Batch vllm-project#2) (vllm-project#5977) Eagle3 mm support, enablement on qwen3vl (vllm-project#4848) [Doc] Remove Chinese characters from the icons in the doc. (vllm-project#5959) [P/D]The issue of solving the force-free secondary release request, which causes the node to crash. (vllm-project#5968) [Feature] Support fine-grained shared expert overlap (vllm-project#5482) [Bugfix] fix cpu offload hang with tp=1 (vllm-project#5963) [Feature]: Support 310P device run qwen2.5/3 dense and qwen2.5vl models (vllm-project#5776) ...

…scale dtype of small operators. [RFC: issue 5476] (vllm-project#5755) ### What this PR does / why we need it? [Feature] Adapt DispathGmmCombineDecode opertor to align with weight scale dtype of small operators. - **Before**: weight scale must be float32 - **After**: weight scale can be float32/float16 when x is float16, float32/bfloat16 when x is float32/bfloat16. And w1 scale can use different dtype with w2 scale. More info about this operator, please refer to RFC: issue vllm-project#5476 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? #### Perf > When scale is of type fp16 or bf16, it will be cast to fp32 internally within the operator, while the subsequent computations remain unchanged. Therefore, this PR will introduce an additional cast operation but halve the memory copy operations for scale . Furthermore, since the scale data is only a few KB in size and participates in relatively few computations, its impact is almost negligible compared to major operations like matrix multiplication. Thus, the theoretical performance change should be minimal. test single operator cases from qwen3-235b, - single A3 node(ep16), 64 moe experts, 4 experts / die (like qwen3-235b ep32) - batch=18/32, token_hidden_size 4096, moe_intermediate_size 1536 The test was conducted for 100 rounds, and the average of the last 95 rounds was taken. | | bs18(us)| bs32(us)| | -----| -----| -----| |Without this PR|96.28|108.83| |With this PR|96.06|107.90| Note: Single-operator benchmarks represent an ideal scenario. They are usually only useful for referencing relative changes and may not fully align with performance data observed within the full model. #### Acc test qwen3-235b eplb on a single A3 node(ep16), with dispatch_gmm_combine_decode | dataset | version | metric | mode | vllm-api-stream-chat | |----- | ----- | ----- | ----- | -----| | aime2024 | 604a78 | accuracy | gen | 83.33 | - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2f4e654 Signed-off-by: wangqiankun <wangqiankun13@huawei.com>

…scale dtype of small operators. [RFC: issue 5476] (vllm-project#5755) ### What this PR does / why we need it? [Feature] Adapt DispathGmmCombineDecode opertor to align with weight scale dtype of small operators. - **Before**: weight scale must be float32 - **After**: weight scale can be float32/float16 when x is float16, float32/bfloat16 when x is float32/bfloat16. And w1 scale can use different dtype with w2 scale. More info about this operator, please refer to RFC: issue vllm-project#5476 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? #### Perf > When scale is of type fp16 or bf16, it will be cast to fp32 internally within the operator, while the subsequent computations remain unchanged. Therefore, this PR will introduce an additional cast operation but halve the memory copy operations for scale . Furthermore, since the scale data is only a few KB in size and participates in relatively few computations, its impact is almost negligible compared to major operations like matrix multiplication. Thus, the theoretical performance change should be minimal. test single operator cases from qwen3-235b, - single A3 node(ep16), 64 moe experts, 4 experts / die (like qwen3-235b ep32) - batch=18/32, token_hidden_size 4096, moe_intermediate_size 1536 The test was conducted for 100 rounds, and the average of the last 95 rounds was taken. | | bs18(us)| bs32(us)| | -----| -----| -----| |Without this PR|96.28|108.83| |With this PR|96.06|107.90| Note: Single-operator benchmarks represent an ideal scenario. They are usually only useful for referencing relative changes and may not fully align with performance data observed within the full model. #### Acc test qwen3-235b eplb on a single A3 node(ep16), with dispatch_gmm_combine_decode | dataset | version | metric | mode | vllm-api-stream-chat | |----- | ----- | ----- | ----- | -----| | aime2024 | 604a78 | accuracy | gen | 83.33 | - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2f4e654 Signed-off-by: wangqiankun <wangqiankun13@huawei.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

…scale dtype of small operators. [RFC: issue 5476] (vllm-project#5755) ### What this PR does / why we need it? [Feature] Adapt DispathGmmCombineDecode opertor to align with weight scale dtype of small operators. - **Before**: weight scale must be float32 - **After**: weight scale can be float32/float16 when x is float16, float32/bfloat16 when x is float32/bfloat16. And w1 scale can use different dtype with w2 scale. More info about this operator, please refer to RFC: issue vllm-project#5476 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? #### Perf > When scale is of type fp16 or bf16, it will be cast to fp32 internally within the operator, while the subsequent computations remain unchanged. Therefore, this PR will introduce an additional cast operation but halve the memory copy operations for scale . Furthermore, since the scale data is only a few KB in size and participates in relatively few computations, its impact is almost negligible compared to major operations like matrix multiplication. Thus, the theoretical performance change should be minimal. test single operator cases from qwen3-235b, - single A3 node(ep16), 64 moe experts, 4 experts / die (like qwen3-235b ep32) - batch=18/32, token_hidden_size 4096, moe_intermediate_size 1536 The test was conducted for 100 rounds, and the average of the last 95 rounds was taken. | | bs18(us)| bs32(us)| | -----| -----| -----| |Without this PR|96.28|108.83| |With this PR|96.06|107.90| Note: Single-operator benchmarks represent an ideal scenario. They are usually only useful for referencing relative changes and may not fully align with performance data observed within the full model. #### Acc test qwen3-235b eplb on a single A3 node(ep16), with dispatch_gmm_combine_decode | dataset | version | metric | mode | vllm-api-stream-chat | |----- | ----- | ----- | ----- | -----| | aime2024 | 604a78 | accuracy | gen | 83.33 | - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2f4e654 Signed-off-by: wangqiankun <wangqiankun13@huawei.com>

…scale dtype of small operators. [RFC: issue 5476] (vllm-project#5755) ### What this PR does / why we need it? [Feature] Adapt DispathGmmCombineDecode opertor to align with weight scale dtype of small operators. - **Before**: weight scale must be float32 - **After**: weight scale can be float32/float16 when x is float16, float32/bfloat16 when x is float32/bfloat16. And w1 scale can use different dtype with w2 scale. More info about this operator, please refer to RFC: issue vllm-project#5476 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? #### Perf > When scale is of type fp16 or bf16, it will be cast to fp32 internally within the operator, while the subsequent computations remain unchanged. Therefore, this PR will introduce an additional cast operation but halve the memory copy operations for scale . Furthermore, since the scale data is only a few KB in size and participates in relatively few computations, its impact is almost negligible compared to major operations like matrix multiplication. Thus, the theoretical performance change should be minimal. test single operator cases from qwen3-235b, - single A3 node(ep16), 64 moe experts, 4 experts / die (like qwen3-235b ep32) - batch=18/32, token_hidden_size 4096, moe_intermediate_size 1536 The test was conducted for 100 rounds, and the average of the last 95 rounds was taken. | | bs18(us)| bs32(us)| | -----| -----| -----| |Without this PR|96.28|108.83| |With this PR|96.06|107.90| Note: Single-operator benchmarks represent an ideal scenario. They are usually only useful for referencing relative changes and may not fully align with performance data observed within the full model. #### Acc test qwen3-235b eplb on a single A3 node(ep16), with dispatch_gmm_combine_decode | dataset | version | metric | mode | vllm-api-stream-chat | |----- | ----- | ----- | ----- | -----| | aime2024 | 604a78 | accuracy | gen | 83.33 | - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2f4e654 Signed-off-by: wangqiankun <wangqiankun13@huawei.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

…scale dtype of small operators. [RFC: issue 5476] (vllm-project#5755) ### What this PR does / why we need it? [Feature] Adapt DispathGmmCombineDecode opertor to align with weight scale dtype of small operators. - **Before**: weight scale must be float32 - **After**: weight scale can be float32/float16 when x is float16, float32/bfloat16 when x is float32/bfloat16. And w1 scale can use different dtype with w2 scale. More info about this operator, please refer to RFC: issue vllm-project#5476 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? #### Perf > When scale is of type fp16 or bf16, it will be cast to fp32 internally within the operator, while the subsequent computations remain unchanged. Therefore, this PR will introduce an additional cast operation but halve the memory copy operations for scale . Furthermore, since the scale data is only a few KB in size and participates in relatively few computations, its impact is almost negligible compared to major operations like matrix multiplication. Thus, the theoretical performance change should be minimal. test single operator cases from qwen3-235b, - single A3 node(ep16), 64 moe experts, 4 experts / die (like qwen3-235b ep32) - batch=18/32, token_hidden_size 4096, moe_intermediate_size 1536 The test was conducted for 100 rounds, and the average of the last 95 rounds was taken. | | bs18(us)| bs32(us)| | -----| -----| -----| |Without this PR|96.28|108.83| |With this PR|96.06|107.90| Note: Single-operator benchmarks represent an ideal scenario. They are usually only useful for referencing relative changes and may not fully align with performance data observed within the full model. #### Acc test qwen3-235b eplb on a single A3 node(ep16), with dispatch_gmm_combine_decode | dataset | version | metric | mode | vllm-api-stream-chat | |----- | ----- | ----- | ----- | -----| | aime2024 | 604a78 | accuracy | gen | 83.33 | - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2f4e654 Signed-off-by: wangqiankun <wangqiankun13@huawei.com>

wangqiankun13 force-pushed the adapt_scale_dtype branch from d896c26 to af283a0 Compare January 9, 2026 03:59

gemini-code-assist bot reviewed Jan 9, 2026

View reviewed changes

github-actions bot added module:tests module:quantization labels Jan 9, 2026

wangqiankun13 force-pushed the adapt_scale_dtype branch from af283a0 to 47b8901 Compare January 9, 2026 06:42

wangqiankun13 changed the title ~~[Feature] Adapt DispathGmmCombineDecode opertor to align with weight scale dtype of small operators. [RFC: 5476]~~ [Feature] Adapt DispathGmmCombineDecode opertor to align with weight scale dtype of small operators. [RFC: issue 5476] Jan 9, 2026

wangqiankun13 force-pushed the adapt_scale_dtype branch 3 times, most recently from ae3fe03 to 6a281a5 Compare January 14, 2026 11:02

wangxiyuan approved these changes Jan 15, 2026

View reviewed changes

wangxiyuan enabled auto-merge (squash) January 15, 2026 08:36

wangxiyuan added ready read for review ready-for-test start test by label for PR labels Jan 15, 2026

auto-merge was automatically disabled January 15, 2026 12:04
Head branch was pushed to by a user without write access

wangqiankun13 force-pushed the adapt_scale_dtype branch from 6a281a5 to 3560502 Compare January 15, 2026 12:04

wangqiankun13 requested a review from zzzzwwjj as a code owner January 15, 2026 12:04

wangqiankun13 force-pushed the adapt_scale_dtype branch 2 times, most recently from 114a8e3 to 6254e81 Compare January 16, 2026 02:59

[Feature] Adapt DispathGmmCombineDecode opertor to align with weight …

fa2b2f2

…scale dtype of small operators Signed-off-by: wangqiankun <wangqiankun13@huawei.com>

wangqiankun13 force-pushed the adapt_scale_dtype branch from 6254e81 to fa2b2f2 Compare January 16, 2026 06:07

wangxiyuan merged commit ebb9406 into vllm-project:main Jan 19, 2026
20 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Adapt DispathGmmCombineDecode opertor to align with weight scale dtype of small operators. [RFC: issue 5476]#5755

[Feature] Adapt DispathGmmCombineDecode opertor to align with weight scale dtype of small operators. [RFC: issue 5476]#5755
wangxiyuan merged 1 commit intovllm-project:mainfrom
wangqiankun13:adapt_scale_dtype

wangqiankun13 commented Jan 9, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 9, 2026

Uh oh!

gemini-code-assist bot Jan 9, 2026

Uh oh!

gemini-code-assist bot Jan 9, 2026

Uh oh!

gemini-code-assist bot Jan 9, 2026

Uh oh!

github-actions bot commented Jan 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		auto &ubRawScale = ubRawScaleList[ubListId];
		auto layoutRawUbScale = LayoutScale::template MakeLayoutInUb<ElementRawScale>(scaleTileShape);

Conversation

wangqiankun13 commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Perf

Acc

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jan 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

wangqiankun13 commented Jan 9, 2026 •

edited

Loading