Skip to content

fix bug which causes the following RuntimeError when call apply_weights…#2

Merged
hliuca merged 1 commit intoROCm:0.3.0-rocmfrom
yaomingamd:0.3.0-rocm
Feb 23, 2024
Merged

fix bug which causes the following RuntimeError when call apply_weights…#2
hliuca merged 1 commit intoROCm:0.3.0-rocmfrom
yaomingamd:0.3.0-rocm

Conversation

@yaomingamd
Copy link
Copy Markdown

… with bias tensor

vllm/model_executor/layers/linear.py", line 70, in apply_weights
if bias:
RuntimeError: Boolean value of Tensor with more than one value is ambiguous

…th bias tensor

 vllm/model_executor/layers/linear.py", line 70, in apply_weights
    if bias:
RuntimeError: Boolean value of Tensor with more than one value is ambiguous
@hliuca hliuca self-assigned this Feb 23, 2024
@hliuca hliuca merged commit c6dc9da into ROCm:0.3.0-rocm Feb 23, 2024
AdrianAbeyta pushed a commit that referenced this pull request Mar 8, 2024
gshtras pushed a commit that referenced this pull request Sep 27, 2024
mawong-amd pushed a commit that referenced this pull request May 2, 2025
wip

wip & debug

update

cleanup

use quark realquantizer for pack/quant/dequant

comment on cudagraph issue; remove prints

Keep only 1 place importing quark

cudagraph issue resolved; dq weight at load time for efficiency

Signed-off-by: Bowen Bao <bowenbao@amd.com>

lint

Signed-off-by: Bowen Bao <bowenbao@amd.com>

turn on emulation based on platform

Signed-off-by: Bowen Bao <bowenbao@amd.com>

add fused moe support - ugly wip

running version

Add envar if dequant weight at load time

Signed-off-by: Bowen Bao <bowenbao@amd.com>

Mxfp4 memory leak fixes (#2)
mawong-amd pushed a commit that referenced this pull request May 14, 2025
wip

wip & debug

update

cleanup

use quark realquantizer for pack/quant/dequant

comment on cudagraph issue; remove prints

Keep only 1 place importing quark

cudagraph issue resolved; dq weight at load time for efficiency

Signed-off-by: Bowen Bao <bowenbao@amd.com>

lint

Signed-off-by: Bowen Bao <bowenbao@amd.com>

turn on emulation based on platform

Signed-off-by: Bowen Bao <bowenbao@amd.com>

add fused moe support - ugly wip

running version

Add envar if dequant weight at load time

Signed-off-by: Bowen Bao <bowenbao@amd.com>

Mxfp4 memory leak fixes (#2)

Signed-off-by: Felix Marty <felmarty@amd.com>
mawong-amd pushed a commit that referenced this pull request May 14, 2025
wip & debug

update

cleanup

use quark realquantizer for pack/quant/dequant

comment on cudagraph issue; remove prints

Keep only 1 place importing quark

cudagraph issue resolved; dq weight at load time for efficiency

Signed-off-by: Bowen Bao <bowenbao@amd.com>

lint

Signed-off-by: Bowen Bao <bowenbao@amd.com>

turn on emulation based on platform

Signed-off-by: Bowen Bao <bowenbao@amd.com>

add fused moe support - ugly wip

running version

Add envar if dequant weight at load time

Signed-off-by: Bowen Bao <bowenbao@amd.com>

Mxfp4 memory leak fixes (#2)

Fix VLLM_QUARK_EMU_MEM_OPT route

Signed-off-by: Felix Marty <felmarty@amd.com>
mawong-amd pushed a commit that referenced this pull request May 14, 2025
wip & debug

update

cleanup

use quark realquantizer for pack/quant/dequant

comment on cudagraph issue; remove prints

Keep only 1 place importing quark

cudagraph issue resolved; dq weight at load time for efficiency

Signed-off-by: Bowen Bao <bowenbao@amd.com>

lint

Signed-off-by: Bowen Bao <bowenbao@amd.com>

turn on emulation based on platform

Signed-off-by: Bowen Bao <bowenbao@amd.com>

add fused moe support - ugly wip

running version

Add envar if dequant weight at load time

Signed-off-by: Bowen Bao <bowenbao@amd.com>

Mxfp4 memory leak fixes (#2)

Fix VLLM_QUARK_EMU_MEM_OPT route

Signed-off-by: Felix Marty <felmarty@amd.com>
charlifu pushed a commit that referenced this pull request Oct 9, 2025
Rohan138 pushed a commit that referenced this pull request Jan 28, 2026
…3058)

Signed-off-by: ramos <49182011+nemoramo@users.noreply.github.com>
Signed-off-by: mayufeng <mayufeng@example.com>
Co-authored-by: mayufeng <mayufeng@example.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants