Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions vllm/model_executor/layers/fused_moe/rocm_aiter_fused_moe.py
Original file line number Diff line number Diff line change
@@ -1,18 +1,31 @@
# SPDX-License-Identifier: Apache-2.0
from functools import cache
from typing import List, Optional

import torch

import vllm.envs as envs
from vllm.config import get_current_vllm_config
from vllm.model_executor.model_loader.utils import get_architecture_class_name
from vllm.platforms import current_platform

SUPPORTED_MODEL_ARCHS = [
"MixtralForCausalLM", "DeepseekForCausalLM", "DeepseekV2ForCausalLM",
"DeepseekV3ForCausalLM"
]


@cache
def is_rocm_aiter_moe_enabled() -> bool:
model_cls_name = get_architecture_class_name(
get_current_vllm_config().model_config)
Comment on lines +20 to +21
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like is_rocm_aiter_moe_enabled is called during the actual forward pass, when (IIUC) it's not valid to call get_current_vllm_config(), as it's not set (e.g. in dispatch_fused_experts_func). That should be resolved before landing otherwise users will get spammed with warnings

Copy link
Contributor Author

@charlifu charlifu Apr 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's why I add a cache decorator, it will use the cached value during the actually forward pass.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. But then is_rocm_aiter_moe_enabled would not be valid when instantiating a second LLM

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yeah. You are right.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should just set the current vllm config during the forward pass in the model runner. @youkaichao WDYT?

return current_platform.is_rocm() \
and envs.VLLM_ROCM_USE_AITER_MOE \
and envs.VLLM_ROCM_USE_AITER \
and model_cls_name in SUPPORTED_MODEL_ARCHS


@cache
def is_rocm_aiter_block_scaled_moe_enabled() -> bool:
return is_rocm_aiter_moe_enabled() and \
envs.VLLM_ROCM_USE_AITER_FP8_BLOCK_SCALED_MOE
Expand Down
4 changes: 3 additions & 1 deletion vllm/model_executor/model_loader/loader.py
Original file line number Diff line number Diff line change
Expand Up @@ -468,7 +468,9 @@ def load_model(self, vllm_config: VllmConfig) -> nn.Module:
"Following weights were not initialized from "
f"checkpoint: {weights_not_loaded}")

_process_weights_after_loading(model, model_config, target_device)
with set_current_vllm_config(vllm_config):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully appreciate the implications of setting the config around _process_weights_after_loading. Could you explain a bit why it's necessary?

Copy link
Contributor Author

@charlifu charlifu Apr 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current_vllm_config value is not set when _process_weights_after_loading is being callled. Currently, this value is set only when creating the model class. So we do not have the information when determining whether to use aiter moe.

We definitely have other options.

  • Add model config parameter to _process_weights_after_loading and is_rocm_aiter_moe_enabled functions. We might have to add more cases to here.
  • Add a private member of model config to fused_moe layer class and set the value when creating the layer class.

Both require changing the interface. If you are concerned that exposing current vllm config during the execution of _process_weigths_after_loading could cause any potential issues. I prefer second option, since we call is_rocm_aiter_moe_enabled function during model execution as well.

_process_weights_after_loading(model, model_config,
target_device)

return model.eval()

Expand Down