[EPLB]Eplb Config Renaming#5533
Conversation
There was a problem hiding this comment.
Code Review
This pull request refactors several configuration parameters for the Expert Parallelism Load Balancer (EPLB) to improve clarity. The changes are applied consistently across documentation, tests, and source code. My review focuses on ensuring the documentation is clear and accurate after these changes. I found some issues in the additional_config.md file, including duplicate entries and formatting errors, which could confuse users.
| | `lmhead_tensor_parallel_size` | int | `None` | The custom tensor parallel size of lmhead. Restriction: Can only be used when tensor_parallel=1 | | ||
| | `oproj_tensor_parallel_size` | int | `None` | The custom tensor parallel size of oproj. | | ||
| | `multistream_overlap_shared_expert` | bool | `False` | Whether to enable multistream shared expert. This option only takes effect on MoE models with shared experts. | | ||
| | `dynamic_eplb` | bool | `False` | Whether to enable dynamic EPLB. | | ||
| | `expert_heat_collection_interval` | int | `400` | Forward iterations when EPLB begins. | | ||
| | `algorithm_execution_interval` | int | `30` | The forward iterations when the EPLB worker will finish CPU tasks. In our test default value 30 can cover most cases. | | ||
| | `expert_map_record_path` | str | `None` | Save the expert load calculation results to a new expert table in the specified directory. | | ||
| | `num_redundant_experts` | int | `0` | Specify redundant experts during initialization. | | ||
| | `dump_config` | str | `None` | Configuration file path for msprobe dump(eager mode). | | ||
| | `enable_async_exponential` | int | `0` | Whether to enable async exponential overlap. To enable async exponential, set this config to 1. | |
There was a problem hiding this comment.
The configuration table in this section has several issues that could be confusing for users:
- Duplicate Configurations:
enable_async_exponentialis listed on line 35 (as abool) and again on line 46 (as anint).dump_config(line 45) appears to be a duplicate ofdump_config_path(line 34), as they have identical descriptions. The correct parameter used in the code isdump_config_path.lmhead_tensor_parallel_size(line 37) andoproj_tensor_parallel_size(line 38) are already documented under thefinegrained_tp_configsection below.
- Formatting and Typos:
- The table formatting is misaligned from line 41 onwards, making it difficult to read.
- There is a typo in the description for
algorithm_execution_intervalon line 42 ("The forward iterations...").
To improve clarity and correctness, I suggest replacing lines 37-46 to only include the relevant, correctly formatted, and de-duplicated configurations related to this PR's scope.
| | `lmhead_tensor_parallel_size` | int | `None` | The custom tensor parallel size of lmhead. Restriction: Can only be used when tensor_parallel=1 | | |
| | `oproj_tensor_parallel_size` | int | `None` | The custom tensor parallel size of oproj. | | |
| | `multistream_overlap_shared_expert` | bool | `False` | Whether to enable multistream shared expert. This option only takes effect on MoE models with shared experts. | | |
| | `dynamic_eplb` | bool | `False` | Whether to enable dynamic EPLB. | | |
| | `expert_heat_collection_interval` | int | `400` | Forward iterations when EPLB begins. | | |
| | `algorithm_execution_interval` | int | `30` | The forward iterations when the EPLB worker will finish CPU tasks. In our test default value 30 can cover most cases. | | |
| | `expert_map_record_path` | str | `None` | Save the expert load calculation results to a new expert table in the specified directory. | | |
| | `num_redundant_experts` | int | `0` | Specify redundant experts during initialization. | | |
| | `dump_config` | str | `None` | Configuration file path for msprobe dump(eager mode). | | |
| | `enable_async_exponential` | int | `0` | Whether to enable async exponential overlap. To enable async exponential, set this config to 1. | | |
| | `multistream_overlap_shared_expert` | bool | `False` | Whether to enable multistream shared expert. This option only takes effect on MoE models with shared experts. | | |
| | `dynamic_eplb` | bool | `False` | Whether to enable dynamic EPLB. | | |
| | `expert_heat_collection_interval` | int | `400` | Forward iterations when EPLB begins. | | |
| | `algorithm_execution_interval` | int | `30` | The forward iterations when the EPLB worker will finish CPU tasks. In our test default value 30 can cover most cases. | | |
| | `expert_map_record_path` | str | `None` | Save the expert load calculation results to a new expert table in the specified directory. | | |
| | `num_redundant_experts` | int | `0` | Specify redundant experts during initialization. | |
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
a70fafd to
3ae6ce4
Compare
cbe20d9 to
455934c
Compare
a0afcd6 to
e9462fc
Compare
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
e297113 to
000b338
Compare
|
ci passed here: https://github.com/vllm-project/vllm-ascend/actions/runs/20952178776?pr=5533 |
af176ae to
0a2eee7
Compare
Signed-off-by: shenchuxiaofugui <1311027364@qq.com>
0a2eee7 to
0f0c8e5
Compare
Conflict was resolved. |
### What this PR does / why we need it?
1. Rename num_iterations_eplb_update to expert_heat_collection_interval.
2. Rename num_wait_worker_iterations to algorithm_execution_interval.
3. Rename init_redundancy_expert to num_redundant_experts because the
variable with the same meaning in vLLM is named this way.
4. Delete gate_eplb because we don't need this feature.
5. Move eplb config into a dict in additional config.
6. Depend on pr5817
### Does this PR introduce _any_ user-facing change?
before this pr:
`--additional-config '{"dynamic_eplb":true,
"num_iterations_eplb_update": 4000, "num_wait_worker_iterations": 150,
"init_redundancy_expert": 16, "expert_map_path": "xxx.json"}'`
after this pr:
`--additional-config
'{"eplb_config":{"dynamic_eplb":true,"expert_heat_collection_interval":4000,
"algorithm_execution_interval":150,"num_redundant_experts": 16,
"expert_map_path": "xxx.json"}}'`
### How was this patch tested?
#### test qwen3-235b eplb num_redundant_experts=16
without pr5817
| dataset | version | metric | mode | vllm-api-general-chat |
|----- | ----- | ----- | ----- | -----|
| aime2024 | 604a78 | accuracy | gen | 83.33 |
with pr5817
| dataset | version | metric | mode | vllm-api-general-chat |
|----- | ----- | ----- | ----- | -----|
| aime2024 | 604a78 | accuracy | gen | 86.67 |
- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@45c1ca1
Signed-off-by: shenchuxiaofugui <1311027364@qq.com>
### What this PR does / why we need it? #5533 Add a wrapper for the eplb startup configuration; this is a forward-compatible update. ### Does this PR introduce _any_ user-facing change? before this pr: --additional-config '{"dynamic_eplb":true, "num_iterations_eplb_update": 4000, "num_wait_worker_iterations": 150, "init_redundancy_expert": 16, "expert_map_path": "xxx.json"}' after this pr: --additional-config '{"eplb_config":{"dynamic_eplb":true,"expert_heat_collection_interval":4000, "algorithm_execution_interval":150,"num_redundant_experts": 16, "expert_map_path": "xxx.json"}}' ### How was this patch tested? qwen3-30b dialogue Okay, the user is asking, \"What is deep learning?\" I need to explain this in a clear and concise way. Let me start by recalling what I know about deep learning. It's a subset of machine learning, right? So first, I should mention that it's part of machine learning, which is a branch of AI. Then, the key point is that deep learning uses neural networks with multiple layers. The term \"deep\" refers to the number of layers in the network.\n\nI should explain what neural networks are. Maybe start with the basics: they're inspired by the human brain, with layers of nodes (neurons). Each layer processes data and passes it to the next. The more layers, the deeper the network. But I need to make sure not to get too technical here.\n\nExamples would help. Maybe mention applications like image recognition, speech recognition, natural language processing. For instance, when you use a smartphone's facial recognition, that's deep learning. Or when you ask a virtual assistant like Siri or Alexa, that's also deep learning in action.\n\nI should also touch on how deep learning works. It requires a lot of data and computational power. The process involves training the network with labeled data, adjusting the weights of the connections between neurons through backpropagation. The more data and layers, the better the model can learn complex patterns.\n\nWait, but the user might not know what backpropagation is. Maybe I should avoid that term unless necessary. Signed-off-by: shenchuxiaofugui <1311027364@qq.com>
### What this PR does / why we need it?
1. Rename num_iterations_eplb_update to expert_heat_collection_interval.
2. Rename num_wait_worker_iterations to algorithm_execution_interval.
3. Rename init_redundancy_expert to num_redundant_experts because the
variable with the same meaning in vLLM is named this way.
4. Delete gate_eplb because we don't need this feature.
5. Move eplb config into a dict in additional config.
6. Depend on pr5817
### Does this PR introduce _any_ user-facing change?
before this pr:
`--additional-config '{"dynamic_eplb":true,
"num_iterations_eplb_update": 4000, "num_wait_worker_iterations": 150,
"init_redundancy_expert": 16, "expert_map_path": "xxx.json"}'`
after this pr:
`--additional-config
'{"eplb_config":{"dynamic_eplb":true,"expert_heat_collection_interval":4000,
"algorithm_execution_interval":150,"num_redundant_experts": 16,
"expert_map_path": "xxx.json"}}'`
### How was this patch tested?
#### test qwen3-235b eplb num_redundant_experts=16
without pr5817
| dataset | version | metric | mode | vllm-api-general-chat |
|----- | ----- | ----- | ----- | -----|
| aime2024 | 604a78 | accuracy | gen | 83.33 |
with pr5817
| dataset | version | metric | mode | vllm-api-general-chat |
|----- | ----- | ----- | ----- | -----|
| aime2024 | 604a78 | accuracy | gen | 86.67 |
- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@45c1ca1
Signed-off-by: shenchuxiaofugui <1311027364@qq.com>
### What this PR does / why we need it? vllm-project#5533 Add a wrapper for the eplb startup configuration; this is a forward-compatible update. ### Does this PR introduce _any_ user-facing change? before this pr: --additional-config '{"dynamic_eplb":true, "num_iterations_eplb_update": 4000, "num_wait_worker_iterations": 150, "init_redundancy_expert": 16, "expert_map_path": "xxx.json"}' after this pr: --additional-config '{"eplb_config":{"dynamic_eplb":true,"expert_heat_collection_interval":4000, "algorithm_execution_interval":150,"num_redundant_experts": 16, "expert_map_path": "xxx.json"}}' ### How was this patch tested? qwen3-30b dialogue Okay, the user is asking, \"What is deep learning?\" I need to explain this in a clear and concise way. Let me start by recalling what I know about deep learning. It's a subset of machine learning, right? So first, I should mention that it's part of machine learning, which is a branch of AI. Then, the key point is that deep learning uses neural networks with multiple layers. The term \"deep\" refers to the number of layers in the network.\n\nI should explain what neural networks are. Maybe start with the basics: they're inspired by the human brain, with layers of nodes (neurons). Each layer processes data and passes it to the next. The more layers, the deeper the network. But I need to make sure not to get too technical here.\n\nExamples would help. Maybe mention applications like image recognition, speech recognition, natural language processing. For instance, when you use a smartphone's facial recognition, that's deep learning. Or when you ask a virtual assistant like Siri or Alexa, that's also deep learning in action.\n\nI should also touch on how deep learning works. It requires a lot of data and computational power. The process involves training the network with labeled data, adjusting the weights of the connections between neurons through backpropagation. The more data and layers, the better the model can learn complex patterns.\n\nWait, but the user might not know what backpropagation is. Maybe I should avoid that term unless necessary. Signed-off-by: shenchuxiaofugui <1311027364@qq.com>
### What this PR does / why we need it?
1. Rename num_iterations_eplb_update to expert_heat_collection_interval.
2. Rename num_wait_worker_iterations to algorithm_execution_interval.
3. Rename init_redundancy_expert to num_redundant_experts because the
variable with the same meaning in vLLM is named this way.
4. Delete gate_eplb because we don't need this feature.
5. Move eplb config into a dict in additional config.
6. Depend on pr5817
### Does this PR introduce _any_ user-facing change?
before this pr:
`--additional-config '{"dynamic_eplb":true,
"num_iterations_eplb_update": 4000, "num_wait_worker_iterations": 150,
"init_redundancy_expert": 16, "expert_map_path": "xxx.json"}'`
after this pr:
`--additional-config
'{"eplb_config":{"dynamic_eplb":true,"expert_heat_collection_interval":4000,
"algorithm_execution_interval":150,"num_redundant_experts": 16,
"expert_map_path": "xxx.json"}}'`
### How was this patch tested?
#### test qwen3-235b eplb num_redundant_experts=16
without pr5817
| dataset | version | metric | mode | vllm-api-general-chat |
|----- | ----- | ----- | ----- | -----|
| aime2024 | 604a78 | accuracy | gen | 83.33 |
with pr5817
| dataset | version | metric | mode | vllm-api-general-chat |
|----- | ----- | ----- | ----- | -----|
| aime2024 | 604a78 | accuracy | gen | 86.67 |
- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@45c1ca1
Signed-off-by: shenchuxiaofugui <1311027364@qq.com>
### What this PR does / why we need it? vllm-project#5533 Add a wrapper for the eplb startup configuration; this is a forward-compatible update. ### Does this PR introduce _any_ user-facing change? before this pr: --additional-config '{"dynamic_eplb":true, "num_iterations_eplb_update": 4000, "num_wait_worker_iterations": 150, "init_redundancy_expert": 16, "expert_map_path": "xxx.json"}' after this pr: --additional-config '{"eplb_config":{"dynamic_eplb":true,"expert_heat_collection_interval":4000, "algorithm_execution_interval":150,"num_redundant_experts": 16, "expert_map_path": "xxx.json"}}' ### How was this patch tested? qwen3-30b dialogue Okay, the user is asking, \"What is deep learning?\" I need to explain this in a clear and concise way. Let me start by recalling what I know about deep learning. It's a subset of machine learning, right? So first, I should mention that it's part of machine learning, which is a branch of AI. Then, the key point is that deep learning uses neural networks with multiple layers. The term \"deep\" refers to the number of layers in the network.\n\nI should explain what neural networks are. Maybe start with the basics: they're inspired by the human brain, with layers of nodes (neurons). Each layer processes data and passes it to the next. The more layers, the deeper the network. But I need to make sure not to get too technical here.\n\nExamples would help. Maybe mention applications like image recognition, speech recognition, natural language processing. For instance, when you use a smartphone's facial recognition, that's deep learning. Or when you ask a virtual assistant like Siri or Alexa, that's also deep learning in action.\n\nI should also touch on how deep learning works. It requires a lot of data and computational power. The process involves training the network with labeled data, adjusting the weights of the connections between neurons through backpropagation. The more data and layers, the better the model can learn complex patterns.\n\nWait, but the user might not know what backpropagation is. Maybe I should avoid that term unless necessary. Signed-off-by: shenchuxiaofugui <1311027364@qq.com>
### What this PR does / why we need it? vllm-project#5533 Add a wrapper for the eplb startup configuration; this is a forward-compatible update. ### Does this PR introduce _any_ user-facing change? before this pr: --additional-config '{"dynamic_eplb":true, "num_iterations_eplb_update": 4000, "num_wait_worker_iterations": 150, "init_redundancy_expert": 16, "expert_map_path": "xxx.json"}' after this pr: --additional-config '{"eplb_config":{"dynamic_eplb":true,"expert_heat_collection_interval":4000, "algorithm_execution_interval":150,"num_redundant_experts": 16, "expert_map_path": "xxx.json"}}' ### How was this patch tested? qwen3-30b dialogue Okay, the user is asking, \"What is deep learning?\" I need to explain this in a clear and concise way. Let me start by recalling what I know about deep learning. It's a subset of machine learning, right? So first, I should mention that it's part of machine learning, which is a branch of AI. Then, the key point is that deep learning uses neural networks with multiple layers. The term \"deep\" refers to the number of layers in the network.\n\nI should explain what neural networks are. Maybe start with the basics: they're inspired by the human brain, with layers of nodes (neurons). Each layer processes data and passes it to the next. The more layers, the deeper the network. But I need to make sure not to get too technical here.\n\nExamples would help. Maybe mention applications like image recognition, speech recognition, natural language processing. For instance, when you use a smartphone's facial recognition, that's deep learning. Or when you ask a virtual assistant like Siri or Alexa, that's also deep learning in action.\n\nI should also touch on how deep learning works. It requires a lot of data and computational power. The process involves training the network with labeled data, adjusting the weights of the connections between neurons through backpropagation. The more data and layers, the better the model can learn complex patterns.\n\nWait, but the user might not know what backpropagation is. Maybe I should avoid that term unless necessary. Signed-off-by: shenchuxiaofugui <1311027364@qq.com>
### What this PR does / why we need it?
1. Rename num_iterations_eplb_update to expert_heat_collection_interval.
2. Rename num_wait_worker_iterations to algorithm_execution_interval.
3. Rename init_redundancy_expert to num_redundant_experts because the
variable with the same meaning in vLLM is named this way.
4. Delete gate_eplb because we don't need this feature.
5. Move eplb config into a dict in additional config.
6. Depend on pr5817
### Does this PR introduce _any_ user-facing change?
before this pr:
`--additional-config '{"dynamic_eplb":true,
"num_iterations_eplb_update": 4000, "num_wait_worker_iterations": 150,
"init_redundancy_expert": 16, "expert_map_path": "xxx.json"}'`
after this pr:
`--additional-config
'{"eplb_config":{"dynamic_eplb":true,"expert_heat_collection_interval":4000,
"algorithm_execution_interval":150,"num_redundant_experts": 16,
"expert_map_path": "xxx.json"}}'`
### How was this patch tested?
#### test qwen3-235b eplb num_redundant_experts=16
without pr5817
| dataset | version | metric | mode | vllm-api-general-chat |
|----- | ----- | ----- | ----- | -----|
| aime2024 | 604a78 | accuracy | gen | 83.33 |
with pr5817
| dataset | version | metric | mode | vllm-api-general-chat |
|----- | ----- | ----- | ----- | -----|
| aime2024 | 604a78 | accuracy | gen | 86.67 |
- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@45c1ca1
Signed-off-by: shenchuxiaofugui <1311027364@qq.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
### What this PR does / why we need it?
1. Rename num_iterations_eplb_update to expert_heat_collection_interval.
2. Rename num_wait_worker_iterations to algorithm_execution_interval.
3. Rename init_redundancy_expert to num_redundant_experts because the
variable with the same meaning in vLLM is named this way.
4. Delete gate_eplb because we don't need this feature.
5. Move eplb config into a dict in additional config.
6. Depend on pr5817
### Does this PR introduce _any_ user-facing change?
before this pr:
`--additional-config '{"dynamic_eplb":true,
"num_iterations_eplb_update": 4000, "num_wait_worker_iterations": 150,
"init_redundancy_expert": 16, "expert_map_path": "xxx.json"}'`
after this pr:
`--additional-config
'{"eplb_config":{"dynamic_eplb":true,"expert_heat_collection_interval":4000,
"algorithm_execution_interval":150,"num_redundant_experts": 16,
"expert_map_path": "xxx.json"}}'`
### How was this patch tested?
#### test qwen3-235b eplb num_redundant_experts=16
without pr5817
| dataset | version | metric | mode | vllm-api-general-chat |
|----- | ----- | ----- | ----- | -----|
| aime2024 | 604a78 | accuracy | gen | 83.33 |
with pr5817
| dataset | version | metric | mode | vllm-api-general-chat |
|----- | ----- | ----- | ----- | -----|
| aime2024 | 604a78 | accuracy | gen | 86.67 |
- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@45c1ca1
Signed-off-by: shenchuxiaofugui <1311027364@qq.com>
### What this PR does / why we need it?
1. Rename num_iterations_eplb_update to expert_heat_collection_interval.
2. Rename num_wait_worker_iterations to algorithm_execution_interval.
3. Rename init_redundancy_expert to num_redundant_experts because the
variable with the same meaning in vLLM is named this way.
4. Delete gate_eplb because we don't need this feature.
5. Move eplb config into a dict in additional config.
6. Depend on pr5817
### Does this PR introduce _any_ user-facing change?
before this pr:
`--additional-config '{"dynamic_eplb":true,
"num_iterations_eplb_update": 4000, "num_wait_worker_iterations": 150,
"init_redundancy_expert": 16, "expert_map_path": "xxx.json"}'`
after this pr:
`--additional-config
'{"eplb_config":{"dynamic_eplb":true,"expert_heat_collection_interval":4000,
"algorithm_execution_interval":150,"num_redundant_experts": 16,
"expert_map_path": "xxx.json"}}'`
### How was this patch tested?
#### test qwen3-235b eplb num_redundant_experts=16
without pr5817
| dataset | version | metric | mode | vllm-api-general-chat |
|----- | ----- | ----- | ----- | -----|
| aime2024 | 604a78 | accuracy | gen | 83.33 |
with pr5817
| dataset | version | metric | mode | vllm-api-general-chat |
|----- | ----- | ----- | ----- | -----|
| aime2024 | 604a78 | accuracy | gen | 86.67 |
- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@45c1ca1
Signed-off-by: shenchuxiaofugui <1311027364@qq.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
### What this PR does / why we need it?
1. Rename num_iterations_eplb_update to expert_heat_collection_interval.
2. Rename num_wait_worker_iterations to algorithm_execution_interval.
3. Rename init_redundancy_expert to num_redundant_experts because the
variable with the same meaning in vLLM is named this way.
4. Delete gate_eplb because we don't need this feature.
5. Move eplb config into a dict in additional config.
6. Depend on pr5817
### Does this PR introduce _any_ user-facing change?
before this pr:
`--additional-config '{"dynamic_eplb":true,
"num_iterations_eplb_update": 4000, "num_wait_worker_iterations": 150,
"init_redundancy_expert": 16, "expert_map_path": "xxx.json"}'`
after this pr:
`--additional-config
'{"eplb_config":{"dynamic_eplb":true,"expert_heat_collection_interval":4000,
"algorithm_execution_interval":150,"num_redundant_experts": 16,
"expert_map_path": "xxx.json"}}'`
### How was this patch tested?
#### test qwen3-235b eplb num_redundant_experts=16
without pr5817
| dataset | version | metric | mode | vllm-api-general-chat |
|----- | ----- | ----- | ----- | -----|
| aime2024 | 604a78 | accuracy | gen | 83.33 |
with pr5817
| dataset | version | metric | mode | vllm-api-general-chat |
|----- | ----- | ----- | ----- | -----|
| aime2024 | 604a78 | accuracy | gen | 86.67 |
- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@45c1ca1
Signed-off-by: shenchuxiaofugui <1311027364@qq.com>
What this PR does / why we need it?
Does this PR introduce any user-facing change?
before this pr:
--additional-config '{"dynamic_eplb":true, "num_iterations_eplb_update": 4000, "num_wait_worker_iterations": 150, "init_redundancy_expert": 16, "expert_map_path": "xxx.json"}'after this pr:
--additional-config '{"eplb_config":{"dynamic_eplb":true,"expert_heat_collection_interval":4000, "algorithm_execution_interval":150,"num_redundant_experts": 16, "expert_map_path": "xxx.json"}}'How was this patch tested?
test qwen3-235b eplb num_redundant_experts=16
without pr5817
with pr5817