-
Notifications
You must be signed in to change notification settings - Fork 173
feat: support DP inside vLLM for EP #1081
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
terrykong
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. @parthchadha to review
Signed-off-by: Yuki Huang <[email protected]> fix type Signed-off-by: Yuki Huang <[email protected]> fix rank Signed-off-by: Yuki Huang <[email protected]> fix local rank Signed-off-by: Yuki Huang <[email protected]>
Signed-off-by: Yuki Huang <[email protected]>
Signed-off-by: Yuki Huang <[email protected]>
Signed-off-by: Yuki Huang <[email protected]>
Signed-off-by: Yuki Huang <[email protected]>
Signed-off-by: Yuki Huang <[email protected]>
93dfe12 to
14b45e1
Compare
WalkthroughConfigs replace boolean Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Orchestrator
participant VirtualCluster
participant WorkerGroups
participant Worker as "vLLM Worker"
Orchestrator->>VirtualCluster: init/create placement groups (retries/backoff, timeout)
VirtualCluster-->>Orchestrator: placement groups ready + (ip,port) per bundle
Orchestrator->>WorkerGroups: request workers (bundle_indices_list)
loop per (pg_idx,bundle_idx)
WorkerGroups->>VirtualCluster: get_available_address_and_port(pg_idx,bundle_idx)
VirtualCluster-->>WorkerGroups: (ip, port)
end
WorkerGroups->>Worker: spawn with env { AVAILABLE_ADDR_LIST, AVAILABLE_PORT_LIST, ... }
Worker->>Worker: read cfg.expert_parallel_size and tensor_parallel_size
alt expert_parallel_size > tensor_parallel_size
Worker->>Worker: compute DP sizing, set VLLM_DP_RANK / VLLM_DP_RANK_LOCAL
Worker->>Worker: set VLLM_DP_MASTER_IP and VLLM_DP_MASTER_PORT from AVAILABLE_* lists
end
Worker-->>Orchestrator: worker ready
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related issues
Suggested reviewers
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 8
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
examples/configs/recipes/llm/grpo-deepscaler-1.5b-24K.yaml (1)
41-49: No 'enable_expert_parallel' config key found; fix vllm config typing
- Sweep: no configs/tests contain an "enable_expert_parallel" config key — it's only a runtime variable in nemo_rl/models/generation/vllm/vllm_worker.py (set from expert_parallel_size and passed to vLLM).
- Action (critical): nemo_rl/models/generation/vllm/config.py currently declares "expert_parallel_size: bool" (line ~23) but configs/tests use integers. Change that to "expert_parallel_size: int" and run validation/tests.
🧹 Nitpick comments (22)
examples/configs/recipes/llm/grpo-gspo-deepscaler-1.5b-8K.yaml (1)
105-109: Consistent migration to expert_parallel_size.EP=1 with TP=1 is a no-op; OK. Add the EP%TP note for clarity.
- expert_parallel_size: 1 + # EP > 1 requires EP % TP == 0 + expert_parallel_size: 1examples/configs/recipes/llm/grpo-qwen2.5-7b-instruct-4n8g-megatron.yaml (1)
151-154: Migration LGTM; EP disabled by default.Here TP=4; keep a comment to avoid misconfig when enabling EP later.
- expert_parallel_size: 1 + # If you enable EP, set expert_parallel_size to a multiple of TP (=4 here) + expert_parallel_size: 1examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-2n8g-fsdp2tp1-noncolocated.yaml (1)
99-101: Rename is correct; matches PR semantics (EP=1 => vLLM DP off).Optional: add the EP%TP constraint comment.
- expert_parallel_size: 1 + # EP > 1 requires EP % TP == 0 + expert_parallel_size: 1examples/configs/recipes/llm/grpo-qwen2.5-32b-32n8g-fsdp2tp8sp-actckpt.v3.yaml (1)
99-101: Good: default EP off; TP=4 here.Add a guardrail comment to prevent invalid EP later.
- expert_parallel_size: 1 + # If enabling EP, choose a multiple of TP (=4) + expert_parallel_size: 1examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-4n8g-fsdp2tp1-long.v3.yaml (1)
99-101: Change is consistent; EP off by default.Same optional comment suggestion.
- expert_parallel_size: 1 + # EP > 1 requires EP % TP == 0 + expert_parallel_size: 1examples/configs/recipes/llm/grpo-math-qwen3-30ba3b-megatron-tp4-32k.yaml (1)
133-138: Rename aligns with code; maintainers note for EP/TP constraint recommended.TP=4 in this config; comment will help future edits.
- expert_parallel_size: 1 + # To enable EP, set to a multiple of TP (=4). Otherwise keep at 1. + expert_parallel_size: 1examples/configs/recipes/llm/grpo-deepscaler-1.5b-8K.yaml (1)
104-108: LGTM; EP=1 keeps previous behavior.Add the EP%TP reminder to reduce misconfiguration.
- expert_parallel_size: 1 + # EP > 1 requires EP % TP == 0 + expert_parallel_size: 1examples/configs/evals/eval.yaml (1)
24-24: EP size switch ACK; default safe for single-GPU eval.Value 1 keeps EP disabled. Consider adding a comment that EP>1 on eval requires EP % TP == 0 and enough GPUs.
examples/configs/recipes/llm/grpo-gemma3-27b-it-8n8g-fsdp2tp8-actckpt-long.yaml (1)
99-99: Migration is correct; keep EP constraints visible to users.Optional: add a YAML comment noting EP > 1 requires EP % TP == 0 and may enable vLLM-internal DP = EP/TP.
examples/configs/recipes/llm/grpo-llama3.2-1b-instruct-1n8g-megatron.yaml (1)
129-129: Good rename; default preserves behavior.No functional change. If you plan to exercise the new DP-in-vLLM path in CI, consider adding a variant with EP > 1 and TP = 1 for coverage.
examples/configs/recipes/llm/grpo-qwen2.5-math-1.5b-instruct-1n8g-fsdp2tp1.v3.yaml (1)
99-99: Rename LGTM; add minimal guardrail note (optional).A short inline comment like “1 disables EP; >1 requires EP % TP == 0” could prevent misconfigurations.
examples/configs/recipes/llm/grpo-qwen2.5-7b-instruct-4n8g-fsdp2tp4sp.v3.yaml (1)
99-99: Config update is correct.Given TP=4, if someone sets EP>1 later, remind EP must be a multiple of 4; DP_in_vLLM = EP/4. Optional comment.
examples/configs/recipes/llm/grpo-deepscaler-1.5b-24K.yaml (1)
41-49: Rename to expert_parallel_size looks good; default preserves behavior.Setting expert_parallel_size: 1 is backward‑compatible. When users tune EP > 1, please document that EP must be divisible by TP (EP % TP == 0) so vLLM DP = EP/TP is integral.
examples/configs/recipes/llm/grpo-llama3.2-1b-instruct-1n8g-fsdp2tp1.v3.yaml (1)
95-102: Good default for EP.expert_parallel_size: 1 preserves previous behavior; note in docs that EP > 1 requires EP % TP == 0.
tests/unit/environments/test_code_environment.py (1)
51-62: Use boolean for enforce_eager (avoid string).The current value is a string "False"; prefer boolean False to match runtime expectations.
Apply this diff:
"gpu_memory_utilization": 0.6, - "enforce_eager": "False", + "enforce_eager": False,tests/unit/models/generation/test_vllm_generation.py (2)
53-65: EP key migration OK; also fix enforce_eager type.Key rename is correct. Please switch enforce_eager from string to boolean for consistency.
"load_format": "auto", - "enforce_eager": "False", + "enforce_eager": False,
53-65: Add a minimal EP>1 test to cover new DP-in-vLLM path.Parametrize a case with TP=1, EP=2 to validate layout checks and env wiring.
# Outside this block: add a new test def test_vllm_ep_gt_1_layout(cluster, tokenizer): cfg = deepcopy(basic_vllm_test_config) cfg = configure_generation_config(cfg, tokenizer, is_eval=True) cfg["vllm_cfg"]["tensor_parallel_size"] = 1 cfg["vllm_cfg"]["expert_parallel_size"] = 2 # triggers vLLM-internal DP v = VllmGeneration(cluster, cfg) v.finish_generation() # init path exercises EP layout v.shutdown()tests/unit/environments/test_retriever.py (1)
51-61: Set enforce_eager to boolean False.Avoid quoting booleans in config dict.
"gpu_memory_utilization": 0.6, - "enforce_eager": "False", + "enforce_eager": False,nemo_rl/models/generation/vllm/vllm_generation.py (1)
154-163: Make vLLM DP env setting robust; optional NCCL resiliency
- Gate
VLLM_DP_SIZEon computedvllm_dp_size > 1(covers future changes).- Optional: set
NCCL_ASYNC_ERROR_HANDLING=1for safer comms.Apply this diff:
if not self.cfg["colocated"]["enabled"]: env_vars["NCCL_CUMEM_ENABLE"] = "1" - # We should use vLLM DP if ep_size > tp_size since EP_SIZE = DP_SIZE * TP_SIZE in vLLM. - # See details in https://github.com/vllm-project/vllm/blob/main/examples/offline_inference/data_parallel.py - if self.ep_size > self.tp_size: - env_vars["VLLM_DP_SIZE"] = str(self.vllm_dp_size) + # Use vLLM internal DP when DP_SIZE > 1 (EP = DP * TP). + if self.vllm_dp_size > 1: + env_vars["VLLM_DP_SIZE"] = str(self.vllm_dp_size) + # Optional resiliency: + # env_vars.setdefault("NCCL_ASYNC_ERROR_HANDLING", "1")nemo_rl/distributed/virtual_cluster.py (3)
354-372: Validate pg_idx and bundle_idx before indexingOut-of-range indices will surface as
IndexError. Fail fast with a clear message.Apply this diff:
- placement_groups = self.get_placement_groups() - if len(placement_groups) == 1: - pg = placement_groups[0] - else: - pg = placement_groups[pg_idx] + placement_groups = self.get_placement_groups() + if not placement_groups: + raise RuntimeError("No placement groups available") + if len(placement_groups) == 1: + pg = placement_groups[0] + else: + if not (0 <= pg_idx < len(placement_groups)): + raise ValueError( + f"pg_idx out of range: {pg_idx} (available={len(placement_groups)})" + ) + pg = placement_groups[pg_idx] + if not (0 <= bundle_idx < pg.bundle_count): + raise ValueError( + f"bundle_idx out of range: {bundle_idx} (bundle_count={pg.bundle_count})" + )
375-383: Confirm Ray behavior with num_cpus=0 + PG scheduling across Ray versionsSetting
num_cpus=0is intentional to avoid contention, but older Ray versions had quirks with PG enforcement on zero‑resource tasks. If you target older clusters, consider reserving a fractional CPU (e.g., 0.01) on the bundle.Would you like a small compatibility matrix check script for your supported Ray versions?
386-388: Minor: tighten error message or use a dedicated exceptionMessage is fine; if you want to appease linters, shorten or use a custom exception type.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (32)
examples/configs/evals/eval.yaml(1 hunks)examples/configs/grpo_math_1B.yaml(1 hunks)examples/configs/grpo_sliding_puzzle.yaml(1 hunks)examples/configs/recipes/llm/grpo-deepscaler-1.5b-24K.yaml(1 hunks)examples/configs/recipes/llm/grpo-deepscaler-1.5b-8K.yaml(1 hunks)examples/configs/recipes/llm/grpo-gemma3-1b-it-1n8g-fsdp2tp1.yaml(1 hunks)examples/configs/recipes/llm/grpo-gemma3-27b-it-8n8g-fsdp2tp8-actckpt-long.yaml(1 hunks)examples/configs/recipes/llm/grpo-gspo-deepscaler-1.5b-8K.yaml(1 hunks)examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-1n8g-megatron-fp8.yaml(1 hunks)examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-2n8g-fsdp2tp1-noncolocated.yaml(1 hunks)examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-4n8g-fsdp2tp1-long.v3.yaml(1 hunks)examples/configs/recipes/llm/grpo-llama3.2-1b-instruct-1n8g-fsdp2tp1.v3.yaml(1 hunks)examples/configs/recipes/llm/grpo-llama3.2-1b-instruct-1n8g-megatron.yaml(1 hunks)examples/configs/recipes/llm/grpo-math-qwen3-30ba3b-megatron-tp4-32k.yaml(1 hunks)examples/configs/recipes/llm/grpo-qwen2.5-32b-32n8g-fsdp2tp8sp-actckpt-long.v3.yaml(1 hunks)examples/configs/recipes/llm/grpo-qwen2.5-32b-32n8g-fsdp2tp8sp-actckpt.v3.yaml(1 hunks)examples/configs/recipes/llm/grpo-qwen2.5-7b-instruct-4n8g-fsdp2tp4sp.v3.yaml(1 hunks)examples/configs/recipes/llm/grpo-qwen2.5-7b-instruct-4n8g-megatron.yaml(1 hunks)examples/configs/recipes/llm/grpo-qwen2.5-math-1.5b-instruct-1n8g-fsdp2tp1.v3.yaml(1 hunks)examples/configs/recipes/vlm/vlm_grpo-qwen2.5-vl-3b-instruct-clevr-1n2g-dtensor2tp1.v1.yaml(1 hunks)examples/configs/recipes/vlm/vlm_grpo-smolvlm2-2.2b-instruct-clevr-1n2g-dtensor2tp1.v1.yaml(1 hunks)examples/configs/vlm_grpo_3B.yaml(1 hunks)nemo_rl/distributed/virtual_cluster.py(2 hunks)nemo_rl/distributed/worker_groups.py(2 hunks)nemo_rl/models/generation/vllm/config.py(1 hunks)nemo_rl/models/generation/vllm/vllm_generation.py(4 hunks)nemo_rl/models/generation/vllm/vllm_worker.py(2 hunks)tests/unit/environments/test_code_environment.py(1 hunks)tests/unit/environments/test_retriever.py(1 hunks)tests/unit/experience/test_rollouts.py(1 hunks)tests/unit/models/generation/test_vllm_generation.py(1 hunks)tests/unit/models/generation/test_vllm_large_model.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
nemo_rl/distributed/worker_groups.py (1)
nemo_rl/distributed/virtual_cluster.py (1)
get_available_address_and_port(354-388)
nemo_rl/models/generation/vllm/vllm_generation.py (3)
nemo_rl/distributed/worker_groups.py (1)
dp_size(590-592)tests/unit/models/generation/test_vllm_generation.py (1)
cluster(211-222)nemo_rl/distributed/virtual_cluster.py (1)
world_size(348-349)
🪛 Ruff (0.12.2)
nemo_rl/distributed/worker_groups.py
463-463: Loop control variable group_idx not used within loop body
(B007)
464-464: Loop control variable local_rank not used within loop body
(B007)
nemo_rl/models/generation/vllm/vllm_worker.py
347-347: Use of possibly insecure function; consider using ast.literal_eval
(S307)
348-348: Use of possibly insecure function; consider using ast.literal_eval
(S307)
nemo_rl/distributed/virtual_cluster.py
386-388: Avoid specifying long messages outside the exception class
(TRY003)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Lint check
🔇 Additional comments (11)
examples/configs/grpo_sliding_puzzle.yaml (1)
44-46: Rename to expert_parallel_size looks good; preserves default off (1).File: examples/configs/grpo_sliding_puzzle.yaml (lines 44–46)
Add a short inline note that when setting expert_parallel_size > 1 it must be a multiple of tensor_parallel_size.
- expert_parallel_size: 1 + # Set >1 to enable EP. Constraint: expert_parallel_size % tensor_parallel_size == 0 + expert_parallel_size: 1Verification not completed: the ripgrep run returned "No files were searched". Re-run from the repo root to confirm there are no lingering enable_expert_parallel keys and that an alias/deprecation path exists. Suggested command:
rg -n --hidden -S --no-ignore 'enable_expert_parallel|expert_parallel_size' -g '!**/site-packages/**'examples/configs/recipes/llm/grpo-gemma3-1b-it-1n8g-fsdp2tp1.yaml (1)
98-98: Rename LGTM.Defaults are sane; nothing further.
examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-1n8g-megatron-fp8.yaml (1)
128-128: Migration OK; FP8 generation unaffected.Nothing blocking. If you intend to demo EP>1 here, ensure kernel/layout support matches FP8 path.
examples/configs/recipes/llm/grpo-qwen2.5-32b-32n8g-fsdp2tp8sp-actckpt-long.v3.yaml (1)
99-99: expert_parallel_size default preserved — add YAML hint & confirm TP mismatch
- vllm_worker sets enable_expert_parallel = expert_parallel_size > 1 and passes it through (nemo_rl/models/generation/vllm/vllm_worker.py:138, 386) — expert_parallel_size: 1 preserves current behavior.
- Add a one-line YAML comment next to examples/configs/recipes/llm/grpo-qwen2.5-32b-32n8g-fsdp2tp8sp-actckpt-long.v3.yaml (around line 99) stating: "If >1, expert_parallel_size must be a multiple of generation.vllm_cfg.tensor_parallel_size; vLLM internal DP = EP / TP."
- Quick rg run did not surface tensor_parallel_size entries under examples/configs — confirm whether the dtensor_cfg.tensor_parallel_size=8 vs generation.vllm_cfg.tensor_parallel_size=4 (train vs gen) mismatch is intentional; if not, align or document.
examples/configs/recipes/vlm/vlm_grpo-qwen2.5-vl-3b-instruct-clevr-1n2g-dtensor2tp1.v1.yaml (1)
110-117: EP size key migration OK.expert_parallel_size: 1 aligns with new runtime logic and keeps EP disabled by default.
examples/configs/vlm_grpo_3B.yaml (1)
111-118: Consistent EP config.Good: expert_parallel_size replaces the old boolean and stays at 1.
examples/configs/recipes/vlm/vlm_grpo-smolvlm2-2.2b-instruct-clevr-1n2g-dtensor2tp1.v1.yaml (1)
109-116: Looks good.EP key rename applied consistently with value 1.
examples/configs/grpo_math_1B.yaml (1)
181-182: YAML update is consistent with the new EP semantics.Comment is correct (EP must be a multiple of TP).
tests/unit/experience/test_rollouts.py (1)
201-202: Config rename to expert_parallel_size is correct here.Keeps default EP=1 for tests.
nemo_rl/models/generation/vllm/vllm_worker.py (1)
137-139: Deriving enable_expert_parallel from expert_parallel_size is correct.This aligns with the new config contract (EP enabled iff size > 1).
Confirm all config sources now pass an int for expert_parallel_size (no strings).
nemo_rl/distributed/virtual_cluster.py (1)
390-397: Delegation wrapper looks goodBackwards compatibility via delegation to
(pg_idx=0, bundle_idx=0)is clean.
Signed-off-by: Yuki Huang <[email protected]>
Signed-off-by: Yuki Huang <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (3)
nemo_rl/models/generation/vllm/vllm_generation.py (3)
58-69: Compute sizes as ints, raise instead of assert, and defer vLLM-DP calc.Casts avoid accidental str math; explicit exceptions survive
-O; computevllm_dp_sizeonly after EP/TP validation.- self.tp_size = self.cfg["vllm_cfg"]["tensor_parallel_size"] - self.pp_size = self.cfg["vllm_cfg"]["pipeline_parallel_size"] - self.ep_size = self.cfg["vllm_cfg"]["expert_parallel_size"] - self.model_parallel_size = self.tp_size * self.pp_size - - assert cluster.world_size() % self.model_parallel_size == 0, ( - "World size must be a multiple of model parallel size. " - f"Got world size {cluster.world_size()} and model parallel size (TP * PP) {self.model_parallel_size}." - ) - self.dp_size = cluster.world_size() // self.model_parallel_size - self.vllm_dp_size = self.ep_size // self.tp_size + vllm_cfg = self.cfg["vllm_cfg"] + self.tp_size = int(vllm_cfg["tensor_parallel_size"]) + self.pp_size = int(vllm_cfg["pipeline_parallel_size"]) + self.ep_size = int(vllm_cfg["expert_parallel_size"]) + self.model_parallel_size = self.tp_size * self.pp_size + + total_mp = self.model_parallel_size + if cluster.world_size() % total_mp != 0: + raise ValueError( + f"world_size={cluster.world_size()} is not divisible by TP*PP={self.tp_size}*{self.pp_size}={total_mp}" + ) + self.dp_size = cluster.world_size() // total_mp
70-88: Replace assert-based config checks; validate EP vs TP before computing vLLM-DP.Asserts can be optimized out; make the checks deterministic and only then set
vllm_dp_size.- if self.pp_size > 1: - assert self.cfg["vllm_cfg"]["async_engine"], ( - "When pipeline_parallel_size > 1, async_engine must be set to True in the vLLM configuration. " - "You can enable it by adding `policy.generation.vllm_cfg.async_engine=true` to your command." - ) + if self.pp_size > 1 and not vllm_cfg["async_engine"]: + raise ValueError( + "When pipeline_parallel_size > 1, set vllm_cfg.async_engine=true." + ) - if self.ep_size > 1: - assert self.ep_size % self.tp_size == 0, ( - "When EP > 1, EP must be a multiple of TP since vLLM's EP = DP * TP. " - "Please update your configuration to set expert_parallel_size to a multiple of tensor_parallel_size." - ) - if self.ep_size != self.tp_size: - # vLLM's EP = DP * TP, so here we need to use DP inside vLLM. - assert not self.cfg["vllm_cfg"]["async_engine"], ( - "vLLM async_engine has some issues when using DP inside vLLM. " - "Please update your configuration to set `policy.generation.vllm_cfg.async_engine=false`. " - "See https://github.com/NVIDIA-NeMo/RL/issues/1101 for more details." - ) + if self.ep_size > 1: + if self.ep_size % self.tp_size != 0: + raise ValueError( + "When EP > 1, expert_parallel_size must be a multiple of tensor_parallel_size (EP = DP * TP in vLLM)." + ) + if self.ep_size != self.tp_size and vllm_cfg["async_engine"]: + raise ValueError( + "async_engine is not supported when using DP inside vLLM (EP != TP). Set vllm_cfg.async_engine=false. " + "See https://github.com/NVIDIA-NeMo/RL/issues/1101 for more details." + ) + # vLLM DP size computed after EP/TP validation + self.vllm_dp_size = max(1, self.ep_size // self.tp_size)
199-201: Don’t use assert for DP-size contract; raise a deterministic error.Prevents silent bypass under
python -O.- assert self.dp_size == self.worker_group.dp_size, ( - f"Data parallel size mismatch. Expected {self.dp_size}, got {self.worker_group.dp_size}" - ) + if self.dp_size != self.worker_group.dp_size: + raise RuntimeError( + f"Data-parallel size mismatch: expected {self.dp_size}, got {self.worker_group.dp_size}" + )
🧹 Nitpick comments (1)
nemo_rl/models/generation/vllm/vllm_generation.py (1)
93-114: Fix non-interpolated constants in error messages.These lines reference placeholders without f-strings; users will see braces literally.
- "does not return logprobs after top_k filtering. Values >= {TOP_K_THRESHOLD} are allowed " + f"does not return logprobs after top_k filtering. Values >= {TOP_K_THRESHOLD} are allowed " ... - "does not return logprobs after top_p filtering. Values >= {TOP_P_THRESHOLD} are allowed " + f"does not return logprobs after top_p filtering. Values >= {TOP_P_THRESHOLD} are allowed "
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
nemo_rl/models/generation/vllm/vllm_generation.py(4 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
nemo_rl/models/generation/vllm/vllm_generation.py (3)
tests/unit/models/generation/test_vllm_generation.py (1)
cluster(211-222)nemo_rl/distributed/virtual_cluster.py (1)
world_size(348-349)nemo_rl/distributed/worker_groups.py (1)
dp_size(590-592)
🔇 Additional comments (2)
nemo_rl/models/generation/vllm/vllm_generation.py (2)
127-132: NamedSharding reshape looks good given the divisibility guard.With the explicit divisibility check above, this reshape is safe.
160-169: Env wiring for VLLM_DP_SIZE is correct and scoped.Setting VLLM_DP_SIZE only when EP > TP matches the DP-inside-vLLM contract.
|
this commit passed as part of this merge queue run: https://github.com/NVIDIA-NeMo/RL/actions/runs/17828559136. will manually merge this one |
Signed-off-by: Yuki Huang <[email protected]>
What does this PR do ?
enable_expert_paralleltoexpert_parallel_size.enable_expert_parallelare all set to 1 by default, so that nothing will be affected.For how does DP in generation side be like after this PR:
Issues
Related #908.
Test Result
green: main branch baseline, TP32
blue: feature branch, using the same setting as baseline
red/pink: feature branch, TP8EP64
Convergence

Time cost

The EP setting is not fine tuned, so the generation time is slower.
Summary by CodeRabbit
New Features
Reliability
Refactor
Tests