Conversation
|
Caution Review failedThe pull request is closed. WalkthroughAdds tokenizer propagation into the parallel-thinking model path, a new config field Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant C as Caller
participant G as GenerationTask
participant PT as ParallelThinkingTask
participant PB as PromptBuilder
participant M as Model (generate_async)
participant DS as Downstream (GenSelect/GenSynthesis)
rect rgba(230,240,255,0.4)
note over C,G: High-level generate flow with tokenizer propagation
C->>G: request generate(input, **kwargs)
G->>PT: get_parallel_thinking_model(tokenizer=self.tokenizer)
G->>PT: generate_async(input, **kwargs)
PT->>PB: get_prompt(..., tokenizer=PT.tokenizer)
PB-->>PT: prompt
PT->>PT: strip {temperature,tokens_to_generate,prompt} from kwargs
PT->>M: generate_async(prompt, **remaining_kwargs)
alt results produced
M-->>PT: outputs
PT->>DS: post-process / synthesis
DS-->>PT: result (includes generation)
PT-->>G: result
G-->>C: result
else empty result
note right of PT: ensure "generation": ""
PT-->>G: { ..., generation: "" }
G-->>C: { ..., generation: "" }
end
end
sequenceDiagram
autonumber
participant Caller as Caller
participant PT as ParallelThinkingTask
participant GS as _run_genselect
participant PB as PromptBuilder
participant M as Model
rect rgba(240,255,240,0.4)
note over Caller,PT: GenSelect flow with kwargs propagation
Caller->>PT: genselect(input, **kwargs)
PT->>GS: _run_genselect(input, **kwargs)
GS->>PB: get_prompt(..., tokenizer=PT.tokenizer)
PB-->>GS: prompt
GS->>GS: remove conflicting generation keys from kwargs
GS->>M: generate_async(prompt, **remaining_kwargs)
alt selection found
M-->>GS: candidates
GS-->>PT: selection (with generation)
PT-->>Caller: selection
else no candidates
note right of GS: return with "generation": ""
GS-->>PT: { ..., generation: "" }
PT-->>Caller: { ..., generation: "" }
end
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem\n\n## Pre-merge checks and finishing touches\n \n \n❌ Failed checks (1 warning)\n\n| Check name | Status | Explanation | Resolution |\n| :----------------: | :--------- | :------------------------------------------------------------------------------------ | :----------------------------------------------------------------------------- |\n| Docstring Coverage |\n \n\n
✅ Passed checks (2 passed)\n\n| Check name | Status | Explanation |\n| :---------------: | :------- | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |\n| Description Check | ✅ Passed | Check skipped - CodeRabbit’s high-level summary is enabled. |\n| Title Check | ✅ Passed | The title succinctly indicates that this PR addresses fixes in the parallel thinking functionality within the inference model, which aligns with the changes made to argument passing and configuration in that module. It is clear, concise, and directly related to the main modifications without extraneous detail. |\n\n📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
nemo_skills/inference/model/parallel_thinking.py (2)
82-87: Verify tokenizer is set whenuse_completions_apiorstart_assistant_response_keyis enabled.The
tokenizerparameter is now passed directly to__init__and stored asself.tokenizer. However, ifself.cfg.use_completions_apiisTrueorself.cfg.start_assistant_response_keyis set, buttokenizerisNone, downstream calls toget_prompt()andprompt.fill()may fail with aValueError(seenemo_skills/prompt/utils.pylines 252-256).While the error message from
prompt.fill()is clear, consider adding an explicit validation guard in__init__to catch this misconfiguration early:if (self.cfg.use_completions_api or self.cfg.start_assistant_response_key) and tokenizer is None: raise ValueError( "`tokenizer` must be provided when `use_completions_api` is True " "or `start_assistant_response_key` is set." )This addresses the concern raised in the previous review and improves the user experience by providing immediate feedback on misconfiguration.
211-220: LGTM with optional style improvement.The call to
prompt.fill()now correctly forwardsstart_assistant_response_keyandchat_template_kwargs, addressing the PR objectives. The duplicate key removal (lines 217-219) prevents conflicts when kwargs overlap with explicit parameters ingenerate_async.Optional: Consider using
.pop()for a more idiomatic approach (as suggested by static analysis):- for duplicate_key in ["temperature", "tokens_to_generate", "prompt"]: - if duplicate_key in kwargs: - del kwargs[duplicate_key] + for duplicate_key in ["temperature", "tokens_to_generate", "prompt"]: + kwargs.pop(duplicate_key, None)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
nemo_skills/inference/generate.py(1 hunks)nemo_skills/inference/model/__init__.py(2 hunks)nemo_skills/inference/model/parallel_thinking.py(5 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
nemo_skills/inference/model/__init__.py (2)
nemo_skills/inference/model/parallel_thinking.py (1)
ParallelThinkingTask(76-395)nemo_skills/inference/chat_interface/core.py (1)
cfg(181-182)
nemo_skills/inference/model/parallel_thinking.py (2)
nemo_skills/inference/model/base.py (1)
BaseModel(32-513)nemo_skills/prompt/utils.py (2)
get_prompt(370-403)fill(241-303)
🪛 Ruff (0.13.2)
nemo_skills/inference/model/parallel_thinking.py
219-219: Use pop instead of key in dict followed by del dict[key]
Replace if statement with .pop(..., None)
(RUF051)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: pre-commit
- GitHub Check: unit-tests
🔇 Additional comments (8)
nemo_skills/inference/model/parallel_thinking.py (6)
58-58: LGTM! New config field for assistant response prefix.The addition of
start_assistant_response_keyenables prepending a value to the assistant response during prompt generation, which is necessary for tool usage scenarios as stated in the PR objectives.
91-97: LGTM! Consistent tokenizer propagation.The calls to
get_prompt()now correctly usetokenizer=self.tokenizer, aligning with the updated__init__signature and ensuring tokenizer context flows through prompt generation.
221-227: LGTM! Kwargs propagation ensures downstream parameters are forwarded.The addition of
**kwargson line 226 ensures that generation-related parameters (e.g.,tools,reasoning_effort) are correctly propagated to the model'sgenerate_asynccall, addressing the PR objectives for tool usage during GenSelect.
352-362: LGTM! Ensures requiredgenerationkey is present.The addition of
"generation": ""on line 355 ensures that the result dict always contains agenerationkey, which is required by downstream code ininference/generate.py(line 490:output[self.cfg.generation_key] = output.pop("generation")). This prevents potentialKeyErrorexceptions in empty-result scenarios.
364-376: LGTM! Kwargs propagation in GenSelect path.The addition of
**kwargson line 366 ensures that generation-related parameters are correctly propagated through the GenSelect path, aligning with the PR objectives to support tool usage during GenSelect generation.
387-394: LGTM! Ensuresgenerationkey is always present.Lines 390-393 ensure that the
generationkey is always present in the result dict, even whensolution_keyis different. This is consistent with the empty-result branch (line 355) and prevents downstream errors ininference/generate.py.nemo_skills/inference/generate.py (1)
370-383: LGTM! Tokenizer propagation to parallel thinking model.Line 381 correctly passes
tokenizer=self.tokenizertoget_parallel_thinking_model, ensuring that the parallel thinking path has access to the tokenizer context when needed (e.g., for completions API or tool usage). This aligns with the broader PR changes to support tokenizer-aware prompt generation.nemo_skills/inference/model/__init__.py (1)
71-95: LGTM! Public API updated to accept tokenizer.Lines 75 and 93-95 correctly update the
get_parallel_thinking_modelsignature to accept an optionaltokenizerparameter and forward it to theParallelThinkingTaskconstructor. This aligns with the broader PR changes to support tokenizer-aware prompt generation in the parallel thinking path.
Signed-off-by: Shubham Toshniwal <stoshniwal@nvidia.com>
Signed-off-by: Shubham Toshniwal <stoshniwal@nvidia.com> Signed-off-by: SeanNaren <snarenthiran@nvidia.com>
Signed-off-by: Shubham Toshniwal <stoshniwal@nvidia.com> Signed-off-by: SeanNaren <snarenthiran@nvidia.com>
Signed-off-by: Shubham Toshniwal <stoshniwal@nvidia.com> Signed-off-by: dgitman <dgitman@nvidia.com>
There were a few arguments not being passed during genselect generation which are particularly necessary for tool usage during genselect. Testing with gpt-oss distilled models revealed these shortcomings. This PR fixes it.
Summary by CodeRabbit
New Features
Bug Fixes
Refactor