Parallel thinking fixes by shtoshni · Pull Request #887 · NVIDIA-NeMo/Skills

shtoshni · 2025-10-03T19:04:02Z

There were a few arguments not being passed during genselect generation which are particularly necessary for tool usage during genselect. Testing with gpt-oss distilled models revealed these shortcomings. This PR fixes it.

Summary by CodeRabbit

New Features
- Added a config option to set the assistant response key and enable tokenizer-awareness in parallel-thinking flows.
Bug Fixes
- Empty-result branches now include a "generation" field to meet downstream expectations.
- Generation-related options are consistently forwarded through all execution paths to avoid dropped settings.
Refactor
- Streamlined generation-parameter handling by stripping conflicting keys before model calls and preserving generation fields across branches.

coderabbitai · 2025-10-03T20:10:39Z

Caution

Review failed

The pull request is closed.

Walkthrough

Adds tokenizer propagation into the parallel-thinking model path, a new config field start_assistant_response_key, forwards prompt/template kwargs into prompt filling, strips conflicting generation keys before model calls, and ensures empty-result branches include "generation": "" while propagating extra kwargs through generate_async and GenSelect.

Changes

Cohort / File(s)	Summary of changes
Parallel Thinking Inference `nemo_skills/inference/model/parallel_thinking.py`	- Add `start_assistant_response_key: str
Model factory `nemo_skills/inference/model/__init__.py`	- Add optional `tokenizer` parameter to `get_parallel_thinking_model(...)` and forward it to `ParallelThinkingTask(...)` initialization.
Generation entrypoint `nemo_skills/inference/generate.py`	- `GenerationTask.setup_llm` now calls `get_parallel_thinking_model(..., tokenizer=self.tokenizer)` to pass tokenizer into the parallel-thinking wrapper.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant C as Caller
  participant G as GenerationTask
  participant PT as ParallelThinkingTask
  participant PB as PromptBuilder
  participant M as Model (generate_async)
  participant DS as Downstream (GenSelect/GenSynthesis)

  rect rgba(230,240,255,0.4)
    note over C,G: High-level generate flow with tokenizer propagation
    C->>G: request generate(input, **kwargs)
    G->>PT: get_parallel_thinking_model(tokenizer=self.tokenizer)
    G->>PT: generate_async(input, **kwargs)
    PT->>PB: get_prompt(..., tokenizer=PT.tokenizer)
    PB-->>PT: prompt
    PT->>PT: strip {temperature,tokens_to_generate,prompt} from kwargs
    PT->>M: generate_async(prompt, **remaining_kwargs)
    alt results produced
      M-->>PT: outputs
      PT->>DS: post-process / synthesis
      DS-->>PT: result (includes generation)
      PT-->>G: result
      G-->>C: result
    else empty result
      note right of PT: ensure "generation": ""
      PT-->>G: { ..., generation: "" }
      G-->>C: { ..., generation: "" }
    end
  end

sequenceDiagram
  autonumber
  participant Caller as Caller
  participant PT as ParallelThinkingTask
  participant GS as _run_genselect
  participant PB as PromptBuilder
  participant M as Model

  rect rgba(240,255,240,0.4)
    note over Caller,PT: GenSelect flow with kwargs propagation
    Caller->>PT: genselect(input, **kwargs)
    PT->>GS: _run_genselect(input, **kwargs)
    GS->>PB: get_prompt(..., tokenizer=PT.tokenizer)
    PB-->>GS: prompt
    GS->>GS: remove conflicting generation keys from kwargs
    GS->>M: generate_async(prompt, **remaining_kwargs)
    alt selection found
      M-->>GS: candidates
      GS-->>PT: selection (with generation)
      PT-->>Caller: selection
    else no candidates
      note right of GS: return with "generation": ""
      GS-->>PT: { ..., generation: "" }
      PT-->>Caller: { ..., generation: "" }
    end
  end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

I twitch my whiskers at a new key's light,
Tokens hop through tunnels, tidy and bright.
Prompts now whisper where responses start,
Kwargs pass along their tiny cart.
If answers hide, I leave a gentle trace—"generation": ""—a cozy place. 🐇✨

\n\n## Pre-merge checks and finishing touches\n

\n

❌ Failed checks (1 warning)

\n\n| Check name | Status | Explanation | Resolution |\n| :----------------: | :--------- | :------------------------------------------------------------------------------------ | :----------------------------------------------------------------------------- |\n| Docstring Coverage | ⚠️ Warning | Docstring coverage is 62.50% which is insufficient. The required threshold is 80.00%. | You can run `@coderabbitai generate docstrings` to improve docstring coverage. |\n\n

\n

✅ Passed checks (2 passed)

\n\n| Check name | Status | Explanation |\n| :---------------: | :------- | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |\n| Description Check | ✅ Passed | Check skipped - CodeRabbit’s high-level summary is enabled. |\n| Title Check | ✅ Passed | The title succinctly indicates that this PR addresses fixes in the parallel thinking functionality within the inference model, which aligns with the changes made to argument passing and configuration in that module. It is clear, concise, and directly related to the main modifications without extraneous detail. |\n\n

\n\n

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between eae82c9 and 5ab33a3.

📒 Files selected for processing (1)

nemo_skills/inference/model/parallel_thinking.py (5 hunks)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

nemo_skills/inference/model/parallel_thinking.py (2)
82-87: Verify tokenizer is set when use_completions_api or start_assistant_response_key is enabled.

The tokenizer parameter is now passed directly to __init__ and stored as self.tokenizer. However, if self.cfg.use_completions_api is True or self.cfg.start_assistant_response_key is set, but tokenizer is None, downstream calls to get_prompt() and prompt.fill() may fail with a ValueError (see nemo_skills/prompt/utils.py lines 252-256).

While the error message from prompt.fill() is clear, consider adding an explicit validation guard in __init__ to catch this misconfiguration early:
if (self.cfg.use_completions_api or self.cfg.start_assistant_response_key) and tokenizer is None:
    raise ValueError(
        "`tokenizer` must be provided when `use_completions_api` is True "
        "or `start_assistant_response_key` is set."
    )
This addresses the concern raised in the previous review and improves the user experience by providing immediate feedback on misconfiguration.

211-220: LGTM with optional style improvement.

The call to prompt.fill() now correctly forwards start_assistant_response_key and chat_template_kwargs, addressing the PR objectives. The duplicate key removal (lines 217-219) prevents conflicts when kwargs overlap with explicit parameters in generate_async.

Optional: Consider using .pop() for a more idiomatic approach (as suggested by static analysis):
-        for duplicate_key in ["temperature", "tokens_to_generate", "prompt"]:
-            if duplicate_key in kwargs:
-                del kwargs[duplicate_key]
+        for duplicate_key in ["temperature", "tokens_to_generate", "prompt"]:
+            kwargs.pop(duplicate_key, None)

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 865c127 and eae82c9.

📒 Files selected for processing (3)

nemo_skills/inference/generate.py (1 hunks)
nemo_skills/inference/model/__init__.py (2 hunks)
nemo_skills/inference/model/parallel_thinking.py (5 hunks)

🧰 Additional context used

🧬 Code graph analysis (2)

nemo_skills/inference/model/__init__.py (2)

nemo_skills/inference/model/parallel_thinking.py (1)

ParallelThinkingTask (76-395)

nemo_skills/inference/chat_interface/core.py (1)

cfg (181-182)

nemo_skills/inference/model/parallel_thinking.py (2)

nemo_skills/inference/model/base.py (1)

BaseModel (32-513)

nemo_skills/prompt/utils.py (2)

get_prompt (370-403)

fill (241-303)

🪛 Ruff (0.13.2)

nemo_skills/inference/model/parallel_thinking.py

219-219: Use pop instead of key in dict followed by del dict[key]

Replace if statement with .pop(..., None)

(RUF051)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: pre-commit
GitHub Check: unit-tests

🔇 Additional comments (8)

nemo_skills/inference/model/parallel_thinking.py (6)

58-58: LGTM! New config field for assistant response prefix.

The addition of start_assistant_response_key enables prepending a value to the assistant response during prompt generation, which is necessary for tool usage scenarios as stated in the PR objectives.

91-97: LGTM! Consistent tokenizer propagation.

The calls to get_prompt() now correctly use tokenizer=self.tokenizer, aligning with the updated __init__ signature and ensuring tokenizer context flows through prompt generation.

221-227: LGTM! Kwargs propagation ensures downstream parameters are forwarded.

The addition of **kwargs on line 226 ensures that generation-related parameters (e.g., tools, reasoning_effort) are correctly propagated to the model's generate_async call, addressing the PR objectives for tool usage during GenSelect.

352-362: LGTM! Ensures required generation key is present.

The addition of "generation": "" on line 355 ensures that the result dict always contains a generation key, which is required by downstream code in inference/generate.py (line 490: output[self.cfg.generation_key] = output.pop("generation")). This prevents potential KeyError exceptions in empty-result scenarios.

364-376: LGTM! Kwargs propagation in GenSelect path.

The addition of **kwargs on line 366 ensures that generation-related parameters are correctly propagated through the GenSelect path, aligning with the PR objectives to support tool usage during GenSelect generation.

387-394: LGTM! Ensures generation key is always present.

Lines 390-393 ensure that the generation key is always present in the result dict, even when solution_key is different. This is consistent with the empty-result branch (line 355) and prevents downstream errors in inference/generate.py.

nemo_skills/inference/generate.py (1)

370-383: LGTM! Tokenizer propagation to parallel thinking model.

Line 381 correctly passes tokenizer=self.tokenizer to get_parallel_thinking_model, ensuring that the parallel thinking path has access to the tokenizer context when needed (e.g., for completions API or tool usage). This aligns with the broader PR changes to support tokenizer-aware prompt generation.

nemo_skills/inference/model/__init__.py (1)

71-95: LGTM! Public API updated to accept tokenizer.

Lines 75 and 93-95 correctly update the get_parallel_thinking_model signature to accept an optional tokenizer parameter and forward it to the ParallelThinkingTask constructor. This aligns with the broader PR changes to support tokenizer-aware prompt generation in the parallel thinking path.

Signed-off-by: Shubham Toshniwal <stoshniwal@nvidia.com>

Signed-off-by: Shubham Toshniwal <stoshniwal@nvidia.com> Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

Signed-off-by: Shubham Toshniwal <stoshniwal@nvidia.com> Signed-off-by: dgitman <dgitman@nvidia.com>

Shubham Toshniwal added 5 commits October 3, 2025 10:05

Fixes to parallel_thinking

bd177d7

Inheritance fixes

87b2729

passing generation kwargs

55e2f06

removing duplicated in kwargs

090f6e0

fixing kwargs

4fdcee2

shtoshni requested a review from Kipok October 3, 2025 19:04

Merge branch 'main' into shtoshni/parallel_thinking_fixes

865c127

This comment was marked as resolved.

Sign in to view

fixing tokenizer passing

eae82c9

NVIDIA-NeMo deleted a comment from coderabbitai bot Oct 3, 2025

coderabbitai bot reviewed Oct 3, 2025

View reviewed changes

Kipok approved these changes Oct 3, 2025

View reviewed changes

Idiomatic substitute for deletion as proposed by coderabbit

5ab33a3

Signed-off-by: Shubham Toshniwal <stoshniwal@nvidia.com>

shtoshni merged commit f94fdbd into main Oct 3, 2025
4 of 6 checks passed

shtoshni deleted the shtoshni/parallel_thinking_fixes branch October 3, 2025 20:29

SeanNaren pushed a commit to SeanNaren/NeMo-Skills that referenced this pull request Oct 9, 2025

Parallel thinking fixes (NVIDIA-NeMo#887)

1ed1c74

Signed-off-by: Shubham Toshniwal <stoshniwal@nvidia.com> Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

SeanNaren pushed a commit that referenced this pull request Oct 9, 2025

Parallel thinking fixes (#887)

e8cda41

Signed-off-by: Shubham Toshniwal <stoshniwal@nvidia.com> Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

coderabbitai bot mentioned this pull request Dec 27, 2025

Trust remote code in tokenizer #1146

Merged

dgtm777 pushed a commit that referenced this pull request Mar 18, 2026

Parallel thinking fixes (#887)

eeee7fa

Signed-off-by: Shubham Toshniwal <stoshniwal@nvidia.com> Signed-off-by: dgitman <dgitman@nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallel thinking fixes#887

Parallel thinking fixes#887
shtoshni merged 8 commits intomainfrom
shtoshni/parallel_thinking_fixes

shtoshni commented Oct 3, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

This comment was marked as resolved.

Uh oh!

coderabbitai bot commented Oct 3, 2025 •

edited

Loading

Review failed

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

shtoshni commented Oct 3, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

This comment was marked as resolved.

Uh oh!

coderabbitai bot commented Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

shtoshni commented Oct 3, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 3, 2025 •

edited

Loading