Update for parallel thinking by shtoshni · Pull Request #929 · NVIDIA-NeMo/Skills

shtoshni · 2025-10-11T01:47:03Z

Endpoint fixes for parallel thinking
Count num tokens addition
Corner cases
Slight refactoring

Summary by CodeRabbit

New Features
- Optional prompt token counting; input token totals included in results when enabled.
- Multi-solution retrieval with optional filtering of incomplete solutions and aggregated token statistics.
Changes
- Default endpoint type switched from “chat” to “text.”
- Parallel thinking settings (mode, endpoint type, window size, solution key, filtering) now consistently applied during inference when the mode is set.

Signed-off-by: Shubham Toshniwal <stoshniwal@nvidia.com>

coderabbitai · 2025-10-11T01:47:11Z

Walkthrough

Propagates parallel_thinking config fields into inference overrides. In ParallelThinkingTask, adds prompt token counting, changes default endpoint_type to text, and introduces _get_multiple_solutions to gather and optionally filter solutions, computing token counts and totals. Ensures num_input_tokens is attached to outputs when enabled. No other control flow changes.

Changes

Cohort / File(s)	Summary
Inference override propagation `nemo_skills/inference/generate.py`	When parallel_thinking.mode is set, passes endpoint_type, mode, window_size, solution_key, and filter_incomplete_solutions into inference_override_config for get_parallel_thinking_model. No other flow changes.
ParallelThinking enhancements `nemo_skills/inference/model/parallel_thinking.py`	Adds count_prompt_tokens flag and optional HF tokenizer to count input tokens; carries num_input_tokens through results. Changes default endpoint_type from chat to text. Adds async helper _get_multiple_solutions to fetch/generate, optionally filter incomplete solutions, and compute total_generated_tokens; integrates into generate_async pathway.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Caller
  participant Inference as Inference.generate
  participant PTFactory as get_parallel_thinking_model
  participant PT as ParallelThinkingTask
  note over Inference,PTFactory: New: propagate endpoint_type, mode, window_size,<br/>solution_key, filter_incomplete_solutions when mode != None

  Caller->>Inference: generate(prompt, config)
  Inference->>PTFactory: get_parallel_thinking_model(override_config)
  PTFactory-->>Inference: PT instance
  Inference->>PT: generate_async(prompt, ...)
  alt count_prompt_tokens == True
    PT->>PT: init HF tokenizer
  end
  note over PT: New: assemble solutions
  PT->>PT: _get_multiple_solutions(prompt, rng, filter_incomplete)
  alt solutions from cache
    PT-->>PT: load pre-generated solutions
  else generate on-the-fly
    PT->>PT: call underlying LLM for solutions
  end
  opt count_prompt_tokens
    PT->>PT: compute num_input_tokens for prompt
  end
  PT-->>Inference: results (solutions, totals, num_input_tokens?)
  Inference-->>Caller: final output

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

I nibble on prompts, count tokens with cheer,
Hop through solutions, both far and near.
Text lanes by default, my whiskers align,
Filtering the stray thoughts, keeping them fine.
With bundles of answers in clovery rows—
Thump! Another clean run, and off the rabbit goes. 🐇✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.
Title Check	❓ Inconclusive	The title “Update for parallel thinking” is related to the changeset but is overly generic and does not clearly summarize the main enhancements such as token-counting support, new multi-solution workflow, and configuration propagation. It fails to convey the specific scope or impact of the updates.	Please revise the title to clearly reflect the primary changes, for example “Add token counting and multi-solution support to parallel thinking workflow.”

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch shtoshni/parallel_thinking_update

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

nemo_skills/inference/model/parallel_thinking.py (1)
106-109: LGTM! Tokenizer initialization is correct.

The tokenizer initialization logic correctly validates that the tokenizer can be loaded when prompt token counting is enabled. The error message is clear and helpful.

If you prefer to address the static analyzer hint (TRY003), you could define a custom exception class, though the current approach is acceptable:
class TokenizerInitializationError(ValueError):
    """Raised when tokenizer cannot be initialized for prompt token counting."""
    pass

# Then use:
raise TokenizerInitializationError()
Based on learnings from static analysis hints.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 85fe307 and 9e1bf9c.

📒 Files selected for processing (2)

nemo_skills/inference/generate.py (1 hunks)
nemo_skills/inference/model/parallel_thinking.py (8 hunks)

🧰 Additional context used

🧬 Code graph analysis (2)

nemo_skills/inference/generate.py (1)

nemo_skills/inference/chat_interface/core.py (1)

cfg (181-182)

nemo_skills/inference/model/parallel_thinking.py (2)

nemo_skills/prompt/utils.py (1)

get_token_count (310-369)

nemo_skills/inference/model/base.py (2)

EndpointType (38-41)

generate_async (213-315)

🪛 Ruff (0.13.3)

nemo_skills/inference/model/parallel_thinking.py

109-109: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: unit-tests
GitHub Check: pre-commit

🔇 Additional comments (8)

nemo_skills/inference/generate.py (1)

400-405: LGTM! Configuration propagation is correct.

The additional fields propagated from parallel_thinking config to inference_override_config align well with the parallel thinking functionality requirements. The approach ensures that parallel thinking-specific settings are properly passed to the underlying model.

nemo_skills/inference/model/parallel_thinking.py (7)

27-29: LGTM! Required imports for token counting.

The additional imports support the new prompt token counting feature.

62-63: LGTM! Clean feature addition.

The count_prompt_tokens field provides a clean way to enable token counting with a sensible default.

201-239: LGTM! Well-structured multi-solution retrieval.

The _get_multiple_solutions method cleanly handles both offline (pre-loaded) and online (generate-on-the-fly) solution workflows. The filtering logic correctly identifies incomplete solutions by checking for unclosed thinking markers, and the token counting aggregation is accurate.

267-284: LGTM! Token counting integration is correct.

The token counting logic properly measures input tokens for the parallel thinking prompt and integrates cleanly with the existing generation flow. The addition of endpoint_type to the duplicate keys list correctly prevents parameter conflicts since the endpoint type should be determined by the model's configuration.

370-384: LGTM! Appropriate edge case handling.

The empty solutions case is handled correctly with sensible defaults. Setting num_input_tokens to None is appropriate since no meaningful prompt was processed in this scenario.

408-409: LGTM! Correct token count propagation.

The num_input_tokens is correctly propagated from the parallel thinking result to the final output when prompt token counting is enabled.

57-57: Confirm endpoint_type default change
Default switched from EndpointType.chat to EndpointType.text in ParallelThinkingConfig; verify chat-based prompts still route correctly or revert if necessary.

Signed-off-by: Shubham Toshniwal <stoshniwal@nvidia.com>

Signed-off-by: Shubham Toshniwal <stoshniwal@nvidia.com> Signed-off-by: dgitman <dgitman@nvidia.com>

Shubham Toshniwal added 6 commits October 10, 2025 12:55

Fixing endpoint + count_prompt_tokens + rearranging

2c4bb1d

Signed-off-by: Shubham Toshniwal <stoshniwal@nvidia.com>

Debugging

3d14588

Signed-off-by: Shubham Toshniwal <stoshniwal@nvidia.com>

Debugging

4fc617a

Signed-off-by: Shubham Toshniwal <stoshniwal@nvidia.com>

Debugging

4e1d2ab

Signed-off-by: Shubham Toshniwal <stoshniwal@nvidia.com>

Fixing endpoint_type

1a8e47c

Signed-off-by: Shubham Toshniwal <stoshniwal@nvidia.com>

Merge branch 'main' into shtoshni/parallel_thinking_update

9e1bf9c

coderabbitai bot reviewed Oct 11, 2025

View reviewed changes

shtoshni merged commit 55943a2 into main Oct 11, 2025
7 checks passed

shtoshni deleted the shtoshni/parallel_thinking_update branch October 11, 2025 04:23

dgtm777 pushed a commit that referenced this pull request Oct 29, 2025

Update for parallel thinking (#929)

fbc40b4

Signed-off-by: Shubham Toshniwal <stoshniwal@nvidia.com>

coderabbitai bot mentioned this pull request Dec 27, 2025

Trust remote code in tokenizer #1146

Merged

dgtm777 pushed a commit that referenced this pull request Mar 18, 2026

Update for parallel thinking (#929)

48111ad

Signed-off-by: Shubham Toshniwal <stoshniwal@nvidia.com> Signed-off-by: dgitman <dgitman@nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update for parallel thinking#929

Update for parallel thinking#929
shtoshni merged 6 commits intomainfrom
shtoshni/parallel_thinking_update

shtoshni commented Oct 11, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Oct 11, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

shtoshni commented Oct 11, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Oct 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

shtoshni commented Oct 11, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 11, 2025 •

edited

Loading