Trust remote code in tokenizer by Kipok · Pull Request #1146 · NVIDIA-NeMo/Skills

Kipok · 2025-12-27T04:43:32Z

Summary by CodeRabbit

New Features
- Improved tokenizer compatibility across the inference system, enabling support for a broader range of tokenizer implementations in evaluation, generation, prompt processing, and parallel thinking workflows.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Signed-off-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com>

Signed-off-by: Igor Gitman <igitman@nvidia.com>

coderabbitai · 2025-12-27T04:45:52Z

📝 Walkthrough

Walkthrough

Adds trust_remote_code=True parameter to AutoTokenizer.from_pretrained() calls across six files in inference and prompt modules, enabling loading of tokenizers that require remote code execution trust.

Changes

Cohort / File(s)	Summary
Inference module tokenizer initialization `nemo_skills/inference/eval/bfcl.py`, `nemo_skills/inference/generate.py`, `nemo_skills/inference/prover.py`	Each file now passes `trust_remote_code=True` to `AutoTokenizer.from_pretrained()` calls, allowing tokenizer loading with remote code execution enabled.
Tokenizer utilities `nemo_skills/inference/model/parallel_thinking.py`, `nemo_skills/inference/model/utils.py`	Tokenizer initialization in both files updated with `trust_remote_code=True` parameter; `WrapperAutoTokenizer` and prompt token counting now use this flag.
Prompt utilities `nemo_skills/prompt/utils.py`	Prompt string-based tokenizer construction now includes `trust_remote_code=True` when calling `AutoTokenizer.from_pretrained()`.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Update for parallel thinking #929 — Also modifies tokenizer initialization in parallel_thinking.py and generate.py for token-counting functionality with similar patterns.
Parallel thinking fixes #887 — Modifies tokenizer handling and threading in parallel_thinking.py and generate.py, affecting related tokenizer construction and usage paths.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 55.56% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Trust remote code in tokenizer' directly and clearly summarizes the main change across all modified files: enabling trust_remote_code=True in AutoTokenizer initialization.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch igitman/trust-remote-code-tokenizer

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e85e995 and dfe7c46.

📒 Files selected for processing (6)

nemo_skills/inference/eval/bfcl.py
nemo_skills/inference/generate.py
nemo_skills/inference/model/parallel_thinking.py
nemo_skills/inference/model/utils.py
nemo_skills/inference/prover.py
nemo_skills/prompt/utils.py

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: unit-tests
GitHub Check: pre-commit

🔇 Additional comments (6)

nemo_skills/inference/generate.py (1)

341-341: LGTM - Enables loading tokenizers with custom code.

The addition of trust_remote_code=True is consistent with the same change across the codebase. This is necessary for tokenizers that include custom Python implementations (e.g., newer models like Qwen, DeepSeek).

Note that this parameter allows execution of arbitrary Python code from the model repository. This is acceptable when loading models from trusted sources, but users should be aware of this behavior when pointing to untrusted model paths.

nemo_skills/inference/prover.py (1)

110-110: LGTM - Consistent with the PR-wide tokenizer trust pattern.

nemo_skills/inference/eval/bfcl.py (1)

141-141: LGTM - Consistent with the PR-wide tokenizer trust pattern.

nemo_skills/prompt/utils.py (1)

115-115: LGTM - Consistent with the PR-wide tokenizer trust pattern.

nemo_skills/inference/model/utils.py (1)

91-91: LGTM - Consistent with the PR-wide tokenizer trust pattern.

nemo_skills/inference/model/parallel_thinking.py (1)

108-108: LGTM - Consistent with the PR-wide tokenizer trust pattern.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Signed-off-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com> Signed-off-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com> Signed-off-by: dlord <dlord@nvidia.com>

Signed-off-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com> Signed-off-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com> Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com>

Signed-off-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com> Signed-off-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com>

Signed-off-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com> Signed-off-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com> Signed-off-by: dgitman <dgitman@nvidia.com>

fzyzcjy and others added 2 commits December 25, 2025 10:50

Update utils.py

1e085ad

Signed-off-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com>

Trust remote code in tokenizer

dfe7c46

Signed-off-by: Igor Gitman <igitman@nvidia.com>

Kipok mentioned this pull request Dec 27, 2025

Fix fail to run Nemo Skills when need trust remote code #1142

Closed

Kipok merged commit 91edc30 into main Dec 27, 2025
5 checks passed

Kipok deleted the igitman/trust-remote-code-tokenizer branch December 27, 2025 05:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trust remote code in tokenizer#1146

Trust remote code in tokenizer#1146
Kipok merged 2 commits intomainfrom
igitman/trust-remote-code-tokenizer

Kipok commented Dec 27, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Dec 27, 2025

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Kipok commented Dec 27, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Dec 27, 2025

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Pre-merge checks and finishing touches

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Kipok commented Dec 27, 2025 •

edited by coderabbitai bot

Loading