Skip to content

Trust remote code in tokenizer#1146

Merged
Kipok merged 2 commits intomainfrom
igitman/trust-remote-code-tokenizer
Dec 27, 2025
Merged

Trust remote code in tokenizer#1146
Kipok merged 2 commits intomainfrom
igitman/trust-remote-code-tokenizer

Conversation

@Kipok
Copy link
Collaborator

@Kipok Kipok commented Dec 27, 2025

Summary by CodeRabbit

  • New Features
    • Improved tokenizer compatibility across the inference system, enabling support for a broader range of tokenizer implementations in evaluation, generation, prompt processing, and parallel thinking workflows.

✏️ Tip: You can customize this high-level summary in your review settings.

fzyzcjy and others added 2 commits December 25, 2025 10:50
Signed-off-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com>
Signed-off-by: Igor Gitman <igitman@nvidia.com>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 27, 2025

📝 Walkthrough

Walkthrough

Adds trust_remote_code=True parameter to AutoTokenizer.from_pretrained() calls across six files in inference and prompt modules, enabling loading of tokenizers that require remote code execution trust.

Changes

Cohort / File(s) Summary
Inference module tokenizer initialization
nemo_skills/inference/eval/bfcl.py, nemo_skills/inference/generate.py, nemo_skills/inference/prover.py
Each file now passes trust_remote_code=True to AutoTokenizer.from_pretrained() calls, allowing tokenizer loading with remote code execution enabled.
Tokenizer utilities
nemo_skills/inference/model/parallel_thinking.py, nemo_skills/inference/model/utils.py
Tokenizer initialization in both files updated with trust_remote_code=True parameter; WrapperAutoTokenizer and prompt token counting now use this flag.
Prompt utilities
nemo_skills/prompt/utils.py
Prompt string-based tokenizer construction now includes trust_remote_code=True when calling AutoTokenizer.from_pretrained().

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

  • Update for parallel thinking #929 — Also modifies tokenizer initialization in parallel_thinking.py and generate.py for token-counting functionality with similar patterns.
  • Parallel thinking fixes #887 — Modifies tokenizer handling and threading in parallel_thinking.py and generate.py, affecting related tokenizer construction and usage paths.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 55.56% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Trust remote code in tokenizer' directly and clearly summarizes the main change across all modified files: enabling trust_remote_code=True in AutoTokenizer initialization.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch igitman/trust-remote-code-tokenizer

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e85e995 and dfe7c46.

📒 Files selected for processing (6)
  • nemo_skills/inference/eval/bfcl.py
  • nemo_skills/inference/generate.py
  • nemo_skills/inference/model/parallel_thinking.py
  • nemo_skills/inference/model/utils.py
  • nemo_skills/inference/prover.py
  • nemo_skills/prompt/utils.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: unit-tests
  • GitHub Check: pre-commit
🔇 Additional comments (6)
nemo_skills/inference/generate.py (1)

341-341: LGTM - Enables loading tokenizers with custom code.

The addition of trust_remote_code=True is consistent with the same change across the codebase. This is necessary for tokenizers that include custom Python implementations (e.g., newer models like Qwen, DeepSeek).

Note that this parameter allows execution of arbitrary Python code from the model repository. This is acceptable when loading models from trusted sources, but users should be aware of this behavior when pointing to untrusted model paths.

nemo_skills/inference/prover.py (1)

110-110: LGTM - Consistent with the PR-wide tokenizer trust pattern.

nemo_skills/inference/eval/bfcl.py (1)

141-141: LGTM - Consistent with the PR-wide tokenizer trust pattern.

nemo_skills/prompt/utils.py (1)

115-115: LGTM - Consistent with the PR-wide tokenizer trust pattern.

nemo_skills/inference/model/utils.py (1)

91-91: LGTM - Consistent with the PR-wide tokenizer trust pattern.

nemo_skills/inference/model/parallel_thinking.py (1)

108-108: LGTM - Consistent with the PR-wide tokenizer trust pattern.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@Kipok Kipok merged commit 91edc30 into main Dec 27, 2025
5 checks passed
@Kipok Kipok deleted the igitman/trust-remote-code-tokenizer branch December 27, 2025 05:19
blahblahasdf pushed a commit to blahblahasdf/Skills that referenced this pull request Jan 8, 2026
Signed-off-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com>
Signed-off-by: Igor Gitman <igitman@nvidia.com>
Co-authored-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com>
Signed-off-by: dlord <dlord@nvidia.com>
hsiehjackson pushed a commit that referenced this pull request Jan 13, 2026
Signed-off-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com>
Signed-off-by: Igor Gitman <igitman@nvidia.com>
Co-authored-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com>
Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com>
wasiahmad pushed a commit that referenced this pull request Feb 4, 2026
Signed-off-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com>
Signed-off-by: Igor Gitman <igitman@nvidia.com>
Co-authored-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com>
dgtm777 pushed a commit that referenced this pull request Mar 18, 2026
Signed-off-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com>
Signed-off-by: Igor Gitman <igitman@nvidia.com>
Co-authored-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com>
dgtm777 pushed a commit that referenced this pull request Mar 18, 2026
Signed-off-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com>
Signed-off-by: Igor Gitman <igitman@nvidia.com>
Co-authored-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com>
Signed-off-by: dgitman <dgitman@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants