Improve tts by smahdavi4 · Pull Request #1203 · NVIDIA-NeMo/Skills

smahdavi4 · 2026-01-30T19:05:29Z

Fixing the following:

missing a semaphore in recipe for concurrent generations
Litellm diskcache generates one file per request on the disk, which is not network-disk friendly. Implementing a hybrid cache to keep the cache in memory and periodically save it to a file.

Summary by CodeRabbit

New Features
- Implemented stable hybrid caching system combining in-memory storage with periodic disk persistence, ensuring deterministic cache operations and reliable state recovery across runs
Improvements
- Enhanced proof generation concurrency handling with optimized request throttling for improved system performance and resource utilization

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>

coderabbitai · 2026-01-30T19:08:04Z

📝 Walkthrough

Walkthrough

The changes introduce a hybrid caching system that combines in-memory storage with periodic disk persistence. StableLiteLLMCache replaces the previous disk-based litellm cache, providing deterministic cache key generation. Script generation is updated to throttle concurrent executions using an async semaphore.

Changes

Cohort / File(s)	Summary
Hybrid Cache Implementation `nemo_skills/inference/litellm_hybrid_cache.py`	New module introducing HybridCache class with in-memory storage, background persistence via configurable thread, reentrant lock-based concurrency control, atomic disk I/O, and force_save capability. Includes StableLiteLLMCache subclass that normalizes cache keys by sorting parameters for deterministic behavior. Provides synchronous and asynchronous APIs for cache operations (set, get, batch_get, increment, flush, delete, disconnect) and set-add operations.
Cache Integration `nemo_skills/inference/generate.py`	Updates cache setup to use StableLiteLLMCache instead of litellm.Cache with disk backend. Imports StableLiteLLMCache from new hybrid cache module. Adds force_save call during cleanup to persist cache state before directory removal.
Script Generation Concurrency Control `recipes/proof-gen-verification/scripts/script_generation.py`	Wraps user-provided script process_single invocation with async semaphore to throttle concurrent executions while maintaining existing kwargs assembly and generation key handling.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant HybridCache
    participant MemoryStore as Memory Store
    participant DiskThread as Background Thread
    participant Disk

    Client->>HybridCache: set_cache(key, value)
    HybridCache->>MemoryStore: store value (with lock)
    HybridCache->>HybridCache: mark dirty flag
    
    Client->>HybridCache: get_cache(key)
    HybridCache->>MemoryStore: retrieve value (with lock)
    HybridCache-->>Client: return value
    
    DiskThread->>HybridCache: periodic check (save_interval)
    alt dirty flag set
        DiskThread->>MemoryStore: read all entries (with lock)
        DiskThread->>Disk: atomic write temp file + rename
        DiskThread->>HybridCache: clear dirty flag
    end
    
    Client->>HybridCache: force_save()
    HybridCache->>MemoryStore: read all entries (with lock)
    HybridCache->>Disk: atomic write temp file + rename
    
    Client->>HybridCache: disconnect()
    HybridCache->>DiskThread: stop background thread
    HybridCache->>Disk: final save (if needed)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Title check	⚠️ Warning	The title 'Improve tts' is unrelated to the actual changes, which focus on cache optimization and semaphore locking in inference/generation workflows, not text-to-speech functionality.	Revise the title to reflect the main changes, such as 'Replace litellm disk cache with hybrid in-memory cache and add semaphore locking' or 'Optimize LLM caching and fix concurrent generation handling'.
Docstring Coverage	⚠️ Warning	Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch improve-tts

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

greptile-apps

_{1 file reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-01-30T19:09:55Z

nemo_skills/inference/litellm_hybrid_cache.py

+            data = {
+                "cache_dict": self.cache_dict.copy(),
+            }
+            self._dirty = False


_dirty flag is cleared before the disk write completes. If the write fails (lines 78-82), the cache will incorrectly believe it's already saved and won't retry.

Suggested change

self._dirty = False

self._dirty = False # Move this after successful write

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In `@nemo_skills/inference/generate.py`:
- Line 40: The background save thread in the HybridCache (created via
StableLiteLLMCache / HybridCache instance) must be stopped before deleting its
storage directory: instead of only calling force_save() then shutil.rmtree(),
retain the cache instance and call its _shutdown() method to stop the background
thread and perform a final save, then call shutil.rmtree() to remove the
directory; update any code that currently only calls force_save() to call
cache._shutdown() (or cache.shutdown wrapper if available) prior to deleting the
directory.

In `@nemo_skills/inference/litellm_hybrid_cache.py`:
- Around line 101-153: The cache methods currently only validate kwargs in
set_cache via _check_no_ttl, letting unsupported kwargs silently pass in
get_cache, batch_get_cache, increment_cache and async variants; replace
_check_no_ttl with a new _validate_kwargs(self, **kwargs) that raises ValueError
when kwargs is non-empty and call _validate_kwargs at the start of all cache
entry points (set_cache, async_set_cache, async_set_cache_pipeline, get_cache,
async_get_cache, batch_get_cache, async_batch_get_cache, increment_cache,
async_increment) so every method fails fast on unsupported parameters.

🧹 Nitpick comments (1)

nemo_skills/inference/litellm_hybrid_cache.py (1)

68-83: Consider clearing _dirty only after a successful write.
If the write fails, _dirty is already false and future saves won’t retry; clearing after os.replace (and only if no new updates happened) avoids that edge case.

🔧 Suggested adjustment

     def _save_to_disk(self):
         """Save cache to disk."""
         with self._lock:
             if not self._dirty:
                 return
             data = {
                 "cache_dict": self.cache_dict.copy(),
             }
-            self._dirty = False
 
         temp_path = self.cache_file_path + ".tmp"
         Path(self.cache_file_path).parent.mkdir(parents=True, exist_ok=True)
         with open(temp_path, "wb") as f:
             pickle.dump(data, f, protocol=pickle.HIGHEST_PROTOCOL)
         os.replace(temp_path, self.cache_file_path)
+        with self._lock:
+            if self.cache_dict == data["cache_dict"]:
+                self._dirty = False

coderabbitai · 2026-01-30T19:25:17Z

nemo_skills/inference/generate.py

    get_evaluator_class,
    supports_single_eval,
 )
+from nemo_skills.inference.litellm_hybrid_cache import StableLiteLLMCache


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, find and examine the generate.py file fd -t f "generate.py" | head -5

Repository: NVIDIA-NeMo/Skills

Length of output: 254

🏁 Script executed:

# Get the actual code around lines 40 and 821-833 cat -n nemo_skills/inference/generate.py | sed -n '35,45p'

Repository: NVIDIA-NeMo/Skills

Length of output: 474

🏁 Script executed:

# Get cleanup code around lines 821-833 cat -n nemo_skills/inference/generate.py | sed -n '815,840p'

Repository: NVIDIA-NeMo/Skills

Length of output: 1273

🏁 Script executed:

# Find the StableLiteLLMCache implementation fd -t f "litellm_hybrid_cache.py"

Repository: NVIDIA-NeMo/Skills

Length of output: 107

🏁 Script executed:

# Read the StableLiteLLMCache implementation to understand threading behavior cat -n nemo_skills/inference/litellm_hybrid_cache.py

Repository: NVIDIA-NeMo/Skills

Length of output: 10684

🏁 Script executed:

# Check if HybridCache is used and what shutdown methods it has rg "class HybridCache|class StableLiteLLMCache|def shutdown|def close|def _shutdown" nemo_skills/inference/litellm_hybrid_cache.py -A 5

Repository: NVIDIA-NeMo/Skills

Length of output: 728

🏁 Script executed:

# Check for background threads or save loops in the file rg "Thread|thread|save.*loop|while.*save" nemo_skills/inference/litellm_hybrid_cache.py -B 2 -A 2

Repository: NVIDIA-NeMo/Skills

Length of output: 1210

🏁 Script executed:

# Check how cleanup_litellm_cache is called and if there are multiple instances rg "cleanup_litellm_cache" nemo_skills/

Repository: NVIDIA-NeMo/Skills

Length of output: 279

🏁 Script executed:

# Check if disconnect() would be better - search for any async context in cleanup flow rg "cleanup_litellm_cache|setup_litellm_cache" nemo_skills/inference/generate.py -B 5 -A 10

Repository: NVIDIA-NeMo/Skills

Length of output: 2472

🏁 Script executed:

# Look for the calling context to see if we could use async disconnect ast-grep --pattern 'def generate(self) { $$$ cleanup_litellm_cache $$$ }' nemo_skills/inference/generate.py

Repository: NVIDIA-NeMo/Skills

Length of output: 44

🏁 Script executed:

# Find where cleanup_litellm_cache is actually invoked rg "cleanup_litellm_cache" nemo_skills/ -B 3 -A 1

Repository: NVIDIA-NeMo/Skills

Length of output: 381

🏁 Script executed:

# Check if there are any error handling or try-catch around cleanup_litellm_cache rg "try:|except|cleanup_litellm_cache" nemo_skills/inference/generate.py -B 2 -A 2

Repository: NVIDIA-NeMo/Skills

Length of output: 1107

🏁 Script executed:

# Verify the async context - is generate() ever awaited or is it always sync? rg "async def generate|await.*generate" nemo_skills/inference/generate.py

Repository: NVIDIA-NeMo/Skills

Length of output: 127

Stop the background save thread before deleting the cache directory to prevent race conditions.

The HybridCache instance runs a background thread that periodically saves to disk every 300 seconds. Calling force_save() only saves the current state but leaves the thread running. When rmtree() deletes the directory immediately after, the background thread may still attempt to write to a non-existent path or recreate the directory.

Store the cache instance and call its _shutdown() method before deleting the directory. This stops the background thread and performs a final save in the correct order:

Suggested change

def setup_litellm_cache(self): if self.cfg.enable_litellm_cache: # One cache per (output_file_name, chunk_id) pair output_file_name = Path(self.cfg.output_file).name self.litellm_cache_dir = ( Path(self.cfg.output_file).parent / "litellm_cache" / f"{output_file_name}_{self.cfg.chunk_id or 0}" ) - litellm.cache = StableLiteLLMCache(cache_file_path=str(self.litellm_cache_dir / "cache.pkl")) + self.litellm_cache = StableLiteLLMCache( + cache_file_path=str(self.litellm_cache_dir / "cache.pkl") + ) + litellm.cache = self.litellm_cache def cleanup_litellm_cache(self): if self.cfg.enable_litellm_cache: - litellm.cache.cache.force_save() + self.litellm_cache.cache._shutdown() shutil.rmtree(self.litellm_cache_dir)

🤖 Prompt for AI Agents

In `@nemo_skills/inference/generate.py` at line 40, The background save thread in the HybridCache (created via StableLiteLLMCache / HybridCache instance) must be stopped before deleting its storage directory: instead of only calling force_save() then shutil.rmtree(), retain the cache instance and call its _shutdown() method to stop the background thread and perform a final save, then call shutil.rmtree() to remove the directory; update any code that currently only calls force_save() to call cache._shutdown() (or cache.shutdown wrapper if available) prior to deleting the directory.

coderabbitai · 2026-01-30T19:25:17Z

nemo_skills/inference/litellm_hybrid_cache.py

+    def set_cache(self, key, value, **kwargs):
+        """Set a value in the cache."""
+        self._check_no_ttl(**kwargs)
+        with self._lock:
+            self.cache_dict[key] = value
+            self._dirty = True
+
+    async def async_set_cache(self, key, value, **kwargs):
+        """Async set - delegates to sync implementation since we're using in-memory."""
+        self.set_cache(key=key, value=value, **kwargs)
+
+    async def async_set_cache_pipeline(self, cache_list, **kwargs):
+        """Set multiple cache entries."""
+        for cache_key, cache_value in cache_list:
+            self.set_cache(key=cache_key, value=cache_value, **kwargs)
+
+    def get_cache(self, key, **kwargs):
+        """Get a value from the cache."""
+        with self._lock:
+            if key not in self.cache_dict:
+                return None
+            cached_response = self.cache_dict[key]
+            if isinstance(cached_response, str):
+                try:
+                    cached_response = json.loads(cached_response)
+                except json.JSONDecodeError:
+                    pass
+            return cached_response
+
+    async def async_get_cache(self, key, **kwargs):
+        """Async get - delegates to sync implementation."""
+        return self.get_cache(key=key, **kwargs)
+
+    def batch_get_cache(self, keys: list, **kwargs):
+        """Get multiple values from cache."""
+        return [self.get_cache(key=k, **kwargs) for k in keys]
+
+    async def async_batch_get_cache(self, keys: list, **kwargs):
+        """Async batch get."""
+        return self.batch_get_cache(keys=keys, **kwargs)
+
+    def increment_cache(self, key, value: int, **kwargs) -> int:
+        """Increment a cache value."""
+        with self._lock:
+            init_value = self.get_cache(key=key) or 0
+            new_value = init_value + value
+            self.set_cache(key, new_value, **kwargs)
+            return new_value
+
+    async def async_increment(self, key, value: float, **kwargs) -> float:
+        """Async increment."""
+        return self.increment_cache(key, int(value), **kwargs)
+


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

find . -name "litellm_hybrid_cache.py" -type f

Repository: NVIDIA-NeMo/Skills

Length of output: 109

🏁 Script executed:

# Get file size and read the full implementation wc -l nemo_skills/inference/litellm_hybrid_cache.py

Repository: NVIDIA-NeMo/Skills

Length of output: 111

🏁 Script executed:

# Read the file to understand _check_no_ttl and the full context cat -n nemo_skills/inference/litellm_hybrid_cache.py

Repository: NVIDIA-NeMo/Skills

Length of output: 10684

🌐 Web query:

LiteLLM cache interface documentation - what kwargs are supported by get_cache set_cache increment_cache methods

💡 Result:

LiteLLM’s cache backends expose a small “core” interface (get_cache, set_cache, increment_cache) and then accept cache-control knobs via kwargs (passed through from the cache={...} dict on a request).

Cache-control kwargs (passed via cache={...} on a request)

These are the per-request kwargs LiteLLM documents and routes into cache read/write behavior (i.e., they affect get_cache() / cache lookup and cache write):

ttl (int, seconds) – per-entry TTL override (used on write) [1][2]

s-maxage (int, seconds) – freshness requirement for accepting an entry (validated after a hit / on read) [1][2]

no-cache (bool) – bypass cache read (still allows write unless no-store) [1][2]

no-store (bool) – prevent cache write [1][2]

namespace (str) – per-request key prefix / cache segmentation [2]

Method-level kwargs / parameters you can rely on

Because the concrete backends differ, LiteLLM’s docs are most consistent about these method parameters:

get_cache(key, ...) (reads; influenced by no-cache, s-maxage, namespace) [2]

set_cache(key, value, ttl=None, ...) (writes; TTL commonly supported as an argument) [3]

increment_cache(key, ...) is used for counters (rate limits, etc.); docs don’t enumerate a separate, richer kwarg set beyond the same TTL/namespace concepts used throughout caching [2].

If you tell me which cache backend you’re implementing (e.g., InMemoryCache, RedisCache, DualCache, proxy cache), I can map these to the exact Python method signatures used in that backend.

Sources: [1] https://docs.litellm.ai/docs/caching/all_caches • [2] https://deepwiki.com/BerriAI/litellm/5.3-dynamic-cache-controls • [3] https://readmex.com/en-US/BerriAI/litellm/page-6d9ab5a93-b77f-4311-b15f-ac9ae2a3ec34

🏁 Script executed:

rg "\.get_cache\(|\.batch_get_cache\(|\.increment_cache\(" --type py -A 2 -B 2

Repository: NVIDIA-NeMo/Skills

Length of output: 2494

🏁 Script executed:

# Also check calls to async variants rg "\.async_get_cache\(|\.async_batch_get_cache\(|\.async_increment\(" --type py -A 2 -B 2

Repository: NVIDIA-NeMo/Skills

Length of output: 44

🏁 Script executed:

# Search for external calls to HybridCache methods rg "HybridCache|StableLiteLLMCache" --type py -A 3 | head -100

Repository: NVIDIA-NeMo/Skills

Length of output: 2256

🏁 Script executed:

# Check how LiteLLM uses cache interface - look for cache.get_cache or cache.set_cache patterns rg "cache\.(get_cache|set_cache|batch_get_cache|increment_cache|async_get_cache)" --type py

Repository: NVIDIA-NeMo/Skills

Length of output: 44

🏁 Script executed:

# Search for where cache methods are actually invoked from LiteLLM # Check the StableLiteLLMCache parent class inheritance and method overrides rg "def (get_cache|set_cache|batch_get_cache|increment_cache)" nemo_skills/inference/litellm_hybrid_cache.py

Repository: NVIDIA-NeMo/Skills

Length of output: 312

🏁 Script executed:

# Check if HybridCache methods are actually called by LiteLLM by searching generate.py cat -n nemo_skills/inference/generate.py | grep -A 30 "StableLiteLLMCache\|litellm.cache"

Repository: NVIDIA-NeMo/Skills

Length of output: 7738

Validate kwargs across all cache methods to fail fast on unsupported parameters.

Currently, get_cache(), batch_get_cache(), and async variants silently drop **kwargs, while only set_cache() validates via _check_no_ttl(). This inconsistency violates the guideline to avoid silently ignoring user-passed parameters. Since HybridCache is in-memory persistent storage without support for cache-control features (ttl, namespace, s-maxage, no-cache, no-store), all methods should validate and reject any unsupported kwargs.

Replace _check_no_ttl() with a comprehensive validation method that fails on any kwargs:

def _validate_kwargs(self, **kwargs): if kwargs: raise ValueError(f"Unsupported cache kwargs: {', '.join(kwargs)}")

Apply this to all cache methods: get_cache(), batch_get_cache(), increment_cache(), and their async variants—not just set_cache().

🧰 Tools

🪛 Ruff (0.14.14)

[warning] 117-117: Unused method argument: kwargs

(ARG002)

🤖 Prompt for AI Agents

In `@nemo_skills/inference/litellm_hybrid_cache.py` around lines 101 - 153, The cache methods currently only validate kwargs in set_cache via _check_no_ttl, letting unsupported kwargs silently pass in get_cache, batch_get_cache, increment_cache and async variants; replace _check_no_ttl with a new _validate_kwargs(self, **kwargs) that raises ValueError when kwargs is non-empty and call _validate_kwargs at the start of all cache entry points (set_cache, async_set_cache, async_set_cache_pipeline, get_cache, async_get_cache, batch_get_cache, async_batch_get_cache, increment_cache, async_increment) so every method fails fast on unsupported parameters.

commit a5da597 Author: Igor Gitman <igitman@nvidia.com> Date: Fri Mar 6 12:13:36 2026 -0800 Revert "Eval kit support (#1239)" (#1294) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit b237e33 Author: George <37293288+Jorjeous@users.noreply.github.com> Date: Fri Mar 6 20:25:37 2026 +0400 Eval kit support (#1239) Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com> Signed-off-by: George <37293288+Jorjeous@users.noreply.github.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> commit dc28bbf Author: George Armstrong <georgea@nvidia.com> Date: Thu Mar 5 10:17:44 2026 -0800 Python direct tool calling without MCP (#1286) Signed-off-by: George Armstrong <georgea@nvidia.com> commit 12454dd Author: Sadegh Mahdavi <smahdavi4@gmail.com> Date: Wed Mar 4 13:06:21 2026 -0800 Allow het servers for nemo-rl jobs (#1223) Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> commit 8884a68 Author: Prasoon Varshney <prasoon1995@gmail.com> Date: Wed Mar 4 10:24:02 2026 -0800 Support source_lang param for translation recipe (#1290) Signed-off-by: Prasoon Varshney <prasoonv@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> commit 4618b19 Author: Meriem B. <113170426+ka00ri@users.noreply.github.com> Date: Wed Mar 4 18:59:28 2026 +0100 Add MMLU-Pro 10% optimized subset for checkpoint selection (#1285) Signed-off-by: Meriem Boubdir <mboubdir@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> commit 5ac8609 Author: Talor Abramovich <talor19@gmail.com> Date: Wed Mar 4 02:30:06 2026 +0200 Add SPEED-Bench (within repo) (#1279) Signed-off-by: Talor Abramovich <talora@nvidia.com> Signed-off-by: talora <talora@nvidia.com> Signed-off-by: Talor Abramovich <talor19@gmail.com> Signed-off-by: George Armstrong <georgea@nvidia.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: George Armstrong <georgea@nvidia.com> Co-authored-by: Igor Gitman <igor.a.gitman@gmail.com> commit c31eec5 Author: George Armstrong <georgea@nvidia.com> Date: Tue Mar 3 12:18:15 2026 -0800 Fix os.getlogin() crash in ns setup (#1289) Signed-off-by: George Armstrong <georgea@nvidia.com> commit c228e66 Author: George Armstrong <georgea@nvidia.com> Date: Tue Mar 3 11:04:54 2026 -0800 Fix streaming TypeError when delta.content is None (#1267) (#1288) Signed-off-by: George Armstrong <georgea@nvidia.com> commit aa47923 Author: Matvei Novikov <mnovikov@nvidia.com> Date: Mon Mar 2 16:28:41 2026 -0800 Add LibTrace recipe for generating domain-specific reasoning data (#1224) Signed-off-by: jubick1337 <mnovikov@nvidia.com> Signed-off-by: mnovikov <mnovikov@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> commit 313cad7 Author: Stephen Ge <stepheng@nvidia.com> Date: Mon Mar 2 18:28:49 2026 -0500 fix: clean parse-failure retries in prover (#1284) Signed-off-by: Stephen Ge <stepheng@nvidia.com> commit 813cfa3 Author: George Armstrong <georgea@nvidia.com> Date: Mon Mar 2 15:10:08 2026 -0800 tst: rollback inference-api to integrate (#1287) Signed-off-by: George Armstrong <georgea@nvidia.com> commit 31735f9 Author: Valentin Mendelev <vmendelev@nvidia.com> Date: Mon Mar 2 23:11:25 2026 +0100 Add backend-agnostic unified inference server with NeMo ASR and TTS backends (#1250) Signed-off-by: Valentin Mendelev <vmendelev@nvidia.com> commit d4ef8c0 Author: George <37293288+Jorjeous@users.noreply.github.com> Date: Fri Feb 27 23:58:54 2026 +0400 Update promt_config to working with openai format + inline setup (#1210) Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com> Signed-off-by: George <37293288+Jorjeous@users.noreply.github.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> commit e879cbc Author: George Armstrong <georgea@nvidia.com> Date: Fri Feb 27 10:41:23 2026 -0800 Update noc tutorial (#1282) Signed-off-by: George Armstrong <georgea@nvidia.com> commit f6e3505 Author: George Armstrong <georgea@nvidia.com> Date: Fri Feb 27 10:17:33 2026 -0800 Add noc reasoning tutorial (#1278) Signed-off-by: Amparo Canaveras <acanaveras@nvidia.com> Signed-off-by: rajeshwarid179 <rdevaramani@nvidia.com> Signed-off-by: acanaveras <142839082+acanaveras@users.noreply.github.com> Signed-off-by: George Armstrong <georgea@nvidia.com> Co-authored-by: Amparo Canaveras <acanaveras@nvidia.com> Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: acanaveras <142839082+acanaveras@users.noreply.github.com> Co-authored-by: rajeshwarid179 <rdevaramani@nvidia.com> commit fc2072a Author: Jiacheng Xu <jcxu@utexas.edu> Date: Fri Feb 27 10:10:25 2026 -0800 CritPt generation add prompt_format=None (#1280) Signed-off-by: Jiacheng Xu <jiachengx@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> commit c8abe5d Author: Igor Gitman <igitman@nvidia.com> Date: Fri Feb 27 09:31:26 2026 -0800 New slurm customization parameters (account, containers) (#1209) Signed-off-by: Igor Gitman <igitman@nvidia.com> Signed-off-by: George Armstrong <georgea@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> commit 2b38cce Author: George Armstrong <georgea@nvidia.com> Date: Wed Feb 25 17:59:52 2026 -0800 Add nemo-skills-core subpackage for lightweight installs (#1229) Signed-off-by: George Armstrong <georgea@nvidia.com> commit 9fa8e83 Author: Dheeraj Peri <peri.dheeraj@gmail.com> Date: Wed Feb 25 12:56:35 2026 -0800 feat: add custom judge type support for external repo integration (#1274) Signed-off-by: Igor Gitman <igitman@nvidia.com> Signed-off-by: bzantium <ryumin93@gmail.com> Signed-off-by: Dheeraj Peri <dperi@nvidia.com> Signed-off-by: suriya <sgunasekar@nvidia.com> Signed-off-by: Jiacheng Xu <jiachengx@nvidia.com> Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: Minho Ryu <ryumin93@gmail.com> Co-authored-by: Yongqiang Wang <yongqiang.seagull@gmail.com> Co-authored-by: Suriya Gunasekar <sgunasekar@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Jiacheng Xu <jcxu@utexas.edu> Co-authored-by: George <37293288+Jorjeous@users.noreply.github.com> commit 8a32b13 Author: Igor Gitman <igitman@nvidia.com> Date: Tue Feb 24 15:24:42 2026 -0800 Exclude numb3rs form test_eval.py (#1275) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit 6da2219 Author: George <37293288+Jorjeous@users.noreply.github.com> Date: Mon Feb 23 18:37:46 2026 +0400 Numb3rs ds addition (#1174) Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com> commit ad034b5 Author: Suriya Gunasekar <sgunasekar@users.noreply.github.com> Date: Sun Feb 22 11:55:24 2026 -0800 Add DSBench-DA evaluation (#1254) Squash merge of changes during code-review. Signed-off-by: suriya <sgunasekar@nvidia.com> commit 7593ab3 Author: Jiacheng Xu <jcxu@utexas.edu> Date: Fri Feb 20 16:42:01 2026 -0800 Add CritPt benchmark (#1200) Signed-off-by: Jiacheng Xu <jiachengx@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> commit 58c31b2 Author: Suriya Gunasekar <sgunasekar@users.noreply.github.com> Date: Fri Feb 20 16:19:22 2026 -0800 Fix no_answer metric overcounting in _compute_pass_at_k (#1245) Signed-off-by: suriya <sgunasekar@nvidia.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> commit 1f1a2e7 Author: Igor Gitman <igitman@nvidia.com> Date: Fri Feb 20 15:58:40 2026 -0800 Fix incorrect prompt tokens count due to HF api update (#1264) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit 8ebc6f5 Author: Igor Gitman <igitman@nvidia.com> Date: Fri Feb 20 09:05:33 2026 -0800 Remove deprecated dataset group (#1263) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit ea4177f Author: Yongqiang Wang <yongqiang.seagull@gmail.com> Date: Thu Feb 19 19:57:25 2026 -0500 fix deps (#1258) commit 60905a7 Author: Minho Ryu <ryumin93@gmail.com> Date: Fri Feb 20 09:39:39 2026 +0900 Add aime26 (#1256) Signed-off-by: bzantium <ryumin93@gmail.com> commit b28afc5 Author: Igor Gitman <igitman@nvidia.com> Date: Thu Feb 19 16:18:25 2026 -0800 Rename custom -> external benchmarks (#1262) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit 6cc9c45 Author: Igor Gitman <igitman@nvidia.com> Date: Thu Feb 19 16:10:33 2026 -0800 Add reference to internal benchmarks repo (#1261) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit 5202af6 Author: Igor Gitman <igitman@nvidia.com> Date: Thu Feb 19 16:08:05 2026 -0800 Remove incorrect presence-penalty setting (#1259) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit 144c70b Author: Igor Gitman <igitman@nvidia.com> Date: Thu Feb 19 15:26:33 2026 -0800 Adding an option to store benchmarks in external repo (#1240) Signed-off-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> commit 10e6e39 Author: George <37293288+Jorjeous@users.noreply.github.com> Date: Thu Feb 19 19:57:21 2026 +0400 update vllm miltimodal for api calls convenience (#1213) Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com> Signed-off-by: mmkrtchyan <mmkrtchyan@nvidia.com> Co-authored-by: mmkrtchyan <mmkrtchyan@nvidia.com> commit 1ba4219 Author: Nick Ludwig <nliudvig@nvidia.com> Date: Wed Feb 18 03:28:23 2026 +0400 Fix --server_container not being applied to dependent jobs (#1244) Signed-off-by: Nikolai Ludwig <nliudvig@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> commit 9517614 Author: Wasi Ahmad <wasiahmad@ucla.edu> Date: Mon Feb 16 11:13:24 2026 -0800 Support mini-swe-agent as agent harness (#1212) Signed-off-by: wasiahmad <wasiahmad@ucla.edu> Signed-off-by: i-vainn <imoshkov@nvidia.com> Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: Charlie Truong <chtruong@nvidia.com> Signed-off-by: Nikolai Ludwig <nliudvig@nvidia.com> Signed-off-by: Grigor Nalbandyan <gnalbandyan@nvidia.com> Signed-off-by: bzantium <ryumin93@gmail.com> Signed-off-by: Stephen Ge <stepheng@nvidia.com> Signed-off-by: Jiacheng Xu <jiachengx@nvidia.com> Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com> Signed-off-by: Mateusz Winiarek <mwiniarek@nvidia.com> Signed-off-by: mmkrtchyan <mmkrtchyan@nvidia.com> Signed-off-by: Wei Du <wedu@nvidia.com> Signed-off-by: Igor Gitman <igitman@nvidia.com> Signed-off-by: George <37293288+Jorjeous@users.noreply.github.com> Signed-off-by: SeanNaren <snarenthiran@nvidia.com> Signed-off-by: Arkadiusz Nowaczynski <anowaczynski@nvidia.com> Signed-off-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com> Co-authored-by: Ivan <imoshkov@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> Co-authored-by: Charlie Truong <chtruong@nvidia.com> Co-authored-by: Nick Ludwig <nliudvig@nvidia.com> Co-authored-by: Wojciech Prazuch <wojciechprazuch3@gmail.com> Co-authored-by: gnalbandyan <153070076+gnalbandyan@users.noreply.github.com> Co-authored-by: Minho Ryu <ryumin93@gmail.com> Co-authored-by: Stephen Ge <stepheng@nvidia.com> Co-authored-by: Jiacheng Xu <jcxu@utexas.edu> Co-authored-by: Jiacheng Xu <jiachengx@nvidia.com> Co-authored-by: George <37293288+Jorjeous@users.noreply.github.com> Co-authored-by: Sanyam Kapoor <sanyamk@nvidia.com> Co-authored-by: Mateusz Winiarek <72758259+Froxyy-dev@users.noreply.github.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: Meline Mkrtchyan <72409758+melllinia@users.noreply.github.com> Co-authored-by: Wei Du <wedu@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: Sean Naren <snarenthiran@nvidia.com> Co-authored-by: Mehrzad Samadi <mehrzadsamadi@gmail.com> Co-authored-by: anowaczynski-nvidia <anowaczynski@nvidia.com> Co-authored-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com> commit a3d44dc Author: Suriya Gunasekar <sgunasekar@users.noreply.github.com> Date: Fri Feb 13 22:32:15 2026 -0800 Add --installation_command support to prepare_data (#1243) Signed-off-by: suriya <sgunasekar@nvidia.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> commit e80d524 Author: George Armstrong <georgea@nvidia.com> Date: Thu Feb 12 17:26:00 2026 -0800 Fix CI disk space for Docker image builds (#1241) Signed-off-by: George Armstrong <georgea@nvidia.com> commit d22236c Author: Sadegh Mahdavi <smahdavi4@gmail.com> Date: Wed Feb 11 17:55:00 2026 -0800 Fix answerbench prompt parsing (#1235) Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com> commit 2401628 Author: George Armstrong <georgea@nvidia.com> Date: Wed Feb 11 14:56:43 2026 -0800 feat: add lockfiles for reproducible sandbox builds (#1233) Signed-off-by: George Armstrong <georgea@nvidia.com> commit 5a0a84d Author: Wasi Ahmad <wasiahmad@ucla.edu> Date: Wed Feb 11 13:30:03 2026 -0800 removing datasets version restriction for LCB eval (#1230) Signed-off-by: wasiahmad <wasiahmad@ucla.edu> commit ef0a890 Author: gnalbandyan <153070076+gnalbandyan@users.noreply.github.com> Date: Wed Feb 11 12:03:16 2026 +0400 Gnalbandyan/add physics (#1214) Signed-off-by: Grigor Nalbandyan <gnalbandyan@nvidia.com> Signed-off-by: gnalbandyan <153070076+gnalbandyan@users.noreply.github.com> commit bd9d30c Author: Wasi Ahmad <wasiahmad@ucla.edu> Date: Tue Feb 10 15:13:27 2026 -0800 LCB generic prompting (#1215) Signed-off-by: wasiahmad <wasiahmad@ucla.edu> commit 7d6c49a Author: Sadegh Mahdavi <smahdavi4@gmail.com> Date: Sat Feb 7 08:45:46 2026 -0800 Add support for different variations of nemo-rl (#1220) Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com> commit b19ba96 Author: George Armstrong <georgea@nvidia.com> Date: Fri Feb 6 21:40:56 2026 -0800 Add multi-node sandbox support for SLURM clusters (#1218) Signed-off-by: George Armstrong <georgea@nvidia.com> commit 8950bb0 Author: anowaczynski-nvidia <anowaczynski@nvidia.com> Date: Sat Feb 7 01:38:00 2026 +0100 support structured outputs in hle judge for optional AA compatibility (#1186) Signed-off-by: Arkadiusz Nowaczynski <anowaczynski@nvidia.com> Signed-off-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> commit b84f7a2 Author: Igor Gitman <igitman@nvidia.com> Date: Fri Feb 6 14:51:02 2026 -0800 A small update on running tests docs (#1219) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit 8e838e1 Author: George Armstrong <georgea@nvidia.com> Date: Thu Feb 5 18:01:35 2026 -0800 feat: add flag to disable sandbox replay (#1217) Signed-off-by: George Armstrong <georgea@nvidia.com> commit 5fd9085 Author: Igor Gitman <igitman@nvidia.com> Date: Thu Feb 5 15:57:01 2026 -0800 Add an option to limit number of tool calls (#1216) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit d820200 Author: Igor Gitman <igitman@nvidia.com> Date: Tue Feb 3 10:43:55 2026 -0800 Add arena-hard v2 (#1205) Signed-off-by: bzantium <ryumin93@gmail.com> Signed-off-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: bzantium <ryumin93@gmail.com> commit a30920e Author: Igor Gitman <igitman@nvidia.com> Date: Mon Feb 2 10:53:55 2026 -0800 Fix mkdocs warnings (#1204) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit 19d7788 Author: Ivan <imoshkov@nvidia.com> Date: Mon Feb 2 23:25:13 2026 +0500 Fix infinite wait in sandbox.wait_for_sandbox (#1206) Signed-off-by: i-vainn <imoshkov@nvidia.com> commit 3e65fbf Author: Sadegh Mahdavi <smahdavi4@gmail.com> Date: Fri Jan 30 19:38:38 2026 -0800 Improve tts (#1203) Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com> commit 250c862 Author: Nick Ludwig <nliudvig@nvidia.com> Date: Fri Jan 30 22:12:29 2026 +0400 SWE-bench: fix SWE-agent hanging, adjust expected scores (#1202) Signed-off-by: Nikolai Ludwig <nliudvig@nvidia.com> commit 7ded756 Author: Ivan <imoshkov@nvidia.com> Date: Fri Jan 30 09:57:41 2026 +0500 Add proper token counting to code execution model (#1184) Signed-off-by: i-vainn <imoshkov@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> commit b986304 Author: Igor Gitman <igitman@nvidia.com> Date: Thu Jan 29 17:57:07 2026 -0800 Upgrade containers (#1198) Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com> Signed-off-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: Sadegh Mahdavi <smahdavi@nvidia.com> commit 3b44f02 Author: Dan Lord <blahblahasdf@gmail.com> Date: Thu Jan 29 16:40:47 2026 -0800 Fix incorrect string format (#1199) Signed-off-by: dlord <dlord@nvidia.com> commit c4854b8 Author: Sadegh Mahdavi <smahdavi4@gmail.com> Date: Thu Jan 29 13:43:36 2026 -0800 Update nemo-rl to latest (#1087) Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com> Signed-off-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com>

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com> Signed-off-by: dgitman <dgitman@nvidia.com>

smahdavi4 added 2 commits January 30, 2026 10:57

fix semaphore lock

6213c47

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>

Add a stable litellm cache

74c306d

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>

smahdavi4 requested a review from Kipok January 30, 2026 19:05

copyright

bfc7ec6

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>

Kipok approved these changes Jan 30, 2026

View reviewed changes

Kipok added the run GPU tests label Jan 30, 2026

greptile-apps bot reviewed Jan 30, 2026

View reviewed changes

coderabbitai bot reviewed Jan 30, 2026

View reviewed changes

Kipok merged commit 3e65fbf into main Jan 31, 2026
6 of 7 checks passed

Kipok deleted the improve-tts branch January 31, 2026 03:38

dgtm777 pushed a commit that referenced this pull request Mar 18, 2026

Improve tts (#1203)

778987b

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>

dgtm777 pushed a commit that referenced this pull request Mar 18, 2026

Improve tts (#1203)

b94d52c

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com> Signed-off-by: dgitman <dgitman@nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve tts#1203

Improve tts#1203
Kipok merged 3 commits intomainfrom
improve-tts

smahdavi4 commented Jan 30, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 30, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot Jan 30, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Jan 30, 2026

Uh oh!

coderabbitai bot Jan 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	self._dirty = False
	self._dirty = False # Move this after successful write

Conversation

smahdavi4 commented Jan 30, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 30, 2026

Choose a reason for hiding this comment

Cache-control kwargs (passed via cache={...} on a request)

Method-level kwargs / parameters you can rely on

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

smahdavi4 commented Jan 30, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 30, 2026 •

edited

Loading

Cache-control kwargs (passed via `cache={...}` on a request)