Skip to content

Add noc reasoning tutorial#1278

Merged
gwarmstrong merged 32 commits intomainfrom
acanaveras/add-noc-reasoning-tutorial
Feb 27, 2026
Merged

Add noc reasoning tutorial#1278
gwarmstrong merged 32 commits intomainfrom
acanaveras/add-noc-reasoning-tutorial

Conversation

@gwarmstrong
Copy link
Collaborator

@gwarmstrong gwarmstrong commented Feb 25, 2026

Re-opened from #1242 using an origin branch so CI can run with secrets access.

Original PR by @acanaveras:

This is the Telco Team. Per our conversation two weeks ago, we would like to add this Tutorial to create a reasoning model for Telco NoC Incidences.

See #1242 for full review history.

Summary by CodeRabbit

  • New Features
    • Added a complete NOC Reasoning Agent suite: end-to-end data pipelines, prompt libraries, tooling catalog, batch/resumable agent runners, evaluation scripts, visualization and token-analysis utilities, and synthetic-data generation/processing tools.
  • Documentation
    • Added a comprehensive tutorial covering setup, data preparation, synthetic data generation, reasoning-trace workflows, training, evaluation, and deployment guidance.

acanaveras and others added 24 commits February 13, 2026 03:28
Signed-off-by: Amparo Canaveras <acanaveras@nvidia.com>
Signed-off-by: Amparo Canaveras <acanaveras@nvidia.com>
…ripts

Signed-off-by: Amparo Canaveras <acanaveras@nvidia.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
- Move __init__.py creation and PYTHONPATH export into a dedicated
  "Configure the environment" subsection in the tutorial
- Remove per-command PYTHONPATH=$PWD prefixes (now set once at top)
- Add all 40 synthetic fault_category values to ALLOWED_PROBLEM_CODES
  in filter_rows.py and create_input_jsonl_from_incidents.py
- Add matching reasoning process mappings in reasoning_processes.py

Signed-off-by: Amparo Canaveras <acanaveras@nvidia.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Python 3.3+ namespace packages make __init__.py unnecessary.
The try/except fallback in create_input_jsonl handles relative imports
gracefully. Only PYTHONPATH is needed for all script imports.

Signed-off-by: Amparo Canaveras <acanaveras@nvidia.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: rajeshwarid179 <rdevaramani@nvidia.com>
Signed-off-by: rajeshwarid179 <rdevaramani@nvidia.com>
Signed-off-by: rajeshwarid179 <rdevaramani@nvidia.com>
Signed-off-by: rajeshwarid179 <rdevaramani@nvidia.com>
Signed-off-by: rajeshwarid179 <rdevaramani@nvidia.com>
Co-authored-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: acanaveras <142839082+acanaveras@users.noreply.github.com>
Co-authored-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: acanaveras <142839082+acanaveras@users.noreply.github.com>
Co-authored-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: acanaveras <142839082+acanaveras@users.noreply.github.com>
Co-authored-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: acanaveras <142839082+acanaveras@users.noreply.github.com>
Co-authored-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: acanaveras <142839082+acanaveras@users.noreply.github.com>
Co-authored-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: acanaveras <142839082+acanaveras@users.noreply.github.com>
Co-authored-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: acanaveras <142839082+acanaveras@users.noreply.github.com>
Co-authored-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: acanaveras <142839082+acanaveras@users.noreply.github.com>
Signed-off-by: rajeshwarid179 <rdevaramani@nvidia.com>
Signed-off-by: rajeshwarid179 <rdevaramani@nvidia.com>
Co-authored-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: rajeshwarid179 <rdevaramani@nvidia.com>
Signed-off-by: rajeshwarid179 <rdevaramani@nvidia.com>
Signed-off-by: rajeshwarid179 <rdevaramani@nvidia.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 25, 2026

📝 Walkthrough

Walkthrough

Adds a NOC Reasoning Agent feature: documentation, configs, prompt templates, and many new scripts for data ingestion, filtering, synthetic data generation, ReAct agent orchestration (single and batch), tool bindings, reasoning-trace compilation, evaluation, splitting, visualization, and CLI utilities.

Changes

Cohort / File(s) Summary
Documentation
docs/tutorials/posts/noc-reasoning-agent.md, .gitignore
Add comprehensive tutorial for the NOC reasoning pipeline; add site/ to .gitignore.
Configs
recipes/noc-reasoning-agent/configs/config.ini, recipes/noc-reasoning-agent/configs/noc_reasoning_sft.yaml, recipes/noc-reasoning-agent/configs/noc_reasoning_sft_6.yaml
New configuration files for model/data paths and detailed SFT data-prep pipelines (processors, filters, QC, manifest output).
Prompts
recipes/noc-reasoning-agent/prompts/...
prompt_incident.yaml, prompt_reasoning.yaml, shortened_prompt_reasoning.yaml, formatting_prompt.yaml
Add multiple prompt templates defining personas, tool schemas, reasoning protocols, output formats, and examples for incident diagnosis, chain-of-thought, and auditing outputs.
Agent Orchestration
recipes/noc-reasoning-agent/scripts/create_agent_with_tools.py, recipes/noc-reasoning-agent/scripts/create_agent_with_tools_batch.py, recipes/noc-reasoning-agent/scripts/ns_pipelines/prepare_react_agent.py
New resumable ReAct agent scripts (single and batch) with model/tokenizer loading, LangChain integration, tool-call parsing/mapping, checkpointing, error handling, and JSONL I/O.
Tool Implementations
recipes/noc-reasoning-agent/scripts/tools.py
Add ~18 domain tool helper functions and exported lists (ALL_TOOLS, ALL_TOOLS_STRING) used to simulate or bind tool responses from row data.
Data Filtering & Categorization
recipes/noc-reasoning-agent/scripts/filtering/filter_rows.py, recipes/noc-reasoning-agent/scripts/filtering/match_keywords.py
CSV/incident filtering utilities (auto-recovery, soft_solve, problem/close-code filters) and keyword-based close-code categorization.
Synthetic & Pipeline Scripts
recipes/noc-reasoning-agent/scripts/ns_pipelines/generate_synthetic_data.py
Synthetic data generation entrypoint wired to NeMo/vLLM pipeline with CLI controls for model, GPUs, temperature, and token generation.
Evaluation
recipes/noc-reasoning-agent/scripts/evaluation/...
evaluation_with_judge.py, problem_code_evaluation.py, score.py
Evaluation tooling: ROUGE/BERTScore batching, LLM-based judges for reasoning/conclusions, close-code matching utilities, and aggregated scoring script.
Reasoning Trace Processing & Utilities
recipes/noc-reasoning-agent/scripts/utils/...
format_reasoning_json.py, create_input_jsonl_from_incidents.py, schema_columns.py, split_incident_data.py, split_mocktools_answers.py, token_usage.py
Add schema constants, CSV→JSONL conversion, reasoning trace extraction/compilation/token conversion, dataset splitting, merging mock-tool answers, and token-usage analysis with plotting.
Visualization
recipes/noc-reasoning-agent/scripts/visualization/...
extract_representation_columns.py, extract_scores.py, generate_trace_visualization.py
Add enrichment, sampling, score extraction, and HTML trace visualization utilities that render parsed reasoning traces, work notes, and parsed steps.
Batching & IO helpers
recipes/noc-reasoning-agent/scripts/... (multiple new scripts)
Multiple CLI entrypoints, JSONL read/write helpers, resume/checkpoint helpers, and parsing utilities added across scripts for robust pipeline orchestration.

Sequence Diagram(s)

sequenceDiagram
    participant Input as JSONL Input
    participant Agent as ReAct Agent Script
    participant LLM as Language Model
    participant Tools as Local Tool Functions
    participant Checkpoint as Resume Store
    participant Output as JSONL Output

    Input->>Agent: load next row
    Agent->>Checkpoint: check processed indices / load history
    Agent->>LLM: send system + user messages (context, prompts)
    LLM-->>Agent: return generation
    alt generation contains <tool_call>
        Agent->>Tools: parse tool_call → invoke bound tool(row)
        Tools-->>Agent: return tool response
        Agent->>LLM: send observation + tool response (continue)
        LLM-->>Agent: return next generation (loop)
    else final answer produced
        Agent->>Output: write agent_response + metadata
        Agent->>Checkpoint: save processed index / state
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Suggested labels

enhancement

Suggested reviewers

  • Kipok
  • ekmb
🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 62.38% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title "Add noc reasoning tutorial" directly describes the main change—introducing a comprehensive tutorial for a NOC reasoning agent to diagnose telecom incidents.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch acanaveras/add-noc-reasoning-tutorial

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Note

Due to the large number of review comments, Critical severity comments were prioritized as inline comments.

🟠 Major comments (28)
recipes/noc-reasoning-agent/scripts/utils/split_incident_data.py-1-1 (1)

1-1: ⚠️ Potential issue | 🟠 Major

Add the required copyright header to unblock CI.

Pipeline reports a blocking error for missing copyright header in the first 10 lines.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@recipes/noc-reasoning-agent/scripts/utils/split_incident_data.py` at line 1,
This file (split_incident_data.py) is missing the required copyright header in
the first 10 lines; add the project's standard copyright/license header comment
at the very top of the file (before any code such as the existing import
argparse) so the header appears within the first 10 lines and matches the
repository's canonical header format.
recipes/noc-reasoning-agent/scripts/utils/split_incident_data.py-8-10 (1)

8-10: ⚠️ Potential issue | 🟠 Major

Validate split parameters before computing train/test sizes.

Line 26/27 can silently produce invalid splits when test_size is out of range (e.g., negative or >1), and max_examples <= 0 also produces confusing outcomes.

Proposed fix
 def split_generation_field(
     input_folder: str, train_out: str, test_out: str, test_size: float = 0.2, seed: int = 42, max_examples=10000
 ):
+    if not (0.0 < test_size < 1.0):
+        raise ValueError(f"test_size must be in (0, 1), got {test_size}")
+    if max_examples <= 0:
+        raise ValueError(f"max_examples must be > 0, got {max_examples}")
+
     train_path = Path(train_out)
     test_path = Path(test_out)

Also applies to: 26-27

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@recipes/noc-reasoning-agent/scripts/utils/split_incident_data.py` around
lines 8 - 10, In split_generation_field, add input validation at the top to
guard against invalid split params: verify that test_size is a float in the
range (0.0, 1.0) or at least 0.0 <= test_size <= 1.0 per your intended behavior
(reject negative or >1 values) and raise a ValueError with a clear message if
out of range; also validate max_examples is an int > 0 (raise ValueError if <=
0) and optionally validate seed is an int; place these checks in the
split_generation_field function before any computation of train/test sizes so
downstream math cannot produce confusing results.
recipes/noc-reasoning-agent/scripts/utils/split_incident_data.py-15-16 (1)

15-16: ⚠️ Potential issue | 🟠 Major

Fix --input_dir contract mismatch (default run currently fails).

Line 15 treats input_folder as a directory containing iteration_0.jsonl, but Line 88 describes/defaults it as a JSONL file (output_incident.jsonl). With defaults, the script resolves output_incident.jsonl/iteration_0.jsonl and crashes.

Proposed fix
-    parser.add_argument("--input_dir", help="Path to input_dir JSONL file", default="output_incident.jsonl")
+    parser.add_argument(
+        "--input_dir",
+        help="Path to directory containing iteration_*.jsonl files",
+        default="output",
+    )
     args = parser.parse_args()

+    if not Path(args.input_dir).is_dir():
+        raise ValueError(f"--input_dir must be an existing directory: {args.input_dir}")
+
     split_generation_field(args.input_dir, args.train_output, args.test_output)

Also applies to: 88-88

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@recipes/noc-reasoning-agent/scripts/utils/split_incident_data.py` around
lines 15 - 16, The code assumes input_folder is a directory and unconditionally
opens os.path.join(input_folder, "iteration_0.jsonl"), which breaks when the CLI
default passes a JSONL file name; update the opening logic to handle both a file
path and a directory: compute an input_path variable that is input_folder if
os.path.isfile(input_folder) else os.path.join(input_folder,
"iteration_0.jsonl"), then open input_path (the loop that currently opens with
open(os.path.join(input_folder, "iteration_0.jsonl"), ...)). Also ensure any CLI
help/defaults (where input_folder is defined) remain as-is or are updated
consistently so the script accepts either a file or a directory.
recipes/noc-reasoning-agent/scripts/utils/split_incident_data.py-61-63 (1)

61-63: ⚠️ Potential issue | 🟠 Major

Guard background access to avoid hard-failing on sparse rows.

Line 61 uses row["background"] directly. If a row is missing that key, the whole run aborts with KeyError instead of skipping or handling the bad row.

Proposed fix
                     if resolution:
                         row["expected"] = f"Close Code: [{resolution}]"
-                        row["initial_background"] = row["background"]
+                        background = row.get("background")
+                        if background is None:
+                            continue
+                        row["initial_background"] = background
                         row["background"] = "\n<think>\n"
                         current_iteration_test.append(row)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@recipes/noc-reasoning-agent/scripts/utils/split_incident_data.py` around
lines 61 - 63, The code directly indexes row["background"] which can raise
KeyError for sparse rows; update the block around assigning
row["initial_background"] and row["background"] (and before appending to
current_iteration_test) to guard access with row.get("background") or an
explicit "if 'background' in row" check: if present, set
row["initial_background"] = row["background"] and row["background"] =
"\n<think>\n" and append; otherwise handle the bad row by either skipping
appending (continue), setting a safe default (e.g., empty string) for
row["initial_background"], or logging the missing key so the run does not
hard-fail. Ensure changes reference the same variables row,
"initial_background", "background", and current_iteration_test.
recipes/noc-reasoning-agent/scripts/visualization/extract_representation_columns.py-28-28 (1)

28-28: ⚠️ Potential issue | 🟠 Major

NaN values from pandas will crash json.dumps at serialization time.

to_dict(orient="index") preserves pandas float('nan') for any missing cell in time_to_resolve, solved_category, or solved_reason. Python's json module raises ValueError for nan by default (it is not valid JSON). Any row whose CSV entry has at least one empty cell will cause json.dumps to crash on Lines 60 and 151.

🐛 Proposed fix — sanitize NaN to None when building the index
-csv_idx = df.set_index(id_col)[["time_to_resolve", "solved_category", "solved_reason"]].to_dict(orient="index")
+import math
+
+def _nan_to_none(v):
+    return None if (isinstance(v, float) and math.isnan(v)) else v
+
+csv_idx = {
+    k: {field: _nan_to_none(val) for field, val in row.items()}
+    for k, row in df.set_index(id_col)[["time_to_resolve", "solved_category", "solved_reason"]].to_dict(orient="index").items()
+}

Alternatively, fill NaN before building the dict:

-csv_idx = df.set_index(id_col)[["time_to_resolve", "solved_category", "solved_reason"]].to_dict(orient="index")
+csv_idx = (
+    df.set_index(id_col)[["time_to_resolve", "solved_category", "solved_reason"]]
+    .where(df[["time_to_resolve", "solved_category", "solved_reason"]].notna(), other=None)
+    .to_dict(orient="index")
+)

Or the simplest one-liner:

-csv_idx = df.set_index(id_col)[["time_to_resolve", "solved_category", "solved_reason"]].to_dict(orient="index")
+_slice = df.set_index(id_col)[["time_to_resolve", "solved_category", "solved_reason"]]
+csv_idx = _slice.where(_slice.notna(), other=None).to_dict(orient="index")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@recipes/noc-reasoning-agent/scripts/visualization/extract_representation_columns.py`
at line 28, The current creation of csv_idx keeps pandas NaN values which
json.dumps will reject; update the csv_idx construction (the expression that
uses
df.set_index(id_col)[["time_to_resolve","solved_category","solved_reason"]].to_dict(orient="index"))
to sanitize missing values to Python None before calling to_dict—e.g., after
selecting those columns convert NaN to None (using pandas' .replace or
.where/.isna) so that csv_idx contains None instead of float('nan'), preserving
the same id_col indexing and column names.
recipes/noc-reasoning-agent/scripts/ns_pipelines/generate_synthetic_data.py-1-1 (1)

1-1: ⚠️ Potential issue | 🟠 Major

Add the required copyright header.

CI reports this file is missing the mandatory header in the first 10 lines.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@recipes/noc-reasoning-agent/scripts/ns_pipelines/generate_synthetic_data.py`
at line 1, Add the project's mandatory copyright header to the top of
recipes/noc-reasoning-agent/scripts/ns_pipelines/generate_synthetic_data.py so
it appears within the first 10 lines; update the file header above the existing
import argparse (and any shebang or encoding lines) to match the repository's
standard copyright/license block used elsewhere.
recipes/noc-reasoning-agent/prompts/shortened_prompt_reasoning.yaml-30-45 (1)

30-45: ⚠️ Potential issue | 🟠 Major

Use one canonical tool name across references and examples.

The tool catalog and few-shot examples currently use different names for the same action (Remote_Unlock/Restart/Restore vs Execute_Remote_Action). This inconsistency will hurt tool-call reliability.

Also applies to: 99-100, 122-123

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@recipes/noc-reasoning-agent/prompts/shortened_prompt_reasoning.yaml` around
lines 30 - 45, The prompt uses inconsistent tool names (e.g.,
Remote_Unlock/Restart/Restore vs Execute_Remote_Action); pick one canonical tool
identifier and replace all variants in the tool catalog and examples (references
around the listed tool "Remote_Unlock/Restart/Restore(Element, Action)" and any
mentions at lines ~99-100 and ~122-123) so every example, description, and tool
reference uses that single name and signature (e.g.,
Execute_Remote_Action(Element, Action) or Remote_Unlock_Restart_Restore(Element,
Action)), updating any example calls and return descriptions to match the chosen
canonical symbol.
recipes/noc-reasoning-agent/scripts/visualization/extract_scores.py-1-1 (1)

1-1: ⚠️ Potential issue | 🟠 Major

Add the required copyright header.

CI is currently failing on this file because the header is missing in the first 10 lines.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@recipes/noc-reasoning-agent/scripts/visualization/extract_scores.py` at line
1, Add the required project copyright/header to the top of the file so it
appears within the first 10 lines (i.e., place the header above the existing
import json). Ensure the header matches the repository's standard
copyright/license block (same wording and year/owner format used in other source
files) and save the file.
recipes/noc-reasoning-agent/scripts/ns_pipelines/generate_synthetic_data.py-8-8 (1)

8-8: ⚠️ Potential issue | 🟠 Major

Unify output directory path usage.

The script creates outputs/sdg_reason but writes to /workspace/outputs/sdg_reason/; these can diverge and cause unexpected file placement/errors.

Also applies to: 22-22

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@recipes/noc-reasoning-agent/scripts/ns_pipelines/generate_synthetic_data.py`
at line 8, The script creates the directory with
os.makedirs("outputs/sdg_reason", exist_ok=True) but later writes files to
"/workspace/outputs/sdg_reason/", causing inconsistent paths; unify by
introducing a single output_dir variable (e.g., output_dir =
os.path.join("outputs", "sdg_reason") or configurable BASE_OUTPUT) and use that
variable both in the os.makedirs call and in all file write paths (replace
hardcoded "/workspace/outputs/sdg_reason/" usages), ensuring any functions or
writes in generate_synthetic_data.py reference this output_dir consistently.
recipes/noc-reasoning-agent/scripts/ns_pipelines/generate_synthetic_data.py-25-25 (1)

25-25: ⚠️ Potential issue | 🟠 Major

--llm selection is disconnected from the model actually used.

The CLI only accepts qwen2.5-32b-instruct, but generation is hardcoded to openai/gpt-oss-120b; plus the OSS branch is unreachable with current choices.

💡 Suggested fix
-def generate_synthetic_data(args, cluster, num_gpus, step=None, input_format_file=None):
+def generate_synthetic_data(args, cluster, num_gpus, model, step=None, input_format_file=None):
@@
-        model="openai/gpt-oss-120b",
+        model=model,
@@
-    parser.add_argument(
-        "--llm", type=str, default="qwen2.5-32b-instruct",
-        choices=["qwen2.5-32b-instruct"],
-        help="The LLM to use for generation",
-    )
+    parser.add_argument(
+        "--llm",
+        type=str,
+        default="qwen2.5-32b-instruct",
+        choices=["qwen2.5-32b-instruct", "gpt-oss-120b"],
+        help="The LLM to use for generation",
+    )
@@
-    if args.llm == "qwen2.5-32b-instruct":
+    model_map = {
+        "qwen2.5-32b-instruct": "Qwen/Qwen3-32B",
+        "gpt-oss-120b": "openai/gpt-oss-120b",
+    }
+    if args.llm == "qwen2.5-32b-instruct":
         generate_synthetic_data(
-            args, cluster, num_gpus, step=1, input_format_file="/workspace/outputs/sdg/formatted_output.json"
+            args, cluster, num_gpus, model=model_map[args.llm], step=1,
+            input_format_file="/workspace/outputs/sdg/formatted_output.json"
         )
     else:
         generate_synthetic_data_oss_gpt(args, cluster, num_gpus)

Also applies to: 49-51, 60-65

recipes/noc-reasoning-agent/prompts/formatting_prompt.yaml-104-109 (1)

104-109: ⚠️ Potential issue | 🟠 Major

Output contract is contradictory (JSON only vs /think).

The prompt asks for strict JSON with no surrounding text, but the footer adds /think. That creates non-JSON output and parser failures.

Also applies to: 189-191

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@recipes/noc-reasoning-agent/prompts/formatting_prompt.yaml` around lines 104
- 109, The prompt's "Strict JSON Output Format" requirement conflicts with the
footer's "/think" token, causing non-JSON output; edit the
formatting_prompt.yaml to remove or relocate the "/think" footer so the final
output can be a pure JSON array (update the "Strict JSON Output Format" section
and any footer text containing "/think"), and apply the same removal/adjustment
to the other occurrence noted around lines 189-191 to ensure no non-JSON tokens
remain.
recipes/noc-reasoning-agent/prompts/prompt_reasoning.yaml-30-37 (1)

30-37: ⚠️ Potential issue | 🟠 Major

Align tool names between the tool list and few-shot examples.

The current mismatch (Remote_Unlock/Restart/Restore vs Execute_Remote_Action) introduces ambiguity in generated tool calls.

Also applies to: 100-101, 131-132

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@recipes/noc-reasoning-agent/prompts/prompt_reasoning.yaml` around lines 30 -
37, The tool list contains names that don't match the few-shot examples (e.g.,
Remote_Unlock/Restart/Restore vs Execute_Remote_Action); update the tool
definitions and any occurrences in the examples to use the canonical names used
in the few-shot calls (rename Remote_Unlock/Restart/Restore ->
Execute_Remote_Action and align other tools such as Orchestration_tool,
Check_Apply_Configuration, Triage_Toolkit_Tool, Check_remote_files to the exact
identifiers used in the examples), ensuring every tool declaration and every
example call in this prompt file use the same identifier strings so generated
tool calls are unambiguous.
recipes/noc-reasoning-agent/configs/noc_reasoning_sft.yaml-58-58 (1)

58-58: ⚠️ Potential issue | 🟠 Major

remove_no_think_tags value conflicts with its comment.

The setting is false while the comment says “Enabled.” This can silently keep samples without </think> and degrade format consistency.

💡 Suggested fix
-  remove_no_think_tags: false         # Enabled, as requested
+  remove_no_think_tags: true          # Enabled, as requested
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@recipes/noc-reasoning-agent/configs/noc_reasoning_sft.yaml` at line 58, The
YAML key remove_no_think_tags is set to false while its inline comment says
"Enabled," causing a mismatch; update the configuration so the value matches the
intended behavior—either set remove_no_think_tags: true to enable removal of
</think> tags or change the comment to reflect that it's disabled—ensure the
change is applied to the remove_no_think_tags entry in noc_reasoning_sft.yaml so
samples are processed consistently.
recipes/noc-reasoning-agent/prompts/prompt_incident.yaml-14-14 (1)

14-14: ⚠️ Potential issue | 🟠 Major

apply_configuration should require config_type in the schema.

Right now the JSON schema marks config_type optional, which can produce incomplete tool calls for this action.

💡 Suggested fix
-  {{"type": "function", "function": {{"name": "apply_configuration", "description": "...", "parameters": {{"type": "object", "properties": {{"element_id": {{"type": "string", "description": "The identifier of the element to validate and apply configuration to."}}, "config_type": {{"type": "string", "description": "The type of configuration to apply."}}}}, "required": ["element_id"]}}}}
+  {{"type": "function", "function": {{"name": "apply_configuration", "description": "...", "parameters": {{"type": "object", "properties": {{"element_id": {{"type": "string", "description": "The identifier of the element to validate and apply configuration to."}}, "config_type": {{"type": "string", "description": "The type of configuration to apply."}}}}, "required": ["element_id", "config_type"]}}}}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@recipes/noc-reasoning-agent/prompts/prompt_incident.yaml` at line 14, The
JSON schema for the function apply_configuration currently omits config_type
from the required fields; update the function definition in prompt_incident.yaml
(apply_configuration) so that the schema's "required" array includes
"config_type" (in addition to "element_id") to ensure config_type is enforced
when generating tool calls.
recipes/noc-reasoning-agent/scripts/visualization/extract_scores.py-36-44 (1)

36-44: ⚠️ Potential issue | 🟠 Major

Guard sample display columns before slicing to match defensive practices used for metrics.

This block can crash with KeyError when one of the display fields is absent in the JSONL rows. Note that lines 20–24 defensively check for available metrics, but lines 36–44 assume all display columns exist without validation.

💡 Suggested fix
 columns_to_display = ["expected_answer", "agent_response", "llm_judge_reason"]
-df_subset = df[columns_to_display].head(10)
+available_display_cols = [c for c in columns_to_display if c in df.columns]
+if not available_display_cols:
+    raise ValueError("No sample display columns found in the JSONL file")
+df_subset = df[available_display_cols].head(10)

 # Display neatly
 for idx, row in df_subset.iterrows():
     print(f"\n--- Sample {idx + 1} ---")
-    print(f"True Answer (expected_answer): {row['expected_answer']}")
-    print(f"Model Answer (agent_response): {row['agent_response']}")
-    print(f"Judge Explanation (llm_judge_reason): {row['llm_judge_reason']}")
+    for col in available_display_cols:
+        print(f"{col}: {row.get(col, '<missing>')}")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@recipes/noc-reasoning-agent/scripts/visualization/extract_scores.py` around
lines 36 - 44, The sample display assumes all columns in columns_to_display
exist and can raise KeyError; update the logic around columns_to_display /
df_subset to defensively filter to only columns present in df (e.g., compute
available_display_columns = [c for c in columns_to_display if c in df.columns]
and slice df with that), and when iterating rows (df_subset / row) use safe
access (row.get(...) or fallback text) for each field like "expected_answer",
"agent_response", and "llm_judge_reason" so missing fields are handled
gracefully and the print loop never crashes.
recipes/noc-reasoning-agent/scripts/filtering/match_keywords.py-254-261 (1)

254-261: ⚠️ Potential issue | 🟠 Major

pattern parameter is unused — precompiled regex patterns are dead code.

find_keyword_matches accepts a pattern argument (the compiled regex) but never uses it — it only does a normalized string lookup in the keywords list. Meanwhile, hw_pattern/sw_pattern are compiled on lines 281-282 and passed on lines 290/296 but silently discarded. Either the regex matching was the intended behavior (in which case this is a bug), or the exact-match is correct and the regex compilation + parameter should be removed.

Option A: Remove dead code (if exact-match is intended)
-def find_keyword_matches(row, pattern, keywords):
+def find_keyword_matches(row, keywords):
     """Finds which specific keywords from a list match within a DataFrame row."""
     resolution_method = row.get("resolution_method", row.get("close_code", ""))
     if resolution_method and str(resolution_method).lower().replace(" ", "") in keywords:
         return True
     return False
 ...
-    hw_pattern = re.compile("|".join(re.escape(k) for k in hw_keywords), flags=re.IGNORECASE)
-    sw_pattern = re.compile("|".join(re.escape(k) for k in sw_keywords), flags=re.IGNORECASE)
 ...
-        hw_matches = find_keyword_matches(row, hw_pattern, hw_keywords)
+        hw_matches = find_keyword_matches(row, hw_keywords)
 ...
-        sw_matches = find_keyword_matches(row, sw_pattern, sw_keywords)
+        sw_matches = find_keyword_matches(row, sw_keywords)
Option B: Actually use the regex (if substring matching was intended)
-def find_keyword_matches(row, pattern, keywords):
+def find_keyword_matches(row, pattern):
     """Finds which specific keywords from a list match within a DataFrame row."""
     resolution_method = row.get("resolution_method", row.get("close_code", ""))
-    if resolution_method and str(resolution_method).lower().replace(" ", "") in keywords:
+    if resolution_method and pattern.search(str(resolution_method)):
         return True
     return False

Also applies to: 280-282, 288-299

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@recipes/noc-reasoning-agent/scripts/filtering/match_keywords.py` around lines
254 - 261, The function find_keyword_matches currently accepts a pattern param
and compiled hw_pattern/sw_pattern are created elsewhere but never used; either
remove the dead regex code or apply the regex match — choose one: if exact
normalized-keyword equality is intended, remove the pattern parameter from
find_keyword_matches and delete hw_pattern/sw_pattern creation and their callers
(replace calls that pass pattern with just keywords), making resolution_method
normalization and keyword lookup the sole logic; otherwise, if substring/regex
matching was intended, update find_keyword_matches to use the passed-in pattern
(e.g., convert resolution_method to str and call pattern.search(...) or
pattern.match(...) on it after normalization) and keep hw_pattern/sw_pattern and
their usage at the call sites (lines creating hw_pattern/sw_pattern and lines
passing them to find_keyword_matches).
recipes/noc-reasoning-agent/scripts/utils/create_input_jsonl_from_incidents.py-13-74 (1)

13-74: 🛠️ Refactor suggestion | 🟠 Major

ALLOWED_PROBLEM_CODES is duplicated verbatim with filter_rows.py.

This exact same list (legacy + synthetic problem codes) appears in both filter_rows.py (as ALLOWED_PROBLEM_CODES_LEGACY + ALLOWED_PROBLEM_CODES_SYNTHETIC) and here. A single source of truth in one module (e.g., schema_columns.py or filter_rows.py) that both scripts import from would eliminate the maintenance risk of the lists drifting apart.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@recipes/noc-reasoning-agent/scripts/utils/create_input_jsonl_from_incidents.py`
around lines 13 - 74, ALLOWED_PROBLEM_CODES in
create_input_jsonl_from_incidents.py is a verbatim duplicate of the lists used
in filter_rows.py; extract the canonical lists (e.g.,
ALLOWED_PROBLEM_CODES_LEGACY and ALLOWED_PROBLEM_CODES_SYNTHETIC or a single
ALLOWED_PROBLEM_CODES) into a shared module such as schema_columns.py (or make
filter_rows.py the source), export the constant(s) from that module, then update
create_input_jsonl_from_incidents.py and filter_rows.py to import the shared
constant(s) instead of defining them locally and remove the duplicated
definition in create_input_jsonl_from_incidents.py.
recipes/noc-reasoning-agent/scripts/utils/token_usage.py-90-91 (1)

90-91: ⚠️ Potential issue | 🟠 Major

Unclosed file handle — resource leak when counting lines.

open(jsonl_path, ...) on line 91 is not wrapped in a with statement or explicitly closed. This leaks a file descriptor.

Proposed fix
     with open(jsonl_path, "r", encoding="utf-8") as f:
-        total_lines = sum(1 for _ in open(jsonl_path, "r", encoding="utf-8"))
+        total_lines = sum(1 for _ in f)
+        f.seek(0)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@recipes/noc-reasoning-agent/scripts/utils/token_usage.py` around lines 90 -
91, The line that computes total_lines opens jsonl_path a second time without a
with, leaking a file descriptor; modify the code so you only open the file once
(reuse the already-open file object f) or wrap the second open in its own with.
For example, change total_lines = sum(1 for _ in open(jsonl_path, ...)) to
total_lines = sum(1 for _ in f) (or use a separate with-open for jsonl_path) so
the file handle is properly closed; reference the jsonl_path variable and the
total_lines calculation to locate the change.
recipes/noc-reasoning-agent/scripts/utils/split_mocktools_answers.py-56-57 (1)

56-57: ⚠️ Potential issue | 🟠 Major

Error messages written to stdout may corrupt JSONL output.

When --output is not specified, data is printed to stdout (lines 113-114). But JSON decode errors are also printed to stdout (line 57, 71), which would interleave error messages with JSONL data and break downstream consumers.

🐛 Proposed fix: write errors to stderr
+import sys
+
 ...
-                    print(f"Error decoding JSON in file1 on line {line_num}: '{line}'. Error: {e}")
+                    print(f"Error decoding JSON in file1 on line {line_num}: '{line}'. Error: {e}", file=sys.stderr)
 ...
-                    print(f"Error decoding JSON in file2 on line {line_num}: '{line}'. Error: {e}")
+                    print(f"Error decoding JSON in file2 on line {line_num}: '{line}'. Error: {e}", file=sys.stderr)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@recipes/noc-reasoning-agent/scripts/utils/split_mocktools_answers.py` around
lines 56 - 57, The error printouts from the JSONDecodeError handlers (the "Error
decoding JSON in file1..." and the similar handler) are writing to stdout and
will corrupt the JSONL output emitted when --output is unset; change those
prints to write to stderr (e.g., print(..., file=sys.stderr)) and add an import
for sys if needed, ensuring both JSONDecodeError exception handlers and any
other diagnostic prints use stderr while leaving the normal JSONL print logic
for the no---output case unchanged.
recipes/noc-reasoning-agent/scripts/ns_pipelines/prepare_react_agent.py-22-53 (1)

22-53: ⚠️ Potential issue | 🟠 Major

get_tools has inconsistent return types and an unhandled json.loads exception.

Two issues:

  1. The function returns either a tuple (None, None) (line 51) or a single dict (line 53). Callers must know to compare against (None, None) (line 72), which is fragile and unconventional. Prefer returning None or an empty dict for the "no tools" case and always a dict otherwise.

  2. The try/except around json.loads on line 39 was commented out (lines 33, 46-47). If any <tool_call> block contains malformed JSON, json.loads will raise an unhandled JSONDecodeError that crashes the entire script.

Proposed fix
 def get_tools(text):
     matches = {}
 
     tool_calls = re.findall(r"<tool_call>(.*?)</tool_call>", text, flags=re.DOTALL)
     tool_response = re.findall(r"<tool_response>(.*?)</tool_response>", text, flags=re.DOTALL)
-    # print(tool_calls)
     if len(tool_calls) != len(tool_response):
         raise ValueError(f"Mismatch: {len(tool_calls)} tool_calls vs {len(tool_response)} tool_responses")
     for i in range(len(tool_calls)):
-        # try:
         tool_block = tool_calls[i]
         response_block = tool_response[i]
         tool_json_str = tool_block.strip()
-
-        tool_data = json.loads(tool_json_str)
-        response = response_block.strip()
-        tool_name = tool_data["name"]
-        arguments = tool_data["arguments"]
-
-        matches[tool_name] = {"arguments": arguments, "response": response}
-
-    # except json.JSONDecodeError as e:
-    #     print(f"Skipping invalid JSON: {e}")
+        try:
+            tool_data = json.loads(tool_json_str)
+        except json.JSONDecodeError as e:
+            print(f"Skipping invalid tool JSON: {e}")
+            continue
+        response = response_block.strip()
+        tool_name = tool_data["name"]
+        arguments = tool_data["arguments"]
+        matches[tool_name] = {"arguments": arguments, "response": response}
 
     if not matches:
-        # print("No tools!")
-        return None, None
-    # print(matches)
+        return None
     return matches

Then update the caller:

-                        if matches == (None, None):
+                        if matches is None:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@recipes/noc-reasoning-agent/scripts/ns_pipelines/prepare_react_agent.py`
around lines 22 - 53, The get_tools function currently returns inconsistent
types and does not handle JSON decoding errors; change get_tools to always
return a single consistent type (prefer returning an empty dict {} when no tools
are found instead of (None, None)) and wrap the json.loads call inside a
try/except catching json.JSONDecodeError so malformed <tool_call> blocks are
skipped (optionally log or collect the error) rather than raising; update the
loop that processes tool_calls/tool_response (references: variables tool_calls,
tool_response, tool_block, tool_data, response, tool_name, arguments) to
continue on decode errors and ensure the final return is always a dict (possibly
empty) and update callers to expect a dict return.
recipes/noc-reasoning-agent/scripts/create_agent_with_tools.py-101-210 (1)

101-210: 🛠️ Refactor suggestion | 🟠 Major

Tool definitions, bind_tools, MemorySaver, and create_react_agent are all recreated on every row — significant overhead.

All 11 @tool functions, llm_with_tools, checkpointer, and react_agent are re-created inside the loop on every iteration. This is extremely expensive. The @tool closures also trigger Ruff B023 (late-binding closure over loop variable row). While safe here because tools are consumed in the same iteration, the pattern is fragile.

Consider the approach used in the batch variant (create_agent_with_tools_batch.py): create the agent once outside the loop and bind row data at call time via a lookup dict.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@recipes/noc-reasoning-agent/scripts/create_agent_with_tools.py` around lines
101 - 210, The tool functions (Check_Alarm_Status, Check_Element_Neighbors, ...,
Check_Remote_Dump_Files), llm_with_tools = chat_llm.bind_tools, checkpointer =
MemorySaver(), and react_agent = create_react_agent are being recreated per-row
and capture the loop variable row; move the tool definitions and agent creation
out of the loop so they are defined once, then at call time pass a row-specific
lookup (e.g., a dict mapping keys like "Check_Alarm_Status_answer") or wrap each
tool in a small factory that accepts the row as an argument to avoid
late-binding of row; update code that invokes the tools to supply the current
row's lookup so bind_tools/llm and react_agent are reused across iterations.
recipes/noc-reasoning-agent/scripts/visualization/generate_trace_visualization.py-206-209 (1)

206-209: ⚠️ Potential issue | 🟠 Major

df.query() with f-string is fragile and injectable.

If selected_criteria contains a single quote (e.g., Alzheimer's), the query string breaks. Use boolean indexing instead.

Proposed fix
     if selected_criteria:
-        filtered_df = df.query(f"category == '{selected_criteria}'")
+        filtered_df = df[df["category"] == selected_criteria]
     else:
         filtered_df = df
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@recipes/noc-reasoning-agent/scripts/visualization/generate_trace_visualization.py`
around lines 206 - 209, The code uses df.query(f"category ==
'{selected_criteria}'") which is fragile and can break or be injectable when
selected_criteria contains quotes; change to boolean indexing by replacing the
query usage so filtered_df is assigned via df[df['category'] ==
selected_criteria] (or df.loc[df['category'] == selected_criteria]) when
selected_criteria is truthy, otherwise keep filtered_df = df; update the logic
around the selected_criteria check to handle empty/None values consistently and
avoid constructing query strings.
recipes/noc-reasoning-agent/scripts/visualization/generate_trace_visualization.py-162-187 (1)

162-187: ⚠️ Potential issue | 🟠 Major

Unescaped HTML injection in reasoning_trace and finish_action.

reasoning_trace (line 178) and finish_action (line 184) are inserted directly into the HTML output without escaping via esc(). If the model output contains characters like <script>, this produces a malformed or exploitable HTML document. While this is an offline report, it's still a correctness issue — any <, >, or & in the trace will break HTML rendering.

Proposed fix
         <h3>Full Reasoning Trace</h3>
-            <div class='work-notes-container'>{reasoning_trace}</div>
+            <div class='work-notes-container'><pre>{esc(reasoning_trace)}</pre></div>
         <h3>Thoughts, Observations, Actions</h3>
         <div class="steps">
             {"".join(steps_html)}
         </div>
         <h3> Closing Notes</h3>
-            <div class='work-notes-container'>{finish_action}</div>
+            <div class='work-notes-container'><pre>{esc(finish_action)}</pre></div>
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@recipes/noc-reasoning-agent/scripts/visualization/generate_trace_visualization.py`
around lines 162 - 187, The HTML currently inserts raw model output into the
template via reasoning_trace and finish_action without escaping, which can break
rendering or allow injection; update the template generation in the function
that builds the details block (where variables reasoning_trace and finish_action
are used) to pass both through the existing esc() function (e.g.,
esc(reasoning_trace) and esc(finish_action)) before interpolation, so any '<',
'>', '&' etc. are HTML-escaped while keeping the surrounding container divs
unchanged.
recipes/noc-reasoning-agent/scripts/utils/format_reasoning_json.py-49-65 (1)

49-65: ⚠️ Potential issue | 🟠 Major

Silent data corruption when <|message|> marker is absent.

When rfind("<|message|>") returns -1 (marker not found), line 50 computes text[-1 + 11:] = text[10:], silently skipping the first 10 characters instead of handling the missing marker. Similarly, on line 53, rfind("]") + 1 turns a "not found" (-1) into 0, making the guard on line 55 (end_index != -1) always true — it should check end_index > 0.

Proposed fix
                     start_index = text.rfind("<|message|>")
+                    if start_index == -1:
+                        print(f"Warning: '<|message|>' marker not found for incident {number}")
+                        continue
                     text = text[start_index + len("<|message|>") :]
                     start_index = text.find("[")
-                    # Find the last position of the JSON array ']' to ensure we get the whole thing
                     end_index = text.rfind("]") + 1
 
-                    if start_index != -1 and end_index != -1:
+                    if start_index != -1 and end_index > 0:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@recipes/noc-reasoning-agent/scripts/utils/format_reasoning_json.py` around
lines 49 - 65, The code silently corrupts data when the "<|message|>" or closing
']' is missing because it slices using rfind results without checking for -1;
update the logic around the start_index/end_index handling: first check result
of text.rfind("<|message|>") and bail (log and continue) if it is -1 before
slicing text, then after locating the '[' with text.find("[") ensure that
start_index != -1 and compute end_index = text.rfind("]"); require end_index >
start_index (not end_index != -1) before slicing to json_string, and only then
call json.loads and assign to responses[number]; keep the existing prints/logs
when markers are missing.
recipes/noc-reasoning-agent/scripts/utils/format_reasoning_json.py-75-90 (1)

75-90: ⚠️ Potential issue | 🟠 Major

Same rfind returning -1 bug in extract_final_thinking_processes.

Line 86: if "final<|message|>" is not found, rfind returns -1, and slicing from text[-1 + 16:] = text[15:] silently corrupts the data. Add a guard.

Proposed fix
-            thinking = text[text.rfind("final<|message|>") + len("final<|message|>") :]
+            marker_idx = text.rfind("final<|message|>")
+            if marker_idx == -1:
+                print(f"Warning: 'final<|message|>' marker not found for incident {number}, step {step_number}")
+                continue
+            thinking = text[marker_idx + len("final<|message|>") :]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@recipes/noc-reasoning-agent/scripts/utils/format_reasoning_json.py` around
lines 75 - 90, In extract_final_thinking_processes, guard against
text.rfind("final<|message|>") returning -1 before slicing: compute idx =
text.rfind("final<|message|>") and if idx == -1 set thinking to the original
text (or an empty/placeholder string) instead of slicing, otherwise slice using
idx + len("final<|message|>"); update data["generation"] and
responses[number][step_number] with that safe thinking value; reference function
extract_final_thinking_processes, variable text, the literal "final<|message|>",
and helper _incident_id to locate the code to change.
recipes/noc-reasoning-agent/scripts/utils/format_reasoning_json.py-201-207 (1)

201-207: ⚠️ Potential issue | 🟠 Major

Parameter key is accepted but never used.

_first_pos(key, named_key=None) takes key as the first argument, but the function body never references it. Every call site passes a string like "site_or_element_id" that appears to be the intended named_key. This looks like a bug where key should either be removed or used as the default for named_key.

Proposed fix (if `key` was meant to be the named key)
-    def _first_pos(key, named_key=None):
+    def _first_pos(named_key=None):
         """Return named arg, first positional arg, or 'all' as default."""
-        if named_key:
-            val = kv_args.get(named_key)
-            if val:
-                return val
+        if named_key and named_key in kv_args:
+            return kv_args[named_key]
         return pos_args[0] if pos_args else "all"

Then update all call sites, e.g.:

-    arg_dict = {"site_or_element_id": _first_pos("site_or_element_id")}
+    arg_dict = {"site_or_element_id": _first_pos(named_key="site_or_element_id")}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@recipes/noc-reasoning-agent/scripts/utils/format_reasoning_json.py` around
lines 201 - 207, The function _first_pos currently accepts a first parameter
named key but never uses it; update the function so the first parameter acts as
the default named key by setting named_key = named_key or key, then use
kv_args.get(named_key) (and keep returning pos_args[0] or "all"); this preserves
all existing call sites that pass the intended named key as the first argument
and avoids removing or renaming parameters. Ensure references inside _first_pos
use named_key, and run tests to confirm behavior unchanged for callers passing a
separate named_key argument.
recipes/noc-reasoning-agent/scripts/evaluation/evaluation_with_judge.py-133-135 (1)

133-135: ⚠️ Potential issue | 🟠 Major

llm_judge_final_output receives the full reasoning part, not the extracted conclusion.

generated_reasoning_part contains the entire generated reasoning sequence. The function's prompt asks the LLM to evaluate "the generated resolution at the end", implying only the final conclusion/answer should be passed — not the full chain-of-thought. Passing the whole reasoning block will likely produce inaccurate scores.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@recipes/noc-reasoning-agent/scripts/evaluation/evaluation_with_judge.py`
around lines 133 - 135, The call to llm_judge_final_output is passing the entire
generated_reasoning_part (full chain-of-thought) instead of just the final
conclusion; update the code to extract only the final conclusion/answer from
generated_reasoning_part (e.g., parse the last non-empty sentence/lines or the
text after a "Conclusion:" marker) and pass that extracted conclusion to
llm_judge_final_output (keep the original conclusion_expected as-is) so the
judge evaluates only the final resolution rather than the whole reasoning trace.
recipes/noc-reasoning-agent/scripts/evaluation/evaluation_with_judge.py-10-29 (1)

10-29: ⚠️ Potential issue | 🟠 Major

Wrap all module-level side effects in a __main__ guard.

parse_args(), pd.read_json(), the ChatNVIDIA client, and the evaluation loops all execute at import time. This makes the module untestable and causes crashes whenever it is imported by another script.

♻️ Proposed fix
-# Parse arguments for input JSONL path
-parser = argparse.ArgumentParser(...)
-...
-args = parser.parse_args()
-
-# Load the input JSONL
-df = pd.read_json(args.input_jsonl, lines=True)
-...
-llm = ChatNVIDIA(base_url=args.nim_url, model=args.model)
-rouge = rouge_scorer.RougeScorer(["rouge1", "rougeL"], use_stemmer=True)
+def main():
+    parser = argparse.ArgumentParser(description="Evaluation Pipeline for Agent Responses")
+    parser.add_argument("input_jsonl", ...)
+    ...
+    args = parser.parse_args()
+
+    df = pd.read_json(args.input_jsonl, lines=True)
+    llm = ChatNVIDIA(base_url=args.nim_url, model=args.model)
+    rouge = rouge_scorer.RougeScorer(["rouge1", "rougeL"], use_stemmer=True)
+    # ... rest of evaluation logic ...
+
+if __name__ == "__main__":
+    main()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@recipes/noc-reasoning-agent/scripts/evaluation/evaluation_with_judge.py`
around lines 10 - 29, The file runs parsing, pd.read_json, ChatNVIDIA(...) and
rouge_scorer.RougeScorer(...) at import time; move all module-level side effects
into a main() function and call it under if __name__ == "__main__": so imports
are side-effect free—specifically wrap the argparse usage
(parser.parse_args()/args), the DataFrame load (pd.read_json), the LLM client
creation (ChatNVIDIA(...)), the ROUGE init (rouge_scorer.RougeScorer(...)), and
the evaluation loop inside main(), export only pure helper functions at module
level, and invoke main() inside the __main__ guard.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8a32b13 and a154947.

⛔ Files ignored due to path filters (1)
  • recipes/noc-reasoning-agent/data/synthetic_incidents.csv is excluded by !**/*.csv
📒 Files selected for processing (29)
  • .gitignore
  • docs/tutorials/posts/noc-reasoning-agent.md
  • recipes/noc-reasoning-agent/configs/config.ini
  • recipes/noc-reasoning-agent/configs/noc_reasoning_sft.yaml
  • recipes/noc-reasoning-agent/configs/noc_reasoning_sft_6.yaml
  • recipes/noc-reasoning-agent/prompts/formatting_prompt.yaml
  • recipes/noc-reasoning-agent/prompts/prompt_incident.yaml
  • recipes/noc-reasoning-agent/prompts/prompt_reasoning.yaml
  • recipes/noc-reasoning-agent/prompts/shortened_prompt_reasoning.yaml
  • recipes/noc-reasoning-agent/scripts/create_agent_with_tools.py
  • recipes/noc-reasoning-agent/scripts/create_agent_with_tools_batch.py
  • recipes/noc-reasoning-agent/scripts/evaluation/evaluation_with_judge.py
  • recipes/noc-reasoning-agent/scripts/evaluation/problem_code_evaluation.py
  • recipes/noc-reasoning-agent/scripts/evaluation/score.py
  • recipes/noc-reasoning-agent/scripts/filtering/filter_rows.py
  • recipes/noc-reasoning-agent/scripts/filtering/match_keywords.py
  • recipes/noc-reasoning-agent/scripts/ns_pipelines/generate_synthetic_data.py
  • recipes/noc-reasoning-agent/scripts/ns_pipelines/prepare_react_agent.py
  • recipes/noc-reasoning-agent/scripts/tools.py
  • recipes/noc-reasoning-agent/scripts/utils/create_input_jsonl_from_incidents.py
  • recipes/noc-reasoning-agent/scripts/utils/format_reasoning_json.py
  • recipes/noc-reasoning-agent/scripts/utils/reasoning_processes.py
  • recipes/noc-reasoning-agent/scripts/utils/schema_columns.py
  • recipes/noc-reasoning-agent/scripts/utils/split_incident_data.py
  • recipes/noc-reasoning-agent/scripts/utils/split_mocktools_answers.py
  • recipes/noc-reasoning-agent/scripts/utils/token_usage.py
  • recipes/noc-reasoning-agent/scripts/visualization/extract_representation_columns.py
  • recipes/noc-reasoning-agent/scripts/visualization/extract_scores.py
  • recipes/noc-reasoning-agent/scripts/visualization/generate_trace_visualization.py

Add Apache 2.0 copyright headers to all recipe Python files,
fix ruff lint issues (unsorted imports, unused import), apply
ruff formatting, and strip trailing whitespace.

Signed-off-by: George Armstrong <georgea@nvidia.com>
…orial' into acanaveras/add-noc-reasoning-tutorial
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Note

Due to the large number of review comments, Critical severity comments were prioritized as inline comments.

🟠 Major comments (21)
recipes/noc-reasoning-agent/scripts/utils/split_incident_data.py-32-33 (1)

32-33: ⚠️ Potential issue | 🟠 Major

Fail fast on missing incident IDs instead of silently accepting None.

At Line 32, Line 55, and Line 70, .get() fallback can return None if both keys are absent, which can silently corrupt split membership.

🔧 Proposed fix
+def _get_incident_id(row: dict) -> str:
+    if "incident_identifier" in row:
+        return row["incident_identifier"]
+    if "number" in row:
+        return row["number"]
+    raise KeyError("Missing incident id: expected 'incident_identifier' or 'number'")
+
 def split_generation_field(
     input_folder: str, train_out: str, test_out: str, test_size: float = 0.2, seed: int = 42, max_examples=10000
 ):
@@
-            number = row.get("incident_identifier", row.get("number"))
+            number = _get_incident_id(row)
             incidents.append(number)
@@
-                number = row.get("incident_identifier", row.get("number"))
+                number = _get_incident_id(row)
                 if number in train_set:
                     current_iteration_train.append(row)
@@
-                number = row.get("incident_identifier", row.get("number"))
+                number = _get_incident_id(row)
                 if number in test_set:

As per coding guidelines **/*.py: "Use direct dictionary key access (e.g., data[key_name]) instead of .get() when the key is expected to be present, to fail with a clear error if the key is missing rather than silently corrupting data".

Also applies to: 55-56, 70-71

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@recipes/noc-reasoning-agent/scripts/utils/split_incident_data.py` around
lines 32 - 33, Replace the tolerant .get() lookups that can return None with
direct key access so missing IDs raise immediately: for each occurrence where
you do number = row.get("incident_identifier", row.get("number")) (and the
similar occurrences at the other two places), change to use a conditional that
accesses keys directly, e.g. if "incident_identifier" in row: number =
row["incident_identifier"] else: number = row["number"]; this way a missing
"number" or "incident_identifier" will raise a KeyError instead of silently
producing None when you later call incidents.append(number) (apply the same
pattern for the other two occurrences).
recipes/noc-reasoning-agent/scripts/utils/split_incident_data.py-102-103 (1)

102-103: ⚠️ Potential issue | 🟠 Major

--input_dir semantics are inconsistent with actual usage.

At Line 102-103, help/default imply a JSONL file, but the code uses os.path.join(input_folder, "iteration_0.jsonl"), so it expects a directory. The current default likely fails on first run.

🔧 Proposed fix
-    parser.add_argument("--input_dir", help="Path to input_dir JSONL file", default="output_incident.jsonl")
+    parser.add_argument(
+        "--input_dir",
+        required=True,
+        help="Path to directory containing iteration_*.jsonl files",
+    )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@recipes/noc-reasoning-agent/scripts/utils/split_incident_data.py` around
lines 102 - 103, Update the CLI flag and its help/default to reflect that
input_dir is a directory (not a file) and ensure callers expect a folder
containing iteration_0.jsonl: change the parser.add_argument for "--input_dir"
to have help text like "Path to input directory containing iteration_0.jsonl"
and set a default that is a directory (e.g., "output_incident") or make the
default None and validate it; also add a validation step near where input_folder
is used (the code that does os.path.join(input_folder, "iteration_0.jsonl")) to
raise a clear error if the provided path is not a directory or the expected file
is missing.
recipes/noc-reasoning-agent/scripts/utils/split_incident_data.py-105-107 (1)

105-107: ⚠️ Potential issue | 🟠 Major

--preview is accepted but ignored.

At Line 105-107, users can pass --preview, but at Line 110 it is never used. This is silent parameter ignore behavior.

🔧 Proposed fix (remove until implemented)
-    parser.add_argument(
-        "--preview", type=int, default=2, help="Number of examples to preview before confirmation (default: 2)"
-    )
     args = parser.parse_args()
 
     split_generation_field(args.input_dir, args.train_output, args.test_output)

As per coding guidelines **/*.py: "Avoid silently ignoring user-passed parameters; fail if a required parameter is not specified or an unsupported parameter is provided."

Also applies to: 110-110

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@recipes/noc-reasoning-agent/scripts/utils/split_incident_data.py` around
lines 105 - 107, The --preview argparse option (parser.add_argument("--preview",
...)) is declared but never used; remove this argument declaration or wire the
parsed preview value into the preview/confirmation logic that runs after parsing
(where the script currently previews examples at line ~110). Specifically,
either delete the parser.add_argument call for "--preview" or pass args.preview
into the function/method that displays the preview (or replace the hard-coded
preview count there) so the parsed preview value from parser is actually
applied.
recipes/noc-reasoning-agent/scripts/utils/split_mocktools_answers.py-65-72 (1)

65-72: ⚠️ Potential issue | 🟠 Major

Do not continue after JSON decode failures.

Line 70 and Line 84 currently log and continue, which can silently emit partial merged datasets. Fail fast on malformed JSONL rows so data issues are visible.

Proposed fix
-                try:
-                    d = json.loads(line)
-                    num = d.get("number")
-                    if num:
-                        data1[num] = d
-                except json.JSONDecodeError as e:
-                    print(f"Error decoding JSON in file1 on line {line_num}: '{line}'. Error: {e}")
+                d = json.loads(line)
+                num = d["number"]
+                data1[num] = d
...
-                try:
-                    d = json.loads(line)
-                    num = d.get("number")
-                    if num:
-                        data2[num] = d.get("generation", "")
-                except json.JSONDecodeError as e:
-                    print(f"Error decoding JSON in file2 on line {line_num}: '{line}'. Error: {e}")
+                d = json.loads(line)
+                num = d["number"]
+                data2[num] = d["generation"]

As per coding guidelines, "Don't catch exceptions when they are not normally expected to be raised; let unexpected exceptions propagate so that failures are visible rather than silent".

Also applies to: 79-85

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@recipes/noc-reasoning-agent/scripts/utils/split_mocktools_answers.py` around
lines 65 - 72, The try/except that catches json.JSONDecodeError around
json.loads(line) and prints an error allows the script to continue and produce a
silently partial merged dataset; change the handlers in both places that catch
JSONDecodeError (the blocks populating data1 and data2) to fail fast: remove the
swallow-and-continue behavior and instead re-raise the exception (or call
sys.exit with a non-zero code) after logging so malformed JSONL rows abort
processing immediately; locate the json.loads(...) usages and their
corresponding except json.JSONDecodeError handlers to implement this change.
recipes/noc-reasoning-agent/scripts/utils/split_mocktools_answers.py-67-69 (1)

67-69: ⚠️ Potential issue | 🟠 Major

Use direct key access for required record fields.

Line 67, Line 81, Line 83, and Line 109 use .get() for fields that the merge depends on (number, generation). This can silently drop or degrade records instead of failing with a clear schema error.

As per coding guidelines, "Use direct dictionary key access (e.g., data[key_name]) instead of .get() when the key is expected to be present, to fail with a clear error if the key is missing rather than silently corrupting data".

Also applies to: 81-83, 109-109

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@recipes/noc-reasoning-agent/scripts/utils/split_mocktools_answers.py` around
lines 67 - 69, Replace uses of d.get("number") and d.get("generation") with
direct indexing (d["number"], d["generation"]) so missing required keys raise
immediately; update occurrences around where num is assigned (variable num from
dict d) and where generation is read/used (the code that indexes data1/data2
with generation) to use d["..."] and, if desired, add an explicit check that
raises a clear ValueError with context before indexing to provide a friendlier
schema error.
recipes/noc-reasoning-agent/scripts/visualization/extract_representation_columns.py-55-57 (1)

55-57: 🛠️ Refactor suggestion | 🟠 Major

Use direct key access on info — all three keys are guaranteed to exist.

csv_idx is built from a DataFrame that was already validated (lines 37–40) to contain time_to_resolve, solved_category, and solved_reason. Using .get() here silently returns None for typos or future renames instead of raising a visible KeyError.

♻️ Proposed fix
-        row["time_to_resolve"] = info.get("time_to_resolve")
-        row["solved_category"] = info.get("solved_category")
-        row["solved_reason"] = info.get("solved_reason")
+        row["time_to_resolve"] = info["time_to_resolve"]
+        row["solved_category"] = info["solved_category"]
+        row["solved_reason"] = info["solved_reason"]

As per coding guidelines, "Use direct dictionary key access (e.g., data[key_name]) instead of .get() when the key is expected to be present, to fail with a clear error if the key is missing rather than silently corrupting data."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@recipes/noc-reasoning-agent/scripts/visualization/extract_representation_columns.py`
around lines 55 - 57, The three assignments use info.get(...) which can silently
return None; change them to direct key access so missing keys raise immediately:
replace the uses of info.get("time_to_resolve"), info.get("solved_category"),
and info.get("solved_reason") with info["time_to_resolve"],
info["solved_category"], and info["solved_reason"] respectively (the variables
involved are row and info; csv_idx is already validated earlier).
recipes/noc-reasoning-agent/scripts/visualization/generate_trace_visualization.py-219-220 (1)

219-220: ⚠️ Potential issue | 🟠 Major

df.query with an f-string will raise a parse error if selected_criteria contains a single quote.

A value such as O'Brien produces the invalid expression category == 'O'Brien', causing pandas to throw pandas.errors.UndefinedVariableError or a ValueError. Use the @ local-variable syntax to avoid constructing the query string manually.

🐛 Proposed fix
     if selected_criteria:
-        filtered_df = df.query(f"category == '{selected_criteria}'")
+        filtered_df = df[df["category"] == selected_criteria]
     else:
         filtered_df = df
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@recipes/noc-reasoning-agent/scripts/visualization/generate_trace_visualization.py`
around lines 219 - 220, The current use of df.query with an f-string (in the
selected_criteria -> filtered_df block) can fail when selected_criteria contains
quotes; change the filtering to avoid string interpolation and instead use
pandas' local-variable syntax or boolean masking (e.g., use df.query("category
== `@selected_criteria`") or df[df["category"] == selected_criteria]) so the query
won't be malformed when selected_criteria contains characters like single
quotes.
recipes/noc-reasoning-agent/scripts/visualization/generate_trace_visualization.py-96-104 (1)

96-104: ⚠️ Potential issue | 🟠 Major

Potential IndexError in parse_work_notes when input ends with a bare timestamp.

re.split with a capturing group produces ["pre", ts1, body1, ts2, body2, ...]. When the source text ends with a delimiter match and no trailing body (e.g., a trailing timestamp with no note text), parts has an even length and parts[i + 1] on the final iteration goes out of bounds.

🐛 Proposed fix
     i = 1
     while i < len(parts):
         timestamp = parts[i].strip()
-        note_text = parts[i + 1].strip()
-        if note_text:  # Only add entries that have content
+        note_text = parts[i + 1].strip() if i + 1 < len(parts) else ""
+        if note_text:
             notes.append({"timestamp": timestamp, "note": note_text})
         i += 2
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@recipes/noc-reasoning-agent/scripts/visualization/generate_trace_visualization.py`
around lines 96 - 104, parse_work_notes can IndexError when parts ends with a
trailing timestamp; update the loop in parse_work_notes to ensure you never
access parts[i + 1] out of bounds by changing the iteration to only run while i
+ 1 < len(parts) (or use a for-range like range(1, len(parts) - 1, 2)), then
keep the existing logic that strips and appends {"timestamp": ..., "note": ...}
only when note_text is non-empty; reference parse_work_notes and the local
variables parts, i, timestamp, note_text when locating the change.
recipes/noc-reasoning-agent/scripts/evaluation/problem_code_evaluation.py-160-162 (1)

160-162: ⚠️ Potential issue | 🟠 Major

Replace broad except Exception with specific exception types.

Per coding guidelines, unexpected exceptions should propagate rather than be swallowed. The only reliably expected exception inside this block is json.JSONDecodeError from json.loads. Catching Exception silently masks logic errors (e.g., AttributeError, IndexError) as mere "failed" lines.

🔧 Proposed fix
-        except Exception as e:
-            print("Error:", e)
-            failed += 1
+        except json.JSONDecodeError as e:
+            print("Error parsing JSON:", e)
+            failed += 1
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@recipes/noc-reasoning-agent/scripts/evaluation/problem_code_evaluation.py`
around lines 160 - 162, Replace the broad "except Exception as e" that surrounds
the json.loads call so only json.JSONDecodeError is caught and treated as a test
failure; change the handler to catch json.JSONDecodeError (increment failed and
print the error) and let any other exceptions propagate (do not swallow) so
logic/attribute/index errors surface—look for the except block surrounding
json.loads and the "failed" counter in problem_code_evaluation.py and update
accordingly.
recipes/noc-reasoning-agent/scripts/evaluation/evaluation_with_judge.py-69-77 (1)

69-77: ⚠️ Potential issue | 🟠 Major

Replace broad except Exception with specific types.

Per coding guidelines, unexpected exceptions should propagate. llm.invoke can raise network/auth errors, and the int(...) cast can raise ValueError on a malformed LLM response — these are the only expected failures. Catch them explicitly instead of masking all exceptions silently.

🔧 Proposed fix
-    except Exception as e:
-        print(f"Error in LLM judge: {e}")
-        return 0, "Error"
+    except (ValueError, IndexError) as e:
+        print(f"Error parsing LLM judge response: {e}")
+        return 0, "Error"

Apply the same change to lines 100–108.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@recipes/noc-reasoning-agent/scripts/evaluation/evaluation_with_judge.py`
around lines 69 - 77, The current broad except in the LLM judge block masks
unexpected errors; modify the try/except so you only catch the expected
failures: catch the LLM client network/auth related exceptions raised by
llm.invoke (e.g., ConnectionError, TimeoutError, or the specific
AuthenticationError type your LLM client uses) and ValueError from the int(...)
cast, log/print the error and return the fallback (0, "Error"), but re-raise any
other exceptions so unexpected failures propagate; update the same pattern for
the analogous block around lines 100–108. Reference the llm.invoke call and the
int(output.split("Score:")...) parsing/Reasoning extraction when making the
change.
recipes/noc-reasoning-agent/scripts/visualization/extract_scores.py-19-20 (1)

19-20: ⚠️ Potential issue | 🟠 Major

Replace the hardcoded path with a CLI argument.

The comment on line 19 explicitly calls this out, but the fix should be argparse rather than a manual source edit. As-is, every run requires source modification.

🔧 Proposed fix
-# Replace with your actual JSONL file path
-file_path = "evaluations.jsonl"
+import argparse
+
+parser = argparse.ArgumentParser(description="Extract and display evaluation scores from a JSONL file")
+parser.add_argument("input_jsonl", help="Path to evaluations JSONL file")
+args = parser.parse_args()
+file_path = args.input_jsonl
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@recipes/noc-reasoning-agent/scripts/visualization/extract_scores.py` around
lines 19 - 20, Replace the hardcoded file_path variable by adding argparse to
parse a CLI flag (e.g., --file or -f) and set file_path from the parsed args
(defaulting to "evaluations.jsonl"); update the top-level code that currently
assigns file_path to instead use args.file and ensure the script defines/uses an
entrypoint (if not present, add an if __name__ == "__main__": main() pattern) so
users can run the script with a custom JSONL path via the new argument.
recipes/noc-reasoning-agent/scripts/evaluation/evaluation_with_judge.py-49-108 (1)

49-108: 🛠️ Refactor suggestion | 🟠 Major

llm_judge_final_output and llm_judge_reasoning are near-identical — merge into one function.

The two functions share identical structure, error handling, and parsing logic; only the prompt text differs. This is a clear DRY violation.

♻️ Proposed refactor
-def llm_judge_final_output(expected, generated):
-    prompt = f"""
-    Evaluate how well the generated resolution at the end matches the expected resolution on a scale of 1-5:
-    ...
-    Expected: {expected}
-    Generated: {generated}
-    ...
-    """
-    try:
-        response = llm.invoke(prompt)
-        output = response.content.strip()
-        score = int(output.split("Score:")[1].split("\n")[0].strip())
-        reasoning = output.split("Reasoning:")[1].strip()
-        return score, reasoning
-    except Exception as e:
-        print(f"Error in LLM judge: {e}")
-        return 0, "Error"
-
-
-def llm_judge_reasoning(expected, generated):
-    prompt = f"""
-    Evaluate how well the generated reasoning is, including tools used, resolution matches the expected resolution on a scale of 1-5:
-    ...
-    Expected: {expected}
-    Generated: {generated}
-    ...
-    """
-    try:
-        response = llm.invoke(prompt)
-        output = response.content.strip()
-        score = int(output.split("Score:")[1].split("\n")[0].strip())
-        reasoning = output.split("Reasoning:")[1].strip()
-        return score, reasoning
-    except Exception as e:
-        print(f"Error in LLM judge: {e}")
-        return 0, "Error"
+JUDGE_PROMPTS = {
+    "final_output": (
+        "Evaluate how well the generated resolution at the end matches the expected resolution on a scale of 1-5:\n"
+        "- 5: Perfect match in content.\n"
+        "- 4: High similarity, minor differences.\n"
+        "- 3: Moderate match, key elements present but some deviations.\n"
+        "- 2: Low match, major differences.\n"
+        "- 1: No match."
+    ),
+    "reasoning": (
+        "Evaluate how well the generated reasoning is, including tools used, resolution matches the expected resolution on a scale of 1-5:\n"
+        "- 5: Perfect match in content, structure, and actions.\n"
+        "- 4: High similarity, minor differences.\n"
+        "- 3: Moderate match, key elements present but some deviations.\n"
+        "- 2: Low match, major differences.\n"
+        "- 1: No match."
+    ),
+}
+
+
+def llm_judge(expected: str, generated: str, judge_type: str) -> tuple[int, str]:
+    """Invoke the LLM judge and return (score, reasoning)."""
+    criteria = JUDGE_PROMPTS[judge_type]
+    prompt = (
+        f"{criteria}\n\n"
+        f"Expected: {expected}\nGenerated: {generated}\n\n"
+        "Provide only the score (1-5) and a brief reasoning.\n"
+        "Format: Score: X\\nReasoning: ..."
+    )
+    response = llm.invoke(prompt)
+    output = response.content.strip()
+    score = int(output.split("Score:")[1].split("\n")[0].strip())
+    reasoning = output.split("Reasoning:")[1].strip()
+    return score, reasoning

Then at the call sites:

-    reasoning_judge_score, reasoning_judge_reason = llm_judge_reasoning(reasoning_expected, generated_reasoning_part)
-    conclusion_judge_score, conclusion_judge_reason = llm_judge_final_output(
-        conclusion_expected, generated_reasoning_part
-    )
+    reasoning_judge_score, reasoning_judge_reason = llm_judge(reasoning_expected, generated_reasoning_part, "reasoning")
+    conclusion_judge_score, conclusion_judge_reason = llm_judge(conclusion_expected, generated_reasoning_part, "final_output")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@recipes/noc-reasoning-agent/scripts/evaluation/evaluation_with_judge.py`
around lines 49 - 108, Both llm_judge_final_output and llm_judge_reasoning
duplicate parsing/error logic; consolidate them into a single helper (e.g.,
llm_judge) that accepts the prompt body or a prompt_type and the expected and
generated strings, reuses the existing llm.invoke/error parsing logic (response
= llm.invoke(prompt); parse Score and Reasoning) and returns (score, reasoning),
then replace calls to llm_judge_final_output and llm_judge_reasoning with the
new llm_judge(expected, generated, prompt_template_or_type) so only the prompt
text differs while parsing/error handling lives in one place.
recipes/noc-reasoning-agent/scripts/filtering/filter_rows.py-120-125 (1)

120-125: ⚠️ Potential issue | 🟠 Major

UnicodeDecodeError can never be raised with encoding="latin1" — the except branch is dead code, and the fallback encoding order is inverted.

latin1 (ISO-8859-1) is a complete single-byte encoding that maps every possible byte value (0x00–0xFF) to a Unicode code point, so pd.read_csv with encoding="latin1" will never raise UnicodeDecodeError. The except branch is unreachable.

Additionally, the fallback pd.read_csv(path) (which defaults to UTF-8) is more restrictive than latin1, not less — so this would never rescue a file that latin1 couldn't handle; it would only fail on files that latin1 already read successfully.

The correct pattern for a graceful fallback is to try utf-8 first (strict, correct for modern files), then fall back to latin1 (permissive, for legacy files with non-UTF-8 bytes):

🐛 Proposed fix
 def _load_csv(path: str) -> pd.DataFrame:
-    """Load CSV, falling back from latin1 to default encoding."""
-    try:
-        return pd.read_csv(path, encoding="latin1")
-    except UnicodeDecodeError:
-        return pd.read_csv(path)
+    """Load CSV, falling back to latin1 for files with non-UTF-8 bytes."""
+    try:
+        return pd.read_csv(path)
+    except UnicodeDecodeError:
+        return pd.read_csv(path, encoding="latin1")

As per coding guidelines, "Don't catch exceptions when they are not normally expected to be raised."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@recipes/noc-reasoning-agent/scripts/filtering/filter_rows.py` around lines
120 - 125, _in _load_csv, attempting read_csv with encoding="latin1" first makes
the except branch unreachable; instead try the strict utf-8 read first and fall
back to latin1 on UnicodeDecodeError. Update the _load_csv function to call
pd.read_csv(path) (UTF-8) inside a try and catch UnicodeDecodeError to then call
pd.read_csv(path, encoding="latin1"); ensure the function name _load_csv and
pd.read_csv calls are used so the change is localized.
recipes/noc-reasoning-agent/scripts/filtering/match_keywords.py-268-275 (1)

268-275: ⚠️ Potential issue | 🟠 Major

pattern parameter is accepted but never used — and the precompiled regexes on lines 295–296 are silently discarded.

find_keyword_matches only checks str(resolution_method).lower().replace(" ", "") in keywords (an exact set-membership test). The compiled hw_pattern/sw_pattern regex objects are passed in but ignored, making the two re.compile calls on lines 295–296 dead code.

Two valid fixes:

Option A — drop the unused parameter and the regex compilation (simpler, matches actual behavior):

♻️ Option A: remove unused parameter and dead code
-def find_keyword_matches(row, pattern, keywords):
+def find_keyword_matches(row, keywords):
     """Finds which specific keywords from a list match within a DataFrame row."""
     resolution_method = row.get("resolution_method", row.get("close_code", ""))
     if resolution_method and str(resolution_method).lower().replace(" ", "") in keywords:
         return True
     return False
-    # Precompile regex patterns for performance. This matches any of the phrases.
-    hw_pattern = re.compile("|".join(re.escape(k) for k in hw_keywords), flags=re.IGNORECASE)
-    sw_pattern = re.compile("|".join(re.escape(k) for k in sw_keywords), flags=re.IGNORECASE)
-
     # Convert all data to string type for safe searching
     str_df = df.astype(str)
     ...
-        hw_matches = find_keyword_matches(row, hw_pattern, hw_keywords)
+        hw_matches = find_keyword_matches(row, hw_keywords)
     ...
-        sw_matches = find_keyword_matches(row, sw_pattern, sw_keywords)
+        sw_matches = find_keyword_matches(row, sw_keywords)

Option B — actually use the regex (if broader substring matching was intended):

♻️ Option B: use the pattern inside find_keyword_matches
 def find_keyword_matches(row, pattern, keywords):
     """Finds which specific keywords from a list match within a DataFrame row."""
     resolution_method = row.get("resolution_method", row.get("close_code", ""))
-    if resolution_method and str(resolution_method).lower().replace(" ", "") in keywords:
+    if resolution_method and pattern.search(str(resolution_method)):
         return True
     return False
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@recipes/noc-reasoning-agent/scripts/filtering/match_keywords.py` around lines
268 - 275, The function find_keyword_matches currently ignores its pattern
parameter and discards the precompiled regexes (hw_pattern / sw_pattern); update
find_keyword_matches to actually use the passed-in pattern by normalizing
resolution_method to a string (e.g., lowercased and stripped) and performing
pattern.search(resolution_method_str) (or pattern.match as appropriate) in
addition to, or instead of, the exact membership test against keywords; modify
callers that pass hw_pattern / sw_pattern so the function uses that regex, or if
you prefer Option A remove the unused pattern parameter and delete the
hw_pattern / sw_pattern compilation to avoid dead code — locate the change in
find_keyword_matches and the code that constructs hw_pattern / sw_pattern.
recipes/noc-reasoning-agent/scripts/utils/token_usage.py-80-93 (1)

80-93: ⚠️ Potential issue | 🟠 Major

When --yaml is provided, enforce file/dependency/key validity instead of skipping.

Current behavior silently ignores user intent (missing file, missing PyYAML) and uses .get("user") where schema key is expected.

Suggested fix
 def read_yaml_prompt(path: Optional[str]) -> Optional[str]:
     if not path:
         return None
     if not os.path.exists(path):
-        print(f"[warn] YAML file not found: {path}")
-        return None
+        raise FileNotFoundError(f"YAML file not found: {path}")
     if yaml is None:
-        print("[warn] PyYAML not installed; skipping YAML parsing.")
-        return None
+        raise RuntimeError("PyYAML is required when --yaml is provided.")
     with open(path, "r", encoding="utf-8") as f:
         data = yaml.safe_load(f)
-    # Your structure shows top-level key 'user'
-    return data.get("user") if isinstance(data, dict) else None
+    if not isinstance(data, dict):
+        raise TypeError(f"YAML root must be a mapping: {path}")
+    return data["user"]

As per coding guidelines, "Use direct dictionary key access (e.g., data[key_name]) instead of .get() when the key is expected to be present, to fail with a clear error if the key is missing rather than silently corrupting data."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@recipes/noc-reasoning-agent/scripts/utils/token_usage.py` around lines 80 -
93, In read_yaml_prompt, enforce the caller's intent by failing fast instead of
silently returning None: when a non-empty path is provided, raise a
FileNotFoundError if the file does not exist (don’t print and continue), raise
an ImportError/RuntimeError if the yaml module is unavailable, and parse the
file then access the expected top-level key using direct indexing (data["user"])
so a KeyError is raised if the schema is wrong; update error messages to include
the path and clear context and keep the function signature and name
(read_yaml_prompt) the same.
recipes/noc-reasoning-agent/scripts/utils/token_usage.py-119-124 (1)

119-124: ⚠️ Potential issue | 🟠 Major

Require generation to exist and be a string.

Using row.get("generation", "") plus a no-op pass on non-string values silently records invalid token lengths.

Suggested fix
-            gen = row.get("generation", "")
+            gen = row["generation"]
@@
             if not isinstance(gen, str):
-                # Try a nested alternative if your data uses it (customize as needed):
-                # gen = row.get("output", {}).get("text", "")
-                pass
+                raise TypeError(f"Line {i}: 'generation' must be a string.")

As per coding guidelines, "Use direct dictionary key access (e.g., data[key_name]) instead of .get() when the key is expected to be present, to fail with a clear error if the key is missing rather than silently corrupting data."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@recipes/noc-reasoning-agent/scripts/utils/token_usage.py` around lines 119 -
124, Replace the silent .get() and no-op branch with strict validation: access
the key directly via row["generation"] (so a KeyError surfaces if missing),
assign it to gen, then if not isinstance(gen, str) raise a clear
TypeError/ValueError (including identifying info from row if available) instead
of passing; update the block that currently sets gen = row.get("generation", "")
and the subsequent non-string branch to perform this direct access and explicit
type check so invalid/missing generation values fail loudly.
recipes/noc-reasoning-agent/scripts/utils/token_usage.py-53-64 (1)

53-64: ⚠️ Potential issue | 🟠 Major

Fail fast when --model is provided but tokenizer loading fails.

Right now a user-supplied model can silently degrade to whitespace tokenization, which invalidates analysis output. Per the coding guidelines, explicitly provided parameters should fail fast rather than silently degrade.

Suggested fix
 def load_tokenizer(model_name: Optional[str]):
     """
     Try to load a HF tokenizer. If unavailable (e.g., no internet/cache),
     return None and we'll fall back to whitespace tokenization.
     """
-    if not model_name or AutoTokenizer is None:
+    if not model_name:
         return None
-    try:
-        tok = AutoTokenizer.from_pretrained(model_name, local_files_only=True)
-        return tok
-    except Exception:
-        # Try again without local_files_only (may fail if no internet)
-        try:
-            tok = AutoTokenizer.from_pretrained(model_name)
-            return tok
-        except Exception:
-            return None
+    if AutoTokenizer is None:
+        raise RuntimeError("--model was provided but transformers is not installed.")
+    try:
+        return AutoTokenizer.from_pretrained(model_name, local_files_only=True)
+    except OSError:
+        return AutoTokenizer.from_pretrained(model_name)

And update the usage check:

     tokenizer = load_tokenizer(args.model)
-    if tokenizer is None:
+    if tokenizer is None and args.model is None:
         print("[warn] Could not load tokenizer; falling back to whitespace token counts.")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@recipes/noc-reasoning-agent/scripts/utils/token_usage.py` around lines 53 -
64, The tokenizer loader currently falls back silently when
AutoTokenizer.from_pretrained fails; change the logic in
recipes/noc-reasoning-agent/scripts/utils/token_usage.py so that if model_name
is provided (the user explicitly set --model) and AutoTokenizer.from_pretrained
(both with local_files_only=True and the retry without it) raises, you raise a
descriptive exception (e.g., RuntimeError) instead of returning None or falling
back to whitespace tokenization; keep the same try/retry calls to
AutoTokenizer.from_pretrained but re-raise with context on final failure. Also
update the caller/usage check that currently accepts a None tokenizer so that
when a user-supplied model_name was provided and the loader raises/returns None
it fails fast and surfaces the error to the user rather than proceeding with a
whitespace tokenizer.
recipes/noc-reasoning-agent/scripts/utils/token_usage.py-72-75 (1)

72-75: ⚠️ Potential issue | 🟠 Major

Remove the bare except Exception: pass block.

Silent exception handling around tokenizer.encode() hides real tokenizer/runtime failures and causes token counts to degrade silently to whitespace-based fallback. This violates the coding guideline: "Don't catch exceptions when they are not normally expected to be raised; let unexpected exceptions propagate so that failures are visible rather than silent." Across the entire codebase, tokenizer.encode() is called directly without exception handling everywhere else, confirming that exceptions are not expected during normal operation.

Suggested fix
     if tokenizer is not None:
         # Use encode to match model token count (fast + accurate)
-        try:
-            return len(tokenizer.encode(text, add_special_tokens=False))
-        except Exception:
-            pass
+        return len(tokenizer.encode(text, add_special_tokens=False))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@recipes/noc-reasoning-agent/scripts/utils/token_usage.py` around lines 72 -
75, Remove the silent try/except around tokenizer.encode so exceptions are not
swallowed: delete the try/except block and call tokenizer.encode(text,
add_special_tokens=False) directly and return len(...) (i.e., replace the
try/except with a direct return len(tokenizer.encode(text,
add_special_tokens=False))); do not add a bare except or silent fallback—let
tokenizer.encode propagate errors so failures are visible.
recipes/noc-reasoning-agent/scripts/ns_pipelines/generate_synthetic_data.py-39-39 (1)

39-39: ⚠️ Potential issue | 🟠 Major

Hardcoded model silently overrides the user-selected --llm.

model="openai/gpt-oss-120b" is always used regardless of args.llm. When main() routes here because args.llm == "qwen2.5-32b-instruct", the user's selection is completely ignored at the model-serving level. The --llm argument should drive both the routing logic and the model identifier passed to generate().

🐛 Proposed fix
 def generate_synthetic_data(args, cluster, num_gpus, step=None, input_format_file=None):
     os.makedirs("outputs/sdg_reason", exist_ok=True)
     generate(
         ctx=wrap_arguments(
             f"++prompt_config=/workspace/data/prompt_reasoning.yaml "
             f"++inference.temperature={args.temperature} "
             f"++inference.tokens_to_generate={args.tokens_to_generate} "
             f"++code_execution=false "
             f"++skip_filled=false "
             f"++use_completions_api=true "
             f"++input_file={input_format_file} "
         ),
         cluster=cluster,
         server_type="vllm",
         input_file=input_format_file,
         output_dir="/workspace/outputs/sdg_reason/",
         expname="incident-generation",
-        model="openai/gpt-oss-120b",
+        model=args.model,
         rerun_done=True,
         server_gpus=num_gpus,
     )

Then pass model through from main():

 if args.llm == "qwen2.5-32b-instruct":
     generate_synthetic_data(
-        args, cluster, num_gpus, step=1, input_format_file="/workspace/outputs/sdg/formatted_output.json"
+        args, cluster, num_gpus, step=1,
+        input_format_file="/workspace/outputs/sdg/formatted_output.json",
     )

where args carries a --model (or derived) attribute mapping the LLM alias to its model path, or simply forward args.llm directly as the model string (after proper name resolution).

As per coding guidelines: "Avoid silently ignoring user-passed parameters; fail if a required parameter is not specified or an unsupported parameter is provided."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@recipes/noc-reasoning-agent/scripts/ns_pipelines/generate_synthetic_data.py`
at line 39, The hardcoded model="openai/gpt-oss-120b" inside the branch
overrides the user-selected LLM; update the branch that calls generate(...) so
it accepts and forwards a model parameter derived from main() (use args.llm or a
resolved mapping) instead of the literal string, and validate that the passed
model is supported; modify main() to pass the chosen model into the function
that invokes generate(...) (refer to main(), args.llm, the generate(...) call,
and the local model variable) so the CLI selection drives the actual model
identifier.
recipes/noc-reasoning-agent/scripts/utils/format_reasoning_json.py-89-104 (1)

89-104: ⚠️ Potential issue | 🟠 Major

rfind returning -1 causes silent data corruption in extract_final_thinking_processes.

Line 100: if "final<|message|>" is not found in text, rfind returns -1, and -1 + len("final<|message|>") evaluates to 15. The slice text[15:] would silently return garbage data instead of signaling an error.

Unlike extract_formatted_json_steps, this function also has no try/except around json.loads(line) (line 93), so a malformed line will crash the entire run.

Proposed fix
 def extract_final_thinking_processes(input_file):
     responses = {}
     with open(input_file, "r", encoding="utf-8") as f:
         for line in f:
-            data = json.loads(line)
+            try:
+                data = json.loads(line)
+            except json.JSONDecodeError:
+                print(f"Skipping invalid line: {line.strip()}")
+                continue
             text = data["generation"]
             number = _incident_id(data)
             step_number = data["step_number"]
             if number not in responses:
                 responses[number] = {}
 
-            thinking = text[text.rfind("final<|message|>") + len("final<|message|>") :]
+            marker = "final<|message|>"
+            idx = text.rfind(marker)
+            if idx == -1:
+                print(f"Warning: marker not found for incident {number}, step {step_number}")
+                continue
+            thinking = text[idx + len(marker) :]
             data["generation"] = thinking
             responses[number][step_number] = thinking
 
     return responses
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@recipes/noc-reasoning-agent/scripts/utils/format_reasoning_json.py` around
lines 89 - 104, In extract_final_thinking_processes, guard against rfind
returning -1 and add JSON parse error handling: wrap the json.loads(line) call
in a try/except to skip or log malformed lines, use the index returned by
text.rfind("final<|message|>") (or text.find) and check if it's >= 0 before
slicing, and only set data["generation"] and responses[number][step_number] when
the marker is found; reference functions/vars: extract_final_thinking_processes,
json.loads, _incident_id, text.rfind("final<|message|>"), data["generation"],
responses, and step_number.
recipes/noc-reasoning-agent/scripts/utils/format_reasoning_json.py-215-221 (1)

215-221: ⚠️ Potential issue | 🟠 Major

key parameter is never used — likely a bug in argument lookup.

_first_pos(key, named_key=None) ignores the key parameter entirely. All call sites pass a single positional argument (e.g., _first_pos("site_or_element_id")), which binds to key but is never referenced. The intent was almost certainly to use key as the kv_args lookup when named_key is not provided.

Proposed fix
-    def _first_pos(key, named_key=None):
+    def _first_pos(named_key):
         """Return named arg, first positional arg, or 'all' as default."""
-        if named_key:
-            val = kv_args.get(named_key)
-            if val:
-                return val
+        val = kv_args.get(named_key)
+        if val:
+            return val
         return pos_args[0] if pos_args else "all"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@recipes/noc-reasoning-agent/scripts/utils/format_reasoning_json.py` around
lines 215 - 221, The helper _first_pos currently ignores its key parameter
causing lookups to use only named_key or fall back to pos_args; update
_first_pos (and any call sites expecting _first_pos("site_or_element_id")) so it
uses key when named_key is not provided — i.e., attempt to get
kv_args[named_key] if named_key is given, otherwise get kv_args[key]; if that
lookup yields a truthy value return it, else fall back to pos_args[0] or "all".
Ensure references to _first_pos, key, named_key, kv_args, and pos_args are used
exactly as described.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a154947 and c0a2a43.

📒 Files selected for processing (20)
  • recipes/noc-reasoning-agent/scripts/create_agent_with_tools.py
  • recipes/noc-reasoning-agent/scripts/create_agent_with_tools_batch.py
  • recipes/noc-reasoning-agent/scripts/evaluation/evaluation_with_judge.py
  • recipes/noc-reasoning-agent/scripts/evaluation/problem_code_evaluation.py
  • recipes/noc-reasoning-agent/scripts/evaluation/score.py
  • recipes/noc-reasoning-agent/scripts/filtering/filter_rows.py
  • recipes/noc-reasoning-agent/scripts/filtering/match_keywords.py
  • recipes/noc-reasoning-agent/scripts/ns_pipelines/generate_synthetic_data.py
  • recipes/noc-reasoning-agent/scripts/ns_pipelines/prepare_react_agent.py
  • recipes/noc-reasoning-agent/scripts/tools.py
  • recipes/noc-reasoning-agent/scripts/utils/create_input_jsonl_from_incidents.py
  • recipes/noc-reasoning-agent/scripts/utils/format_reasoning_json.py
  • recipes/noc-reasoning-agent/scripts/utils/reasoning_processes.py
  • recipes/noc-reasoning-agent/scripts/utils/schema_columns.py
  • recipes/noc-reasoning-agent/scripts/utils/split_incident_data.py
  • recipes/noc-reasoning-agent/scripts/utils/split_mocktools_answers.py
  • recipes/noc-reasoning-agent/scripts/utils/token_usage.py
  • recipes/noc-reasoning-agent/scripts/visualization/extract_representation_columns.py
  • recipes/noc-reasoning-agent/scripts/visualization/extract_scores.py
  • recipes/noc-reasoning-agent/scripts/visualization/generate_trace_visualization.py
🚧 Files skipped from review as they are similar to previous changes (3)
  • recipes/noc-reasoning-agent/scripts/utils/schema_columns.py
  • recipes/noc-reasoning-agent/scripts/utils/create_input_jsonl_from_incidents.py
  • recipes/noc-reasoning-agent/scripts/evaluation/score.py

Comment on lines +66 to +71
def find_finish_action(generation_text):
"""Extract the Finish[...] action text from a generation string."""
last_index = generation_text.rfind("Finish")
if last_index != -1:
return generation_text[last_index:].strip()
return ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

find_finish_action will raise AttributeError when "generation" is absent.

incident_data.get("generation") returns None when the key is missing, and find_finish_action immediately calls .rfind() on the argument without a None guard. The same value is also assigned to reasoning_trace on line 116 and later passed to parse_steps — though parse_steps handles None safely via reasoning or "", the find_finish_action path does not.

🐛 Proposed fix
 def find_finish_action(generation_text):
     """Extract the Finish[...] action text from a generation string."""
+    if not generation_text:
+        return ""
     last_index = generation_text.rfind("Finish")
     if last_index != -1:
         return generation_text[last_index:].strip()
     return ""

Alternatively, call with an explicit default at the call site:

-    finish_action = find_finish_action(incident_data.get("generation"))
+    finish_action = find_finish_action(incident_data.get("generation", ""))

Also applies to: 116-117

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@recipes/noc-reasoning-agent/scripts/visualization/generate_trace_visualization.py`
around lines 66 - 71, find_finish_action currently calls
generation_text.rfind("Finish") and will raise AttributeError if passed None
(e.g., incident_data.get("generation") may return None); update
find_finish_action (or its call sites) to treat a missing generation as an empty
string by coalescing generation_text to "" before calling .rfind (or return ""
immediately if generation_text is falsy). Ensure the same defensive behavior is
applied where reasoning_trace is set from incident_data.get("generation") before
passing into parse_steps so find_finish_action and downstream logic never call
string methods on None.

Apply review suggestion from PR #1242 to use ```bash instead of
bare ``` for all shell command code blocks in the NOC reasoning
agent tutorial.

Signed-off-by: George Armstrong <georgea@nvidia.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/tutorials/posts/noc-reasoning-agent.md`:
- Around line 261-267: The blockquote under "Re-running generation pipelines"
contains empty quoted lines that trigger MD028; open the blockquote starting
with "**Note — Re-running generation pipelines:**" and remove any blank lines
that begin with ">" so the entire quote remains contiguous (ensure the lines
showing `ns generate` behavior, the rm example, and the note about other output
directories are all directly adjacent with no blank quoted lines).
- Around line 80-87: Add a single blank line between the two adjacent Markdown
tables that start with the header "| Software | Purpose |" and the following
table (currently immediately before "## Setup") to satisfy markdownlint rule
MD058; locate the block that renders the first table and insert one empty line
after its final row so there is a blank line before the next table/heading.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c0a2a43 and e5d4815.

📒 Files selected for processing (1)
  • docs/tutorials/posts/noc-reasoning-agent.md

Comment on lines +80 to +87
| Software | Purpose |
| --- | --- |
| **Docker** | Containerization for consistent environments |
| **<a href="https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html" target="_blank">NVIDIA Container Toolkit</a>** | Allows Docker containers to access GPU resources |
| **<a href="https://slurm.schedmd.com/" target="_blank">Slurm</a> with <a href="https://github.com/NVIDIA/pyxis" target="_blank">NVIDIA/pyxis</a>** (optional) | Cluster job scheduler for distributed workloads |
| **Python 3.10+** | Required Python version |
| **NeMo-Skills CLI** | Main interface for running pipelines |
## Setup
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add a blank line between adjacent tables to satisfy markdownlint.

Line 80 starts a second table immediately after the first one. Add a blank line between them to resolve MD058.

Suggested fix
 | **VRAM** | Sufficient for model size | Stores model weights and activations |
 | **Multi-GPU** | Recommended | Enables model and batch parallelism |
+
 | Software | Purpose |
 | --- | --- |
 | **Docker** | Containerization for consistent environments |
🧰 Tools
🪛 markdownlint-cli2 (0.21.0)

[warning] 86-86: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/tutorials/posts/noc-reasoning-agent.md` around lines 80 - 87, Add a
single blank line between the two adjacent Markdown tables that start with the
header "| Software | Purpose |" and the following table (currently immediately
before "## Setup") to satisfy markdownlint rule MD058; locate the block that
renders the first table and insert one empty line after its final row so there
is a blank line before the next table/heading.

Comment on lines +261 to +267
> **Note — Re-running generation pipelines:**
> `ns generate` creates both an `output.jsonl` and an `output.jsonl.done` sentinel file in the output directory. If you need to re-run a generation step from scratch, delete **both** files before restarting:
>
> `rm outputs/sdg/output.jsonl outputs/sdg/output.jsonl.done`
>
> The same applies to any `ns generate` output directory (e.g., `outputs/sdg_reason/`). Without deleting these files, the pipeline will skip generation and reuse the existing results.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Remove blank quoted lines inside the blockquote note.

The blockquote under “Re-running generation pipelines” includes empty quoted lines, which triggers MD028 (Line 267). Keep the blockquote contiguous.

Suggested fix
 > **Note — Re-running generation pipelines:**
 > `ns generate` creates both an `output.jsonl` and an `output.jsonl.done` sentinel file in the output directory. If you need to re-run a generation step from scratch, delete **both** files before restarting:
->
 > `rm outputs/sdg/output.jsonl outputs/sdg/output.jsonl.done`
->
 > The same applies to any `ns generate` output directory (e.g., `outputs/sdg_reason/`). Without deleting these files, the pipeline will skip generation and reuse the existing results.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
> **Note — Re-running generation pipelines:**
> `ns generate` creates both an `output.jsonl` and an `output.jsonl.done` sentinel file in the output directory. If you need to re-run a generation step from scratch, delete **both** files before restarting:
>
> `rm outputs/sdg/output.jsonl outputs/sdg/output.jsonl.done`
>
> The same applies to any `ns generate` output directory (e.g., `outputs/sdg_reason/`). Without deleting these files, the pipeline will skip generation and reuse the existing results.
> **Note — Re-running generation pipelines:**
> `ns generate` creates both an `output.jsonl` and an `output.jsonl.done` sentinel file in the output directory. If you need to re-run a generation step from scratch, delete **both** files before restarting:
> `rm outputs/sdg/output.jsonl outputs/sdg/output.jsonl.done`
> The same applies to any `ns generate` output directory (e.g., `outputs/sdg_reason/`). Without deleting these files, the pipeline will skip generation and reuse the existing results.
🧰 Tools
🪛 markdownlint-cli2 (0.21.0)

[warning] 267-267: Blank line inside blockquote

(MD028, no-blanks-blockquote)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/tutorials/posts/noc-reasoning-agent.md` around lines 261 - 267, The
blockquote under "Re-running generation pipelines" contains empty quoted lines
that trigger MD028; open the blockquote starting with "**Note — Re-running
generation pipelines:**" and remove any blank lines that begin with ">" so the
entire quote remains contiguous (ensure the lines showing `ns generate`
behavior, the rm example, and the note about other output directories are all
directly adjacent with no blank quoted lines).

@gwarmstrong gwarmstrong merged commit f6e3505 into main Feb 27, 2026
5 checks passed
@gwarmstrong gwarmstrong deleted the acanaveras/add-noc-reasoning-tutorial branch February 27, 2026 18:17
gwarmstrong added a commit that referenced this pull request Mar 4, 2026
Signed-off-by: Amparo Canaveras <acanaveras@nvidia.com>
Signed-off-by: rajeshwarid179 <rdevaramani@nvidia.com>
Signed-off-by: acanaveras <142839082+acanaveras@users.noreply.github.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
Co-authored-by: Amparo Canaveras <acanaveras@nvidia.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: acanaveras <142839082+acanaveras@users.noreply.github.com>
Co-authored-by: rajeshwarid179 <rdevaramani@nvidia.com>
sgunasekar added a commit that referenced this pull request Mar 11, 2026
commit a5da597
Author: Igor Gitman <igitman@nvidia.com>
Date:   Fri Mar 6 12:13:36 2026 -0800

    Revert "Eval kit support  (#1239)" (#1294)

    Signed-off-by: Igor Gitman <igitman@nvidia.com>

commit b237e33
Author: George <37293288+Jorjeous@users.noreply.github.com>
Date:   Fri Mar 6 20:25:37 2026 +0400

    Eval kit support  (#1239)

    Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>
    Signed-off-by: George <37293288+Jorjeous@users.noreply.github.com>
    Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

commit dc28bbf
Author: George Armstrong <georgea@nvidia.com>
Date:   Thu Mar 5 10:17:44 2026 -0800

    Python direct tool calling without MCP (#1286)

    Signed-off-by: George Armstrong <georgea@nvidia.com>

commit 12454dd
Author: Sadegh Mahdavi <smahdavi4@gmail.com>
Date:   Wed Mar 4 13:06:21 2026 -0800

    Allow het servers for nemo-rl jobs (#1223)

    Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>
    Co-authored-by: Igor Gitman <igitman@nvidia.com>

commit 8884a68
Author: Prasoon Varshney <prasoon1995@gmail.com>
Date:   Wed Mar 4 10:24:02 2026 -0800

    Support source_lang param for translation recipe (#1290)

    Signed-off-by: Prasoon Varshney <prasoonv@nvidia.com>
    Co-authored-by: George Armstrong <georgea@nvidia.com>

commit 4618b19
Author: Meriem B. <113170426+ka00ri@users.noreply.github.com>
Date:   Wed Mar 4 18:59:28 2026 +0100

    Add MMLU-Pro 10% optimized subset for checkpoint selection (#1285)

    Signed-off-by: Meriem Boubdir <mboubdir@nvidia.com>
    Co-authored-by: George Armstrong <georgea@nvidia.com>

commit 5ac8609
Author: Talor Abramovich <talor19@gmail.com>
Date:   Wed Mar 4 02:30:06 2026 +0200

    Add SPEED-Bench (within repo) (#1279)

    Signed-off-by: Talor Abramovich <talora@nvidia.com>
    Signed-off-by: talora <talora@nvidia.com>
    Signed-off-by: Talor Abramovich <talor19@gmail.com>
    Signed-off-by: George Armstrong <georgea@nvidia.com>
    Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
    Co-authored-by: George Armstrong <georgea@nvidia.com>
    Co-authored-by: Igor Gitman <igor.a.gitman@gmail.com>

commit c31eec5
Author: George Armstrong <georgea@nvidia.com>
Date:   Tue Mar 3 12:18:15 2026 -0800

    Fix os.getlogin() crash in ns setup (#1289)

    Signed-off-by: George Armstrong <georgea@nvidia.com>

commit c228e66
Author: George Armstrong <georgea@nvidia.com>
Date:   Tue Mar 3 11:04:54 2026 -0800

    Fix streaming TypeError when delta.content is None (#1267) (#1288)

    Signed-off-by: George Armstrong <georgea@nvidia.com>

commit aa47923
Author: Matvei Novikov <mnovikov@nvidia.com>
Date:   Mon Mar 2 16:28:41 2026 -0800

    Add LibTrace recipe for generating domain-specific reasoning data (#1224)

    Signed-off-by: jubick1337 <mnovikov@nvidia.com>
    Signed-off-by: mnovikov <mnovikov@nvidia.com>
    Co-authored-by: George Armstrong <georgea@nvidia.com>

commit 313cad7
Author: Stephen Ge <stepheng@nvidia.com>
Date:   Mon Mar 2 18:28:49 2026 -0500

    fix: clean parse-failure retries in prover (#1284)

    Signed-off-by: Stephen Ge <stepheng@nvidia.com>

commit 813cfa3
Author: George Armstrong <georgea@nvidia.com>
Date:   Mon Mar 2 15:10:08 2026 -0800

    tst: rollback inference-api to integrate (#1287)

    Signed-off-by: George Armstrong <georgea@nvidia.com>

commit 31735f9
Author: Valentin Mendelev <vmendelev@nvidia.com>
Date:   Mon Mar 2 23:11:25 2026 +0100

    Add backend-agnostic unified inference server with NeMo ASR and TTS backends (#1250)

    Signed-off-by: Valentin Mendelev <vmendelev@nvidia.com>

commit d4ef8c0
Author: George <37293288+Jorjeous@users.noreply.github.com>
Date:   Fri Feb 27 23:58:54 2026 +0400

    Update promt_config to working with openai format + inline setup (#1210)

    Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>
    Signed-off-by: George <37293288+Jorjeous@users.noreply.github.com>
    Co-authored-by: Igor Gitman <igitman@nvidia.com>
    Co-authored-by: George Armstrong <georgea@nvidia.com>

commit e879cbc
Author: George Armstrong <georgea@nvidia.com>
Date:   Fri Feb 27 10:41:23 2026 -0800

    Update noc tutorial (#1282)

    Signed-off-by: George Armstrong <georgea@nvidia.com>

commit f6e3505
Author: George Armstrong <georgea@nvidia.com>
Date:   Fri Feb 27 10:17:33 2026 -0800

    Add noc reasoning tutorial (#1278)

    Signed-off-by: Amparo Canaveras <acanaveras@nvidia.com>
    Signed-off-by: rajeshwarid179 <rdevaramani@nvidia.com>
    Signed-off-by: acanaveras <142839082+acanaveras@users.noreply.github.com>
    Signed-off-by: George Armstrong <georgea@nvidia.com>
    Co-authored-by: Amparo Canaveras <acanaveras@nvidia.com>
    Co-authored-by: Cursor <cursoragent@cursor.com>
    Co-authored-by: acanaveras <142839082+acanaveras@users.noreply.github.com>
    Co-authored-by: rajeshwarid179 <rdevaramani@nvidia.com>

commit fc2072a
Author: Jiacheng Xu <jcxu@utexas.edu>
Date:   Fri Feb 27 10:10:25 2026 -0800

    CritPt generation add prompt_format=None (#1280)

    Signed-off-by: Jiacheng Xu <jiachengx@nvidia.com>
    Co-authored-by: George Armstrong <georgea@nvidia.com>

commit c8abe5d
Author: Igor Gitman <igitman@nvidia.com>
Date:   Fri Feb 27 09:31:26 2026 -0800

    New slurm customization parameters (account, containers) (#1209)

    Signed-off-by: Igor Gitman <igitman@nvidia.com>
    Signed-off-by: George Armstrong <georgea@nvidia.com>
    Co-authored-by: George Armstrong <georgea@nvidia.com>

commit 2b38cce
Author: George Armstrong <georgea@nvidia.com>
Date:   Wed Feb 25 17:59:52 2026 -0800

    Add nemo-skills-core subpackage for lightweight installs (#1229)

    Signed-off-by: George Armstrong <georgea@nvidia.com>

commit 9fa8e83
Author: Dheeraj Peri <peri.dheeraj@gmail.com>
Date:   Wed Feb 25 12:56:35 2026 -0800

    feat: add custom judge type support for external repo integration (#1274)

    Signed-off-by: Igor Gitman <igitman@nvidia.com>
    Signed-off-by: bzantium <ryumin93@gmail.com>
    Signed-off-by: Dheeraj Peri <dperi@nvidia.com>
    Signed-off-by: suriya <sgunasekar@nvidia.com>
    Signed-off-by: Jiacheng Xu <jiachengx@nvidia.com>
    Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>
    Co-authored-by: Igor Gitman <igitman@nvidia.com>
    Co-authored-by: Minho Ryu <ryumin93@gmail.com>
    Co-authored-by: Yongqiang Wang <yongqiang.seagull@gmail.com>
    Co-authored-by: Suriya Gunasekar <sgunasekar@users.noreply.github.com>
    Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
    Co-authored-by: Jiacheng Xu <jcxu@utexas.edu>
    Co-authored-by: George <37293288+Jorjeous@users.noreply.github.com>

commit 8a32b13
Author: Igor Gitman <igitman@nvidia.com>
Date:   Tue Feb 24 15:24:42 2026 -0800

    Exclude numb3rs form test_eval.py (#1275)

    Signed-off-by: Igor Gitman <igitman@nvidia.com>

commit 6da2219
Author: George <37293288+Jorjeous@users.noreply.github.com>
Date:   Mon Feb 23 18:37:46 2026 +0400

    Numb3rs ds addition (#1174)

    Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>

commit ad034b5
Author: Suriya Gunasekar <sgunasekar@users.noreply.github.com>
Date:   Sun Feb 22 11:55:24 2026 -0800

    Add DSBench-DA evaluation (#1254)

    Squash merge of changes during code-review.
    Signed-off-by: suriya <sgunasekar@nvidia.com>

commit 7593ab3
Author: Jiacheng Xu <jcxu@utexas.edu>
Date:   Fri Feb 20 16:42:01 2026 -0800

    Add CritPt benchmark (#1200)

    Signed-off-by: Jiacheng Xu <jiachengx@nvidia.com>
    Co-authored-by: Igor Gitman <igitman@nvidia.com>

commit 58c31b2
Author: Suriya Gunasekar <sgunasekar@users.noreply.github.com>
Date:   Fri Feb 20 16:19:22 2026 -0800

    Fix no_answer metric overcounting in _compute_pass_at_k (#1245)

    Signed-off-by: suriya <sgunasekar@nvidia.com>
    Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
    Co-authored-by: Igor Gitman <igitman@nvidia.com>

commit 1f1a2e7
Author: Igor Gitman <igitman@nvidia.com>
Date:   Fri Feb 20 15:58:40 2026 -0800

    Fix incorrect prompt tokens count due to HF api update (#1264)

    Signed-off-by: Igor Gitman <igitman@nvidia.com>

commit 8ebc6f5
Author: Igor Gitman <igitman@nvidia.com>
Date:   Fri Feb 20 09:05:33 2026 -0800

    Remove deprecated dataset group (#1263)

    Signed-off-by: Igor Gitman <igitman@nvidia.com>

commit ea4177f
Author: Yongqiang Wang <yongqiang.seagull@gmail.com>
Date:   Thu Feb 19 19:57:25 2026 -0500

    fix deps (#1258)

commit 60905a7
Author: Minho Ryu <ryumin93@gmail.com>
Date:   Fri Feb 20 09:39:39 2026 +0900

    Add aime26 (#1256)

    Signed-off-by: bzantium <ryumin93@gmail.com>

commit b28afc5
Author: Igor Gitman <igitman@nvidia.com>
Date:   Thu Feb 19 16:18:25 2026 -0800

    Rename custom -> external benchmarks (#1262)

    Signed-off-by: Igor Gitman <igitman@nvidia.com>

commit 6cc9c45
Author: Igor Gitman <igitman@nvidia.com>
Date:   Thu Feb 19 16:10:33 2026 -0800

    Add reference to internal benchmarks repo (#1261)

    Signed-off-by: Igor Gitman <igitman@nvidia.com>

commit 5202af6
Author: Igor Gitman <igitman@nvidia.com>
Date:   Thu Feb 19 16:08:05 2026 -0800

    Remove incorrect presence-penalty setting (#1259)

    Signed-off-by: Igor Gitman <igitman@nvidia.com>

commit 144c70b
Author: Igor Gitman <igitman@nvidia.com>
Date:   Thu Feb 19 15:26:33 2026 -0800

    Adding an option to store benchmarks in external repo (#1240)

    Signed-off-by: Igor Gitman <igitman@nvidia.com>
    Co-authored-by: George Armstrong <georgea@nvidia.com>

commit 10e6e39
Author: George <37293288+Jorjeous@users.noreply.github.com>
Date:   Thu Feb 19 19:57:21 2026 +0400

    update vllm miltimodal for api calls convenience (#1213)

    Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>
    Signed-off-by: mmkrtchyan <mmkrtchyan@nvidia.com>
    Co-authored-by: mmkrtchyan <mmkrtchyan@nvidia.com>

commit 1ba4219
Author: Nick Ludwig <nliudvig@nvidia.com>
Date:   Wed Feb 18 03:28:23 2026 +0400

    Fix --server_container not being applied to dependent jobs (#1244)

    Signed-off-by: Nikolai Ludwig <nliudvig@nvidia.com>
    Co-authored-by: Igor Gitman <igitman@nvidia.com>

commit 9517614
Author: Wasi Ahmad <wasiahmad@ucla.edu>
Date:   Mon Feb 16 11:13:24 2026 -0800

    Support mini-swe-agent as agent harness (#1212)

    Signed-off-by: wasiahmad <wasiahmad@ucla.edu>
    Signed-off-by: i-vainn <imoshkov@nvidia.com>
    Signed-off-by: George Armstrong <georgea@nvidia.com>
    Signed-off-by: Charlie Truong <chtruong@nvidia.com>
    Signed-off-by: Nikolai Ludwig <nliudvig@nvidia.com>
    Signed-off-by: Grigor Nalbandyan <gnalbandyan@nvidia.com>
    Signed-off-by: bzantium <ryumin93@gmail.com>
    Signed-off-by: Stephen Ge <stepheng@nvidia.com>
    Signed-off-by: Jiacheng Xu <jiachengx@nvidia.com>
    Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>
    Signed-off-by: Mateusz Winiarek <mwiniarek@nvidia.com>
    Signed-off-by: mmkrtchyan <mmkrtchyan@nvidia.com>
    Signed-off-by: Wei Du <wedu@nvidia.com>
    Signed-off-by: Igor Gitman <igitman@nvidia.com>
    Signed-off-by: George <37293288+Jorjeous@users.noreply.github.com>
    Signed-off-by: SeanNaren <snarenthiran@nvidia.com>
    Signed-off-by: Arkadiusz Nowaczynski <anowaczynski@nvidia.com>
    Signed-off-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com>
    Co-authored-by: Ivan <imoshkov@nvidia.com>
    Co-authored-by: George Armstrong <georgea@nvidia.com>
    Co-authored-by: Charlie Truong <chtruong@nvidia.com>
    Co-authored-by: Nick Ludwig <nliudvig@nvidia.com>
    Co-authored-by: Wojciech Prazuch <wojciechprazuch3@gmail.com>
    Co-authored-by: gnalbandyan <153070076+gnalbandyan@users.noreply.github.com>
    Co-authored-by: Minho Ryu <ryumin93@gmail.com>
    Co-authored-by: Stephen Ge <stepheng@nvidia.com>
    Co-authored-by: Jiacheng Xu <jcxu@utexas.edu>
    Co-authored-by: Jiacheng Xu <jiachengx@nvidia.com>
    Co-authored-by: George <37293288+Jorjeous@users.noreply.github.com>
    Co-authored-by: Sanyam Kapoor <sanyamk@nvidia.com>
    Co-authored-by: Mateusz Winiarek <72758259+Froxyy-dev@users.noreply.github.com>
    Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
    Co-authored-by: Meline Mkrtchyan <72409758+melllinia@users.noreply.github.com>
    Co-authored-by: Wei Du <wedu@nvidia.com>
    Co-authored-by: Igor Gitman <igitman@nvidia.com>
    Co-authored-by: Sean Naren <snarenthiran@nvidia.com>
    Co-authored-by: Mehrzad Samadi <mehrzadsamadi@gmail.com>
    Co-authored-by: anowaczynski-nvidia <anowaczynski@nvidia.com>
    Co-authored-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com>

commit a3d44dc
Author: Suriya Gunasekar <sgunasekar@users.noreply.github.com>
Date:   Fri Feb 13 22:32:15 2026 -0800

    Add --installation_command support to prepare_data (#1243)

    Signed-off-by: suriya <sgunasekar@nvidia.com>
    Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

commit e80d524
Author: George Armstrong <georgea@nvidia.com>
Date:   Thu Feb 12 17:26:00 2026 -0800

    Fix CI disk space for Docker image builds (#1241)

    Signed-off-by: George Armstrong <georgea@nvidia.com>

commit d22236c
Author: Sadegh Mahdavi <smahdavi4@gmail.com>
Date:   Wed Feb 11 17:55:00 2026 -0800

    Fix answerbench prompt parsing (#1235)

    Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>

commit 2401628
Author: George Armstrong <georgea@nvidia.com>
Date:   Wed Feb 11 14:56:43 2026 -0800

    feat: add lockfiles for reproducible sandbox builds (#1233)

    Signed-off-by: George Armstrong <georgea@nvidia.com>

commit 5a0a84d
Author: Wasi Ahmad <wasiahmad@ucla.edu>
Date:   Wed Feb 11 13:30:03 2026 -0800

    removing datasets version restriction for LCB eval (#1230)

    Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

commit ef0a890
Author: gnalbandyan <153070076+gnalbandyan@users.noreply.github.com>
Date:   Wed Feb 11 12:03:16 2026 +0400

    Gnalbandyan/add physics (#1214)

    Signed-off-by: Grigor Nalbandyan <gnalbandyan@nvidia.com>
    Signed-off-by: gnalbandyan <153070076+gnalbandyan@users.noreply.github.com>

commit bd9d30c
Author: Wasi Ahmad <wasiahmad@ucla.edu>
Date:   Tue Feb 10 15:13:27 2026 -0800

    LCB generic prompting (#1215)

    Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

commit 7d6c49a
Author: Sadegh Mahdavi <smahdavi4@gmail.com>
Date:   Sat Feb 7 08:45:46 2026 -0800

    Add support for different variations of nemo-rl (#1220)

    Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>

commit b19ba96
Author: George Armstrong <georgea@nvidia.com>
Date:   Fri Feb 6 21:40:56 2026 -0800

    Add multi-node sandbox support for SLURM clusters (#1218)

    Signed-off-by: George Armstrong <georgea@nvidia.com>

commit 8950bb0
Author: anowaczynski-nvidia <anowaczynski@nvidia.com>
Date:   Sat Feb 7 01:38:00 2026 +0100

    support structured outputs in hle judge for optional AA compatibility (#1186)

    Signed-off-by: Arkadiusz Nowaczynski <anowaczynski@nvidia.com>
    Signed-off-by: Igor Gitman <igitman@nvidia.com>
    Co-authored-by: Igor Gitman <igitman@nvidia.com>

commit b84f7a2
Author: Igor Gitman <igitman@nvidia.com>
Date:   Fri Feb 6 14:51:02 2026 -0800

    A small update on running tests docs (#1219)

    Signed-off-by: Igor Gitman <igitman@nvidia.com>

commit 8e838e1
Author: George Armstrong <georgea@nvidia.com>
Date:   Thu Feb 5 18:01:35 2026 -0800

    feat: add flag to disable sandbox replay (#1217)

    Signed-off-by: George Armstrong <georgea@nvidia.com>

commit 5fd9085
Author: Igor Gitman <igitman@nvidia.com>
Date:   Thu Feb 5 15:57:01 2026 -0800

    Add an option to limit number of tool calls (#1216)

    Signed-off-by: Igor Gitman <igitman@nvidia.com>

commit d820200
Author: Igor Gitman <igitman@nvidia.com>
Date:   Tue Feb 3 10:43:55 2026 -0800

    Add arena-hard v2 (#1205)

    Signed-off-by: bzantium <ryumin93@gmail.com>
    Signed-off-by: Igor Gitman <igitman@nvidia.com>
    Co-authored-by: bzantium <ryumin93@gmail.com>

commit a30920e
Author: Igor Gitman <igitman@nvidia.com>
Date:   Mon Feb 2 10:53:55 2026 -0800

    Fix mkdocs warnings (#1204)

    Signed-off-by: Igor Gitman <igitman@nvidia.com>

commit 19d7788
Author: Ivan <imoshkov@nvidia.com>
Date:   Mon Feb 2 23:25:13 2026 +0500

    Fix infinite wait in sandbox.wait_for_sandbox (#1206)

    Signed-off-by: i-vainn <imoshkov@nvidia.com>

commit 3e65fbf
Author: Sadegh Mahdavi <smahdavi4@gmail.com>
Date:   Fri Jan 30 19:38:38 2026 -0800

    Improve tts (#1203)

    Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>

commit 250c862
Author: Nick Ludwig <nliudvig@nvidia.com>
Date:   Fri Jan 30 22:12:29 2026 +0400

    SWE-bench: fix SWE-agent hanging, adjust expected scores (#1202)

    Signed-off-by: Nikolai Ludwig <nliudvig@nvidia.com>

commit 7ded756
Author: Ivan <imoshkov@nvidia.com>
Date:   Fri Jan 30 09:57:41 2026 +0500

     Add proper token counting to code execution model (#1184)

    Signed-off-by: i-vainn <imoshkov@nvidia.com>
    Co-authored-by: Igor Gitman <igitman@nvidia.com>

commit b986304
Author: Igor Gitman <igitman@nvidia.com>
Date:   Thu Jan 29 17:57:07 2026 -0800

    Upgrade containers (#1198)

    Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>
    Signed-off-by: Igor Gitman <igitman@nvidia.com>
    Co-authored-by: Sadegh Mahdavi <smahdavi@nvidia.com>

commit 3b44f02
Author: Dan Lord <blahblahasdf@gmail.com>
Date:   Thu Jan 29 16:40:47 2026 -0800

    Fix incorrect string format (#1199)

    Signed-off-by: dlord <dlord@nvidia.com>

commit c4854b8
Author: Sadegh Mahdavi <smahdavi4@gmail.com>
Date:   Thu Jan 29 13:43:36 2026 -0800

    Update nemo-rl to latest (#1087)

    Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>
    Signed-off-by: Igor Gitman <igitman@nvidia.com>
    Co-authored-by: Igor Gitman <igitman@nvidia.com>
dgtm777 pushed a commit that referenced this pull request Mar 18, 2026
Signed-off-by: Amparo Canaveras <acanaveras@nvidia.com>
Signed-off-by: rajeshwarid179 <rdevaramani@nvidia.com>
Signed-off-by: acanaveras <142839082+acanaveras@users.noreply.github.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
Co-authored-by: Amparo Canaveras <acanaveras@nvidia.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: acanaveras <142839082+acanaveras@users.noreply.github.com>
Co-authored-by: rajeshwarid179 <rdevaramani@nvidia.com>
dgtm777 pushed a commit that referenced this pull request Mar 18, 2026
Signed-off-by: Amparo Canaveras <acanaveras@nvidia.com>
Signed-off-by: rajeshwarid179 <rdevaramani@nvidia.com>
Signed-off-by: acanaveras <142839082+acanaveras@users.noreply.github.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
Co-authored-by: Amparo Canaveras <acanaveras@nvidia.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: acanaveras <142839082+acanaveras@users.noreply.github.com>
Co-authored-by: rajeshwarid179 <rdevaramani@nvidia.com>
Signed-off-by: dgitman <dgitman@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants