Conversation
Signed-off-by: Amparo Canaveras <acanaveras@nvidia.com>
Signed-off-by: Amparo Canaveras <acanaveras@nvidia.com>
…ripts Signed-off-by: Amparo Canaveras <acanaveras@nvidia.com> Co-authored-by: Cursor <cursoragent@cursor.com>
- Move __init__.py creation and PYTHONPATH export into a dedicated "Configure the environment" subsection in the tutorial - Remove per-command PYTHONPATH=$PWD prefixes (now set once at top) - Add all 40 synthetic fault_category values to ALLOWED_PROBLEM_CODES in filter_rows.py and create_input_jsonl_from_incidents.py - Add matching reasoning process mappings in reasoning_processes.py Signed-off-by: Amparo Canaveras <acanaveras@nvidia.com> Co-authored-by: Cursor <cursoragent@cursor.com>
Python 3.3+ namespace packages make __init__.py unnecessary. The try/except fallback in create_input_jsonl handles relative imports gracefully. Only PYTHONPATH is needed for all script imports. Signed-off-by: Amparo Canaveras <acanaveras@nvidia.com> Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: rajeshwarid179 <rdevaramani@nvidia.com>
Signed-off-by: rajeshwarid179 <rdevaramani@nvidia.com>
Signed-off-by: rajeshwarid179 <rdevaramani@nvidia.com>
Signed-off-by: rajeshwarid179 <rdevaramani@nvidia.com>
Signed-off-by: rajeshwarid179 <rdevaramani@nvidia.com>
Co-authored-by: George Armstrong <georgea@nvidia.com> Signed-off-by: acanaveras <142839082+acanaveras@users.noreply.github.com>
Co-authored-by: George Armstrong <georgea@nvidia.com> Signed-off-by: acanaveras <142839082+acanaveras@users.noreply.github.com>
Co-authored-by: George Armstrong <georgea@nvidia.com> Signed-off-by: acanaveras <142839082+acanaveras@users.noreply.github.com>
Co-authored-by: George Armstrong <georgea@nvidia.com> Signed-off-by: acanaveras <142839082+acanaveras@users.noreply.github.com>
Co-authored-by: George Armstrong <georgea@nvidia.com> Signed-off-by: acanaveras <142839082+acanaveras@users.noreply.github.com>
Co-authored-by: George Armstrong <georgea@nvidia.com> Signed-off-by: acanaveras <142839082+acanaveras@users.noreply.github.com>
Co-authored-by: George Armstrong <georgea@nvidia.com> Signed-off-by: acanaveras <142839082+acanaveras@users.noreply.github.com>
Co-authored-by: George Armstrong <georgea@nvidia.com> Signed-off-by: acanaveras <142839082+acanaveras@users.noreply.github.com>
Signed-off-by: rajeshwarid179 <rdevaramani@nvidia.com>
Signed-off-by: rajeshwarid179 <rdevaramani@nvidia.com>
Co-authored-by: George Armstrong <georgea@nvidia.com> Signed-off-by: rajeshwarid179 <rdevaramani@nvidia.com>
Signed-off-by: rajeshwarid179 <rdevaramani@nvidia.com>
Signed-off-by: rajeshwarid179 <rdevaramani@nvidia.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
📝 WalkthroughWalkthroughAdds a NOC Reasoning Agent feature: documentation, configs, prompt templates, and many new scripts for data ingestion, filtering, synthetic data generation, ReAct agent orchestration (single and batch), tool bindings, reasoning-trace compilation, evaluation, splitting, visualization, and CLI utilities. Changes
Sequence Diagram(s)sequenceDiagram
participant Input as JSONL Input
participant Agent as ReAct Agent Script
participant LLM as Language Model
participant Tools as Local Tool Functions
participant Checkpoint as Resume Store
participant Output as JSONL Output
Input->>Agent: load next row
Agent->>Checkpoint: check processed indices / load history
Agent->>LLM: send system + user messages (context, prompts)
LLM-->>Agent: return generation
alt generation contains <tool_call>
Agent->>Tools: parse tool_call → invoke bound tool(row)
Tools-->>Agent: return tool response
Agent->>LLM: send observation + tool response (continue)
LLM-->>Agent: return next generation (loop)
else final answer produced
Agent->>Output: write agent_response + metadata
Agent->>Checkpoint: save processed index / state
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Note
Due to the large number of review comments, Critical severity comments were prioritized as inline comments.
🟠 Major comments (28)
recipes/noc-reasoning-agent/scripts/utils/split_incident_data.py-1-1 (1)
1-1:⚠️ Potential issue | 🟠 MajorAdd the required copyright header to unblock CI.
Pipeline reports a blocking error for missing copyright header in the first 10 lines.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/noc-reasoning-agent/scripts/utils/split_incident_data.py` at line 1, This file (split_incident_data.py) is missing the required copyright header in the first 10 lines; add the project's standard copyright/license header comment at the very top of the file (before any code such as the existing import argparse) so the header appears within the first 10 lines and matches the repository's canonical header format.recipes/noc-reasoning-agent/scripts/utils/split_incident_data.py-8-10 (1)
8-10:⚠️ Potential issue | 🟠 MajorValidate split parameters before computing train/test sizes.
Line 26/27 can silently produce invalid splits when
test_sizeis out of range (e.g., negative or >1), andmax_examples <= 0also produces confusing outcomes.Proposed fix
def split_generation_field( input_folder: str, train_out: str, test_out: str, test_size: float = 0.2, seed: int = 42, max_examples=10000 ): + if not (0.0 < test_size < 1.0): + raise ValueError(f"test_size must be in (0, 1), got {test_size}") + if max_examples <= 0: + raise ValueError(f"max_examples must be > 0, got {max_examples}") + train_path = Path(train_out) test_path = Path(test_out)Also applies to: 26-27
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/noc-reasoning-agent/scripts/utils/split_incident_data.py` around lines 8 - 10, In split_generation_field, add input validation at the top to guard against invalid split params: verify that test_size is a float in the range (0.0, 1.0) or at least 0.0 <= test_size <= 1.0 per your intended behavior (reject negative or >1 values) and raise a ValueError with a clear message if out of range; also validate max_examples is an int > 0 (raise ValueError if <= 0) and optionally validate seed is an int; place these checks in the split_generation_field function before any computation of train/test sizes so downstream math cannot produce confusing results.recipes/noc-reasoning-agent/scripts/utils/split_incident_data.py-15-16 (1)
15-16:⚠️ Potential issue | 🟠 MajorFix
--input_dircontract mismatch (default run currently fails).Line 15 treats
input_folderas a directory containingiteration_0.jsonl, but Line 88 describes/defaults it as a JSONL file (output_incident.jsonl). With defaults, the script resolvesoutput_incident.jsonl/iteration_0.jsonland crashes.Proposed fix
- parser.add_argument("--input_dir", help="Path to input_dir JSONL file", default="output_incident.jsonl") + parser.add_argument( + "--input_dir", + help="Path to directory containing iteration_*.jsonl files", + default="output", + )args = parser.parse_args() + if not Path(args.input_dir).is_dir(): + raise ValueError(f"--input_dir must be an existing directory: {args.input_dir}") + split_generation_field(args.input_dir, args.train_output, args.test_output)Also applies to: 88-88
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/noc-reasoning-agent/scripts/utils/split_incident_data.py` around lines 15 - 16, The code assumes input_folder is a directory and unconditionally opens os.path.join(input_folder, "iteration_0.jsonl"), which breaks when the CLI default passes a JSONL file name; update the opening logic to handle both a file path and a directory: compute an input_path variable that is input_folder if os.path.isfile(input_folder) else os.path.join(input_folder, "iteration_0.jsonl"), then open input_path (the loop that currently opens with open(os.path.join(input_folder, "iteration_0.jsonl"), ...)). Also ensure any CLI help/defaults (where input_folder is defined) remain as-is or are updated consistently so the script accepts either a file or a directory.recipes/noc-reasoning-agent/scripts/utils/split_incident_data.py-61-63 (1)
61-63:⚠️ Potential issue | 🟠 MajorGuard
backgroundaccess to avoid hard-failing on sparse rows.Line 61 uses
row["background"]directly. If a row is missing that key, the whole run aborts withKeyErrorinstead of skipping or handling the bad row.Proposed fix
if resolution: row["expected"] = f"Close Code: [{resolution}]" - row["initial_background"] = row["background"] + background = row.get("background") + if background is None: + continue + row["initial_background"] = background row["background"] = "\n<think>\n" current_iteration_test.append(row)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/noc-reasoning-agent/scripts/utils/split_incident_data.py` around lines 61 - 63, The code directly indexes row["background"] which can raise KeyError for sparse rows; update the block around assigning row["initial_background"] and row["background"] (and before appending to current_iteration_test) to guard access with row.get("background") or an explicit "if 'background' in row" check: if present, set row["initial_background"] = row["background"] and row["background"] = "\n<think>\n" and append; otherwise handle the bad row by either skipping appending (continue), setting a safe default (e.g., empty string) for row["initial_background"], or logging the missing key so the run does not hard-fail. Ensure changes reference the same variables row, "initial_background", "background", and current_iteration_test.recipes/noc-reasoning-agent/scripts/visualization/extract_representation_columns.py-28-28 (1)
28-28:⚠️ Potential issue | 🟠 Major
NaNvalues from pandas will crashjson.dumpsat serialization time.
to_dict(orient="index")preserves pandasfloat('nan')for any missing cell intime_to_resolve,solved_category, orsolved_reason. Python'sjsonmodule raisesValueErrorfornanby default (it is not valid JSON). Any row whose CSV entry has at least one empty cell will causejson.dumpsto crash on Lines 60 and 151.🐛 Proposed fix — sanitize NaN to None when building the index
-csv_idx = df.set_index(id_col)[["time_to_resolve", "solved_category", "solved_reason"]].to_dict(orient="index") +import math + +def _nan_to_none(v): + return None if (isinstance(v, float) and math.isnan(v)) else v + +csv_idx = { + k: {field: _nan_to_none(val) for field, val in row.items()} + for k, row in df.set_index(id_col)[["time_to_resolve", "solved_category", "solved_reason"]].to_dict(orient="index").items() +}Alternatively, fill NaN before building the dict:
-csv_idx = df.set_index(id_col)[["time_to_resolve", "solved_category", "solved_reason"]].to_dict(orient="index") +csv_idx = ( + df.set_index(id_col)[["time_to_resolve", "solved_category", "solved_reason"]] + .where(df[["time_to_resolve", "solved_category", "solved_reason"]].notna(), other=None) + .to_dict(orient="index") +)Or the simplest one-liner:
-csv_idx = df.set_index(id_col)[["time_to_resolve", "solved_category", "solved_reason"]].to_dict(orient="index") +_slice = df.set_index(id_col)[["time_to_resolve", "solved_category", "solved_reason"]] +csv_idx = _slice.where(_slice.notna(), other=None).to_dict(orient="index")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/noc-reasoning-agent/scripts/visualization/extract_representation_columns.py` at line 28, The current creation of csv_idx keeps pandas NaN values which json.dumps will reject; update the csv_idx construction (the expression that uses df.set_index(id_col)[["time_to_resolve","solved_category","solved_reason"]].to_dict(orient="index")) to sanitize missing values to Python None before calling to_dict—e.g., after selecting those columns convert NaN to None (using pandas' .replace or .where/.isna) so that csv_idx contains None instead of float('nan'), preserving the same id_col indexing and column names.recipes/noc-reasoning-agent/scripts/ns_pipelines/generate_synthetic_data.py-1-1 (1)
1-1:⚠️ Potential issue | 🟠 MajorAdd the required copyright header.
CI reports this file is missing the mandatory header in the first 10 lines.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/noc-reasoning-agent/scripts/ns_pipelines/generate_synthetic_data.py` at line 1, Add the project's mandatory copyright header to the top of recipes/noc-reasoning-agent/scripts/ns_pipelines/generate_synthetic_data.py so it appears within the first 10 lines; update the file header above the existing import argparse (and any shebang or encoding lines) to match the repository's standard copyright/license block used elsewhere.recipes/noc-reasoning-agent/prompts/shortened_prompt_reasoning.yaml-30-45 (1)
30-45:⚠️ Potential issue | 🟠 MajorUse one canonical tool name across references and examples.
The tool catalog and few-shot examples currently use different names for the same action (
Remote_Unlock/Restart/RestorevsExecute_Remote_Action). This inconsistency will hurt tool-call reliability.Also applies to: 99-100, 122-123
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/noc-reasoning-agent/prompts/shortened_prompt_reasoning.yaml` around lines 30 - 45, The prompt uses inconsistent tool names (e.g., Remote_Unlock/Restart/Restore vs Execute_Remote_Action); pick one canonical tool identifier and replace all variants in the tool catalog and examples (references around the listed tool "Remote_Unlock/Restart/Restore(Element, Action)" and any mentions at lines ~99-100 and ~122-123) so every example, description, and tool reference uses that single name and signature (e.g., Execute_Remote_Action(Element, Action) or Remote_Unlock_Restart_Restore(Element, Action)), updating any example calls and return descriptions to match the chosen canonical symbol.recipes/noc-reasoning-agent/scripts/visualization/extract_scores.py-1-1 (1)
1-1:⚠️ Potential issue | 🟠 MajorAdd the required copyright header.
CI is currently failing on this file because the header is missing in the first 10 lines.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/noc-reasoning-agent/scripts/visualization/extract_scores.py` at line 1, Add the required project copyright/header to the top of the file so it appears within the first 10 lines (i.e., place the header above the existing import json). Ensure the header matches the repository's standard copyright/license block (same wording and year/owner format used in other source files) and save the file.recipes/noc-reasoning-agent/scripts/ns_pipelines/generate_synthetic_data.py-8-8 (1)
8-8:⚠️ Potential issue | 🟠 MajorUnify output directory path usage.
The script creates
outputs/sdg_reasonbut writes to/workspace/outputs/sdg_reason/; these can diverge and cause unexpected file placement/errors.Also applies to: 22-22
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/noc-reasoning-agent/scripts/ns_pipelines/generate_synthetic_data.py` at line 8, The script creates the directory with os.makedirs("outputs/sdg_reason", exist_ok=True) but later writes files to "/workspace/outputs/sdg_reason/", causing inconsistent paths; unify by introducing a single output_dir variable (e.g., output_dir = os.path.join("outputs", "sdg_reason") or configurable BASE_OUTPUT) and use that variable both in the os.makedirs call and in all file write paths (replace hardcoded "/workspace/outputs/sdg_reason/" usages), ensuring any functions or writes in generate_synthetic_data.py reference this output_dir consistently.recipes/noc-reasoning-agent/scripts/ns_pipelines/generate_synthetic_data.py-25-25 (1)
25-25:⚠️ Potential issue | 🟠 Major
--llmselection is disconnected from the model actually used.The CLI only accepts
qwen2.5-32b-instruct, but generation is hardcoded toopenai/gpt-oss-120b; plus the OSS branch is unreachable with current choices.💡 Suggested fix
-def generate_synthetic_data(args, cluster, num_gpus, step=None, input_format_file=None): +def generate_synthetic_data(args, cluster, num_gpus, model, step=None, input_format_file=None): @@ - model="openai/gpt-oss-120b", + model=model, @@ - parser.add_argument( - "--llm", type=str, default="qwen2.5-32b-instruct", - choices=["qwen2.5-32b-instruct"], - help="The LLM to use for generation", - ) + parser.add_argument( + "--llm", + type=str, + default="qwen2.5-32b-instruct", + choices=["qwen2.5-32b-instruct", "gpt-oss-120b"], + help="The LLM to use for generation", + ) @@ - if args.llm == "qwen2.5-32b-instruct": + model_map = { + "qwen2.5-32b-instruct": "Qwen/Qwen3-32B", + "gpt-oss-120b": "openai/gpt-oss-120b", + } + if args.llm == "qwen2.5-32b-instruct": generate_synthetic_data( - args, cluster, num_gpus, step=1, input_format_file="/workspace/outputs/sdg/formatted_output.json" + args, cluster, num_gpus, model=model_map[args.llm], step=1, + input_format_file="/workspace/outputs/sdg/formatted_output.json" ) else: generate_synthetic_data_oss_gpt(args, cluster, num_gpus)Also applies to: 49-51, 60-65
recipes/noc-reasoning-agent/prompts/formatting_prompt.yaml-104-109 (1)
104-109:⚠️ Potential issue | 🟠 MajorOutput contract is contradictory (
JSON onlyvs/think).The prompt asks for strict JSON with no surrounding text, but the footer adds
/think. That creates non-JSON output and parser failures.Also applies to: 189-191
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/noc-reasoning-agent/prompts/formatting_prompt.yaml` around lines 104 - 109, The prompt's "Strict JSON Output Format" requirement conflicts with the footer's "/think" token, causing non-JSON output; edit the formatting_prompt.yaml to remove or relocate the "/think" footer so the final output can be a pure JSON array (update the "Strict JSON Output Format" section and any footer text containing "/think"), and apply the same removal/adjustment to the other occurrence noted around lines 189-191 to ensure no non-JSON tokens remain.recipes/noc-reasoning-agent/prompts/prompt_reasoning.yaml-30-37 (1)
30-37:⚠️ Potential issue | 🟠 MajorAlign tool names between the tool list and few-shot examples.
The current mismatch (
Remote_Unlock/Restart/RestorevsExecute_Remote_Action) introduces ambiguity in generated tool calls.Also applies to: 100-101, 131-132
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/noc-reasoning-agent/prompts/prompt_reasoning.yaml` around lines 30 - 37, The tool list contains names that don't match the few-shot examples (e.g., Remote_Unlock/Restart/Restore vs Execute_Remote_Action); update the tool definitions and any occurrences in the examples to use the canonical names used in the few-shot calls (rename Remote_Unlock/Restart/Restore -> Execute_Remote_Action and align other tools such as Orchestration_tool, Check_Apply_Configuration, Triage_Toolkit_Tool, Check_remote_files to the exact identifiers used in the examples), ensuring every tool declaration and every example call in this prompt file use the same identifier strings so generated tool calls are unambiguous.recipes/noc-reasoning-agent/configs/noc_reasoning_sft.yaml-58-58 (1)
58-58:⚠️ Potential issue | 🟠 Major
remove_no_think_tagsvalue conflicts with its comment.The setting is
falsewhile the comment says “Enabled.” This can silently keep samples without</think>and degrade format consistency.💡 Suggested fix
- remove_no_think_tags: false # Enabled, as requested + remove_no_think_tags: true # Enabled, as requested🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/noc-reasoning-agent/configs/noc_reasoning_sft.yaml` at line 58, The YAML key remove_no_think_tags is set to false while its inline comment says "Enabled," causing a mismatch; update the configuration so the value matches the intended behavior—either set remove_no_think_tags: true to enable removal of </think> tags or change the comment to reflect that it's disabled—ensure the change is applied to the remove_no_think_tags entry in noc_reasoning_sft.yaml so samples are processed consistently.recipes/noc-reasoning-agent/prompts/prompt_incident.yaml-14-14 (1)
14-14:⚠️ Potential issue | 🟠 Major
apply_configurationshould requireconfig_typein the schema.Right now the JSON schema marks
config_typeoptional, which can produce incomplete tool calls for this action.💡 Suggested fix
- {{"type": "function", "function": {{"name": "apply_configuration", "description": "...", "parameters": {{"type": "object", "properties": {{"element_id": {{"type": "string", "description": "The identifier of the element to validate and apply configuration to."}}, "config_type": {{"type": "string", "description": "The type of configuration to apply."}}}}, "required": ["element_id"]}}}} + {{"type": "function", "function": {{"name": "apply_configuration", "description": "...", "parameters": {{"type": "object", "properties": {{"element_id": {{"type": "string", "description": "The identifier of the element to validate and apply configuration to."}}, "config_type": {{"type": "string", "description": "The type of configuration to apply."}}}}, "required": ["element_id", "config_type"]}}}}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/noc-reasoning-agent/prompts/prompt_incident.yaml` at line 14, The JSON schema for the function apply_configuration currently omits config_type from the required fields; update the function definition in prompt_incident.yaml (apply_configuration) so that the schema's "required" array includes "config_type" (in addition to "element_id") to ensure config_type is enforced when generating tool calls.recipes/noc-reasoning-agent/scripts/visualization/extract_scores.py-36-44 (1)
36-44:⚠️ Potential issue | 🟠 MajorGuard sample display columns before slicing to match defensive practices used for metrics.
This block can crash with
KeyErrorwhen one of the display fields is absent in the JSONL rows. Note that lines 20–24 defensively check for available metrics, but lines 36–44 assume all display columns exist without validation.💡 Suggested fix
columns_to_display = ["expected_answer", "agent_response", "llm_judge_reason"] -df_subset = df[columns_to_display].head(10) +available_display_cols = [c for c in columns_to_display if c in df.columns] +if not available_display_cols: + raise ValueError("No sample display columns found in the JSONL file") +df_subset = df[available_display_cols].head(10) # Display neatly for idx, row in df_subset.iterrows(): print(f"\n--- Sample {idx + 1} ---") - print(f"True Answer (expected_answer): {row['expected_answer']}") - print(f"Model Answer (agent_response): {row['agent_response']}") - print(f"Judge Explanation (llm_judge_reason): {row['llm_judge_reason']}") + for col in available_display_cols: + print(f"{col}: {row.get(col, '<missing>')}")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/noc-reasoning-agent/scripts/visualization/extract_scores.py` around lines 36 - 44, The sample display assumes all columns in columns_to_display exist and can raise KeyError; update the logic around columns_to_display / df_subset to defensively filter to only columns present in df (e.g., compute available_display_columns = [c for c in columns_to_display if c in df.columns] and slice df with that), and when iterating rows (df_subset / row) use safe access (row.get(...) or fallback text) for each field like "expected_answer", "agent_response", and "llm_judge_reason" so missing fields are handled gracefully and the print loop never crashes.recipes/noc-reasoning-agent/scripts/filtering/match_keywords.py-254-261 (1)
254-261:⚠️ Potential issue | 🟠 Major
patternparameter is unused — precompiled regex patterns are dead code.
find_keyword_matchesaccepts apatternargument (the compiled regex) but never uses it — it only does a normalized string lookup in thekeywordslist. Meanwhile,hw_pattern/sw_patternare compiled on lines 281-282 and passed on lines 290/296 but silently discarded. Either the regex matching was the intended behavior (in which case this is a bug), or the exact-match is correct and the regex compilation + parameter should be removed.Option A: Remove dead code (if exact-match is intended)
-def find_keyword_matches(row, pattern, keywords): +def find_keyword_matches(row, keywords): """Finds which specific keywords from a list match within a DataFrame row.""" resolution_method = row.get("resolution_method", row.get("close_code", "")) if resolution_method and str(resolution_method).lower().replace(" ", "") in keywords: return True return False ... - hw_pattern = re.compile("|".join(re.escape(k) for k in hw_keywords), flags=re.IGNORECASE) - sw_pattern = re.compile("|".join(re.escape(k) for k in sw_keywords), flags=re.IGNORECASE) ... - hw_matches = find_keyword_matches(row, hw_pattern, hw_keywords) + hw_matches = find_keyword_matches(row, hw_keywords) ... - sw_matches = find_keyword_matches(row, sw_pattern, sw_keywords) + sw_matches = find_keyword_matches(row, sw_keywords)Option B: Actually use the regex (if substring matching was intended)
-def find_keyword_matches(row, pattern, keywords): +def find_keyword_matches(row, pattern): """Finds which specific keywords from a list match within a DataFrame row.""" resolution_method = row.get("resolution_method", row.get("close_code", "")) - if resolution_method and str(resolution_method).lower().replace(" ", "") in keywords: + if resolution_method and pattern.search(str(resolution_method)): return True return FalseAlso applies to: 280-282, 288-299
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/noc-reasoning-agent/scripts/filtering/match_keywords.py` around lines 254 - 261, The function find_keyword_matches currently accepts a pattern param and compiled hw_pattern/sw_pattern are created elsewhere but never used; either remove the dead regex code or apply the regex match — choose one: if exact normalized-keyword equality is intended, remove the pattern parameter from find_keyword_matches and delete hw_pattern/sw_pattern creation and their callers (replace calls that pass pattern with just keywords), making resolution_method normalization and keyword lookup the sole logic; otherwise, if substring/regex matching was intended, update find_keyword_matches to use the passed-in pattern (e.g., convert resolution_method to str and call pattern.search(...) or pattern.match(...) on it after normalization) and keep hw_pattern/sw_pattern and their usage at the call sites (lines creating hw_pattern/sw_pattern and lines passing them to find_keyword_matches).recipes/noc-reasoning-agent/scripts/utils/create_input_jsonl_from_incidents.py-13-74 (1)
13-74: 🛠️ Refactor suggestion | 🟠 Major
ALLOWED_PROBLEM_CODESis duplicated verbatim withfilter_rows.py.This exact same list (legacy + synthetic problem codes) appears in both
filter_rows.py(asALLOWED_PROBLEM_CODES_LEGACY + ALLOWED_PROBLEM_CODES_SYNTHETIC) and here. A single source of truth in one module (e.g.,schema_columns.pyorfilter_rows.py) that both scripts import from would eliminate the maintenance risk of the lists drifting apart.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/noc-reasoning-agent/scripts/utils/create_input_jsonl_from_incidents.py` around lines 13 - 74, ALLOWED_PROBLEM_CODES in create_input_jsonl_from_incidents.py is a verbatim duplicate of the lists used in filter_rows.py; extract the canonical lists (e.g., ALLOWED_PROBLEM_CODES_LEGACY and ALLOWED_PROBLEM_CODES_SYNTHETIC or a single ALLOWED_PROBLEM_CODES) into a shared module such as schema_columns.py (or make filter_rows.py the source), export the constant(s) from that module, then update create_input_jsonl_from_incidents.py and filter_rows.py to import the shared constant(s) instead of defining them locally and remove the duplicated definition in create_input_jsonl_from_incidents.py.recipes/noc-reasoning-agent/scripts/utils/token_usage.py-90-91 (1)
90-91:⚠️ Potential issue | 🟠 MajorUnclosed file handle — resource leak when counting lines.
open(jsonl_path, ...)on line 91 is not wrapped in awithstatement or explicitly closed. This leaks a file descriptor.Proposed fix
with open(jsonl_path, "r", encoding="utf-8") as f: - total_lines = sum(1 for _ in open(jsonl_path, "r", encoding="utf-8")) + total_lines = sum(1 for _ in f) + f.seek(0)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/noc-reasoning-agent/scripts/utils/token_usage.py` around lines 90 - 91, The line that computes total_lines opens jsonl_path a second time without a with, leaking a file descriptor; modify the code so you only open the file once (reuse the already-open file object f) or wrap the second open in its own with. For example, change total_lines = sum(1 for _ in open(jsonl_path, ...)) to total_lines = sum(1 for _ in f) (or use a separate with-open for jsonl_path) so the file handle is properly closed; reference the jsonl_path variable and the total_lines calculation to locate the change.recipes/noc-reasoning-agent/scripts/utils/split_mocktools_answers.py-56-57 (1)
56-57:⚠️ Potential issue | 🟠 MajorError messages written to stdout may corrupt JSONL output.
When
--outputis not specified, data is printed to stdout (lines 113-114). But JSON decode errors are also printed to stdout (line 57, 71), which would interleave error messages with JSONL data and break downstream consumers.🐛 Proposed fix: write errors to stderr
+import sys + ... - print(f"Error decoding JSON in file1 on line {line_num}: '{line}'. Error: {e}") + print(f"Error decoding JSON in file1 on line {line_num}: '{line}'. Error: {e}", file=sys.stderr) ... - print(f"Error decoding JSON in file2 on line {line_num}: '{line}'. Error: {e}") + print(f"Error decoding JSON in file2 on line {line_num}: '{line}'. Error: {e}", file=sys.stderr)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/noc-reasoning-agent/scripts/utils/split_mocktools_answers.py` around lines 56 - 57, The error printouts from the JSONDecodeError handlers (the "Error decoding JSON in file1..." and the similar handler) are writing to stdout and will corrupt the JSONL output emitted when --output is unset; change those prints to write to stderr (e.g., print(..., file=sys.stderr)) and add an import for sys if needed, ensuring both JSONDecodeError exception handlers and any other diagnostic prints use stderr while leaving the normal JSONL print logic for the no---output case unchanged.recipes/noc-reasoning-agent/scripts/ns_pipelines/prepare_react_agent.py-22-53 (1)
22-53:⚠️ Potential issue | 🟠 Major
get_toolshas inconsistent return types and an unhandledjson.loadsexception.Two issues:
The function returns either a tuple
(None, None)(line 51) or a singledict(line 53). Callers must know to compare against(None, None)(line 72), which is fragile and unconventional. Prefer returningNoneor an empty dict for the "no tools" case and always a dict otherwise.The
try/exceptaroundjson.loadson line 39 was commented out (lines 33, 46-47). If any<tool_call>block contains malformed JSON,json.loadswill raise an unhandledJSONDecodeErrorthat crashes the entire script.Proposed fix
def get_tools(text): matches = {} tool_calls = re.findall(r"<tool_call>(.*?)</tool_call>", text, flags=re.DOTALL) tool_response = re.findall(r"<tool_response>(.*?)</tool_response>", text, flags=re.DOTALL) - # print(tool_calls) if len(tool_calls) != len(tool_response): raise ValueError(f"Mismatch: {len(tool_calls)} tool_calls vs {len(tool_response)} tool_responses") for i in range(len(tool_calls)): - # try: tool_block = tool_calls[i] response_block = tool_response[i] tool_json_str = tool_block.strip() - - tool_data = json.loads(tool_json_str) - response = response_block.strip() - tool_name = tool_data["name"] - arguments = tool_data["arguments"] - - matches[tool_name] = {"arguments": arguments, "response": response} - - # except json.JSONDecodeError as e: - # print(f"Skipping invalid JSON: {e}") + try: + tool_data = json.loads(tool_json_str) + except json.JSONDecodeError as e: + print(f"Skipping invalid tool JSON: {e}") + continue + response = response_block.strip() + tool_name = tool_data["name"] + arguments = tool_data["arguments"] + matches[tool_name] = {"arguments": arguments, "response": response} if not matches: - # print("No tools!") - return None, None - # print(matches) + return None return matchesThen update the caller:
- if matches == (None, None): + if matches is None:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/noc-reasoning-agent/scripts/ns_pipelines/prepare_react_agent.py` around lines 22 - 53, The get_tools function currently returns inconsistent types and does not handle JSON decoding errors; change get_tools to always return a single consistent type (prefer returning an empty dict {} when no tools are found instead of (None, None)) and wrap the json.loads call inside a try/except catching json.JSONDecodeError so malformed <tool_call> blocks are skipped (optionally log or collect the error) rather than raising; update the loop that processes tool_calls/tool_response (references: variables tool_calls, tool_response, tool_block, tool_data, response, tool_name, arguments) to continue on decode errors and ensure the final return is always a dict (possibly empty) and update callers to expect a dict return.recipes/noc-reasoning-agent/scripts/create_agent_with_tools.py-101-210 (1)
101-210: 🛠️ Refactor suggestion | 🟠 MajorTool definitions,
bind_tools,MemorySaver, andcreate_react_agentare all recreated on every row — significant overhead.All 11
@toolfunctions,llm_with_tools,checkpointer, andreact_agentare re-created inside the loop on every iteration. This is extremely expensive. The@toolclosures also trigger Ruff B023 (late-binding closure over loop variablerow). While safe here because tools are consumed in the same iteration, the pattern is fragile.Consider the approach used in the batch variant (
create_agent_with_tools_batch.py): create the agent once outside the loop and bind row data at call time via a lookup dict.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/noc-reasoning-agent/scripts/create_agent_with_tools.py` around lines 101 - 210, The tool functions (Check_Alarm_Status, Check_Element_Neighbors, ..., Check_Remote_Dump_Files), llm_with_tools = chat_llm.bind_tools, checkpointer = MemorySaver(), and react_agent = create_react_agent are being recreated per-row and capture the loop variable row; move the tool definitions and agent creation out of the loop so they are defined once, then at call time pass a row-specific lookup (e.g., a dict mapping keys like "Check_Alarm_Status_answer") or wrap each tool in a small factory that accepts the row as an argument to avoid late-binding of row; update code that invokes the tools to supply the current row's lookup so bind_tools/llm and react_agent are reused across iterations.recipes/noc-reasoning-agent/scripts/visualization/generate_trace_visualization.py-206-209 (1)
206-209:⚠️ Potential issue | 🟠 Major
df.query()with f-string is fragile and injectable.If
selected_criteriacontains a single quote (e.g.,Alzheimer's), the query string breaks. Use boolean indexing instead.Proposed fix
if selected_criteria: - filtered_df = df.query(f"category == '{selected_criteria}'") + filtered_df = df[df["category"] == selected_criteria] else: filtered_df = df🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/noc-reasoning-agent/scripts/visualization/generate_trace_visualization.py` around lines 206 - 209, The code uses df.query(f"category == '{selected_criteria}'") which is fragile and can break or be injectable when selected_criteria contains quotes; change to boolean indexing by replacing the query usage so filtered_df is assigned via df[df['category'] == selected_criteria] (or df.loc[df['category'] == selected_criteria]) when selected_criteria is truthy, otherwise keep filtered_df = df; update the logic around the selected_criteria check to handle empty/None values consistently and avoid constructing query strings.recipes/noc-reasoning-agent/scripts/visualization/generate_trace_visualization.py-162-187 (1)
162-187:⚠️ Potential issue | 🟠 MajorUnescaped HTML injection in
reasoning_traceandfinish_action.
reasoning_trace(line 178) andfinish_action(line 184) are inserted directly into the HTML output without escaping viaesc(). If the model output contains characters like<script>, this produces a malformed or exploitable HTML document. While this is an offline report, it's still a correctness issue — any<,>, or&in the trace will break HTML rendering.Proposed fix
<h3>Full Reasoning Trace</h3> - <div class='work-notes-container'>{reasoning_trace}</div> + <div class='work-notes-container'><pre>{esc(reasoning_trace)}</pre></div> <h3>Thoughts, Observations, Actions</h3> <div class="steps"> {"".join(steps_html)} </div> <h3> Closing Notes</h3> - <div class='work-notes-container'>{finish_action}</div> + <div class='work-notes-container'><pre>{esc(finish_action)}</pre></div>🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/noc-reasoning-agent/scripts/visualization/generate_trace_visualization.py` around lines 162 - 187, The HTML currently inserts raw model output into the template via reasoning_trace and finish_action without escaping, which can break rendering or allow injection; update the template generation in the function that builds the details block (where variables reasoning_trace and finish_action are used) to pass both through the existing esc() function (e.g., esc(reasoning_trace) and esc(finish_action)) before interpolation, so any '<', '>', '&' etc. are HTML-escaped while keeping the surrounding container divs unchanged.recipes/noc-reasoning-agent/scripts/utils/format_reasoning_json.py-49-65 (1)
49-65:⚠️ Potential issue | 🟠 MajorSilent data corruption when
<|message|>marker is absent.When
rfind("<|message|>")returns-1(marker not found), line 50 computestext[-1 + 11:]=text[10:], silently skipping the first 10 characters instead of handling the missing marker. Similarly, on line 53,rfind("]") + 1turns a "not found" (-1) into0, making the guard on line 55 (end_index != -1) always true — it should checkend_index > 0.Proposed fix
start_index = text.rfind("<|message|>") + if start_index == -1: + print(f"Warning: '<|message|>' marker not found for incident {number}") + continue text = text[start_index + len("<|message|>") :] start_index = text.find("[") - # Find the last position of the JSON array ']' to ensure we get the whole thing end_index = text.rfind("]") + 1 - if start_index != -1 and end_index != -1: + if start_index != -1 and end_index > 0:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/noc-reasoning-agent/scripts/utils/format_reasoning_json.py` around lines 49 - 65, The code silently corrupts data when the "<|message|>" or closing ']' is missing because it slices using rfind results without checking for -1; update the logic around the start_index/end_index handling: first check result of text.rfind("<|message|>") and bail (log and continue) if it is -1 before slicing text, then after locating the '[' with text.find("[") ensure that start_index != -1 and compute end_index = text.rfind("]"); require end_index > start_index (not end_index != -1) before slicing to json_string, and only then call json.loads and assign to responses[number]; keep the existing prints/logs when markers are missing.recipes/noc-reasoning-agent/scripts/utils/format_reasoning_json.py-75-90 (1)
75-90:⚠️ Potential issue | 🟠 MajorSame
rfindreturning-1bug inextract_final_thinking_processes.Line 86: if
"final<|message|>"is not found,rfindreturns-1, and slicing fromtext[-1 + 16:]=text[15:]silently corrupts the data. Add a guard.Proposed fix
- thinking = text[text.rfind("final<|message|>") + len("final<|message|>") :] + marker_idx = text.rfind("final<|message|>") + if marker_idx == -1: + print(f"Warning: 'final<|message|>' marker not found for incident {number}, step {step_number}") + continue + thinking = text[marker_idx + len("final<|message|>") :]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/noc-reasoning-agent/scripts/utils/format_reasoning_json.py` around lines 75 - 90, In extract_final_thinking_processes, guard against text.rfind("final<|message|>") returning -1 before slicing: compute idx = text.rfind("final<|message|>") and if idx == -1 set thinking to the original text (or an empty/placeholder string) instead of slicing, otherwise slice using idx + len("final<|message|>"); update data["generation"] and responses[number][step_number] with that safe thinking value; reference function extract_final_thinking_processes, variable text, the literal "final<|message|>", and helper _incident_id to locate the code to change.recipes/noc-reasoning-agent/scripts/utils/format_reasoning_json.py-201-207 (1)
201-207:⚠️ Potential issue | 🟠 MajorParameter
keyis accepted but never used.
_first_pos(key, named_key=None)takeskeyas the first argument, but the function body never references it. Every call site passes a string like"site_or_element_id"that appears to be the intendednamed_key. This looks like a bug wherekeyshould either be removed or used as the default fornamed_key.Proposed fix (if `key` was meant to be the named key)
- def _first_pos(key, named_key=None): + def _first_pos(named_key=None): """Return named arg, first positional arg, or 'all' as default.""" - if named_key: - val = kv_args.get(named_key) - if val: - return val + if named_key and named_key in kv_args: + return kv_args[named_key] return pos_args[0] if pos_args else "all"Then update all call sites, e.g.:
- arg_dict = {"site_or_element_id": _first_pos("site_or_element_id")} + arg_dict = {"site_or_element_id": _first_pos(named_key="site_or_element_id")}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/noc-reasoning-agent/scripts/utils/format_reasoning_json.py` around lines 201 - 207, The function _first_pos currently accepts a first parameter named key but never uses it; update the function so the first parameter acts as the default named key by setting named_key = named_key or key, then use kv_args.get(named_key) (and keep returning pos_args[0] or "all"); this preserves all existing call sites that pass the intended named key as the first argument and avoids removing or renaming parameters. Ensure references inside _first_pos use named_key, and run tests to confirm behavior unchanged for callers passing a separate named_key argument.recipes/noc-reasoning-agent/scripts/evaluation/evaluation_with_judge.py-133-135 (1)
133-135:⚠️ Potential issue | 🟠 Major
llm_judge_final_outputreceives the full reasoning part, not the extracted conclusion.
generated_reasoning_partcontains the entire generated reasoning sequence. The function's prompt asks the LLM to evaluate "the generated resolution at the end", implying only the final conclusion/answer should be passed — not the full chain-of-thought. Passing the whole reasoning block will likely produce inaccurate scores.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/noc-reasoning-agent/scripts/evaluation/evaluation_with_judge.py` around lines 133 - 135, The call to llm_judge_final_output is passing the entire generated_reasoning_part (full chain-of-thought) instead of just the final conclusion; update the code to extract only the final conclusion/answer from generated_reasoning_part (e.g., parse the last non-empty sentence/lines or the text after a "Conclusion:" marker) and pass that extracted conclusion to llm_judge_final_output (keep the original conclusion_expected as-is) so the judge evaluates only the final resolution rather than the whole reasoning trace.recipes/noc-reasoning-agent/scripts/evaluation/evaluation_with_judge.py-10-29 (1)
10-29:⚠️ Potential issue | 🟠 MajorWrap all module-level side effects in a
__main__guard.
parse_args(),pd.read_json(), theChatNVIDIAclient, and the evaluation loops all execute at import time. This makes the module untestable and causes crashes whenever it is imported by another script.♻️ Proposed fix
-# Parse arguments for input JSONL path -parser = argparse.ArgumentParser(...) -... -args = parser.parse_args() - -# Load the input JSONL -df = pd.read_json(args.input_jsonl, lines=True) -... -llm = ChatNVIDIA(base_url=args.nim_url, model=args.model) -rouge = rouge_scorer.RougeScorer(["rouge1", "rougeL"], use_stemmer=True) +def main(): + parser = argparse.ArgumentParser(description="Evaluation Pipeline for Agent Responses") + parser.add_argument("input_jsonl", ...) + ... + args = parser.parse_args() + + df = pd.read_json(args.input_jsonl, lines=True) + llm = ChatNVIDIA(base_url=args.nim_url, model=args.model) + rouge = rouge_scorer.RougeScorer(["rouge1", "rougeL"], use_stemmer=True) + # ... rest of evaluation logic ... + +if __name__ == "__main__": + main()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/noc-reasoning-agent/scripts/evaluation/evaluation_with_judge.py` around lines 10 - 29, The file runs parsing, pd.read_json, ChatNVIDIA(...) and rouge_scorer.RougeScorer(...) at import time; move all module-level side effects into a main() function and call it under if __name__ == "__main__": so imports are side-effect free—specifically wrap the argparse usage (parser.parse_args()/args), the DataFrame load (pd.read_json), the LLM client creation (ChatNVIDIA(...)), the ROUGE init (rouge_scorer.RougeScorer(...)), and the evaluation loop inside main(), export only pure helper functions at module level, and invoke main() inside the __main__ guard.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
recipes/noc-reasoning-agent/data/synthetic_incidents.csvis excluded by!**/*.csv
📒 Files selected for processing (29)
.gitignoredocs/tutorials/posts/noc-reasoning-agent.mdrecipes/noc-reasoning-agent/configs/config.inirecipes/noc-reasoning-agent/configs/noc_reasoning_sft.yamlrecipes/noc-reasoning-agent/configs/noc_reasoning_sft_6.yamlrecipes/noc-reasoning-agent/prompts/formatting_prompt.yamlrecipes/noc-reasoning-agent/prompts/prompt_incident.yamlrecipes/noc-reasoning-agent/prompts/prompt_reasoning.yamlrecipes/noc-reasoning-agent/prompts/shortened_prompt_reasoning.yamlrecipes/noc-reasoning-agent/scripts/create_agent_with_tools.pyrecipes/noc-reasoning-agent/scripts/create_agent_with_tools_batch.pyrecipes/noc-reasoning-agent/scripts/evaluation/evaluation_with_judge.pyrecipes/noc-reasoning-agent/scripts/evaluation/problem_code_evaluation.pyrecipes/noc-reasoning-agent/scripts/evaluation/score.pyrecipes/noc-reasoning-agent/scripts/filtering/filter_rows.pyrecipes/noc-reasoning-agent/scripts/filtering/match_keywords.pyrecipes/noc-reasoning-agent/scripts/ns_pipelines/generate_synthetic_data.pyrecipes/noc-reasoning-agent/scripts/ns_pipelines/prepare_react_agent.pyrecipes/noc-reasoning-agent/scripts/tools.pyrecipes/noc-reasoning-agent/scripts/utils/create_input_jsonl_from_incidents.pyrecipes/noc-reasoning-agent/scripts/utils/format_reasoning_json.pyrecipes/noc-reasoning-agent/scripts/utils/reasoning_processes.pyrecipes/noc-reasoning-agent/scripts/utils/schema_columns.pyrecipes/noc-reasoning-agent/scripts/utils/split_incident_data.pyrecipes/noc-reasoning-agent/scripts/utils/split_mocktools_answers.pyrecipes/noc-reasoning-agent/scripts/utils/token_usage.pyrecipes/noc-reasoning-agent/scripts/visualization/extract_representation_columns.pyrecipes/noc-reasoning-agent/scripts/visualization/extract_scores.pyrecipes/noc-reasoning-agent/scripts/visualization/generate_trace_visualization.py
recipes/noc-reasoning-agent/scripts/visualization/extract_representation_columns.py
Show resolved
Hide resolved
…naveras/add-noc-reasoning-tutorial
Add Apache 2.0 copyright headers to all recipe Python files, fix ruff lint issues (unsorted imports, unused import), apply ruff formatting, and strip trailing whitespace. Signed-off-by: George Armstrong <georgea@nvidia.com>
…orial' into acanaveras/add-noc-reasoning-tutorial
There was a problem hiding this comment.
Actionable comments posted: 1
Note
Due to the large number of review comments, Critical severity comments were prioritized as inline comments.
🟠 Major comments (21)
recipes/noc-reasoning-agent/scripts/utils/split_incident_data.py-32-33 (1)
32-33:⚠️ Potential issue | 🟠 MajorFail fast on missing incident IDs instead of silently accepting
None.At Line 32, Line 55, and Line 70,
.get()fallback can returnNoneif both keys are absent, which can silently corrupt split membership.🔧 Proposed fix
+def _get_incident_id(row: dict) -> str: + if "incident_identifier" in row: + return row["incident_identifier"] + if "number" in row: + return row["number"] + raise KeyError("Missing incident id: expected 'incident_identifier' or 'number'") + def split_generation_field( input_folder: str, train_out: str, test_out: str, test_size: float = 0.2, seed: int = 42, max_examples=10000 ): @@ - number = row.get("incident_identifier", row.get("number")) + number = _get_incident_id(row) incidents.append(number) @@ - number = row.get("incident_identifier", row.get("number")) + number = _get_incident_id(row) if number in train_set: current_iteration_train.append(row) @@ - number = row.get("incident_identifier", row.get("number")) + number = _get_incident_id(row) if number in test_set:As per coding guidelines
**/*.py: "Use direct dictionary key access (e.g.,data[key_name]) instead of.get()when the key is expected to be present, to fail with a clear error if the key is missing rather than silently corrupting data".Also applies to: 55-56, 70-71
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/noc-reasoning-agent/scripts/utils/split_incident_data.py` around lines 32 - 33, Replace the tolerant .get() lookups that can return None with direct key access so missing IDs raise immediately: for each occurrence where you do number = row.get("incident_identifier", row.get("number")) (and the similar occurrences at the other two places), change to use a conditional that accesses keys directly, e.g. if "incident_identifier" in row: number = row["incident_identifier"] else: number = row["number"]; this way a missing "number" or "incident_identifier" will raise a KeyError instead of silently producing None when you later call incidents.append(number) (apply the same pattern for the other two occurrences).recipes/noc-reasoning-agent/scripts/utils/split_incident_data.py-102-103 (1)
102-103:⚠️ Potential issue | 🟠 Major
--input_dirsemantics are inconsistent with actual usage.At Line 102-103, help/default imply a JSONL file, but the code uses
os.path.join(input_folder, "iteration_0.jsonl"), so it expects a directory. The current default likely fails on first run.🔧 Proposed fix
- parser.add_argument("--input_dir", help="Path to input_dir JSONL file", default="output_incident.jsonl") + parser.add_argument( + "--input_dir", + required=True, + help="Path to directory containing iteration_*.jsonl files", + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/noc-reasoning-agent/scripts/utils/split_incident_data.py` around lines 102 - 103, Update the CLI flag and its help/default to reflect that input_dir is a directory (not a file) and ensure callers expect a folder containing iteration_0.jsonl: change the parser.add_argument for "--input_dir" to have help text like "Path to input directory containing iteration_0.jsonl" and set a default that is a directory (e.g., "output_incident") or make the default None and validate it; also add a validation step near where input_folder is used (the code that does os.path.join(input_folder, "iteration_0.jsonl")) to raise a clear error if the provided path is not a directory or the expected file is missing.recipes/noc-reasoning-agent/scripts/utils/split_incident_data.py-105-107 (1)
105-107:⚠️ Potential issue | 🟠 Major
--previewis accepted but ignored.At Line 105-107, users can pass
--preview, but at Line 110 it is never used. This is silent parameter ignore behavior.🔧 Proposed fix (remove until implemented)
- parser.add_argument( - "--preview", type=int, default=2, help="Number of examples to preview before confirmation (default: 2)" - ) args = parser.parse_args() split_generation_field(args.input_dir, args.train_output, args.test_output)As per coding guidelines
**/*.py: "Avoid silently ignoring user-passed parameters; fail if a required parameter is not specified or an unsupported parameter is provided."Also applies to: 110-110
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/noc-reasoning-agent/scripts/utils/split_incident_data.py` around lines 105 - 107, The --preview argparse option (parser.add_argument("--preview", ...)) is declared but never used; remove this argument declaration or wire the parsed preview value into the preview/confirmation logic that runs after parsing (where the script currently previews examples at line ~110). Specifically, either delete the parser.add_argument call for "--preview" or pass args.preview into the function/method that displays the preview (or replace the hard-coded preview count there) so the parsed preview value from parser is actually applied.recipes/noc-reasoning-agent/scripts/utils/split_mocktools_answers.py-65-72 (1)
65-72:⚠️ Potential issue | 🟠 MajorDo not continue after JSON decode failures.
Line 70 and Line 84 currently log and continue, which can silently emit partial merged datasets. Fail fast on malformed JSONL rows so data issues are visible.
Proposed fix
- try: - d = json.loads(line) - num = d.get("number") - if num: - data1[num] = d - except json.JSONDecodeError as e: - print(f"Error decoding JSON in file1 on line {line_num}: '{line}'. Error: {e}") + d = json.loads(line) + num = d["number"] + data1[num] = d ... - try: - d = json.loads(line) - num = d.get("number") - if num: - data2[num] = d.get("generation", "") - except json.JSONDecodeError as e: - print(f"Error decoding JSON in file2 on line {line_num}: '{line}'. Error: {e}") + d = json.loads(line) + num = d["number"] + data2[num] = d["generation"]As per coding guidelines, "Don't catch exceptions when they are not normally expected to be raised; let unexpected exceptions propagate so that failures are visible rather than silent".
Also applies to: 79-85
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/noc-reasoning-agent/scripts/utils/split_mocktools_answers.py` around lines 65 - 72, The try/except that catches json.JSONDecodeError around json.loads(line) and prints an error allows the script to continue and produce a silently partial merged dataset; change the handlers in both places that catch JSONDecodeError (the blocks populating data1 and data2) to fail fast: remove the swallow-and-continue behavior and instead re-raise the exception (or call sys.exit with a non-zero code) after logging so malformed JSONL rows abort processing immediately; locate the json.loads(...) usages and their corresponding except json.JSONDecodeError handlers to implement this change.recipes/noc-reasoning-agent/scripts/utils/split_mocktools_answers.py-67-69 (1)
67-69:⚠️ Potential issue | 🟠 MajorUse direct key access for required record fields.
Line 67, Line 81, Line 83, and Line 109 use
.get()for fields that the merge depends on (number,generation). This can silently drop or degrade records instead of failing with a clear schema error.As per coding guidelines, "Use direct dictionary key access (e.g.,
data[key_name]) instead of.get()when the key is expected to be present, to fail with a clear error if the key is missing rather than silently corrupting data".Also applies to: 81-83, 109-109
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/noc-reasoning-agent/scripts/utils/split_mocktools_answers.py` around lines 67 - 69, Replace uses of d.get("number") and d.get("generation") with direct indexing (d["number"], d["generation"]) so missing required keys raise immediately; update occurrences around where num is assigned (variable num from dict d) and where generation is read/used (the code that indexes data1/data2 with generation) to use d["..."] and, if desired, add an explicit check that raises a clear ValueError with context before indexing to provide a friendlier schema error.recipes/noc-reasoning-agent/scripts/visualization/extract_representation_columns.py-55-57 (1)
55-57: 🛠️ Refactor suggestion | 🟠 MajorUse direct key access on
info— all three keys are guaranteed to exist.
csv_idxis built from a DataFrame that was already validated (lines 37–40) to containtime_to_resolve,solved_category, andsolved_reason. Using.get()here silently returnsNonefor typos or future renames instead of raising a visibleKeyError.♻️ Proposed fix
- row["time_to_resolve"] = info.get("time_to_resolve") - row["solved_category"] = info.get("solved_category") - row["solved_reason"] = info.get("solved_reason") + row["time_to_resolve"] = info["time_to_resolve"] + row["solved_category"] = info["solved_category"] + row["solved_reason"] = info["solved_reason"]As per coding guidelines, "Use direct dictionary key access (e.g.,
data[key_name]) instead of.get()when the key is expected to be present, to fail with a clear error if the key is missing rather than silently corrupting data."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/noc-reasoning-agent/scripts/visualization/extract_representation_columns.py` around lines 55 - 57, The three assignments use info.get(...) which can silently return None; change them to direct key access so missing keys raise immediately: replace the uses of info.get("time_to_resolve"), info.get("solved_category"), and info.get("solved_reason") with info["time_to_resolve"], info["solved_category"], and info["solved_reason"] respectively (the variables involved are row and info; csv_idx is already validated earlier).recipes/noc-reasoning-agent/scripts/visualization/generate_trace_visualization.py-219-220 (1)
219-220:⚠️ Potential issue | 🟠 Major
df.querywith an f-string will raise a parse error ifselected_criteriacontains a single quote.A value such as
O'Brienproduces the invalid expressioncategory == 'O'Brien', causing pandas to throwpandas.errors.UndefinedVariableErroror aValueError. Use the@local-variable syntax to avoid constructing the query string manually.🐛 Proposed fix
if selected_criteria: - filtered_df = df.query(f"category == '{selected_criteria}'") + filtered_df = df[df["category"] == selected_criteria] else: filtered_df = df🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/noc-reasoning-agent/scripts/visualization/generate_trace_visualization.py` around lines 219 - 220, The current use of df.query with an f-string (in the selected_criteria -> filtered_df block) can fail when selected_criteria contains quotes; change the filtering to avoid string interpolation and instead use pandas' local-variable syntax or boolean masking (e.g., use df.query("category == `@selected_criteria`") or df[df["category"] == selected_criteria]) so the query won't be malformed when selected_criteria contains characters like single quotes.recipes/noc-reasoning-agent/scripts/visualization/generate_trace_visualization.py-96-104 (1)
96-104:⚠️ Potential issue | 🟠 MajorPotential
IndexErrorinparse_work_noteswhen input ends with a bare timestamp.
re.splitwith a capturing group produces["pre", ts1, body1, ts2, body2, ...]. When the source text ends with a delimiter match and no trailing body (e.g., a trailing timestamp with no note text),partshas an even length andparts[i + 1]on the final iteration goes out of bounds.🐛 Proposed fix
i = 1 while i < len(parts): timestamp = parts[i].strip() - note_text = parts[i + 1].strip() - if note_text: # Only add entries that have content + note_text = parts[i + 1].strip() if i + 1 < len(parts) else "" + if note_text: notes.append({"timestamp": timestamp, "note": note_text}) i += 2🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/noc-reasoning-agent/scripts/visualization/generate_trace_visualization.py` around lines 96 - 104, parse_work_notes can IndexError when parts ends with a trailing timestamp; update the loop in parse_work_notes to ensure you never access parts[i + 1] out of bounds by changing the iteration to only run while i + 1 < len(parts) (or use a for-range like range(1, len(parts) - 1, 2)), then keep the existing logic that strips and appends {"timestamp": ..., "note": ...} only when note_text is non-empty; reference parse_work_notes and the local variables parts, i, timestamp, note_text when locating the change.recipes/noc-reasoning-agent/scripts/evaluation/problem_code_evaluation.py-160-162 (1)
160-162:⚠️ Potential issue | 🟠 MajorReplace broad
except Exceptionwith specific exception types.Per coding guidelines, unexpected exceptions should propagate rather than be swallowed. The only reliably expected exception inside this block is
json.JSONDecodeErrorfromjson.loads. CatchingExceptionsilently masks logic errors (e.g.,AttributeError,IndexError) as mere "failed" lines.🔧 Proposed fix
- except Exception as e: - print("Error:", e) - failed += 1 + except json.JSONDecodeError as e: + print("Error parsing JSON:", e) + failed += 1🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/noc-reasoning-agent/scripts/evaluation/problem_code_evaluation.py` around lines 160 - 162, Replace the broad "except Exception as e" that surrounds the json.loads call so only json.JSONDecodeError is caught and treated as a test failure; change the handler to catch json.JSONDecodeError (increment failed and print the error) and let any other exceptions propagate (do not swallow) so logic/attribute/index errors surface—look for the except block surrounding json.loads and the "failed" counter in problem_code_evaluation.py and update accordingly.recipes/noc-reasoning-agent/scripts/evaluation/evaluation_with_judge.py-69-77 (1)
69-77:⚠️ Potential issue | 🟠 MajorReplace broad
except Exceptionwith specific types.Per coding guidelines, unexpected exceptions should propagate.
llm.invokecan raise network/auth errors, and theint(...)cast can raiseValueErroron a malformed LLM response — these are the only expected failures. Catch them explicitly instead of masking all exceptions silently.🔧 Proposed fix
- except Exception as e: - print(f"Error in LLM judge: {e}") - return 0, "Error" + except (ValueError, IndexError) as e: + print(f"Error parsing LLM judge response: {e}") + return 0, "Error"Apply the same change to lines 100–108.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/noc-reasoning-agent/scripts/evaluation/evaluation_with_judge.py` around lines 69 - 77, The current broad except in the LLM judge block masks unexpected errors; modify the try/except so you only catch the expected failures: catch the LLM client network/auth related exceptions raised by llm.invoke (e.g., ConnectionError, TimeoutError, or the specific AuthenticationError type your LLM client uses) and ValueError from the int(...) cast, log/print the error and return the fallback (0, "Error"), but re-raise any other exceptions so unexpected failures propagate; update the same pattern for the analogous block around lines 100–108. Reference the llm.invoke call and the int(output.split("Score:")...) parsing/Reasoning extraction when making the change.recipes/noc-reasoning-agent/scripts/visualization/extract_scores.py-19-20 (1)
19-20:⚠️ Potential issue | 🟠 MajorReplace the hardcoded path with a CLI argument.
The comment on line 19 explicitly calls this out, but the fix should be
argparserather than a manual source edit. As-is, every run requires source modification.🔧 Proposed fix
-# Replace with your actual JSONL file path -file_path = "evaluations.jsonl" +import argparse + +parser = argparse.ArgumentParser(description="Extract and display evaluation scores from a JSONL file") +parser.add_argument("input_jsonl", help="Path to evaluations JSONL file") +args = parser.parse_args() +file_path = args.input_jsonl🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/noc-reasoning-agent/scripts/visualization/extract_scores.py` around lines 19 - 20, Replace the hardcoded file_path variable by adding argparse to parse a CLI flag (e.g., --file or -f) and set file_path from the parsed args (defaulting to "evaluations.jsonl"); update the top-level code that currently assigns file_path to instead use args.file and ensure the script defines/uses an entrypoint (if not present, add an if __name__ == "__main__": main() pattern) so users can run the script with a custom JSONL path via the new argument.recipes/noc-reasoning-agent/scripts/evaluation/evaluation_with_judge.py-49-108 (1)
49-108: 🛠️ Refactor suggestion | 🟠 Major
llm_judge_final_outputandllm_judge_reasoningare near-identical — merge into one function.The two functions share identical structure, error handling, and parsing logic; only the prompt text differs. This is a clear DRY violation.
♻️ Proposed refactor
-def llm_judge_final_output(expected, generated): - prompt = f""" - Evaluate how well the generated resolution at the end matches the expected resolution on a scale of 1-5: - ... - Expected: {expected} - Generated: {generated} - ... - """ - try: - response = llm.invoke(prompt) - output = response.content.strip() - score = int(output.split("Score:")[1].split("\n")[0].strip()) - reasoning = output.split("Reasoning:")[1].strip() - return score, reasoning - except Exception as e: - print(f"Error in LLM judge: {e}") - return 0, "Error" - - -def llm_judge_reasoning(expected, generated): - prompt = f""" - Evaluate how well the generated reasoning is, including tools used, resolution matches the expected resolution on a scale of 1-5: - ... - Expected: {expected} - Generated: {generated} - ... - """ - try: - response = llm.invoke(prompt) - output = response.content.strip() - score = int(output.split("Score:")[1].split("\n")[0].strip()) - reasoning = output.split("Reasoning:")[1].strip() - return score, reasoning - except Exception as e: - print(f"Error in LLM judge: {e}") - return 0, "Error" +JUDGE_PROMPTS = { + "final_output": ( + "Evaluate how well the generated resolution at the end matches the expected resolution on a scale of 1-5:\n" + "- 5: Perfect match in content.\n" + "- 4: High similarity, minor differences.\n" + "- 3: Moderate match, key elements present but some deviations.\n" + "- 2: Low match, major differences.\n" + "- 1: No match." + ), + "reasoning": ( + "Evaluate how well the generated reasoning is, including tools used, resolution matches the expected resolution on a scale of 1-5:\n" + "- 5: Perfect match in content, structure, and actions.\n" + "- 4: High similarity, minor differences.\n" + "- 3: Moderate match, key elements present but some deviations.\n" + "- 2: Low match, major differences.\n" + "- 1: No match." + ), +} + + +def llm_judge(expected: str, generated: str, judge_type: str) -> tuple[int, str]: + """Invoke the LLM judge and return (score, reasoning).""" + criteria = JUDGE_PROMPTS[judge_type] + prompt = ( + f"{criteria}\n\n" + f"Expected: {expected}\nGenerated: {generated}\n\n" + "Provide only the score (1-5) and a brief reasoning.\n" + "Format: Score: X\\nReasoning: ..." + ) + response = llm.invoke(prompt) + output = response.content.strip() + score = int(output.split("Score:")[1].split("\n")[0].strip()) + reasoning = output.split("Reasoning:")[1].strip() + return score, reasoningThen at the call sites:
- reasoning_judge_score, reasoning_judge_reason = llm_judge_reasoning(reasoning_expected, generated_reasoning_part) - conclusion_judge_score, conclusion_judge_reason = llm_judge_final_output( - conclusion_expected, generated_reasoning_part - ) + reasoning_judge_score, reasoning_judge_reason = llm_judge(reasoning_expected, generated_reasoning_part, "reasoning") + conclusion_judge_score, conclusion_judge_reason = llm_judge(conclusion_expected, generated_reasoning_part, "final_output")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/noc-reasoning-agent/scripts/evaluation/evaluation_with_judge.py` around lines 49 - 108, Both llm_judge_final_output and llm_judge_reasoning duplicate parsing/error logic; consolidate them into a single helper (e.g., llm_judge) that accepts the prompt body or a prompt_type and the expected and generated strings, reuses the existing llm.invoke/error parsing logic (response = llm.invoke(prompt); parse Score and Reasoning) and returns (score, reasoning), then replace calls to llm_judge_final_output and llm_judge_reasoning with the new llm_judge(expected, generated, prompt_template_or_type) so only the prompt text differs while parsing/error handling lives in one place.recipes/noc-reasoning-agent/scripts/filtering/filter_rows.py-120-125 (1)
120-125:⚠️ Potential issue | 🟠 Major
UnicodeDecodeErrorcan never be raised withencoding="latin1"— the except branch is dead code, and the fallback encoding order is inverted.
latin1(ISO-8859-1) is a complete single-byte encoding that maps every possible byte value (0x00–0xFF) to a Unicode code point, sopd.read_csvwithencoding="latin1"will never raiseUnicodeDecodeError. The except branch is unreachable.Additionally, the fallback
pd.read_csv(path)(which defaults to UTF-8) is more restrictive thanlatin1, not less — so this would never rescue a file thatlatin1couldn't handle; it would only fail on files thatlatin1already read successfully.The correct pattern for a graceful fallback is to try
utf-8first (strict, correct for modern files), then fall back tolatin1(permissive, for legacy files with non-UTF-8 bytes):🐛 Proposed fix
def _load_csv(path: str) -> pd.DataFrame: - """Load CSV, falling back from latin1 to default encoding.""" - try: - return pd.read_csv(path, encoding="latin1") - except UnicodeDecodeError: - return pd.read_csv(path) + """Load CSV, falling back to latin1 for files with non-UTF-8 bytes.""" + try: + return pd.read_csv(path) + except UnicodeDecodeError: + return pd.read_csv(path, encoding="latin1")As per coding guidelines, "Don't catch exceptions when they are not normally expected to be raised."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/noc-reasoning-agent/scripts/filtering/filter_rows.py` around lines 120 - 125, _in _load_csv, attempting read_csv with encoding="latin1" first makes the except branch unreachable; instead try the strict utf-8 read first and fall back to latin1 on UnicodeDecodeError. Update the _load_csv function to call pd.read_csv(path) (UTF-8) inside a try and catch UnicodeDecodeError to then call pd.read_csv(path, encoding="latin1"); ensure the function name _load_csv and pd.read_csv calls are used so the change is localized.recipes/noc-reasoning-agent/scripts/filtering/match_keywords.py-268-275 (1)
268-275:⚠️ Potential issue | 🟠 Major
patternparameter is accepted but never used — and the precompiled regexes on lines 295–296 are silently discarded.
find_keyword_matchesonly checksstr(resolution_method).lower().replace(" ", "") in keywords(an exact set-membership test). The compiledhw_pattern/sw_patternregex objects are passed in but ignored, making the twore.compilecalls on lines 295–296 dead code.Two valid fixes:
Option A — drop the unused parameter and the regex compilation (simpler, matches actual behavior):
♻️ Option A: remove unused parameter and dead code
-def find_keyword_matches(row, pattern, keywords): +def find_keyword_matches(row, keywords): """Finds which specific keywords from a list match within a DataFrame row.""" resolution_method = row.get("resolution_method", row.get("close_code", "")) if resolution_method and str(resolution_method).lower().replace(" ", "") in keywords: return True return False- # Precompile regex patterns for performance. This matches any of the phrases. - hw_pattern = re.compile("|".join(re.escape(k) for k in hw_keywords), flags=re.IGNORECASE) - sw_pattern = re.compile("|".join(re.escape(k) for k in sw_keywords), flags=re.IGNORECASE) - # Convert all data to string type for safe searching str_df = df.astype(str) ... - hw_matches = find_keyword_matches(row, hw_pattern, hw_keywords) + hw_matches = find_keyword_matches(row, hw_keywords) ... - sw_matches = find_keyword_matches(row, sw_pattern, sw_keywords) + sw_matches = find_keyword_matches(row, sw_keywords)Option B — actually use the regex (if broader substring matching was intended):
♻️ Option B: use the pattern inside find_keyword_matches
def find_keyword_matches(row, pattern, keywords): """Finds which specific keywords from a list match within a DataFrame row.""" resolution_method = row.get("resolution_method", row.get("close_code", "")) - if resolution_method and str(resolution_method).lower().replace(" ", "") in keywords: + if resolution_method and pattern.search(str(resolution_method)): return True return False🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/noc-reasoning-agent/scripts/filtering/match_keywords.py` around lines 268 - 275, The function find_keyword_matches currently ignores its pattern parameter and discards the precompiled regexes (hw_pattern / sw_pattern); update find_keyword_matches to actually use the passed-in pattern by normalizing resolution_method to a string (e.g., lowercased and stripped) and performing pattern.search(resolution_method_str) (or pattern.match as appropriate) in addition to, or instead of, the exact membership test against keywords; modify callers that pass hw_pattern / sw_pattern so the function uses that regex, or if you prefer Option A remove the unused pattern parameter and delete the hw_pattern / sw_pattern compilation to avoid dead code — locate the change in find_keyword_matches and the code that constructs hw_pattern / sw_pattern.recipes/noc-reasoning-agent/scripts/utils/token_usage.py-80-93 (1)
80-93:⚠️ Potential issue | 🟠 MajorWhen
--yamlis provided, enforce file/dependency/key validity instead of skipping.Current behavior silently ignores user intent (
missing file,missing PyYAML) and uses.get("user")where schema key is expected.Suggested fix
def read_yaml_prompt(path: Optional[str]) -> Optional[str]: if not path: return None if not os.path.exists(path): - print(f"[warn] YAML file not found: {path}") - return None + raise FileNotFoundError(f"YAML file not found: {path}") if yaml is None: - print("[warn] PyYAML not installed; skipping YAML parsing.") - return None + raise RuntimeError("PyYAML is required when --yaml is provided.") with open(path, "r", encoding="utf-8") as f: data = yaml.safe_load(f) - # Your structure shows top-level key 'user' - return data.get("user") if isinstance(data, dict) else None + if not isinstance(data, dict): + raise TypeError(f"YAML root must be a mapping: {path}") + return data["user"]As per coding guidelines, "Use direct dictionary key access (e.g.,
data[key_name]) instead of.get()when the key is expected to be present, to fail with a clear error if the key is missing rather than silently corrupting data."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/noc-reasoning-agent/scripts/utils/token_usage.py` around lines 80 - 93, In read_yaml_prompt, enforce the caller's intent by failing fast instead of silently returning None: when a non-empty path is provided, raise a FileNotFoundError if the file does not exist (don’t print and continue), raise an ImportError/RuntimeError if the yaml module is unavailable, and parse the file then access the expected top-level key using direct indexing (data["user"]) so a KeyError is raised if the schema is wrong; update error messages to include the path and clear context and keep the function signature and name (read_yaml_prompt) the same.recipes/noc-reasoning-agent/scripts/utils/token_usage.py-119-124 (1)
119-124:⚠️ Potential issue | 🟠 MajorRequire
generationto exist and be a string.Using
row.get("generation", "")plus a no-oppasson non-string values silently records invalid token lengths.Suggested fix
- gen = row.get("generation", "") + gen = row["generation"] @@ if not isinstance(gen, str): - # Try a nested alternative if your data uses it (customize as needed): - # gen = row.get("output", {}).get("text", "") - pass + raise TypeError(f"Line {i}: 'generation' must be a string.")As per coding guidelines, "Use direct dictionary key access (e.g.,
data[key_name]) instead of.get()when the key is expected to be present, to fail with a clear error if the key is missing rather than silently corrupting data."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/noc-reasoning-agent/scripts/utils/token_usage.py` around lines 119 - 124, Replace the silent .get() and no-op branch with strict validation: access the key directly via row["generation"] (so a KeyError surfaces if missing), assign it to gen, then if not isinstance(gen, str) raise a clear TypeError/ValueError (including identifying info from row if available) instead of passing; update the block that currently sets gen = row.get("generation", "") and the subsequent non-string branch to perform this direct access and explicit type check so invalid/missing generation values fail loudly.recipes/noc-reasoning-agent/scripts/utils/token_usage.py-53-64 (1)
53-64:⚠️ Potential issue | 🟠 MajorFail fast when
--modelis provided but tokenizer loading fails.Right now a user-supplied model can silently degrade to whitespace tokenization, which invalidates analysis output. Per the coding guidelines, explicitly provided parameters should fail fast rather than silently degrade.
Suggested fix
def load_tokenizer(model_name: Optional[str]): """ Try to load a HF tokenizer. If unavailable (e.g., no internet/cache), return None and we'll fall back to whitespace tokenization. """ - if not model_name or AutoTokenizer is None: + if not model_name: return None - try: - tok = AutoTokenizer.from_pretrained(model_name, local_files_only=True) - return tok - except Exception: - # Try again without local_files_only (may fail if no internet) - try: - tok = AutoTokenizer.from_pretrained(model_name) - return tok - except Exception: - return None + if AutoTokenizer is None: + raise RuntimeError("--model was provided but transformers is not installed.") + try: + return AutoTokenizer.from_pretrained(model_name, local_files_only=True) + except OSError: + return AutoTokenizer.from_pretrained(model_name)And update the usage check:
tokenizer = load_tokenizer(args.model) - if tokenizer is None: + if tokenizer is None and args.model is None: print("[warn] Could not load tokenizer; falling back to whitespace token counts.")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/noc-reasoning-agent/scripts/utils/token_usage.py` around lines 53 - 64, The tokenizer loader currently falls back silently when AutoTokenizer.from_pretrained fails; change the logic in recipes/noc-reasoning-agent/scripts/utils/token_usage.py so that if model_name is provided (the user explicitly set --model) and AutoTokenizer.from_pretrained (both with local_files_only=True and the retry without it) raises, you raise a descriptive exception (e.g., RuntimeError) instead of returning None or falling back to whitespace tokenization; keep the same try/retry calls to AutoTokenizer.from_pretrained but re-raise with context on final failure. Also update the caller/usage check that currently accepts a None tokenizer so that when a user-supplied model_name was provided and the loader raises/returns None it fails fast and surfaces the error to the user rather than proceeding with a whitespace tokenizer.recipes/noc-reasoning-agent/scripts/utils/token_usage.py-72-75 (1)
72-75:⚠️ Potential issue | 🟠 MajorRemove the bare
except Exception: passblock.Silent exception handling around
tokenizer.encode()hides real tokenizer/runtime failures and causes token counts to degrade silently to whitespace-based fallback. This violates the coding guideline: "Don't catch exceptions when they are not normally expected to be raised; let unexpected exceptions propagate so that failures are visible rather than silent." Across the entire codebase,tokenizer.encode()is called directly without exception handling everywhere else, confirming that exceptions are not expected during normal operation.Suggested fix
if tokenizer is not None: # Use encode to match model token count (fast + accurate) - try: - return len(tokenizer.encode(text, add_special_tokens=False)) - except Exception: - pass + return len(tokenizer.encode(text, add_special_tokens=False))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/noc-reasoning-agent/scripts/utils/token_usage.py` around lines 72 - 75, Remove the silent try/except around tokenizer.encode so exceptions are not swallowed: delete the try/except block and call tokenizer.encode(text, add_special_tokens=False) directly and return len(...) (i.e., replace the try/except with a direct return len(tokenizer.encode(text, add_special_tokens=False))); do not add a bare except or silent fallback—let tokenizer.encode propagate errors so failures are visible.recipes/noc-reasoning-agent/scripts/ns_pipelines/generate_synthetic_data.py-39-39 (1)
39-39:⚠️ Potential issue | 🟠 MajorHardcoded
modelsilently overrides the user-selected--llm.
model="openai/gpt-oss-120b"is always used regardless ofargs.llm. Whenmain()routes here becauseargs.llm == "qwen2.5-32b-instruct", the user's selection is completely ignored at the model-serving level. The--llmargument should drive both the routing logic and the model identifier passed togenerate().🐛 Proposed fix
def generate_synthetic_data(args, cluster, num_gpus, step=None, input_format_file=None): os.makedirs("outputs/sdg_reason", exist_ok=True) generate( ctx=wrap_arguments( f"++prompt_config=/workspace/data/prompt_reasoning.yaml " f"++inference.temperature={args.temperature} " f"++inference.tokens_to_generate={args.tokens_to_generate} " f"++code_execution=false " f"++skip_filled=false " f"++use_completions_api=true " f"++input_file={input_format_file} " ), cluster=cluster, server_type="vllm", input_file=input_format_file, output_dir="/workspace/outputs/sdg_reason/", expname="incident-generation", - model="openai/gpt-oss-120b", + model=args.model, rerun_done=True, server_gpus=num_gpus, )Then pass
modelthrough frommain():if args.llm == "qwen2.5-32b-instruct": generate_synthetic_data( - args, cluster, num_gpus, step=1, input_format_file="/workspace/outputs/sdg/formatted_output.json" + args, cluster, num_gpus, step=1, + input_format_file="/workspace/outputs/sdg/formatted_output.json", )where
argscarries a--model(or derived) attribute mapping the LLM alias to its model path, or simply forwardargs.llmdirectly as themodelstring (after proper name resolution).As per coding guidelines: "Avoid silently ignoring user-passed parameters; fail if a required parameter is not specified or an unsupported parameter is provided."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/noc-reasoning-agent/scripts/ns_pipelines/generate_synthetic_data.py` at line 39, The hardcoded model="openai/gpt-oss-120b" inside the branch overrides the user-selected LLM; update the branch that calls generate(...) so it accepts and forwards a model parameter derived from main() (use args.llm or a resolved mapping) instead of the literal string, and validate that the passed model is supported; modify main() to pass the chosen model into the function that invokes generate(...) (refer to main(), args.llm, the generate(...) call, and the local model variable) so the CLI selection drives the actual model identifier.recipes/noc-reasoning-agent/scripts/utils/format_reasoning_json.py-89-104 (1)
89-104:⚠️ Potential issue | 🟠 Major
rfindreturning-1causes silent data corruption inextract_final_thinking_processes.Line 100: if
"final<|message|>"is not found intext,rfindreturns-1, and-1 + len("final<|message|>")evaluates to15. The slicetext[15:]would silently return garbage data instead of signaling an error.Unlike
extract_formatted_json_steps, this function also has notry/exceptaroundjson.loads(line)(line 93), so a malformed line will crash the entire run.Proposed fix
def extract_final_thinking_processes(input_file): responses = {} with open(input_file, "r", encoding="utf-8") as f: for line in f: - data = json.loads(line) + try: + data = json.loads(line) + except json.JSONDecodeError: + print(f"Skipping invalid line: {line.strip()}") + continue text = data["generation"] number = _incident_id(data) step_number = data["step_number"] if number not in responses: responses[number] = {} - thinking = text[text.rfind("final<|message|>") + len("final<|message|>") :] + marker = "final<|message|>" + idx = text.rfind(marker) + if idx == -1: + print(f"Warning: marker not found for incident {number}, step {step_number}") + continue + thinking = text[idx + len(marker) :] data["generation"] = thinking responses[number][step_number] = thinking return responses🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/noc-reasoning-agent/scripts/utils/format_reasoning_json.py` around lines 89 - 104, In extract_final_thinking_processes, guard against rfind returning -1 and add JSON parse error handling: wrap the json.loads(line) call in a try/except to skip or log malformed lines, use the index returned by text.rfind("final<|message|>") (or text.find) and check if it's >= 0 before slicing, and only set data["generation"] and responses[number][step_number] when the marker is found; reference functions/vars: extract_final_thinking_processes, json.loads, _incident_id, text.rfind("final<|message|>"), data["generation"], responses, and step_number.recipes/noc-reasoning-agent/scripts/utils/format_reasoning_json.py-215-221 (1)
215-221:⚠️ Potential issue | 🟠 Major
keyparameter is never used — likely a bug in argument lookup.
_first_pos(key, named_key=None)ignores thekeyparameter entirely. All call sites pass a single positional argument (e.g.,_first_pos("site_or_element_id")), which binds tokeybut is never referenced. The intent was almost certainly to usekeyas thekv_argslookup whennamed_keyis not provided.Proposed fix
- def _first_pos(key, named_key=None): + def _first_pos(named_key): """Return named arg, first positional arg, or 'all' as default.""" - if named_key: - val = kv_args.get(named_key) - if val: - return val + val = kv_args.get(named_key) + if val: + return val return pos_args[0] if pos_args else "all"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/noc-reasoning-agent/scripts/utils/format_reasoning_json.py` around lines 215 - 221, The helper _first_pos currently ignores its key parameter causing lookups to use only named_key or fall back to pos_args; update _first_pos (and any call sites expecting _first_pos("site_or_element_id")) so it uses key when named_key is not provided — i.e., attempt to get kv_args[named_key] if named_key is given, otherwise get kv_args[key]; if that lookup yields a truthy value return it, else fall back to pos_args[0] or "all". Ensure references to _first_pos, key, named_key, kv_args, and pos_args are used exactly as described.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (20)
recipes/noc-reasoning-agent/scripts/create_agent_with_tools.pyrecipes/noc-reasoning-agent/scripts/create_agent_with_tools_batch.pyrecipes/noc-reasoning-agent/scripts/evaluation/evaluation_with_judge.pyrecipes/noc-reasoning-agent/scripts/evaluation/problem_code_evaluation.pyrecipes/noc-reasoning-agent/scripts/evaluation/score.pyrecipes/noc-reasoning-agent/scripts/filtering/filter_rows.pyrecipes/noc-reasoning-agent/scripts/filtering/match_keywords.pyrecipes/noc-reasoning-agent/scripts/ns_pipelines/generate_synthetic_data.pyrecipes/noc-reasoning-agent/scripts/ns_pipelines/prepare_react_agent.pyrecipes/noc-reasoning-agent/scripts/tools.pyrecipes/noc-reasoning-agent/scripts/utils/create_input_jsonl_from_incidents.pyrecipes/noc-reasoning-agent/scripts/utils/format_reasoning_json.pyrecipes/noc-reasoning-agent/scripts/utils/reasoning_processes.pyrecipes/noc-reasoning-agent/scripts/utils/schema_columns.pyrecipes/noc-reasoning-agent/scripts/utils/split_incident_data.pyrecipes/noc-reasoning-agent/scripts/utils/split_mocktools_answers.pyrecipes/noc-reasoning-agent/scripts/utils/token_usage.pyrecipes/noc-reasoning-agent/scripts/visualization/extract_representation_columns.pyrecipes/noc-reasoning-agent/scripts/visualization/extract_scores.pyrecipes/noc-reasoning-agent/scripts/visualization/generate_trace_visualization.py
🚧 Files skipped from review as they are similar to previous changes (3)
- recipes/noc-reasoning-agent/scripts/utils/schema_columns.py
- recipes/noc-reasoning-agent/scripts/utils/create_input_jsonl_from_incidents.py
- recipes/noc-reasoning-agent/scripts/evaluation/score.py
| def find_finish_action(generation_text): | ||
| """Extract the Finish[...] action text from a generation string.""" | ||
| last_index = generation_text.rfind("Finish") | ||
| if last_index != -1: | ||
| return generation_text[last_index:].strip() | ||
| return "" |
There was a problem hiding this comment.
find_finish_action will raise AttributeError when "generation" is absent.
incident_data.get("generation") returns None when the key is missing, and find_finish_action immediately calls .rfind() on the argument without a None guard. The same value is also assigned to reasoning_trace on line 116 and later passed to parse_steps — though parse_steps handles None safely via reasoning or "", the find_finish_action path does not.
🐛 Proposed fix
def find_finish_action(generation_text):
"""Extract the Finish[...] action text from a generation string."""
+ if not generation_text:
+ return ""
last_index = generation_text.rfind("Finish")
if last_index != -1:
return generation_text[last_index:].strip()
return ""Alternatively, call with an explicit default at the call site:
- finish_action = find_finish_action(incident_data.get("generation"))
+ finish_action = find_finish_action(incident_data.get("generation", ""))Also applies to: 116-117
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@recipes/noc-reasoning-agent/scripts/visualization/generate_trace_visualization.py`
around lines 66 - 71, find_finish_action currently calls
generation_text.rfind("Finish") and will raise AttributeError if passed None
(e.g., incident_data.get("generation") may return None); update
find_finish_action (or its call sites) to treat a missing generation as an empty
string by coalescing generation_text to "" before calling .rfind (or return ""
immediately if generation_text is falsy). Ensure the same defensive behavior is
applied where reasoning_trace is set from incident_data.get("generation") before
passing into parse_steps so find_finish_action and downstream logic never call
string methods on None.
Apply review suggestion from PR #1242 to use ```bash instead of bare ``` for all shell command code blocks in the NOC reasoning agent tutorial. Signed-off-by: George Armstrong <georgea@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/tutorials/posts/noc-reasoning-agent.md`:
- Around line 261-267: The blockquote under "Re-running generation pipelines"
contains empty quoted lines that trigger MD028; open the blockquote starting
with "**Note — Re-running generation pipelines:**" and remove any blank lines
that begin with ">" so the entire quote remains contiguous (ensure the lines
showing `ns generate` behavior, the rm example, and the note about other output
directories are all directly adjacent with no blank quoted lines).
- Around line 80-87: Add a single blank line between the two adjacent Markdown
tables that start with the header "| Software | Purpose |" and the following
table (currently immediately before "## Setup") to satisfy markdownlint rule
MD058; locate the block that renders the first table and insert one empty line
after its final row so there is a blank line before the next table/heading.
| | Software | Purpose | | ||
| | --- | --- | | ||
| | **Docker** | Containerization for consistent environments | | ||
| | **<a href="https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html" target="_blank">NVIDIA Container Toolkit</a>** | Allows Docker containers to access GPU resources | | ||
| | **<a href="https://slurm.schedmd.com/" target="_blank">Slurm</a> with <a href="https://github.com/NVIDIA/pyxis" target="_blank">NVIDIA/pyxis</a>** (optional) | Cluster job scheduler for distributed workloads | | ||
| | **Python 3.10+** | Required Python version | | ||
| | **NeMo-Skills CLI** | Main interface for running pipelines | | ||
| ## Setup |
There was a problem hiding this comment.
Add a blank line between adjacent tables to satisfy markdownlint.
Line 80 starts a second table immediately after the first one. Add a blank line between them to resolve MD058.
Suggested fix
| **VRAM** | Sufficient for model size | Stores model weights and activations |
| **Multi-GPU** | Recommended | Enables model and batch parallelism |
+
| Software | Purpose |
| --- | --- |
| **Docker** | Containerization for consistent environments |🧰 Tools
🪛 markdownlint-cli2 (0.21.0)
[warning] 86-86: Tables should be surrounded by blank lines
(MD058, blanks-around-tables)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/tutorials/posts/noc-reasoning-agent.md` around lines 80 - 87, Add a
single blank line between the two adjacent Markdown tables that start with the
header "| Software | Purpose |" and the following table (currently immediately
before "## Setup") to satisfy markdownlint rule MD058; locate the block that
renders the first table and insert one empty line after its final row so there
is a blank line before the next table/heading.
| > **Note — Re-running generation pipelines:** | ||
| > `ns generate` creates both an `output.jsonl` and an `output.jsonl.done` sentinel file in the output directory. If you need to re-run a generation step from scratch, delete **both** files before restarting: | ||
| > | ||
| > `rm outputs/sdg/output.jsonl outputs/sdg/output.jsonl.done` | ||
| > | ||
| > The same applies to any `ns generate` output directory (e.g., `outputs/sdg_reason/`). Without deleting these files, the pipeline will skip generation and reuse the existing results. | ||
|
|
There was a problem hiding this comment.
Remove blank quoted lines inside the blockquote note.
The blockquote under “Re-running generation pipelines” includes empty quoted lines, which triggers MD028 (Line 267). Keep the blockquote contiguous.
Suggested fix
> **Note — Re-running generation pipelines:**
> `ns generate` creates both an `output.jsonl` and an `output.jsonl.done` sentinel file in the output directory. If you need to re-run a generation step from scratch, delete **both** files before restarting:
->
> `rm outputs/sdg/output.jsonl outputs/sdg/output.jsonl.done`
->
> The same applies to any `ns generate` output directory (e.g., `outputs/sdg_reason/`). Without deleting these files, the pipeline will skip generation and reuse the existing results.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| > **Note — Re-running generation pipelines:** | |
| > `ns generate` creates both an `output.jsonl` and an `output.jsonl.done` sentinel file in the output directory. If you need to re-run a generation step from scratch, delete **both** files before restarting: | |
| > | |
| > `rm outputs/sdg/output.jsonl outputs/sdg/output.jsonl.done` | |
| > | |
| > The same applies to any `ns generate` output directory (e.g., `outputs/sdg_reason/`). Without deleting these files, the pipeline will skip generation and reuse the existing results. | |
| > **Note — Re-running generation pipelines:** | |
| > `ns generate` creates both an `output.jsonl` and an `output.jsonl.done` sentinel file in the output directory. If you need to re-run a generation step from scratch, delete **both** files before restarting: | |
| > `rm outputs/sdg/output.jsonl outputs/sdg/output.jsonl.done` | |
| > The same applies to any `ns generate` output directory (e.g., `outputs/sdg_reason/`). Without deleting these files, the pipeline will skip generation and reuse the existing results. |
🧰 Tools
🪛 markdownlint-cli2 (0.21.0)
[warning] 267-267: Blank line inside blockquote
(MD028, no-blanks-blockquote)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/tutorials/posts/noc-reasoning-agent.md` around lines 261 - 267, The
blockquote under "Re-running generation pipelines" contains empty quoted lines
that trigger MD028; open the blockquote starting with "**Note — Re-running
generation pipelines:**" and remove any blank lines that begin with ">" so the
entire quote remains contiguous (ensure the lines showing `ns generate`
behavior, the rm example, and the note about other output directories are all
directly adjacent with no blank quoted lines).
Signed-off-by: Amparo Canaveras <acanaveras@nvidia.com> Signed-off-by: rajeshwarid179 <rdevaramani@nvidia.com> Signed-off-by: acanaveras <142839082+acanaveras@users.noreply.github.com> Signed-off-by: George Armstrong <georgea@nvidia.com> Co-authored-by: Amparo Canaveras <acanaveras@nvidia.com> Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: acanaveras <142839082+acanaveras@users.noreply.github.com> Co-authored-by: rajeshwarid179 <rdevaramani@nvidia.com>
commit a5da597 Author: Igor Gitman <igitman@nvidia.com> Date: Fri Mar 6 12:13:36 2026 -0800 Revert "Eval kit support (#1239)" (#1294) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit b237e33 Author: George <37293288+Jorjeous@users.noreply.github.com> Date: Fri Mar 6 20:25:37 2026 +0400 Eval kit support (#1239) Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com> Signed-off-by: George <37293288+Jorjeous@users.noreply.github.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> commit dc28bbf Author: George Armstrong <georgea@nvidia.com> Date: Thu Mar 5 10:17:44 2026 -0800 Python direct tool calling without MCP (#1286) Signed-off-by: George Armstrong <georgea@nvidia.com> commit 12454dd Author: Sadegh Mahdavi <smahdavi4@gmail.com> Date: Wed Mar 4 13:06:21 2026 -0800 Allow het servers for nemo-rl jobs (#1223) Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> commit 8884a68 Author: Prasoon Varshney <prasoon1995@gmail.com> Date: Wed Mar 4 10:24:02 2026 -0800 Support source_lang param for translation recipe (#1290) Signed-off-by: Prasoon Varshney <prasoonv@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> commit 4618b19 Author: Meriem B. <113170426+ka00ri@users.noreply.github.com> Date: Wed Mar 4 18:59:28 2026 +0100 Add MMLU-Pro 10% optimized subset for checkpoint selection (#1285) Signed-off-by: Meriem Boubdir <mboubdir@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> commit 5ac8609 Author: Talor Abramovich <talor19@gmail.com> Date: Wed Mar 4 02:30:06 2026 +0200 Add SPEED-Bench (within repo) (#1279) Signed-off-by: Talor Abramovich <talora@nvidia.com> Signed-off-by: talora <talora@nvidia.com> Signed-off-by: Talor Abramovich <talor19@gmail.com> Signed-off-by: George Armstrong <georgea@nvidia.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: George Armstrong <georgea@nvidia.com> Co-authored-by: Igor Gitman <igor.a.gitman@gmail.com> commit c31eec5 Author: George Armstrong <georgea@nvidia.com> Date: Tue Mar 3 12:18:15 2026 -0800 Fix os.getlogin() crash in ns setup (#1289) Signed-off-by: George Armstrong <georgea@nvidia.com> commit c228e66 Author: George Armstrong <georgea@nvidia.com> Date: Tue Mar 3 11:04:54 2026 -0800 Fix streaming TypeError when delta.content is None (#1267) (#1288) Signed-off-by: George Armstrong <georgea@nvidia.com> commit aa47923 Author: Matvei Novikov <mnovikov@nvidia.com> Date: Mon Mar 2 16:28:41 2026 -0800 Add LibTrace recipe for generating domain-specific reasoning data (#1224) Signed-off-by: jubick1337 <mnovikov@nvidia.com> Signed-off-by: mnovikov <mnovikov@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> commit 313cad7 Author: Stephen Ge <stepheng@nvidia.com> Date: Mon Mar 2 18:28:49 2026 -0500 fix: clean parse-failure retries in prover (#1284) Signed-off-by: Stephen Ge <stepheng@nvidia.com> commit 813cfa3 Author: George Armstrong <georgea@nvidia.com> Date: Mon Mar 2 15:10:08 2026 -0800 tst: rollback inference-api to integrate (#1287) Signed-off-by: George Armstrong <georgea@nvidia.com> commit 31735f9 Author: Valentin Mendelev <vmendelev@nvidia.com> Date: Mon Mar 2 23:11:25 2026 +0100 Add backend-agnostic unified inference server with NeMo ASR and TTS backends (#1250) Signed-off-by: Valentin Mendelev <vmendelev@nvidia.com> commit d4ef8c0 Author: George <37293288+Jorjeous@users.noreply.github.com> Date: Fri Feb 27 23:58:54 2026 +0400 Update promt_config to working with openai format + inline setup (#1210) Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com> Signed-off-by: George <37293288+Jorjeous@users.noreply.github.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> commit e879cbc Author: George Armstrong <georgea@nvidia.com> Date: Fri Feb 27 10:41:23 2026 -0800 Update noc tutorial (#1282) Signed-off-by: George Armstrong <georgea@nvidia.com> commit f6e3505 Author: George Armstrong <georgea@nvidia.com> Date: Fri Feb 27 10:17:33 2026 -0800 Add noc reasoning tutorial (#1278) Signed-off-by: Amparo Canaveras <acanaveras@nvidia.com> Signed-off-by: rajeshwarid179 <rdevaramani@nvidia.com> Signed-off-by: acanaveras <142839082+acanaveras@users.noreply.github.com> Signed-off-by: George Armstrong <georgea@nvidia.com> Co-authored-by: Amparo Canaveras <acanaveras@nvidia.com> Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: acanaveras <142839082+acanaveras@users.noreply.github.com> Co-authored-by: rajeshwarid179 <rdevaramani@nvidia.com> commit fc2072a Author: Jiacheng Xu <jcxu@utexas.edu> Date: Fri Feb 27 10:10:25 2026 -0800 CritPt generation add prompt_format=None (#1280) Signed-off-by: Jiacheng Xu <jiachengx@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> commit c8abe5d Author: Igor Gitman <igitman@nvidia.com> Date: Fri Feb 27 09:31:26 2026 -0800 New slurm customization parameters (account, containers) (#1209) Signed-off-by: Igor Gitman <igitman@nvidia.com> Signed-off-by: George Armstrong <georgea@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> commit 2b38cce Author: George Armstrong <georgea@nvidia.com> Date: Wed Feb 25 17:59:52 2026 -0800 Add nemo-skills-core subpackage for lightweight installs (#1229) Signed-off-by: George Armstrong <georgea@nvidia.com> commit 9fa8e83 Author: Dheeraj Peri <peri.dheeraj@gmail.com> Date: Wed Feb 25 12:56:35 2026 -0800 feat: add custom judge type support for external repo integration (#1274) Signed-off-by: Igor Gitman <igitman@nvidia.com> Signed-off-by: bzantium <ryumin93@gmail.com> Signed-off-by: Dheeraj Peri <dperi@nvidia.com> Signed-off-by: suriya <sgunasekar@nvidia.com> Signed-off-by: Jiacheng Xu <jiachengx@nvidia.com> Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: Minho Ryu <ryumin93@gmail.com> Co-authored-by: Yongqiang Wang <yongqiang.seagull@gmail.com> Co-authored-by: Suriya Gunasekar <sgunasekar@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Jiacheng Xu <jcxu@utexas.edu> Co-authored-by: George <37293288+Jorjeous@users.noreply.github.com> commit 8a32b13 Author: Igor Gitman <igitman@nvidia.com> Date: Tue Feb 24 15:24:42 2026 -0800 Exclude numb3rs form test_eval.py (#1275) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit 6da2219 Author: George <37293288+Jorjeous@users.noreply.github.com> Date: Mon Feb 23 18:37:46 2026 +0400 Numb3rs ds addition (#1174) Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com> commit ad034b5 Author: Suriya Gunasekar <sgunasekar@users.noreply.github.com> Date: Sun Feb 22 11:55:24 2026 -0800 Add DSBench-DA evaluation (#1254) Squash merge of changes during code-review. Signed-off-by: suriya <sgunasekar@nvidia.com> commit 7593ab3 Author: Jiacheng Xu <jcxu@utexas.edu> Date: Fri Feb 20 16:42:01 2026 -0800 Add CritPt benchmark (#1200) Signed-off-by: Jiacheng Xu <jiachengx@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> commit 58c31b2 Author: Suriya Gunasekar <sgunasekar@users.noreply.github.com> Date: Fri Feb 20 16:19:22 2026 -0800 Fix no_answer metric overcounting in _compute_pass_at_k (#1245) Signed-off-by: suriya <sgunasekar@nvidia.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> commit 1f1a2e7 Author: Igor Gitman <igitman@nvidia.com> Date: Fri Feb 20 15:58:40 2026 -0800 Fix incorrect prompt tokens count due to HF api update (#1264) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit 8ebc6f5 Author: Igor Gitman <igitman@nvidia.com> Date: Fri Feb 20 09:05:33 2026 -0800 Remove deprecated dataset group (#1263) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit ea4177f Author: Yongqiang Wang <yongqiang.seagull@gmail.com> Date: Thu Feb 19 19:57:25 2026 -0500 fix deps (#1258) commit 60905a7 Author: Minho Ryu <ryumin93@gmail.com> Date: Fri Feb 20 09:39:39 2026 +0900 Add aime26 (#1256) Signed-off-by: bzantium <ryumin93@gmail.com> commit b28afc5 Author: Igor Gitman <igitman@nvidia.com> Date: Thu Feb 19 16:18:25 2026 -0800 Rename custom -> external benchmarks (#1262) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit 6cc9c45 Author: Igor Gitman <igitman@nvidia.com> Date: Thu Feb 19 16:10:33 2026 -0800 Add reference to internal benchmarks repo (#1261) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit 5202af6 Author: Igor Gitman <igitman@nvidia.com> Date: Thu Feb 19 16:08:05 2026 -0800 Remove incorrect presence-penalty setting (#1259) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit 144c70b Author: Igor Gitman <igitman@nvidia.com> Date: Thu Feb 19 15:26:33 2026 -0800 Adding an option to store benchmarks in external repo (#1240) Signed-off-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> commit 10e6e39 Author: George <37293288+Jorjeous@users.noreply.github.com> Date: Thu Feb 19 19:57:21 2026 +0400 update vllm miltimodal for api calls convenience (#1213) Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com> Signed-off-by: mmkrtchyan <mmkrtchyan@nvidia.com> Co-authored-by: mmkrtchyan <mmkrtchyan@nvidia.com> commit 1ba4219 Author: Nick Ludwig <nliudvig@nvidia.com> Date: Wed Feb 18 03:28:23 2026 +0400 Fix --server_container not being applied to dependent jobs (#1244) Signed-off-by: Nikolai Ludwig <nliudvig@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> commit 9517614 Author: Wasi Ahmad <wasiahmad@ucla.edu> Date: Mon Feb 16 11:13:24 2026 -0800 Support mini-swe-agent as agent harness (#1212) Signed-off-by: wasiahmad <wasiahmad@ucla.edu> Signed-off-by: i-vainn <imoshkov@nvidia.com> Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: Charlie Truong <chtruong@nvidia.com> Signed-off-by: Nikolai Ludwig <nliudvig@nvidia.com> Signed-off-by: Grigor Nalbandyan <gnalbandyan@nvidia.com> Signed-off-by: bzantium <ryumin93@gmail.com> Signed-off-by: Stephen Ge <stepheng@nvidia.com> Signed-off-by: Jiacheng Xu <jiachengx@nvidia.com> Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com> Signed-off-by: Mateusz Winiarek <mwiniarek@nvidia.com> Signed-off-by: mmkrtchyan <mmkrtchyan@nvidia.com> Signed-off-by: Wei Du <wedu@nvidia.com> Signed-off-by: Igor Gitman <igitman@nvidia.com> Signed-off-by: George <37293288+Jorjeous@users.noreply.github.com> Signed-off-by: SeanNaren <snarenthiran@nvidia.com> Signed-off-by: Arkadiusz Nowaczynski <anowaczynski@nvidia.com> Signed-off-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com> Co-authored-by: Ivan <imoshkov@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> Co-authored-by: Charlie Truong <chtruong@nvidia.com> Co-authored-by: Nick Ludwig <nliudvig@nvidia.com> Co-authored-by: Wojciech Prazuch <wojciechprazuch3@gmail.com> Co-authored-by: gnalbandyan <153070076+gnalbandyan@users.noreply.github.com> Co-authored-by: Minho Ryu <ryumin93@gmail.com> Co-authored-by: Stephen Ge <stepheng@nvidia.com> Co-authored-by: Jiacheng Xu <jcxu@utexas.edu> Co-authored-by: Jiacheng Xu <jiachengx@nvidia.com> Co-authored-by: George <37293288+Jorjeous@users.noreply.github.com> Co-authored-by: Sanyam Kapoor <sanyamk@nvidia.com> Co-authored-by: Mateusz Winiarek <72758259+Froxyy-dev@users.noreply.github.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: Meline Mkrtchyan <72409758+melllinia@users.noreply.github.com> Co-authored-by: Wei Du <wedu@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: Sean Naren <snarenthiran@nvidia.com> Co-authored-by: Mehrzad Samadi <mehrzadsamadi@gmail.com> Co-authored-by: anowaczynski-nvidia <anowaczynski@nvidia.com> Co-authored-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com> commit a3d44dc Author: Suriya Gunasekar <sgunasekar@users.noreply.github.com> Date: Fri Feb 13 22:32:15 2026 -0800 Add --installation_command support to prepare_data (#1243) Signed-off-by: suriya <sgunasekar@nvidia.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> commit e80d524 Author: George Armstrong <georgea@nvidia.com> Date: Thu Feb 12 17:26:00 2026 -0800 Fix CI disk space for Docker image builds (#1241) Signed-off-by: George Armstrong <georgea@nvidia.com> commit d22236c Author: Sadegh Mahdavi <smahdavi4@gmail.com> Date: Wed Feb 11 17:55:00 2026 -0800 Fix answerbench prompt parsing (#1235) Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com> commit 2401628 Author: George Armstrong <georgea@nvidia.com> Date: Wed Feb 11 14:56:43 2026 -0800 feat: add lockfiles for reproducible sandbox builds (#1233) Signed-off-by: George Armstrong <georgea@nvidia.com> commit 5a0a84d Author: Wasi Ahmad <wasiahmad@ucla.edu> Date: Wed Feb 11 13:30:03 2026 -0800 removing datasets version restriction for LCB eval (#1230) Signed-off-by: wasiahmad <wasiahmad@ucla.edu> commit ef0a890 Author: gnalbandyan <153070076+gnalbandyan@users.noreply.github.com> Date: Wed Feb 11 12:03:16 2026 +0400 Gnalbandyan/add physics (#1214) Signed-off-by: Grigor Nalbandyan <gnalbandyan@nvidia.com> Signed-off-by: gnalbandyan <153070076+gnalbandyan@users.noreply.github.com> commit bd9d30c Author: Wasi Ahmad <wasiahmad@ucla.edu> Date: Tue Feb 10 15:13:27 2026 -0800 LCB generic prompting (#1215) Signed-off-by: wasiahmad <wasiahmad@ucla.edu> commit 7d6c49a Author: Sadegh Mahdavi <smahdavi4@gmail.com> Date: Sat Feb 7 08:45:46 2026 -0800 Add support for different variations of nemo-rl (#1220) Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com> commit b19ba96 Author: George Armstrong <georgea@nvidia.com> Date: Fri Feb 6 21:40:56 2026 -0800 Add multi-node sandbox support for SLURM clusters (#1218) Signed-off-by: George Armstrong <georgea@nvidia.com> commit 8950bb0 Author: anowaczynski-nvidia <anowaczynski@nvidia.com> Date: Sat Feb 7 01:38:00 2026 +0100 support structured outputs in hle judge for optional AA compatibility (#1186) Signed-off-by: Arkadiusz Nowaczynski <anowaczynski@nvidia.com> Signed-off-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> commit b84f7a2 Author: Igor Gitman <igitman@nvidia.com> Date: Fri Feb 6 14:51:02 2026 -0800 A small update on running tests docs (#1219) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit 8e838e1 Author: George Armstrong <georgea@nvidia.com> Date: Thu Feb 5 18:01:35 2026 -0800 feat: add flag to disable sandbox replay (#1217) Signed-off-by: George Armstrong <georgea@nvidia.com> commit 5fd9085 Author: Igor Gitman <igitman@nvidia.com> Date: Thu Feb 5 15:57:01 2026 -0800 Add an option to limit number of tool calls (#1216) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit d820200 Author: Igor Gitman <igitman@nvidia.com> Date: Tue Feb 3 10:43:55 2026 -0800 Add arena-hard v2 (#1205) Signed-off-by: bzantium <ryumin93@gmail.com> Signed-off-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: bzantium <ryumin93@gmail.com> commit a30920e Author: Igor Gitman <igitman@nvidia.com> Date: Mon Feb 2 10:53:55 2026 -0800 Fix mkdocs warnings (#1204) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit 19d7788 Author: Ivan <imoshkov@nvidia.com> Date: Mon Feb 2 23:25:13 2026 +0500 Fix infinite wait in sandbox.wait_for_sandbox (#1206) Signed-off-by: i-vainn <imoshkov@nvidia.com> commit 3e65fbf Author: Sadegh Mahdavi <smahdavi4@gmail.com> Date: Fri Jan 30 19:38:38 2026 -0800 Improve tts (#1203) Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com> commit 250c862 Author: Nick Ludwig <nliudvig@nvidia.com> Date: Fri Jan 30 22:12:29 2026 +0400 SWE-bench: fix SWE-agent hanging, adjust expected scores (#1202) Signed-off-by: Nikolai Ludwig <nliudvig@nvidia.com> commit 7ded756 Author: Ivan <imoshkov@nvidia.com> Date: Fri Jan 30 09:57:41 2026 +0500 Add proper token counting to code execution model (#1184) Signed-off-by: i-vainn <imoshkov@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> commit b986304 Author: Igor Gitman <igitman@nvidia.com> Date: Thu Jan 29 17:57:07 2026 -0800 Upgrade containers (#1198) Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com> Signed-off-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: Sadegh Mahdavi <smahdavi@nvidia.com> commit 3b44f02 Author: Dan Lord <blahblahasdf@gmail.com> Date: Thu Jan 29 16:40:47 2026 -0800 Fix incorrect string format (#1199) Signed-off-by: dlord <dlord@nvidia.com> commit c4854b8 Author: Sadegh Mahdavi <smahdavi4@gmail.com> Date: Thu Jan 29 13:43:36 2026 -0800 Update nemo-rl to latest (#1087) Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com> Signed-off-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com>
Signed-off-by: Amparo Canaveras <acanaveras@nvidia.com> Signed-off-by: rajeshwarid179 <rdevaramani@nvidia.com> Signed-off-by: acanaveras <142839082+acanaveras@users.noreply.github.com> Signed-off-by: George Armstrong <georgea@nvidia.com> Co-authored-by: Amparo Canaveras <acanaveras@nvidia.com> Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: acanaveras <142839082+acanaveras@users.noreply.github.com> Co-authored-by: rajeshwarid179 <rdevaramani@nvidia.com>
Signed-off-by: Amparo Canaveras <acanaveras@nvidia.com> Signed-off-by: rajeshwarid179 <rdevaramani@nvidia.com> Signed-off-by: acanaveras <142839082+acanaveras@users.noreply.github.com> Signed-off-by: George Armstrong <georgea@nvidia.com> Co-authored-by: Amparo Canaveras <acanaveras@nvidia.com> Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: acanaveras <142839082+acanaveras@users.noreply.github.com> Co-authored-by: rajeshwarid179 <rdevaramani@nvidia.com> Signed-off-by: dgitman <dgitman@nvidia.com>
Re-opened from #1242 using an origin branch so CI can run with secrets access.
Original PR by @acanaveras:
See #1242 for full review history.
Summary by CodeRabbit