Skip to content

Conversation

@oryx1729
Copy link
Contributor

@oryx1729 oryx1729 commented Jul 21, 2025

Description

Adds a blueprint demonstrating how to build a deep research agent using Haystack Framework that combines web search and Retrieval-Augmented Generation (RAG) using the NeMo-Agent-Toolkit.

By Submitting this PR I confirm:

  • I am familiar with the Contributing Guidelines.
  • We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
    • Any contribution which contains commits that are not Signed-Off will not be accepted.
  • When the PR is ready for review, new or existing tests cover these changes.
  • When the PR is ready for review, the documentation is up to date with these changes.

Summary by CodeRabbit

  • New Features

    • Adds a Haystack Deep Research Agent example: web search + RAG + orchestrated agent, OpenSearch integration, optional startup indexing, sample data, and configurable agent/LLM settings.
  • Documentation

    • Adds a comprehensive README with architecture, setup, configuration, usage examples, and troubleshooting.
  • Tests

    • Adds config validation and conditional end-to-end tests for the example workflow.
  • Chores

    • Adds packaging/metadata, component registration, sample data pointers, and a CI allowlist entry.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Jul 21, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@mdemoret-nv mdemoret-nv added enhancement non-breaking Non-breaking change labels Jul 21, 2025
@mdemoret-nv
Copy link
Collaborator

@oryx1729 Can you update your commits to contain the DCO? More info here: https://github.com/NVIDIA/NeMo-Agent-Toolkit/pull/461/checks?check_run_id=46372766560

@oryx1729
Copy link
Contributor Author

Hi @mdemoret-nv, @mpangrazzi is on vacation and will be back next week. Is it be possible to start the review and have the DCO done when he's back?

@mdemoret-nv
Copy link
Collaborator

@oryx1729 No problem.

@mdemoret-nv mdemoret-nv added feature request New feature or request and removed enhancement labels Jul 25, 2025
@mpangrazzi
Copy link
Contributor

@mdemoret-nv Hi! Commits should be compliant with DCO now.

@mpangrazzi mpangrazzi force-pushed the develop branch 2 times, most recently from 70b2319 to 6ccb74a Compare August 21, 2025 08:55
Copy link
Collaborator

@mdemoret-nv mdemoret-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be better to utilize the LLM top level object.

Copy link
Member

@willkill07 willkill07 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks good.

Please ensure that all "public" functions have full documentation and that all functions have type annotations. It really helps with clarity.

I can imagine in the future a complete haystack plugin with various features :)

@mpangrazzi
Copy link
Contributor

@willkill07 @mdemoret-nv Thank you! I've just did another iteration following your suggestions.

Copy link
Member

@willkill07 willkill07 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late, untimely review on this. I forgot to follow up!

@willkill07
Copy link
Member

@mpangrazzi and @oryx1729 -- I have left a review. I'm sorry that it took me (too) long to respond in a timely manner.

@mpangrazzi mpangrazzi requested a review from a team as a code owner September 19, 2025 12:34
mpangrazzi and others added 11 commits September 19, 2025 15:46
Signed-off-by: oryx1729 <[email protected]>
Signed-off-by: Michele Pangrazzi <[email protected]>
…iaChatGenerator ; Rewrote tests

Signed-off-by: Michele Pangrazzi <[email protected]>
…_haystack_deep_research_agent/register.py

Co-authored-by: Will Killian <[email protected]>
Signed-off-by: Michele Pangrazzi <[email protected]>
…_haystack_deep_research_agent/configs/config.yml

Co-authored-by: Will Killian <[email protected]>
Signed-off-by: Michele Pangrazzi <[email protected]>
…_haystack_deep_research_agent/configs/config.yml

Co-authored-by: Will Killian <[email protected]>
Signed-off-by: Michele Pangrazzi <[email protected]>
…_haystack_deep_research_agent/configs/config.yml

Co-authored-by: Will Killian <[email protected]>
Signed-off-by: Michele Pangrazzi <[email protected]>
…est_haystack_deep_research_agent.py

Co-authored-by: Will Killian <[email protected]>
Signed-off-by: Michele Pangrazzi <[email protected]>
Signed-off-by: Michele Pangrazzi <[email protected]>
mpangrazzi and others added 11 commits September 22, 2025 17:01
…ct.toml

Co-authored-by: Will Killian <[email protected]>
Signed-off-by: Michele Pangrazzi <[email protected]>
…_haystack_deep_research_agent/__init__.py

Co-authored-by: Will Killian <[email protected]>
Signed-off-by: Michele Pangrazzi <[email protected]>
…ct.toml

Co-authored-by: Will Killian <[email protected]>
Signed-off-by: Michele Pangrazzi <[email protected]>
Co-authored-by: Will Killian <[email protected]>
Signed-off-by: Michele Pangrazzi <[email protected]>
Co-authored-by: Will Killian <[email protected]>
Signed-off-by: Michele Pangrazzi <[email protected]>
Signed-off-by: Michele Pangrazzi <[email protected]>
Signed-off-by: Michele Pangrazzi <[email protected]>
@coderabbitai
Copy link

coderabbitai bot commented Sep 22, 2025

Walkthrough

Adds a Haystack Deep Research Agent example: package metadata, configs, sample data, three Haystack pipelines (indexing, search, RAG), workflow registration, README, tests, and a CI path-check allowlist entry integrating OpenSearch, SerperDev web search, and NVIDIA NIM LLMs.

Changes

Cohort / File(s) Summary
Documentation
examples/basic/frameworks/haystack_deep_research_agent/README.md
New README describing architecture, setup, configuration, usage, examples, and troubleshooting for the Haystack Deep Research Agent example.
Packaging & Entry Point
examples/basic/frameworks/haystack_deep_research_agent/pyproject.toml
Adds pyproject declaring package metadata, dependencies, build system, editable UV source mapping, and a nat.components entry point.
Configuration
.../src/nat_haystack_deep_research_agent/configs/config.yml, examples/basic/frameworks/haystack_deep_research_agent/configs
Adds YAML config declaring two NIM LLMs (rag_llm, agent_llm) and workflow parameters (_type: haystack_deep_research_agent, max_agent_steps, search_top_k, rag_top_k, opensearch_url, index_on_startup, data_dir).
Data Assets
.../src/nat_haystack_deep_research_agent/data/*
Adds example data files (Git LFS pointers) for indexing: carbonara.md, sample_document.txt, and data dir entry.
Package Exports
.../src/nat_haystack_deep_research_agent/__init__.py, .../pipelines/__init__.py
Re-exports helper functions: run_startup_indexing, create_rag_tool, and create_search_tool at package and pipelines package levels.
Pipelines: Indexing
.../pipelines/indexing.py
New startup indexing module: gathers PDFs/texts, builds Haystack pipeline (converters, DocumentCleaner, DocumentSplitter, DocumentWriter), supports fallback example data dir, logs progress, and writes to OpenSearch with SKIP policy.
Pipelines: RAG
.../pipelines/rag.py
New create_rag_tool(document_store, *, top_k, generator) building a RAG pipeline with OpenSearch retriever, prompt assembly, NvidiaChatGenerator, returning (ComponentTool, Pipeline).
Pipelines: Web Search
.../pipelines/search.py
New create_search_tool(top_k) building a SerperDevWebSearch → LinkContentFetcher → HTMLToDocument → OutputAdapter pipeline, wrapped as a ComponentTool emitting templated search results.
Workflow Registration
.../register.py
Adds HaystackDeepResearchWorkflowConfig and registers haystack_deep_research_agent workflow: LLM validation (NIM), optional startup indexing, wiring search + rag tools into an Agent, warmup, and an async response handler.
Tests
.../tests/test_haystack_deep_research_agent.py
Adds config YAML validation test and a conditional async e2e test that requires env vars and OpenSearch reachability; includes _opensearch_reachable helper.
CI Path Allowlist
ci/scripts/path_checks.py
Adds an allowlist entry permitting references from the example README to a specific data PDF to avoid path-check violations.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant WF as Workflow
  participant Agent
  participant Search as Search Tool
  participant RAG as RAG Tool
  participant OS as OpenSearch
  participant WebAPI as Web Search API
  participant LLM_A as Agent LLM
  participant LLM_R as RAG LLM

  User->>WF: submit(query)
  activate WF
  WF->>Agent: Run(query)
  activate Agent

  alt uses web
    Agent->>Search: search(query)
    activate Search
    Search->>WebAPI: web.search(query)
    WebAPI-->>Search: links
    Search-->>Agent: search_result
    deactivate Search
  end

  alt uses docs
    Agent->>RAG: retrieve+generate(query)
    activate RAG
    RAG->>OS: retrieve(top_k)
    OS-->>RAG: documents
    RAG->>LLM_R: generate(context+query)
    LLM_R-->>RAG: rag_answer
    RAG-->>Agent: rag_result
    deactivate RAG
  end

  Agent->>LLM_A: compose final reply
  LLM_A-->>Agent: final_message
  Agent-->>WF: response_text
  deactivate Agent
  WF-->>User: response
  deactivate WF
Loading
sequenceDiagram
  autonumber
  participant WF as Workflow Startup
  participant IDX as Indexing Pipeline
  participant FS as Data Dir
  participant OS as OpenSearch

  WF->>WF: check index_on_startup
  alt index_on_startup == true
    WF->>IDX: run_startup_indexing(data_dir)
    activate IDX
    IDX->>FS: scan for PDFs & text
    FS-->>IDX: file lists
    IDX->>OS: write documents (skip duplicates)
    OS-->>IDX: acks
    IDX-->>WF: totals
    deactivate IDX
  else
    WF-->>WF: skip indexing
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title uses an imperative verb, clearly summarizes the addition of a blueprint for the Haystack Deep Research Agent, and stays within the recommended length, satisfying the style requirements.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5b14e5a and 5184836.

📒 Files selected for processing (1)
  • ci/scripts/path_checks.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • ci/scripts/path_checks.py

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

🧹 Nitpick comments (19)
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/data/carbonara.md (3)

1-3: SPDX header vs LFS pointer: add linter exclusion for data pointers.

SPDX header is required for .md, but adding it here would break the LFS pointer (must start with version ...). Update the header-check to ignore LFS-managed files under .../data/** instead of modifying this file.


1-1: markdownlint false positive (MD034 no-bare-urls) on LFS pointer.

This isn’t prose Markdown. Exclude examples/**/data/** from markdownlint or disable MD034 for that glob.

You can use a config like:

# .markdownlint-cli2.yaml
globs:
  - "**/*.md"
  - "!examples/**/data/**"

1-3: Optional: prevent edit races on data assets.

Consider enabling Git LFS file locking for .../data/** in contributor docs to avoid concurrent edits on binary-backed content.

examples/basic/frameworks/haystack_deep_research_agent/pyproject.toml (1)

22-22: Optional: enrich classifiers.

Add Python versions and license classifier to improve packaging metadata.

-classifiers = ["Programming Language :: Python"]
+classifiers = [
+  "Programming Language :: Python",
+  "Programming Language :: Python :: 3.11",
+  "Programming Language :: Python :: 3.12",
+  "License :: OSI Approved :: Apache Software License",
+]
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/rag.py (3)

33-35: Shorten exception message to satisfy ruff TRY003.

Keep it concise; details belong in docs/logs.

-    if generator is None:
-        raise ValueError("NvidiaChatGenerator instance must be provided via builder-configured LLM.")
+    if generator is None:
+        raise ValueError("generator is required")

46-46: Be explicit about required_variables for ChatPromptBuilder.

"*" may not be portable across Haystack versions. List the variables to reduce runtime surprises.

-    prompt_builder = ChatPromptBuilder(template=[ChatMessage.from_user(template)], required_variables="*")
+    prompt_builder = ChatPromptBuilder(
+        template=[ChatMessage.from_user(template)],
+        required_variables=["documents", "query"],
+    )

53-55: Connect explicit ports to avoid implicit default-port coupling.

Being explicit improves robustness across upstream changes.

-    rag_pipeline.connect("retriever", "prompt_builder.documents")
-    rag_pipeline.connect("prompt_builder", "llm")
+    rag_pipeline.connect("retriever", "prompt_builder.documents")
+    rag_pipeline.connect("prompt_builder.messages", "llm.messages")

If NvidiaChatGenerator uses a different input (e.g., "prompt"), update the right-hand side accordingly.

examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/configs/config.yml (1)

24-31: Consider gating heavy dependencies for CI (index_on_startup).

Indexing on startup + OpenSearch can slow/fail CI. Gate with a flag or auto-detect availability.

-  index_on_startup: true
+  index_on_startup: ${INDEX_ON_STARTUP:-false}

Ensure tests set INDEX_ON_STARTUP=true only when OpenSearch is available (or provide an in-memory store fallback).

examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/__init__.py (1)

5-10: Narrow the import guard and log instead of silent pass; also export via all instead of noqa.

Catching Exception and pass hides real errors. Prefer ImportError and a debug log. Define all for re-exports and drop noqa.

+# Re-export pipelines helpers for convenience
+import logging
+logger = logging.getLogger(__name__)
-# Re-export pipelines helpers for convenience
-try:
-    from .pipelines.indexing import run_startup_indexing  # noqa: F401
-    from .pipelines.rag import create_rag_tool  # noqa: F401
-    from .pipelines.search import create_search_tool  # noqa: F401
-except Exception:  # pragma: no cover - optional during install time
-    pass
+__all__ = ["run_startup_indexing", "create_rag_tool", "create_search_tool"]
+try:
+    from .pipelines.indexing import run_startup_indexing
+    from .pipelines.rag import create_rag_tool
+    from .pipelines.search import create_search_tool
+except ImportError as e:  # pragma: no cover - optional during install time
+    logger.debug("Optional re-exports not available: %s", e)
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/__init__.py (1)

2-4: Export via all and remove unused noqa directives.

Ruff flags RUF100. Use all for explicit re-exports.

-from .indexing import run_startup_indexing  # noqa: F401
-from .rag import create_rag_tool  # noqa: F401
-from .search import create_search_tool  # noqa: F401
+from .indexing import run_startup_indexing
+from .rag import create_rag_tool
+from .search import create_search_tool
+
+__all__ = ["run_startup_indexing", "create_rag_tool", "create_search_tool"]
examples/basic/frameworks/haystack_deep_research_agent/README.md (2)

20-21: Tiny copy edit.

Double space after “framework”.

-This example demonstrates how to build a deep research agent using Haystack framework  that combines web search and Retrieval Augmented Generation (RAG) capabilities using the NeMo-Agent-Toolkit.
+This example demonstrates how to build a deep research agent using the Haystack framework that combines web search and Retrieval-Augmented Generation (RAG) using the NeMo Agent Toolkit.

191-193: Clarify reference name.

Use “NeMo Agent Toolkit” consistently (avoid hyphen in prose).

-See Haystack's NvidiaChatGenerator docs: [NvidiaChatGenerator](https://docs.haystack.deepset.ai/docs/nvidiachatgenerator)
+See Haystack’s NvidiaChatGenerator docs: [NvidiaChatGenerator](https://docs.haystack.deepset.ai/docs/nvidiachatgenerator)
examples/basic/frameworks/haystack_deep_research_agent/tests/test_haystack_deep_research_agent.py (1)

69-95: Config keys assertion is fine. Consider YAML parse for structure.

Optional: parse YAML and assert structurally rather than substring checks.

Would you like a follow-up patch that loads YAML and validates fields/types?

examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/indexing.py (1)

93-94: Log with stack trace when swallowing errors.

Per project guidelines, use logger.exception() when not re-raising.

-except Exception as e:  # pragma: no cover
-    logger.warning("Indexing pipeline failed or was skipped due to an error: %s", str(e))
+except Exception as e:  # pragma: no cover
+    logger.exception("Indexing pipeline failed or was skipped due to an error: %s", e)
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/register.py (5)

51-58: Add explicit return type for workflow coroutine.

Type hints are required by guidelines.

-async def haystack_deep_research_agent_workflow(config: HaystackDeepResearchWorkflowConfig, builder: Builder):
+from typing import AsyncIterator, Awaitable, Callable
+
+async def haystack_deep_research_agent_workflow(
+    config: HaystackDeepResearchWorkflowConfig, builder: Builder
+) -> AsyncIterator[Callable[[str], Awaitable[str]]]:

70-81: Use parameterized logging (avoid f-strings) and consistent style.

Also OK to keep info level.

-    logger.info(f"Starting Haystack Deep Research Agent workflow with config: {config}")
+    logger.info("Starting Haystack Deep Research Agent workflow with config: %s", config)
@@
-    logger.info("Connected to OpenSearch successfully")
+    logger.info("Connected to OpenSearch successfully")
@@
-    if config.index_on_startup:
-        run_startup_indexing(document_store=document_store, data_dir=config.data_dir, logger=logger)
+    if config.index_on_startup:
+        run_startup_indexing(document_store=document_store, data_dir=config.data_dir, logger=logger)

94-98: Prefer concise error messages inside exceptions.

TRY003: keep messages short; current messages are fine but could be simplified. Optional.


151-156: GeneratorExit is not exceptional here—log at warning level.

Avoid logger.exception to prevent duplicate stack traces.

-    except GeneratorExit:
-        logger.exception("Workflow exited early!", exc_info=True)
+    except GeneratorExit:
+        logger.warning("Workflow exited early!")

66-69: Optional: import directly from local package submodules to avoid circular import via package init.

Keeps boundaries explicit if init adds logic later.

-from nat_haystack_deep_research_agent import create_rag_tool
-from nat_haystack_deep_research_agent import create_search_tool
-from nat_haystack_deep_research_agent import run_startup_indexing
+from nat_haystack_deep_research_agent.pipelines.rag import create_rag_tool
+from nat_haystack_deep_research_agent.pipelines.search import create_search_tool
+from nat_haystack_deep_research_agent.pipelines.indexing import run_startup_indexing
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 31f8f13 and 45c81dd.

📒 Files selected for processing (14)
  • examples/basic/frameworks/haystack_deep_research_agent/README.md (1 hunks)
  • examples/basic/frameworks/haystack_deep_research_agent/configs (1 hunks)
  • examples/basic/frameworks/haystack_deep_research_agent/data (1 hunks)
  • examples/basic/frameworks/haystack_deep_research_agent/pyproject.toml (1 hunks)
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/__init__.py (1 hunks)
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/configs/config.yml (1 hunks)
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/data/carbonara.md (1 hunks)
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/data/sample_document.txt (1 hunks)
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/__init__.py (1 hunks)
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/indexing.py (1 hunks)
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/rag.py (1 hunks)
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/search.py (1 hunks)
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/register.py (1 hunks)
  • examples/basic/frameworks/haystack_deep_research_agent/tests/test_haystack_deep_research_agent.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (11)
**/*.{py,sh,md,yml,yaml,toml,ini,json,ipynb,txt,rst}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

**/*.{py,sh,md,yml,yaml,toml,ini,json,ipynb,txt,rst}: Every file must start with the standard SPDX Apache-2.0 header; keep copyright years up‑to‑date
All source files must include the SPDX Apache‑2.0 header; do not bypass CI header checks

Files:

  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/data/carbonara.md
  • examples/basic/frameworks/haystack_deep_research_agent/tests/test_haystack_deep_research_agent.py
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/rag.py
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/__init__.py
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/__init__.py
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/configs/config.yml
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/data/sample_document.txt
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/search.py
  • examples/basic/frameworks/haystack_deep_research_agent/README.md
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/register.py
  • examples/basic/frameworks/haystack_deep_research_agent/pyproject.toml
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/indexing.py
**/*.{py,md}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Never hard‑code version numbers in code or docs; versions are derived by setuptools‑scm

Files:

  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/data/carbonara.md
  • examples/basic/frameworks/haystack_deep_research_agent/tests/test_haystack_deep_research_agent.py
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/rag.py
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/__init__.py
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/__init__.py
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/search.py
  • examples/basic/frameworks/haystack_deep_research_agent/README.md
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/register.py
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/indexing.py
**/*

⚙️ CodeRabbit configuration file

**/*: # Code Review Instructions

  • Ensure the code follows best practices and coding standards. - For Python code, follow
    PEP 20 and
    PEP 8 for style guidelines.
  • Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values.
    Example:
    def my_function(param1: int, param2: str) -> bool:
        pass
  • For Python exception handling, ensure proper stack trace preservation:
    • When re-raising exceptions: use bare raise statements to maintain the original stack trace,
      and use logger.error() (not logger.exception()) to avoid duplicate stack trace output.
    • When catching and logging exceptions without re-raising: always use logger.exception()
      to capture the full stack trace information.

Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any

words listed in the ci/vale/styles/config/vocabularies/nat/reject.txt file, words that might appear to be
spelling mistakes but are listed in the ci/vale/styles/config/vocabularies/nat/accept.txt file are OK.

Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,

and should contain an Apache License 2.0 header comment at the top of each file.

  • Confirm that copyright years are up-to date whenever a file is changed.

Files:

  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/data/carbonara.md
  • examples/basic/frameworks/haystack_deep_research_agent/tests/test_haystack_deep_research_agent.py
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/rag.py
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/__init__.py
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/__init__.py
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/configs/config.yml
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/data/sample_document.txt
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/search.py
  • examples/basic/frameworks/haystack_deep_research_agent/README.md
  • examples/basic/frameworks/haystack_deep_research_agent/configs
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/register.py
  • examples/basic/frameworks/haystack_deep_research_agent/pyproject.toml
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/indexing.py
  • examples/basic/frameworks/haystack_deep_research_agent/data
examples/**/*

⚙️ CodeRabbit configuration file

examples/**/*: - This directory contains example code and usage scenarios for the toolkit, at a minimum an example should
contain a README.md or file README.ipynb.

  • If an example contains Python code, it should be placed in a subdirectory named src/ and should
    contain a pyproject.toml file. Optionally, it might also contain scripts in a scripts/ directory.
  • If an example contains YAML files, they should be placed in a subdirectory named configs/. - If an example contains sample data files, they should be placed in a subdirectory named data/, and should
    be checked into git-lfs.

Files:

  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/data/carbonara.md
  • examples/basic/frameworks/haystack_deep_research_agent/tests/test_haystack_deep_research_agent.py
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/rag.py
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/__init__.py
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/__init__.py
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/configs/config.yml
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/data/sample_document.txt
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/search.py
  • examples/basic/frameworks/haystack_deep_research_agent/README.md
  • examples/basic/frameworks/haystack_deep_research_agent/configs
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/register.py
  • examples/basic/frameworks/haystack_deep_research_agent/pyproject.toml
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/indexing.py
  • examples/basic/frameworks/haystack_deep_research_agent/data
**/*.py

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

**/*.py: Follow PEP 8/20 style; format with yapf (column_limit=120) and use 4-space indentation; end files with a single newline
Run ruff (ruff check --fix) per pyproject.toml; fix warnings unless explicitly ignored; ruff is linter-only
Use snake_case for functions/variables, PascalCase for classes, and UPPER_CASE for constants
Treat pyright warnings as errors during development
Exception handling: preserve stack traces and avoid duplicate logging
When re-raising exceptions, use bare raise and log with logger.error(), not logger.exception()
When catching and not re-raising, log with logger.exception() to capture stack trace
Validate and sanitize all user input; prefer httpx with SSL verification and follow OWASP Top‑10
Use async/await for I/O-bound work; profile CPU-heavy paths with cProfile/mprof; cache with functools.lru_cache or external cache; leverage NumPy vectorization when beneficial

**/*.py: Programmatic use: create TestLLMConfig(response_seq=[...], delay_ms=...), add with builder.add_llm("", cfg).
When retrieving the test LLM wrapper, use builder.get_llm(name, wrapper_type=LLMFrameworkEnum.) and call the framework’s method (e.g., ainvoke, achat, call).

Files:

  • examples/basic/frameworks/haystack_deep_research_agent/tests/test_haystack_deep_research_agent.py
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/rag.py
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/__init__.py
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/__init__.py
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/search.py
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/register.py
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/indexing.py
**/tests/test_*.py

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Test files must be named test_*.py and placed under a tests/ folder

Files:

  • examples/basic/frameworks/haystack_deep_research_agent/tests/test_haystack_deep_research_agent.py
**/tests/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

**/tests/**/*.py: Test functions must use the test_ prefix and snake_case
Extract repeated test code into pytest fixtures; fixtures should set name=... in @pytest.fixture and functions named with fixture_ prefix
Mark expensive tests with @pytest.mark.slow or @pytest.mark.integration
Use pytest with pytest-asyncio for async code; mock external services with pytest_httpserver or unittest.mock

Files:

  • examples/basic/frameworks/haystack_deep_research_agent/tests/test_haystack_deep_research_agent.py
**/*.{py,yaml,yml}

📄 CodeRabbit inference engine (.cursor/rules/nat-test-llm.mdc)

**/*.{py,yaml,yml}: Configure response_seq as a list of strings; values cycle per call, and [] yields an empty string.
Configure delay_ms to inject per-call artificial latency in milliseconds for nat_test_llm.

Files:

  • examples/basic/frameworks/haystack_deep_research_agent/tests/test_haystack_deep_research_agent.py
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/rag.py
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/__init__.py
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/__init__.py
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/configs/config.yml
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/search.py
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/register.py
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/indexing.py
**/configs/**/*.y?(a)ml

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Configuration files consumed by code must live in a neighbouring configs/ directory

Files:

  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/configs/config.yml
**/*.{yaml,yml}

📄 CodeRabbit inference engine (.cursor/rules/nat-test-llm.mdc)

In workflow/config YAML, set llms.._type: nat_test_llm to stub responses.

Files:

  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/configs/config.yml
**/README.{md,ipynb}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Each documentation README must follow the NeMo Agent toolkit naming rules and must not use deprecated names

Files:

  • examples/basic/frameworks/haystack_deep_research_agent/README.md
🧠 Learnings (2)
📚 Learning: 2025-08-28T23:22:41.742Z
Learnt from: CR
PR: NVIDIA/NeMo-Agent-Toolkit#0
File: .cursor/rules/general.mdc:0-0
Timestamp: 2025-08-28T23:22:41.742Z
Learning: Applies to **/configs/**/*.y?(a)ml : Configuration files consumed by code must live in a neighbouring configs/ directory

Applied to files:

  • examples/basic/frameworks/haystack_deep_research_agent/tests/test_haystack_deep_research_agent.py
📚 Learning: 2025-08-28T23:22:41.742Z
Learnt from: CR
PR: NVIDIA/NeMo-Agent-Toolkit#0
File: .cursor/rules/general.mdc:0-0
Timestamp: 2025-08-28T23:22:41.742Z
Learning: Applies to packages/*/pyproject.toml : Package pyproject.toml must depend on nvidia-nat or a package starting with nvidia-nat- using ~=<two-digit> versions (e.g., ~=1.0)

Applied to files:

  • examples/basic/frameworks/haystack_deep_research_agent/pyproject.toml
🧬 Code graph analysis (4)
examples/basic/frameworks/haystack_deep_research_agent/tests/test_haystack_deep_research_agent.py (1)
src/nat/runtime/loader.py (1)
  • load_workflow (97-117)
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/__init__.py (3)
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/indexing.py (1)
  • run_startup_indexing (35-94)
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/rag.py (1)
  • create_rag_tool (13-69)
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/search.py (1)
  • create_search_tool (13-60)
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/__init__.py (3)
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/indexing.py (1)
  • run_startup_indexing (35-94)
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/rag.py (1)
  • create_rag_tool (13-69)
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/search.py (1)
  • create_search_tool (13-60)
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/register.py (6)
src/nat/builder/builder.py (1)
  • Builder (64-281)
src/nat/data_models/function.py (1)
  • FunctionBaseConfig (26-27)
src/nat/llm/nim_llm.py (1)
  • NIMModelConfig (34-52)
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/rag.py (1)
  • create_rag_tool (13-69)
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/search.py (1)
  • create_search_tool (13-60)
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/indexing.py (1)
  • run_startup_indexing (35-94)
🪛 markdownlint-cli2 (0.18.1)
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/data/carbonara.md

1-1: Bare URL used

(MD034, no-bare-urls)

🪛 Ruff (0.13.1)
examples/basic/frameworks/haystack_deep_research_agent/tests/test_haystack_deep_research_agent.py

28-30: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.

(S310)


32-32: Do not catch blind exception: Exception

(BLE001)


59-59: Do not call getattr with a constant attribute value. It is not any safer than normal property access.

Replace getattr with attribute access

(B009)

examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/rag.py

35-35: Avoid specifying long messages outside the exception class

(TRY003)

examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/__init__.py

6-6: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


7-7: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


8-8: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


9-10: try-except-pass detected, consider logging the exception

(S110)


9-9: Do not catch blind exception: Exception

(BLE001)

examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/__init__.py

2-2: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


3-3: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


4-4: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/register.py

95-95: Avoid specifying long messages outside the exception class

(TRY003)


97-97: Avoid specifying long messages outside the exception class

(TRY003)


139-139: Unnecessary key check before dictionary access

Replace with dict.get

(RUF019)


147-147: Do not catch blind exception: Exception

(BLE001)


148-148: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


148-148: Use explicit conversion flag

Replace with conversion flag

(RUF010)


149-149: Use explicit conversion flag

Replace with conversion flag

(RUF010)

examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/indexing.py

93-93: Do not catch blind exception: Exception

(BLE001)

🔇 Additional comments (14)
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/data/carbonara.md (1)

1-3: LGTM: correctly tracked via Git LFS.

Pointer looks valid (version/oid/size). Meets examples/data → Git LFS guideline.

Please confirm .gitattributes includes this path so future updates stay on LFS, e.g.:

examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/data/** filter=lfs diff=lfs merge=lfs -text
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/data/sample_document.txt (1)

1-3: LFS pointer and data placement look correct — git‑lfs verification required

  • 3-line LFS pointer present in examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/data/sample_document.txt.
  • Verification environment lacks git‑lfs (git lfs command failed); run locally to confirm the file is tracked by LFS:
git lfs ls-files | rg -n 'sample_document\.txt'
examples/basic/frameworks/haystack_deep_research_agent/pyproject.toml (5)

11-19: Version compatibility sanity check for Haystack stack.

Please confirm these pins co-install cleanly (haystack-ai 2.17–2.18 with opensearch-haystack ~=4.2.0 and nvidia-haystack ~=0.3.0).

I can help adjust ranges if you hit resolver conflicts.


12-12: Good: complies with internal dependency policy.

Dependency on nvidia-nat uses ~= per learnings.


27-28: Ensure module path exists and add trailing newline.

Verify src/nat_haystack_deep_research_agent/register.py exports register(). Also keep a trailing newline (per prior review).

 [project.entry-points.'nat.components']
 nat_haystack_deep_research_agent = "nat_haystack_deep_research_agent.register"
+

24-25: uv source mapping looks correct for monorepo dev.

Editable path to repo root is appropriate.


5-6: setuptools-scm root: confirm intended repo root.

Setting root to "../../../.." makes versioning depend on the monorepo’s VCS root. Confirm builds from this subdir pick up tags as expected.

examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/rag.py (2)

1-2: License header looks good.

SPDX and year are correct and present.


62-67: Fix outputs_to_string mapping (likely reversed/mismatched).

SuperComponent maps "llm.replies" → "rag_result" (examples/.../pipelines/rag.py lines 56–60). outputs_to_string is set to {"source": "rag_result"} (lines 62–66) but should map the SuperComponent output key ("rag_result") to a callable that converts that output to a string. Replace with:

rag_tool = ComponentTool(
    name="rag",
    description="Use this tool to search in our internal database of documents.",
    component=rag_component,
    outputs_to_string={
        "rag_result": (lambda replies: replies[0].content if replies else "")
    },
)

Also update the same pattern in examples/.../pipelines/search.py (outputs_to_string={"source": "search_result"}). Confirm the ComponentTool.outputs_to_string API for the Haystack version you target (mapping[str, Callable[[Any], str]] vs. another signature) and adjust the converter accordingly.

examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/configs/config.yml (1)

29-31: Avoid hard-coded localhost URL and absolute data_dir — make them env-configurable with relative defaults.

Use env-backed defaults so the example is portable and CI-friendly.

-  opensearch_url: http://localhost:9200
+  # Prefer env with a sensible default; your loader should expand env vars.
+  opensearch_url: ${OPENSEARCH_URL:-http://localhost:9200}
@@
-  data_dir: /data
+  # Keep relative to the example repo by default.
+  data_dir: ${NAT_DATA_DIR:-data}

Confirm your config loader supports ${VAR:-default} expansion; if not, use the framework’s interpolation format or document an env-only approach.

examples/basic/frameworks/haystack_deep_research_agent/data (1)

1-1: Confirm LFS tracking and align runtime mount/config — do not delete example data

Findings:

  • .gitattributes contains a rule for examples//data/ -> filter=lfs.
  • Example files exist: examples/basic/frameworks/haystack_deep_research_agent/data/carbonara.md and sample_document.txt.
  • Config sets data_dir: /data at examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/configs/config.yml (line 31).
  • run_startup_indexing in the example's src reads data_dir and falls back to the example data directory if /data is missing.

Actions:

  • Verify locally that those example files are actually tracked by Git LFS (git lfs ls-files). git-lfs was not available in the verification sandbox, so LFS tracking could not be confirmed.
  • If not LFS-tracked: add the files to Git LFS and ensure .gitattributes is committed before pushing.
  • Add a short README.md in examples/basic/frameworks/haystack_deep_research_agent/data describing the sample files and the LFS requirement.
  • Ensure runtime either mounts examples/.../data to /data (matches current config) or change data_dir in config.yml to a relative path resolved at runtime.

Relevant locations: examples/basic/frameworks/haystack_deep_research_agent/data/, examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/configs/config.yml, and the example indexing code (run_startup_indexing) in the example src.

examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/search.py (1)

13-60: LGTM: well‑factored search tool pipeline.

Clear inputs/outputs, typed signature, and reasonable defaults.

examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/register.py (2)

27-39: Minor: tighten and format system prompt (no action required).

Content is clear; keep as-is unless you want me to trim indentation.


90-108: Consider using builder.get_llm(..., wrapper_type=LLMFrameworkEnum.HAYSTACK) to obtain generators.

Aligns with repo guidance; if not available, current approach is fine.

Do we have a Haystack wrapper enum in LLMFrameworkEnum for direct retrieval? If yes, I can draft a patch.

Signed-off-by: Michele Pangrazzi <[email protected]>
@mpangrazzi
Copy link
Contributor

@willkill07 I've updated the branch with your suggestions, lastly updated some aiq imports. I see you've also added an AI-based reviewer, would wait your feedback before addressing suggestions. Thank you!

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (4)
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/register.py (4)

27-39: Dedent the system prompt to avoid leading indentation in LLM input.
Use textwrap.dedent and strip to remove incidental whitespace.

Apply this diff:

@@
-import logging
+import logging
+import textwrap
@@
-class HaystackDeepResearchWorkflowConfig(FunctionBaseConfig, name="haystack_deep_research_agent"):  # type: ignore
-    system_prompt: str = """
-    You are a deep research assistant.
-    You create comprehensive research reports to answer the user's questions.
-    You use the 'search' tool to answer any questions by using web search.
-    You use the 'rag' tool to answer any questions by using retrieval augmented generation on your internal document DB.
-    You perform multiple searches until you have the information you need to answer the question.
-    Make sure you research different aspects of the question.
-    Use markdown to format your response.
-    When you use information from the websearch results, cite your sources using markdown links.
-    When you use information from the document database, cite the text used from the source document.
-    It is important that you cite accurately.
-    """
+class HaystackDeepResearchWorkflowConfig(FunctionBaseConfig, name="haystack_deep_research_agent"):  # type: ignore
+    system_prompt: str = textwrap.dedent(
+        """
+        You are a deep research assistant.
+        You create comprehensive research reports to answer the user's questions.
+        You use the 'search' tool to answer any questions by using web search.
+        You use the 'rag' tool to answer any questions by using retrieval augmented generation on your internal document DB.
+        You perform multiple searches until you have the information you need to answer the question.
+        Make sure you research different aspects of the question.
+        Use markdown to format your response.
+        When you use information from the websearch results, cite your sources using markdown links.
+        When you use information from the document database, cite the text used from the source document.
+        It is important that you cite accurately.
+        """
+    ).strip()

76-78: Clarify OpenSearch log message.
Object construction ≠ successful connection; adjust wording.

Apply this diff:

-    logger.info("Connected to OpenSearch successfully")
+    logger.info("Initialized OpenSearchDocumentStore")

94-98: Shorten exception messages per TRY003 (ruff) and keep them factual.

Apply this diff:

-    if not isinstance(rag_llm_cfg, NIMModelConfig):
-        raise TypeError("llms.rag_llm must be of type 'nim'.")
-    if not isinstance(agent_llm_cfg, NIMModelConfig):
-        raise TypeError("llms.agent_llm must be of type 'nim'.")
+    if not isinstance(rag_llm_cfg, NIMModelConfig):
+        raise TypeError("rag_llm must be a NIMModelConfig")
+    if not isinstance(agent_llm_cfg, NIMModelConfig):
+        raise TypeError("agent_llm must be a NIMModelConfig")

153-154: Don’t use logger.exception for normal generator shutdown.
Log at info/debug without stack trace.

Apply this diff:

-    except GeneratorExit:
-        logger.exception("Workflow exited early!", exc_info=True)
+    except GeneratorExit:
+        logger.info("Workflow exited early.")
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 45c81dd and 4a9e81f.

📒 Files selected for processing (1)
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/register.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (6)
**/*.py

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

**/*.py: Follow PEP 8/20 style; format with yapf (column_limit=120) and use 4-space indentation; end files with a single newline
Run ruff (ruff check --fix) per pyproject.toml; fix warnings unless explicitly ignored; ruff is linter-only
Use snake_case for functions/variables, PascalCase for classes, and UPPER_CASE for constants
Treat pyright warnings as errors during development
Exception handling: preserve stack traces and avoid duplicate logging
When re-raising exceptions, use bare raise and log with logger.error(), not logger.exception()
When catching and not re-raising, log with logger.exception() to capture stack trace
Validate and sanitize all user input; prefer httpx with SSL verification and follow OWASP Top‑10
Use async/await for I/O-bound work; profile CPU-heavy paths with cProfile/mprof; cache with functools.lru_cache or external cache; leverage NumPy vectorization when beneficial

**/*.py: Programmatic use: create TestLLMConfig(response_seq=[...], delay_ms=...), add with builder.add_llm("", cfg).
When retrieving the test LLM wrapper, use builder.get_llm(name, wrapper_type=LLMFrameworkEnum.) and call the framework’s method (e.g., ainvoke, achat, call).

Files:

  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/register.py
**/*.{py,sh,md,yml,yaml,toml,ini,json,ipynb,txt,rst}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

**/*.{py,sh,md,yml,yaml,toml,ini,json,ipynb,txt,rst}: Every file must start with the standard SPDX Apache-2.0 header; keep copyright years up‑to‑date
All source files must include the SPDX Apache‑2.0 header; do not bypass CI header checks

Files:

  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/register.py
**/*.{py,md}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Never hard‑code version numbers in code or docs; versions are derived by setuptools‑scm

Files:

  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/register.py
**/*.{py,yaml,yml}

📄 CodeRabbit inference engine (.cursor/rules/nat-test-llm.mdc)

**/*.{py,yaml,yml}: Configure response_seq as a list of strings; values cycle per call, and [] yields an empty string.
Configure delay_ms to inject per-call artificial latency in milliseconds for nat_test_llm.

Files:

  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/register.py
**/*

⚙️ CodeRabbit configuration file

**/*: # Code Review Instructions

  • Ensure the code follows best practices and coding standards. - For Python code, follow
    PEP 20 and
    PEP 8 for style guidelines.
  • Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values.
    Example:
    def my_function(param1: int, param2: str) -> bool:
        pass
  • For Python exception handling, ensure proper stack trace preservation:
    • When re-raising exceptions: use bare raise statements to maintain the original stack trace,
      and use logger.error() (not logger.exception()) to avoid duplicate stack trace output.
    • When catching and logging exceptions without re-raising: always use logger.exception()
      to capture the full stack trace information.

Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any

words listed in the ci/vale/styles/config/vocabularies/nat/reject.txt file, words that might appear to be
spelling mistakes but are listed in the ci/vale/styles/config/vocabularies/nat/accept.txt file are OK.

Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,

and should contain an Apache License 2.0 header comment at the top of each file.

  • Confirm that copyright years are up-to date whenever a file is changed.

Files:

  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/register.py
examples/**/*

⚙️ CodeRabbit configuration file

examples/**/*: - This directory contains example code and usage scenarios for the toolkit, at a minimum an example should
contain a README.md or file README.ipynb.

  • If an example contains Python code, it should be placed in a subdirectory named src/ and should
    contain a pyproject.toml file. Optionally, it might also contain scripts in a scripts/ directory.
  • If an example contains YAML files, they should be placed in a subdirectory named configs/. - If an example contains sample data files, they should be placed in a subdirectory named data/, and should
    be checked into git-lfs.

Files:

  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/register.py
🧬 Code graph analysis (1)
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/register.py (6)
src/nat/builder/builder.py (1)
  • Builder (64-281)
src/nat/data_models/function.py (1)
  • FunctionBaseConfig (26-27)
src/nat/llm/nim_llm.py (1)
  • NIMModelConfig (34-52)
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/rag.py (1)
  • create_rag_tool (13-69)
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/search.py (1)
  • create_search_tool (13-60)
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/indexing.py (1)
  • run_startup_indexing (35-94)
🪛 Ruff (0.13.1)
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/register.py

95-95: Avoid specifying long messages outside the exception class

(TRY003)


97-97: Avoid specifying long messages outside the exception class

(TRY003)


139-139: Unnecessary key check before dictionary access

Replace with dict.get

(RUF019)


147-147: Do not catch blind exception: Exception

(BLE001)


148-148: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


148-148: Use explicit conversion flag

Replace with conversion flag

(RUF010)


149-149: Use explicit conversion flag

Replace with conversion flag

(RUF010)

🔇 Additional comments (4)
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/register.py (4)

1-16: SPDX header present and correct.
Year and license look good.


19-23: Correct package prefix (nat.*) and scoped imports.
This fixes the earlier aiq.* issue.


83-89: Confirm API key sourcing for NIM.
You ignore cfg.api_key and rely on NVIDIA_API_KEY env var. If cfg.api_key can be set in Builder, prefer it (with env fallback).

Would you like me to wire cfg.api_key (e.g., Secret.from_token(cfg.api_key) if present) and fall back to Secret.from_env_var("NVIDIA_API_KEY")?


121-149: Harden logging and error handling; don’t leak internal errors to users.
Use dict.get, parameterized logs, and logger.exception; return a generic message. Mirrors earlier feedback.

Apply this diff:

@@
-        try:
-            logger.info(f"Processing research query: {input_message}")
+        try:
+            logger.info("Processing research query: %s", input_message)
@@
-            agent_output = agent.run(messages=messages)
+            agent_output = agent.run(messages=messages)
@@
-            # Extract response
-            if "messages" in agent_output and agent_output["messages"]:
-                response = agent_output["messages"][-1].text
-                logger.info("Research query completed successfully")
-                return response
-            else:
-                logger.warning(f"No response generated for query: {input_message}")
-                return "I apologize, but I was unable to generate a response for your query."
+            # Extract response
+            messages_out = agent_output.get("messages") or []
+            if messages_out:
+                response = messages_out[-1].text
+                logger.info("Research query completed successfully")
+                return response
+            logger.warning("No response generated for query: %s", input_message)
+            return "I apologize, but I was unable to generate a response for your query."
@@
-        except Exception as e:
-            logger.error(f"Workflow execution failed: {str(e)}")
-            return f"I apologize, but an error occurred during research: {str(e)}"
+        except Exception:
+            logger.exception("Workflow execution failed")
+            return "I apologize, but an unexpected error occurred during research."

@willkill07
Copy link
Member

/ok to test 4a9e81f

@willkill07
Copy link
Member

@mpangrazzi we have some CI checks that are failing.

  1. Several files are missing copyright headers:
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/init.py
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/init.py
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/init.py
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/indexing.py
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/rag.py
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/search.py
  1. There are wording issues in examples/basic/frameworks/haystack_deep_research_agent/README.md:
    156:66  error  Did you really mean 'venv'?     Vale.Spelling 
    209:81  error  Use 'Pydantic' instead of       Vale.Terms    
                   'pydantic'.
    
    Here, I would spell out virtual environment and correct to Pydantic
  2. We need to add <!-- vale off --> to the top of examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/data/carbonara.md and for good measure a <!-- vale on --> to the end.
  3. We need to add a pair to the ALLOWLISTED_FILE_PATH_PAIRS set in ci/scripts/path_checks.py file:
    (
      r"^examples/basic/frameworks/haystack_deep_research_agent/README.md",
      r"^examples/basic/frameworks/haystack_deep_research_agent/data/bedrock-ug.pdf",
    ),

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (7)
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/data/carbonara.md (2)

1-3: Add Vale guards to the underlying Markdown content (not the LFS pointer).

Per CI feedback, ensure the actual carbonara.md blob starts with <!-- vale off --> and ends with <!-- vale on -->. Do not insert these into the three-line LFS pointer; update the real Markdown content so the pointer’s oid/size change accordingly.

I can provide a quick patch snippet showing the exact placement if you share the markdown body.


1-3: Optional: Fix bare URLs instead of disabling lint.

If you prefer keeping Vale on, convert bare URLs to proper markdown links [text](url) to address MD034.

examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/__init__.py (1)

16-18: Remove unnecessary noqa directives.

Static analysis indicates F401 rules are not enabled, making these noqa directives unnecessary.

-from .indexing import run_startup_indexing  # noqa: F401
-from .rag import create_rag_tool  # noqa: F401
-from .search import create_search_tool  # noqa: F401
+from .indexing import run_startup_indexing
+from .rag import create_rag_tool
+from .search import create_search_tool
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/rag.py (1)

45-47: Move exception message to custom exception class.

Long messages in exception constructors can make code harder to maintain. Consider creating a custom exception class or using a shorter message.

     if generator is None:
-        raise ValueError("NvidiaChatGenerator instance must be provided via builder-configured LLM.")
+        raise ValueError("NvidiaChatGenerator instance must be provided")
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/__init__.py (1)

18-20: Remove unnecessary noqa directives.

Static analysis indicates F401 rules are not enabled, making these noqa directives unnecessary.

-    from .pipelines.indexing import run_startup_indexing  # noqa: F401
-    from .pipelines.rag import create_rag_tool  # noqa: F401
-    from .pipelines.search import create_search_tool  # noqa: F401
+    from .pipelines.indexing import run_startup_indexing
+    from .pipelines.rag import create_rag_tool
+    from .pipelines.search import create_search_tool
examples/basic/frameworks/haystack_deep_research_agent/tests/test_haystack_deep_research_agent.py (1)

50-51: Use direct attribute access instead of getattr with constant.

Using getattr with a string constant provides no safety benefit over normal attribute access.

-    loader_mod = importlib.import_module("nat.runtime.loader")
-    load_workflow = getattr(loader_mod, "load_workflow")
+    loader_mod = importlib.import_module("nat.runtime.loader")
+    load_workflow = loader_mod.load_workflow
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/register.py (1)

88-91: Move long exception messages to custom exception classes.

Long inline error messages should be moved to custom exception classes for better maintainability.

     if not isinstance(rag_llm_cfg, NIMModelConfig):
-        raise TypeError("llms.rag_llm must be of type 'nim'.")
+        raise TypeError("llms.rag_llm must be of type 'nim'")
     if not isinstance(agent_llm_cfg, NIMModelConfig):
-        raise TypeError("llms.agent_llm must be of type 'nim'.")
+        raise TypeError("llms.agent_llm must be of type 'nim'")
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4a9e81f and 5b14e5a.

📒 Files selected for processing (10)
  • ci/scripts/path_checks.py (1 hunks)
  • examples/basic/frameworks/haystack_deep_research_agent/README.md (1 hunks)
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/__init__.py (1 hunks)
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/data/carbonara.md (1 hunks)
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/__init__.py (1 hunks)
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/indexing.py (1 hunks)
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/rag.py (1 hunks)
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/search.py (1 hunks)
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/register.py (1 hunks)
  • examples/basic/frameworks/haystack_deep_research_agent/tests/test_haystack_deep_research_agent.py (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • examples/basic/frameworks/haystack_deep_research_agent/README.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/search.py
🧰 Additional context used
📓 Path-based instructions (6)
**/*.{py,yaml,yml}

📄 CodeRabbit inference engine (.cursor/rules/nat-test-llm.mdc)

**/*.{py,yaml,yml}: Configure response_seq as a list of strings; values cycle per call, and [] yields an empty string.
Configure delay_ms to inject per-call artificial latency in milliseconds for nat_test_llm.

Files:

  • ci/scripts/path_checks.py
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/indexing.py
  • examples/basic/frameworks/haystack_deep_research_agent/tests/test_haystack_deep_research_agent.py
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/register.py
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/rag.py
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/__init__.py
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/__init__.py
**/*.py

📄 CodeRabbit inference engine (.cursor/rules/nat-test-llm.mdc)

**/*.py: Programmatic use: create TestLLMConfig(response_seq=[...], delay_ms=...), add with builder.add_llm("", cfg).
When retrieving the test LLM wrapper, use builder.get_llm(name, wrapper_type=LLMFrameworkEnum.) and call the framework’s method (e.g., ainvoke, achat, call).

**/*.py: In code comments/identifiers use NAT abbreviations as specified: nat for API namespace/CLI, nvidia-nat for package name, NAT for env var prefixes; do not use these abbreviations in documentation
Follow PEP 20 and PEP 8; run yapf with column_limit=120; use 4-space indentation; end files with a single trailing newline
Run ruff check --fix as linter (not formatter) using pyproject.toml config; fix warnings unless explicitly ignored
Respect naming: snake_case for functions/variables, PascalCase for classes, UPPER_CASE for constants
Treat pyright warnings as errors during development
Exception handling: use bare raise to re-raise; log with logger.error() when re-raising to avoid duplicate stack traces; use logger.exception() when catching without re-raising
Provide Google-style docstrings for every public module, class, function, and CLI command; first line concise and ending with a period; surround code entities with backticks
Validate and sanitize all user input, especially in web or CLI interfaces
Prefer httpx with SSL verification enabled by default and follow OWASP Top-10 recommendations
Use async/await for I/O-bound work; profile CPU-heavy paths with cProfile or mprof before optimizing; cache expensive computations with functools.lru_cache or external cache; leverage NumPy vectorized operations when beneficial

Files:

  • ci/scripts/path_checks.py
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/indexing.py
  • examples/basic/frameworks/haystack_deep_research_agent/tests/test_haystack_deep_research_agent.py
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/register.py
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/rag.py
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/__init__.py
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/__init__.py
{scripts/**,ci/scripts/**}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Shell or utility scripts belong in scripts/ or ci/scripts/ and must not be mixed with library code

Files:

  • ci/scripts/path_checks.py
**/*

⚙️ CodeRabbit configuration file

**/*: # Code Review Instructions

  • Ensure the code follows best practices and coding standards. - For Python code, follow
    PEP 20 and
    PEP 8 for style guidelines.
  • Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values.
    Example:
    def my_function(param1: int, param2: str) -> bool:
        pass
  • For Python exception handling, ensure proper stack trace preservation:
    • When re-raising exceptions: use bare raise statements to maintain the original stack trace,
      and use logger.error() (not logger.exception()) to avoid duplicate stack trace output.
    • When catching and logging exceptions without re-raising: always use logger.exception()
      to capture the full stack trace information.

Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any

words listed in the ci/vale/styles/config/vocabularies/nat/reject.txt file, words that might appear to be
spelling mistakes but are listed in the ci/vale/styles/config/vocabularies/nat/accept.txt file are OK.

Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,

and should contain an Apache License 2.0 header comment at the top of each file.

  • Confirm that copyright years are up-to date whenever a file is changed.

Files:

  • ci/scripts/path_checks.py
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/indexing.py
  • examples/basic/frameworks/haystack_deep_research_agent/tests/test_haystack_deep_research_agent.py
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/register.py
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/rag.py
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/__init__.py
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/data/carbonara.md
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/__init__.py
examples/**/*

⚙️ CodeRabbit configuration file

examples/**/*: - This directory contains example code and usage scenarios for the toolkit, at a minimum an example should
contain a README.md or file README.ipynb.

  • If an example contains Python code, it should be placed in a subdirectory named src/ and should
    contain a pyproject.toml file. Optionally, it might also contain scripts in a scripts/ directory.
  • If an example contains YAML files, they should be placed in a subdirectory named configs/. - If an example contains sample data files, they should be placed in a subdirectory named data/, and should
    be checked into git-lfs.

Files:

  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/indexing.py
  • examples/basic/frameworks/haystack_deep_research_agent/tests/test_haystack_deep_research_agent.py
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/register.py
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/rag.py
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/__init__.py
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/data/carbonara.md
  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/__init__.py
**/data/**

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Large/binary assets must be committed with Git-LFS and placed in a neighbouring data/ folder

Files:

  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/data/carbonara.md
🧠 Learnings (2)
📚 Learning: 2025-09-23T18:39:15.023Z
Learnt from: CR
PR: NVIDIA/NeMo-Agent-Toolkit#0
File: .cursor/rules/general.mdc:0-0
Timestamp: 2025-09-23T18:39:15.023Z
Learning: Applies to {tests/**/*.py,examples/*/tests/**/*.py} : Use pytest (with pytest-asyncio for async); name test files test_*.py; test functions start with test_; extract repeated code into fixtures; fixtures must set name in decorator and be named with fixture_ prefix

Applied to files:

  • examples/basic/frameworks/haystack_deep_research_agent/tests/test_haystack_deep_research_agent.py
📚 Learning: 2025-09-23T18:39:15.023Z
Learnt from: CR
PR: NVIDIA/NeMo-Agent-Toolkit#0
File: .cursor/rules/general.mdc:0-0
Timestamp: 2025-09-23T18:39:15.023Z
Learning: Every file must start with the standard SPDX Apache-2.0 header; keep copyright years up to date

Applied to files:

  • examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/__init__.py
🧬 Code graph analysis (4)
examples/basic/frameworks/haystack_deep_research_agent/tests/test_haystack_deep_research_agent.py (2)
src/nat/runtime/loader.py (1)
  • load_workflow (97-117)
src/nat/runtime/session.py (1)
  • workflow (84-85)
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/register.py (6)
src/nat/builder/builder.py (1)
  • Builder (64-281)
src/nat/data_models/function.py (1)
  • FunctionBaseConfig (26-27)
src/nat/llm/nim_llm.py (1)
  • NIMModelConfig (34-52)
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/rag.py (1)
  • create_rag_tool (25-81)
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/search.py (1)
  • create_search_tool (25-72)
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/indexing.py (1)
  • run_startup_indexing (47-106)
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/__init__.py (3)
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/indexing.py (1)
  • run_startup_indexing (47-106)
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/rag.py (1)
  • create_rag_tool (25-81)
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/search.py (1)
  • create_search_tool (25-72)
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/__init__.py (3)
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/indexing.py (1)
  • run_startup_indexing (47-106)
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/rag.py (1)
  • create_rag_tool (25-81)
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/search.py (1)
  • create_search_tool (25-72)
🪛 Ruff (0.13.1)
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/indexing.py

105-105: Do not catch blind exception: Exception

(BLE001)

examples/basic/frameworks/haystack_deep_research_agent/tests/test_haystack_deep_research_agent.py

27-27: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.

(S310)


29-29: Do not catch blind exception: Exception

(BLE001)


51-51: Do not call getattr with a constant attribute value. It is not any safer than normal property access.

Replace getattr with attribute access

(B009)

examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/register.py

89-89: Avoid specifying long messages outside the exception class

(TRY003)


91-91: Avoid specifying long messages outside the exception class

(TRY003)


133-133: Unnecessary key check before dictionary access

Replace with dict.get

(RUF019)


141-141: Do not catch blind exception: Exception

(BLE001)


142-142: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


142-142: Use explicit conversion flag

Replace with conversion flag

(RUF010)


143-143: Use explicit conversion flag

Replace with conversion flag

(RUF010)

examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/rag.py

47-47: Avoid specifying long messages outside the exception class

(TRY003)

examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/__init__.py

18-18: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


19-19: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


20-20: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


21-22: try-except-pass detected, consider logging the exception

(S110)


21-21: Do not catch blind exception: Exception

(BLE001)

examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/__init__.py

16-16: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


17-17: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


18-18: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

🪛 markdownlint-cli2 (0.18.1)
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/data/carbonara.md

1-1: Bare URL used

(MD034, no-bare-urls)

🔇 Additional comments (15)
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/data/carbonara.md (1)

1-3: LGTM: Data asset correctly tracked via Git LFS in data/

Pointer format and placement comply with repo guidelines for example data.

Please confirm .gitattributes correctly tracks this path pattern with LFS (likely examples/**/data/**), so future updates keep this file under LFS.

ci/scripts/path_checks.py (1)

87-90: LGTM!

The allowlist entry correctly permits references from the Haystack Deep Research Agent README to the bedrock-ug.pdf data file, preventing false path check violations. The implementation follows the repository's established pattern for allowlisting file-path pairs.

examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/__init__.py (1)

1-15: LGTM!

The Apache 2.0 license header is complete and properly formatted.

examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/rag.py (1)

25-30: Add OpenSearchDocumentStore type annotation.

The document_store parameter is untyped but should be properly annotated per repository guidelines that all parameters must be typed.

+from haystack_integrations.document_stores.opensearch import OpenSearchDocumentStore
@@
 def create_rag_tool(
-    document_store,
+    document_store: OpenSearchDocumentStore,
     *,
     top_k: int = 15,
     generator: NvidiaChatGenerator | None = None,
 ) -> tuple[ComponentTool, Pipeline]:
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/indexing.py (3)

27-30: LGTM!

Clean helper function that efficiently gathers source files by type using pathlib globbing.


33-44: LGTM!

Pipeline configuration is well-structured with appropriate document processing steps - cleaning, sentence-based splitting with reasonable parameters, and duplicate-skipping policy for the document store.


47-106: LGTM!

Robust indexing implementation with good error handling, fallback directory logic, separate processing for PDF and text sources, and proper logging throughout the workflow.

examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/__init__.py (1)

17-22: Graceful import handling is appropriate.

The try/except pattern allows the package to be importable even when optional dependencies are missing during installation, which is good practice for optional re-exports.

examples/basic/frameworks/haystack_deep_research_agent/tests/test_haystack_deep_research_agent.py (3)

25-30: Fix security and error handling issues in URL health check.

The OpenSearch documentation confirms that cluster health endpoints use HTTP/HTTPS URLs, but the current implementation has security and error handling problems that should be addressed.

+from urllib.parse import urlsplit

 def _opensearch_reachable(url: str) -> bool:
     try:
+        # Only allow http/https schemes for security
+        parts = urlsplit(url)
+        if parts.scheme not in {"http", "https"}:
+            return False
         with urllib.request.urlopen(f"{url.rstrip('/')}/_cluster/health", timeout=1) as resp:
-            return 200 <= getattr(resp, "status", 0) < 300
-    except Exception:
+            return 200 <= resp.status < 300
+    except (urllib.error.URLError, urllib.error.HTTPError, TimeoutError, ValueError):
         return False

46-58: Fix config path and module import in e2e test.

The test references incorrect paths and module names that will cause runtime failures.

+@pytest.mark.asyncio
 async def test_full_workflow_e2e() -> None:
     config_file = (
         Path(__file__).resolve().parents[1]
-        / "src"
-        / "aiq_haystack_deep_research_agent"
         / "configs"
         / "config.yml"
     )

-    loader_mod = importlib.import_module("aiq.runtime.loader")
-    load_workflow = getattr(loader_mod, "load_workflow")
+    loader_mod = importlib.import_module("nat.runtime.loader")
+    load_workflow = getattr(loader_mod, "load_workflow")

61-82: LGTM!

Clean configuration validation test that checks for required keys using string presence validation, which is appropriate for YAML structure verification.

examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/register.py (4)

26-47: LGTM!

Well-structured configuration class with comprehensive system prompt and sensible defaults. The configuration appropriately extends FunctionBaseConfig and includes all necessary parameters for the workflow.


50-57: Add explicit return type annotation.

The async generator function lacks an explicit return type annotation, which violates repository guidelines requiring all functions to be typed.

+from collections.abc import AsyncGenerator, Awaitable, Callable
@@
 @register_function(config_type=HaystackDeepResearchWorkflowConfig)
-async def haystack_deep_research_agent_workflow(config: HaystackDeepResearchWorkflowConfig, builder: Builder):
+async def haystack_deep_research_agent_workflow(
+    config: HaystackDeepResearchWorkflowConfig, builder: Builder
+) -> AsyncGenerator[Callable[[str], Awaitable[str]], None]:

68-68: Avoid logging full config objects to prevent sensitive data leakage.

Logging the entire config object may expose sensitive fields like API keys.

-    logger.info(f"Starting Haystack Deep Research Agent workflow with config: {config}")
+    logger.info(
+        "Starting Haystack Deep Research Agent workflow "
+        "(max_agent_steps=%d, search_top_k=%d, rag_top_k=%d, index_on_startup=%s, data_dir=%s)",
+        config.max_agent_steps,
+        config.search_top_k,
+        config.rag_top_k,
+        config.index_on_startup,
+        config.data_dir,
+    )

125-143: Improve error handling and message access pattern.

The current exception handling and dictionary access patterns need hardening to prevent user-facing error leakage and use safer dict access.

         try:
-            logger.info(f"Processing research query: {input_message}")
+            logger.info("Processing research query: %s", input_message)

             # Create messages
             messages = [ChatMessage.from_user(input_message)]
             agent_output = agent.run(messages=messages)

             # Extract response
-            if "messages" in agent_output and agent_output["messages"]:
-                response = agent_output["messages"][-1].text
+            messages_out = agent_output.get("messages") or []
+            if messages_out:
+                response = messages_out[-1].text
                 logger.info("Research query completed successfully")
                 return response
             else:
-                logger.warning(f"No response generated for query: {input_message}")
+                logger.warning("No response generated for query: %s", input_message)
                 return "I apologize, but I was unable to generate a response for your query."

         except Exception as e:
-            logger.error(f"Workflow execution failed: {str(e)}")
-            return f"I apologize, but an error occurred during research: {str(e)}"
+            logger.exception("Workflow execution failed: %s", e)
+            return "I apologize, but an unexpected error occurred during research."

@willkill07
Copy link
Member

/ok to test 5b14e5a

@mpangrazzi
Copy link
Contributor

@willkill07 I should have fixed most of the things, but still there's an invalid path:

- examples/basic/frameworks/haystack_deep_research_agent/README.md:93:6 -> examples/basic/frameworks/haystack_deep_research_agent/data/bedrock-ug.pdf

Why this happens even if I've added the file paths in ci/scripts/path_checks.py as you suggested? Any idea?

@willkill07
Copy link
Member

Sorry I gave you bad guidance!

The tuple should go under IGNORED_FILE_PATH_PAIRS rather than ALLOWLISTED_FILE_PATH_PAIRS

Signed-off-by: Michele Pangrazzi <[email protected]>
@mpangrazzi
Copy link
Contributor

@willkill07 no worries, makes more sense now 😉 Checked locally with ./ci/scripts/checks.sh seems ok now.

@willkill07
Copy link
Member

/ok to test 5184836

@willkill07
Copy link
Member

/merge

@rapids-bot rapids-bot bot merged commit a3c7296 into NVIDIA:develop Sep 25, 2025
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature request New feature or request non-breaking Non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants