-
Notifications
You must be signed in to change notification settings - Fork 440
Add a blueprint for Haystack Deep Research Agent #461
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@oryx1729 Can you update your commits to contain the DCO? More info here: https://github.com/NVIDIA/NeMo-Agent-Toolkit/pull/461/checks?check_run_id=46372766560 |
|
Hi @mdemoret-nv, @mpangrazzi is on vacation and will be back next week. Is it be possible to start the review and have the DCO done when he's back? |
|
@oryx1729 No problem. |
|
@mdemoret-nv Hi! Commits should be compliant with DCO now. |
70b2319 to
6ccb74a
Compare
mdemoret-nv
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be better to utilize the LLM top level object.
...meworks/haystack_deep_research_agent/src/aiq_haystack_deep_research_agent/configs/config.yml
Outdated
Show resolved
Hide resolved
willkill07
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this looks good.
Please ensure that all "public" functions have full documentation and that all functions have type annotations. It really helps with clarity.
I can imagine in the future a complete haystack plugin with various features :)
examples/basic/frameworks/haystack_deep_research_agent/README.md
Outdated
Show resolved
Hide resolved
examples/basic/frameworks/haystack_deep_research_agent/README.md
Outdated
Show resolved
Hide resolved
...meworks/haystack_deep_research_agent/src/aiq_haystack_deep_research_agent/configs/config.yml
Show resolved
Hide resolved
|
@willkill07 @mdemoret-nv Thank you! I've just did another iteration following your suggestions. |
willkill07
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late, untimely review on this. I forgot to follow up!
...sic/frameworks/haystack_deep_research_agent/src/aiq_haystack_deep_research_agent/register.py
Outdated
Show resolved
Hide resolved
examples/basic/frameworks/haystack_deep_research_agent/README.md
Outdated
Show resolved
Hide resolved
examples/basic/frameworks/haystack_deep_research_agent/data/carbonara.md
Outdated
Show resolved
Hide resolved
...meworks/haystack_deep_research_agent/src/aiq_haystack_deep_research_agent/configs/config.yml
Outdated
Show resolved
Hide resolved
...meworks/haystack_deep_research_agent/src/aiq_haystack_deep_research_agent/configs/config.yml
Outdated
Show resolved
Hide resolved
...meworks/haystack_deep_research_agent/src/aiq_haystack_deep_research_agent/configs/config.yml
Outdated
Show resolved
Hide resolved
...sic/frameworks/haystack_deep_research_agent/src/aiq_haystack_deep_research_agent/register.py
Outdated
Show resolved
Hide resolved
...les/basic/frameworks/haystack_deep_research_agent/tests/test_haystack_deep_research_agent.py
Outdated
Show resolved
Hide resolved
|
@mpangrazzi and @oryx1729 -- I have left a review. I'm sorry that it took me (too) long to respond in a timely manner. |
Signed-off-by: Michele Pangrazzi <[email protected]>
Signed-off-by: oryx1729 <[email protected]> Signed-off-by: Michele Pangrazzi <[email protected]>
…iaChatGenerator ; Rewrote tests Signed-off-by: Michele Pangrazzi <[email protected]>
Signed-off-by: Michele Pangrazzi <[email protected]>
Signed-off-by: Michele Pangrazzi <[email protected]>
…_haystack_deep_research_agent/register.py Co-authored-by: Will Killian <[email protected]> Signed-off-by: Michele Pangrazzi <[email protected]>
…_haystack_deep_research_agent/configs/config.yml Co-authored-by: Will Killian <[email protected]> Signed-off-by: Michele Pangrazzi <[email protected]>
…_haystack_deep_research_agent/configs/config.yml Co-authored-by: Will Killian <[email protected]> Signed-off-by: Michele Pangrazzi <[email protected]>
…_haystack_deep_research_agent/configs/config.yml Co-authored-by: Will Killian <[email protected]> Signed-off-by: Michele Pangrazzi <[email protected]>
…est_haystack_deep_research_agent.py Co-authored-by: Will Killian <[email protected]> Signed-off-by: Michele Pangrazzi <[email protected]>
Signed-off-by: Michele Pangrazzi <[email protected]>
…ct.toml Co-authored-by: Will Killian <[email protected]> Signed-off-by: Michele Pangrazzi <[email protected]>
…_haystack_deep_research_agent/__init__.py Co-authored-by: Will Killian <[email protected]> Signed-off-by: Michele Pangrazzi <[email protected]>
Co-authored-by: Will Killian <[email protected]> Signed-off-by: Michele Pangrazzi <[email protected]>
Co-authored-by: Will Killian <[email protected]> Signed-off-by: Michele Pangrazzi <[email protected]>
…ct.toml Co-authored-by: Will Killian <[email protected]> Signed-off-by: Michele Pangrazzi <[email protected]>
Co-authored-by: Will Killian <[email protected]> Signed-off-by: Michele Pangrazzi <[email protected]>
Co-authored-by: Will Killian <[email protected]> Signed-off-by: Michele Pangrazzi <[email protected]>
Co-authored-by: Will Killian <[email protected]> Signed-off-by: Michele Pangrazzi <[email protected]>
Signed-off-by: Michele Pangrazzi <[email protected]>
Signed-off-by: Michele Pangrazzi <[email protected]>
WalkthroughAdds a Haystack Deep Research Agent example: package metadata, configs, sample data, three Haystack pipelines (indexing, search, RAG), workflow registration, README, tests, and a CI path-check allowlist entry integrating OpenSearch, SerperDev web search, and NVIDIA NIM LLMs. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor User
participant WF as Workflow
participant Agent
participant Search as Search Tool
participant RAG as RAG Tool
participant OS as OpenSearch
participant WebAPI as Web Search API
participant LLM_A as Agent LLM
participant LLM_R as RAG LLM
User->>WF: submit(query)
activate WF
WF->>Agent: Run(query)
activate Agent
alt uses web
Agent->>Search: search(query)
activate Search
Search->>WebAPI: web.search(query)
WebAPI-->>Search: links
Search-->>Agent: search_result
deactivate Search
end
alt uses docs
Agent->>RAG: retrieve+generate(query)
activate RAG
RAG->>OS: retrieve(top_k)
OS-->>RAG: documents
RAG->>LLM_R: generate(context+query)
LLM_R-->>RAG: rag_answer
RAG-->>Agent: rag_result
deactivate RAG
end
Agent->>LLM_A: compose final reply
LLM_A-->>Agent: final_message
Agent-->>WF: response_text
deactivate Agent
WF-->>User: response
deactivate WF
sequenceDiagram
autonumber
participant WF as Workflow Startup
participant IDX as Indexing Pipeline
participant FS as Data Dir
participant OS as OpenSearch
WF->>WF: check index_on_startup
alt index_on_startup == true
WF->>IDX: run_startup_indexing(data_dir)
activate IDX
IDX->>FS: scan for PDFs & text
FS-->>IDX: file lists
IDX->>OS: write documents (skip duplicates)
OS-->>IDX: acks
IDX-->>WF: totals
deactivate IDX
else
WF-->>WF: skip indexing
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests
📜 Recent review detailsConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 11
🧹 Nitpick comments (19)
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/data/carbonara.md (3)
1-3: SPDX header vs LFS pointer: add linter exclusion for data pointers.SPDX header is required for .md, but adding it here would break the LFS pointer (must start with
version ...). Update the header-check to ignore LFS-managed files under.../data/**instead of modifying this file.
1-1: markdownlint false positive (MD034 no-bare-urls) on LFS pointer.This isn’t prose Markdown. Exclude
examples/**/data/**from markdownlint or disable MD034 for that glob.You can use a config like:
# .markdownlint-cli2.yaml globs: - "**/*.md" - "!examples/**/data/**"
1-3: Optional: prevent edit races on data assets.Consider enabling Git LFS file locking for
.../data/**in contributor docs to avoid concurrent edits on binary-backed content.examples/basic/frameworks/haystack_deep_research_agent/pyproject.toml (1)
22-22: Optional: enrich classifiers.Add Python versions and license classifier to improve packaging metadata.
-classifiers = ["Programming Language :: Python"] +classifiers = [ + "Programming Language :: Python", + "Programming Language :: Python :: 3.11", + "Programming Language :: Python :: 3.12", + "License :: OSI Approved :: Apache Software License", +]examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/rag.py (3)
33-35: Shorten exception message to satisfy ruff TRY003.Keep it concise; details belong in docs/logs.
- if generator is None: - raise ValueError("NvidiaChatGenerator instance must be provided via builder-configured LLM.") + if generator is None: + raise ValueError("generator is required")
46-46: Be explicit about required_variables for ChatPromptBuilder."*" may not be portable across Haystack versions. List the variables to reduce runtime surprises.
- prompt_builder = ChatPromptBuilder(template=[ChatMessage.from_user(template)], required_variables="*") + prompt_builder = ChatPromptBuilder( + template=[ChatMessage.from_user(template)], + required_variables=["documents", "query"], + )
53-55: Connect explicit ports to avoid implicit default-port coupling.Being explicit improves robustness across upstream changes.
- rag_pipeline.connect("retriever", "prompt_builder.documents") - rag_pipeline.connect("prompt_builder", "llm") + rag_pipeline.connect("retriever", "prompt_builder.documents") + rag_pipeline.connect("prompt_builder.messages", "llm.messages")If NvidiaChatGenerator uses a different input (e.g., "prompt"), update the right-hand side accordingly.
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/configs/config.yml (1)
24-31: Consider gating heavy dependencies for CI (index_on_startup).Indexing on startup + OpenSearch can slow/fail CI. Gate with a flag or auto-detect availability.
- index_on_startup: true + index_on_startup: ${INDEX_ON_STARTUP:-false}Ensure tests set INDEX_ON_STARTUP=true only when OpenSearch is available (or provide an in-memory store fallback).
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/__init__.py (1)
5-10: Narrow the import guard and log instead of silent pass; also export via all instead of noqa.Catching Exception and pass hides real errors. Prefer ImportError and a debug log. Define all for re-exports and drop noqa.
+# Re-export pipelines helpers for convenience +import logging +logger = logging.getLogger(__name__) -# Re-export pipelines helpers for convenience -try: - from .pipelines.indexing import run_startup_indexing # noqa: F401 - from .pipelines.rag import create_rag_tool # noqa: F401 - from .pipelines.search import create_search_tool # noqa: F401 -except Exception: # pragma: no cover - optional during install time - pass +__all__ = ["run_startup_indexing", "create_rag_tool", "create_search_tool"] +try: + from .pipelines.indexing import run_startup_indexing + from .pipelines.rag import create_rag_tool + from .pipelines.search import create_search_tool +except ImportError as e: # pragma: no cover - optional during install time + logger.debug("Optional re-exports not available: %s", e)examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/__init__.py (1)
2-4: Export via all and remove unused noqa directives.Ruff flags RUF100. Use all for explicit re-exports.
-from .indexing import run_startup_indexing # noqa: F401 -from .rag import create_rag_tool # noqa: F401 -from .search import create_search_tool # noqa: F401 +from .indexing import run_startup_indexing +from .rag import create_rag_tool +from .search import create_search_tool + +__all__ = ["run_startup_indexing", "create_rag_tool", "create_search_tool"]examples/basic/frameworks/haystack_deep_research_agent/README.md (2)
20-21: Tiny copy edit.Double space after “framework”.
-This example demonstrates how to build a deep research agent using Haystack framework that combines web search and Retrieval Augmented Generation (RAG) capabilities using the NeMo-Agent-Toolkit. +This example demonstrates how to build a deep research agent using the Haystack framework that combines web search and Retrieval-Augmented Generation (RAG) using the NeMo Agent Toolkit.
191-193: Clarify reference name.Use “NeMo Agent Toolkit” consistently (avoid hyphen in prose).
-See Haystack's NvidiaChatGenerator docs: [NvidiaChatGenerator](https://docs.haystack.deepset.ai/docs/nvidiachatgenerator) +See Haystack’s NvidiaChatGenerator docs: [NvidiaChatGenerator](https://docs.haystack.deepset.ai/docs/nvidiachatgenerator)examples/basic/frameworks/haystack_deep_research_agent/tests/test_haystack_deep_research_agent.py (1)
69-95: Config keys assertion is fine. Consider YAML parse for structure.Optional: parse YAML and assert structurally rather than substring checks.
Would you like a follow-up patch that loads YAML and validates fields/types?
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/indexing.py (1)
93-94: Log with stack trace when swallowing errors.Per project guidelines, use logger.exception() when not re-raising.
-except Exception as e: # pragma: no cover - logger.warning("Indexing pipeline failed or was skipped due to an error: %s", str(e)) +except Exception as e: # pragma: no cover + logger.exception("Indexing pipeline failed or was skipped due to an error: %s", e)examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/register.py (5)
51-58: Add explicit return type for workflow coroutine.Type hints are required by guidelines.
-async def haystack_deep_research_agent_workflow(config: HaystackDeepResearchWorkflowConfig, builder: Builder): +from typing import AsyncIterator, Awaitable, Callable + +async def haystack_deep_research_agent_workflow( + config: HaystackDeepResearchWorkflowConfig, builder: Builder +) -> AsyncIterator[Callable[[str], Awaitable[str]]]:
70-81: Use parameterized logging (avoid f-strings) and consistent style.Also OK to keep info level.
- logger.info(f"Starting Haystack Deep Research Agent workflow with config: {config}") + logger.info("Starting Haystack Deep Research Agent workflow with config: %s", config) @@ - logger.info("Connected to OpenSearch successfully") + logger.info("Connected to OpenSearch successfully") @@ - if config.index_on_startup: - run_startup_indexing(document_store=document_store, data_dir=config.data_dir, logger=logger) + if config.index_on_startup: + run_startup_indexing(document_store=document_store, data_dir=config.data_dir, logger=logger)
94-98: Prefer concise error messages inside exceptions.TRY003: keep messages short; current messages are fine but could be simplified. Optional.
151-156: GeneratorExit is not exceptional here—log at warning level.Avoid logger.exception to prevent duplicate stack traces.
- except GeneratorExit: - logger.exception("Workflow exited early!", exc_info=True) + except GeneratorExit: + logger.warning("Workflow exited early!")
66-69: Optional: import directly from local package submodules to avoid circular import via package init.Keeps boundaries explicit if init adds logic later.
-from nat_haystack_deep_research_agent import create_rag_tool -from nat_haystack_deep_research_agent import create_search_tool -from nat_haystack_deep_research_agent import run_startup_indexing +from nat_haystack_deep_research_agent.pipelines.rag import create_rag_tool +from nat_haystack_deep_research_agent.pipelines.search import create_search_tool +from nat_haystack_deep_research_agent.pipelines.indexing import run_startup_indexing
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (14)
examples/basic/frameworks/haystack_deep_research_agent/README.md(1 hunks)examples/basic/frameworks/haystack_deep_research_agent/configs(1 hunks)examples/basic/frameworks/haystack_deep_research_agent/data(1 hunks)examples/basic/frameworks/haystack_deep_research_agent/pyproject.toml(1 hunks)examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/__init__.py(1 hunks)examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/configs/config.yml(1 hunks)examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/data/carbonara.md(1 hunks)examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/data/sample_document.txt(1 hunks)examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/__init__.py(1 hunks)examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/indexing.py(1 hunks)examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/rag.py(1 hunks)examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/search.py(1 hunks)examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/register.py(1 hunks)examples/basic/frameworks/haystack_deep_research_agent/tests/test_haystack_deep_research_agent.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (11)
**/*.{py,sh,md,yml,yaml,toml,ini,json,ipynb,txt,rst}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
**/*.{py,sh,md,yml,yaml,toml,ini,json,ipynb,txt,rst}: Every file must start with the standard SPDX Apache-2.0 header; keep copyright years up‑to‑date
All source files must include the SPDX Apache‑2.0 header; do not bypass CI header checks
Files:
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/data/carbonara.mdexamples/basic/frameworks/haystack_deep_research_agent/tests/test_haystack_deep_research_agent.pyexamples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/rag.pyexamples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/__init__.pyexamples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/__init__.pyexamples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/configs/config.ymlexamples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/data/sample_document.txtexamples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/search.pyexamples/basic/frameworks/haystack_deep_research_agent/README.mdexamples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/register.pyexamples/basic/frameworks/haystack_deep_research_agent/pyproject.tomlexamples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/indexing.py
**/*.{py,md}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Never hard‑code version numbers in code or docs; versions are derived by setuptools‑scm
Files:
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/data/carbonara.mdexamples/basic/frameworks/haystack_deep_research_agent/tests/test_haystack_deep_research_agent.pyexamples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/rag.pyexamples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/__init__.pyexamples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/__init__.pyexamples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/search.pyexamples/basic/frameworks/haystack_deep_research_agent/README.mdexamples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/register.pyexamples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/indexing.py
**/*
⚙️ CodeRabbit configuration file
**/*: # Code Review Instructions
- Ensure the code follows best practices and coding standards. - For Python code, follow
PEP 20 and
PEP 8 for style guidelines.- Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values.
Example:def my_function(param1: int, param2: str) -> bool: pass- For Python exception handling, ensure proper stack trace preservation:
- When re-raising exceptions: use bare
raisestatements to maintain the original stack trace,
and uselogger.error()(notlogger.exception()) to avoid duplicate stack trace output.- When catching and logging exceptions without re-raising: always use
logger.exception()
to capture the full stack trace information.Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any
words listed in the
ci/vale/styles/config/vocabularies/nat/reject.txtfile, words that might appear to be
spelling mistakes but are listed in theci/vale/styles/config/vocabularies/nat/accept.txtfile are OK.Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,
and should contain an Apache License 2.0 header comment at the top of each file.
- Confirm that copyright years are up-to date whenever a file is changed.
Files:
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/data/carbonara.mdexamples/basic/frameworks/haystack_deep_research_agent/tests/test_haystack_deep_research_agent.pyexamples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/rag.pyexamples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/__init__.pyexamples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/__init__.pyexamples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/configs/config.ymlexamples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/data/sample_document.txtexamples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/search.pyexamples/basic/frameworks/haystack_deep_research_agent/README.mdexamples/basic/frameworks/haystack_deep_research_agent/configsexamples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/register.pyexamples/basic/frameworks/haystack_deep_research_agent/pyproject.tomlexamples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/indexing.pyexamples/basic/frameworks/haystack_deep_research_agent/data
examples/**/*
⚙️ CodeRabbit configuration file
examples/**/*: - This directory contains example code and usage scenarios for the toolkit, at a minimum an example should
contain a README.md or file README.ipynb.
- If an example contains Python code, it should be placed in a subdirectory named
src/and should
contain apyproject.tomlfile. Optionally, it might also contain scripts in ascripts/directory.- If an example contains YAML files, they should be placed in a subdirectory named
configs/. - If an example contains sample data files, they should be placed in a subdirectory nameddata/, and should
be checked into git-lfs.
Files:
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/data/carbonara.mdexamples/basic/frameworks/haystack_deep_research_agent/tests/test_haystack_deep_research_agent.pyexamples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/rag.pyexamples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/__init__.pyexamples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/__init__.pyexamples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/configs/config.ymlexamples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/data/sample_document.txtexamples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/search.pyexamples/basic/frameworks/haystack_deep_research_agent/README.mdexamples/basic/frameworks/haystack_deep_research_agent/configsexamples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/register.pyexamples/basic/frameworks/haystack_deep_research_agent/pyproject.tomlexamples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/indexing.pyexamples/basic/frameworks/haystack_deep_research_agent/data
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
**/*.py: Follow PEP 8/20 style; format with yapf (column_limit=120) and use 4-space indentation; end files with a single newline
Run ruff (ruff check --fix) per pyproject.toml; fix warnings unless explicitly ignored; ruff is linter-only
Use snake_case for functions/variables, PascalCase for classes, and UPPER_CASE for constants
Treat pyright warnings as errors during development
Exception handling: preserve stack traces and avoid duplicate logging
When re-raising exceptions, use bareraiseand log with logger.error(), not logger.exception()
When catching and not re-raising, log with logger.exception() to capture stack trace
Validate and sanitize all user input; prefer httpx with SSL verification and follow OWASP Top‑10
Use async/await for I/O-bound work; profile CPU-heavy paths with cProfile/mprof; cache with functools.lru_cache or external cache; leverage NumPy vectorization when beneficial
**/*.py: Programmatic use: create TestLLMConfig(response_seq=[...], delay_ms=...), add with builder.add_llm("", cfg).
When retrieving the test LLM wrapper, use builder.get_llm(name, wrapper_type=LLMFrameworkEnum.) and call the framework’s method (e.g., ainvoke, achat, call).
Files:
examples/basic/frameworks/haystack_deep_research_agent/tests/test_haystack_deep_research_agent.pyexamples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/rag.pyexamples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/__init__.pyexamples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/__init__.pyexamples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/search.pyexamples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/register.pyexamples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/indexing.py
**/tests/test_*.py
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Test files must be named test_*.py and placed under a tests/ folder
Files:
examples/basic/frameworks/haystack_deep_research_agent/tests/test_haystack_deep_research_agent.py
**/tests/**/*.py
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
**/tests/**/*.py: Test functions must use the test_ prefix and snake_case
Extract repeated test code into pytest fixtures; fixtures should set name=... in @pytest.fixture and functions named with fixture_ prefix
Mark expensive tests with @pytest.mark.slow or @pytest.mark.integration
Use pytest with pytest-asyncio for async code; mock external services with pytest_httpserver or unittest.mock
Files:
examples/basic/frameworks/haystack_deep_research_agent/tests/test_haystack_deep_research_agent.py
**/*.{py,yaml,yml}
📄 CodeRabbit inference engine (.cursor/rules/nat-test-llm.mdc)
**/*.{py,yaml,yml}: Configure response_seq as a list of strings; values cycle per call, and [] yields an empty string.
Configure delay_ms to inject per-call artificial latency in milliseconds for nat_test_llm.
Files:
examples/basic/frameworks/haystack_deep_research_agent/tests/test_haystack_deep_research_agent.pyexamples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/rag.pyexamples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/__init__.pyexamples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/__init__.pyexamples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/configs/config.ymlexamples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/search.pyexamples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/register.pyexamples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/indexing.py
**/configs/**/*.y?(a)ml
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Configuration files consumed by code must live in a neighbouring configs/ directory
Files:
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/configs/config.yml
**/*.{yaml,yml}
📄 CodeRabbit inference engine (.cursor/rules/nat-test-llm.mdc)
In workflow/config YAML, set llms.._type: nat_test_llm to stub responses.
Files:
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/configs/config.yml
**/README.{md,ipynb}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Each documentation README must follow the NeMo Agent toolkit naming rules and must not use deprecated names
Files:
examples/basic/frameworks/haystack_deep_research_agent/README.md
🧠 Learnings (2)
📚 Learning: 2025-08-28T23:22:41.742Z
Learnt from: CR
PR: NVIDIA/NeMo-Agent-Toolkit#0
File: .cursor/rules/general.mdc:0-0
Timestamp: 2025-08-28T23:22:41.742Z
Learning: Applies to **/configs/**/*.y?(a)ml : Configuration files consumed by code must live in a neighbouring configs/ directory
Applied to files:
examples/basic/frameworks/haystack_deep_research_agent/tests/test_haystack_deep_research_agent.py
📚 Learning: 2025-08-28T23:22:41.742Z
Learnt from: CR
PR: NVIDIA/NeMo-Agent-Toolkit#0
File: .cursor/rules/general.mdc:0-0
Timestamp: 2025-08-28T23:22:41.742Z
Learning: Applies to packages/*/pyproject.toml : Package pyproject.toml must depend on nvidia-nat or a package starting with nvidia-nat- using ~=<two-digit> versions (e.g., ~=1.0)
Applied to files:
examples/basic/frameworks/haystack_deep_research_agent/pyproject.toml
🧬 Code graph analysis (4)
examples/basic/frameworks/haystack_deep_research_agent/tests/test_haystack_deep_research_agent.py (1)
src/nat/runtime/loader.py (1)
load_workflow(97-117)
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/__init__.py (3)
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/indexing.py (1)
run_startup_indexing(35-94)examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/rag.py (1)
create_rag_tool(13-69)examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/search.py (1)
create_search_tool(13-60)
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/__init__.py (3)
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/indexing.py (1)
run_startup_indexing(35-94)examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/rag.py (1)
create_rag_tool(13-69)examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/search.py (1)
create_search_tool(13-60)
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/register.py (6)
src/nat/builder/builder.py (1)
Builder(64-281)src/nat/data_models/function.py (1)
FunctionBaseConfig(26-27)src/nat/llm/nim_llm.py (1)
NIMModelConfig(34-52)examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/rag.py (1)
create_rag_tool(13-69)examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/search.py (1)
create_search_tool(13-60)examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/indexing.py (1)
run_startup_indexing(35-94)
🪛 markdownlint-cli2 (0.18.1)
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/data/carbonara.md
1-1: Bare URL used
(MD034, no-bare-urls)
🪛 Ruff (0.13.1)
examples/basic/frameworks/haystack_deep_research_agent/tests/test_haystack_deep_research_agent.py
28-30: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.
(S310)
32-32: Do not catch blind exception: Exception
(BLE001)
59-59: Do not call getattr with a constant attribute value. It is not any safer than normal property access.
Replace getattr with attribute access
(B009)
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/rag.py
35-35: Avoid specifying long messages outside the exception class
(TRY003)
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/__init__.py
6-6: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
7-7: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
8-8: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
9-10: try-except-pass detected, consider logging the exception
(S110)
9-9: Do not catch blind exception: Exception
(BLE001)
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/__init__.py
2-2: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
3-3: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
4-4: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/register.py
95-95: Avoid specifying long messages outside the exception class
(TRY003)
97-97: Avoid specifying long messages outside the exception class
(TRY003)
139-139: Unnecessary key check before dictionary access
Replace with dict.get
(RUF019)
147-147: Do not catch blind exception: Exception
(BLE001)
148-148: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
148-148: Use explicit conversion flag
Replace with conversion flag
(RUF010)
149-149: Use explicit conversion flag
Replace with conversion flag
(RUF010)
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/indexing.py
93-93: Do not catch blind exception: Exception
(BLE001)
🔇 Additional comments (14)
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/data/carbonara.md (1)
1-3: LGTM: correctly tracked via Git LFS.Pointer looks valid (version/oid/size). Meets examples/data → Git LFS guideline.
Please confirm
.gitattributesincludes this path so future updates stay on LFS, e.g.:examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/data/** filter=lfs diff=lfs merge=lfs -textexamples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/data/sample_document.txt (1)
1-3: LFS pointer and data placement look correct — git‑lfs verification required
- 3-line LFS pointer present in examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/data/sample_document.txt.
- Verification environment lacks git‑lfs (git lfs command failed); run locally to confirm the file is tracked by LFS:
git lfs ls-files | rg -n 'sample_document\.txt'examples/basic/frameworks/haystack_deep_research_agent/pyproject.toml (5)
11-19: Version compatibility sanity check for Haystack stack.Please confirm these pins co-install cleanly (haystack-ai 2.17–2.18 with opensearch-haystack ~=4.2.0 and nvidia-haystack ~=0.3.0).
I can help adjust ranges if you hit resolver conflicts.
12-12: Good: complies with internal dependency policy.Dependency on nvidia-nat uses ~= per learnings.
27-28: Ensure module path exists and add trailing newline.Verify src/nat_haystack_deep_research_agent/register.py exports register(). Also keep a trailing newline (per prior review).
[project.entry-points.'nat.components'] nat_haystack_deep_research_agent = "nat_haystack_deep_research_agent.register" +
24-25: uv source mapping looks correct for monorepo dev.Editable path to repo root is appropriate.
5-6: setuptools-scm root: confirm intended repo root.Setting root to "../../../.." makes versioning depend on the monorepo’s VCS root. Confirm builds from this subdir pick up tags as expected.
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/rag.py (2)
1-2: License header looks good.SPDX and year are correct and present.
62-67: Fix outputs_to_string mapping (likely reversed/mismatched).SuperComponent maps "llm.replies" → "rag_result" (examples/.../pipelines/rag.py lines 56–60). outputs_to_string is set to {"source": "rag_result"} (lines 62–66) but should map the SuperComponent output key ("rag_result") to a callable that converts that output to a string. Replace with:
rag_tool = ComponentTool( name="rag", description="Use this tool to search in our internal database of documents.", component=rag_component, outputs_to_string={ "rag_result": (lambda replies: replies[0].content if replies else "") }, )Also update the same pattern in examples/.../pipelines/search.py (outputs_to_string={"source": "search_result"}). Confirm the ComponentTool.outputs_to_string API for the Haystack version you target (mapping[str, Callable[[Any], str]] vs. another signature) and adjust the converter accordingly.
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/configs/config.yml (1)
29-31: Avoid hard-coded localhost URL and absolute data_dir — make them env-configurable with relative defaults.Use env-backed defaults so the example is portable and CI-friendly.
- opensearch_url: http://localhost:9200 + # Prefer env with a sensible default; your loader should expand env vars. + opensearch_url: ${OPENSEARCH_URL:-http://localhost:9200} @@ - data_dir: /data + # Keep relative to the example repo by default. + data_dir: ${NAT_DATA_DIR:-data}Confirm your config loader supports ${VAR:-default} expansion; if not, use the framework’s interpolation format or document an env-only approach.
examples/basic/frameworks/haystack_deep_research_agent/data (1)
1-1: Confirm LFS tracking and align runtime mount/config — do not delete example dataFindings:
- .gitattributes contains a rule for examples//data/ -> filter=lfs.
- Example files exist: examples/basic/frameworks/haystack_deep_research_agent/data/carbonara.md and sample_document.txt.
- Config sets data_dir: /data at examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/configs/config.yml (line 31).
- run_startup_indexing in the example's src reads data_dir and falls back to the example data directory if /data is missing.
Actions:
- Verify locally that those example files are actually tracked by Git LFS (git lfs ls-files). git-lfs was not available in the verification sandbox, so LFS tracking could not be confirmed.
- If not LFS-tracked: add the files to Git LFS and ensure .gitattributes is committed before pushing.
- Add a short README.md in examples/basic/frameworks/haystack_deep_research_agent/data describing the sample files and the LFS requirement.
- Ensure runtime either mounts examples/.../data to /data (matches current config) or change data_dir in config.yml to a relative path resolved at runtime.
Relevant locations: examples/basic/frameworks/haystack_deep_research_agent/data/, examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/configs/config.yml, and the example indexing code (run_startup_indexing) in the example src.
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/search.py (1)
13-60: LGTM: well‑factored search tool pipeline.Clear inputs/outputs, typed signature, and reasonable defaults.
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/register.py (2)
27-39: Minor: tighten and format system prompt (no action required).Content is clear; keep as-is unless you want me to trim indentation.
90-108: Consider using builder.get_llm(..., wrapper_type=LLMFrameworkEnum.HAYSTACK) to obtain generators.Aligns with repo guidance; if not available, current approach is fine.
Do we have a Haystack wrapper enum in LLMFrameworkEnum for direct retrieval? If yes, I can draft a patch.
...meworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/configs/config.yml
Show resolved
Hide resolved
...s/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/data/sample_document.txt
Show resolved
Hide resolved
...rameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/rag.py
Show resolved
Hide resolved
...sic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/register.py
Outdated
Show resolved
Hide resolved
...sic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/register.py
Show resolved
Hide resolved
...les/basic/frameworks/haystack_deep_research_agent/tests/test_haystack_deep_research_agent.py
Show resolved
Hide resolved
...les/basic/frameworks/haystack_deep_research_agent/tests/test_haystack_deep_research_agent.py
Outdated
Show resolved
Hide resolved
Signed-off-by: Michele Pangrazzi <[email protected]>
|
@willkill07 I've updated the branch with your suggestions, lastly updated some |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (4)
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/register.py (4)
27-39: Dedent the system prompt to avoid leading indentation in LLM input.
Use textwrap.dedent and strip to remove incidental whitespace.Apply this diff:
@@ -import logging +import logging +import textwrap @@ -class HaystackDeepResearchWorkflowConfig(FunctionBaseConfig, name="haystack_deep_research_agent"): # type: ignore - system_prompt: str = """ - You are a deep research assistant. - You create comprehensive research reports to answer the user's questions. - You use the 'search' tool to answer any questions by using web search. - You use the 'rag' tool to answer any questions by using retrieval augmented generation on your internal document DB. - You perform multiple searches until you have the information you need to answer the question. - Make sure you research different aspects of the question. - Use markdown to format your response. - When you use information from the websearch results, cite your sources using markdown links. - When you use information from the document database, cite the text used from the source document. - It is important that you cite accurately. - """ +class HaystackDeepResearchWorkflowConfig(FunctionBaseConfig, name="haystack_deep_research_agent"): # type: ignore + system_prompt: str = textwrap.dedent( + """ + You are a deep research assistant. + You create comprehensive research reports to answer the user's questions. + You use the 'search' tool to answer any questions by using web search. + You use the 'rag' tool to answer any questions by using retrieval augmented generation on your internal document DB. + You perform multiple searches until you have the information you need to answer the question. + Make sure you research different aspects of the question. + Use markdown to format your response. + When you use information from the websearch results, cite your sources using markdown links. + When you use information from the document database, cite the text used from the source document. + It is important that you cite accurately. + """ + ).strip()
76-78: Clarify OpenSearch log message.
Object construction ≠ successful connection; adjust wording.Apply this diff:
- logger.info("Connected to OpenSearch successfully") + logger.info("Initialized OpenSearchDocumentStore")
94-98: Shorten exception messages per TRY003 (ruff) and keep them factual.Apply this diff:
- if not isinstance(rag_llm_cfg, NIMModelConfig): - raise TypeError("llms.rag_llm must be of type 'nim'.") - if not isinstance(agent_llm_cfg, NIMModelConfig): - raise TypeError("llms.agent_llm must be of type 'nim'.") + if not isinstance(rag_llm_cfg, NIMModelConfig): + raise TypeError("rag_llm must be a NIMModelConfig") + if not isinstance(agent_llm_cfg, NIMModelConfig): + raise TypeError("agent_llm must be a NIMModelConfig")
153-154: Don’t use logger.exception for normal generator shutdown.
Log at info/debug without stack trace.Apply this diff:
- except GeneratorExit: - logger.exception("Workflow exited early!", exc_info=True) + except GeneratorExit: + logger.info("Workflow exited early.")
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/register.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (6)
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
**/*.py: Follow PEP 8/20 style; format with yapf (column_limit=120) and use 4-space indentation; end files with a single newline
Run ruff (ruff check --fix) per pyproject.toml; fix warnings unless explicitly ignored; ruff is linter-only
Use snake_case for functions/variables, PascalCase for classes, and UPPER_CASE for constants
Treat pyright warnings as errors during development
Exception handling: preserve stack traces and avoid duplicate logging
When re-raising exceptions, use bareraiseand log with logger.error(), not logger.exception()
When catching and not re-raising, log with logger.exception() to capture stack trace
Validate and sanitize all user input; prefer httpx with SSL verification and follow OWASP Top‑10
Use async/await for I/O-bound work; profile CPU-heavy paths with cProfile/mprof; cache with functools.lru_cache or external cache; leverage NumPy vectorization when beneficial
**/*.py: Programmatic use: create TestLLMConfig(response_seq=[...], delay_ms=...), add with builder.add_llm("", cfg).
When retrieving the test LLM wrapper, use builder.get_llm(name, wrapper_type=LLMFrameworkEnum.) and call the framework’s method (e.g., ainvoke, achat, call).
Files:
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/register.py
**/*.{py,sh,md,yml,yaml,toml,ini,json,ipynb,txt,rst}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
**/*.{py,sh,md,yml,yaml,toml,ini,json,ipynb,txt,rst}: Every file must start with the standard SPDX Apache-2.0 header; keep copyright years up‑to‑date
All source files must include the SPDX Apache‑2.0 header; do not bypass CI header checks
Files:
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/register.py
**/*.{py,md}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Never hard‑code version numbers in code or docs; versions are derived by setuptools‑scm
Files:
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/register.py
**/*.{py,yaml,yml}
📄 CodeRabbit inference engine (.cursor/rules/nat-test-llm.mdc)
**/*.{py,yaml,yml}: Configure response_seq as a list of strings; values cycle per call, and [] yields an empty string.
Configure delay_ms to inject per-call artificial latency in milliseconds for nat_test_llm.
Files:
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/register.py
**/*
⚙️ CodeRabbit configuration file
**/*: # Code Review Instructions
- Ensure the code follows best practices and coding standards. - For Python code, follow
PEP 20 and
PEP 8 for style guidelines.- Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values.
Example:def my_function(param1: int, param2: str) -> bool: pass- For Python exception handling, ensure proper stack trace preservation:
- When re-raising exceptions: use bare
raisestatements to maintain the original stack trace,
and uselogger.error()(notlogger.exception()) to avoid duplicate stack trace output.- When catching and logging exceptions without re-raising: always use
logger.exception()
to capture the full stack trace information.Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any
words listed in the
ci/vale/styles/config/vocabularies/nat/reject.txtfile, words that might appear to be
spelling mistakes but are listed in theci/vale/styles/config/vocabularies/nat/accept.txtfile are OK.Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,
and should contain an Apache License 2.0 header comment at the top of each file.
- Confirm that copyright years are up-to date whenever a file is changed.
Files:
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/register.py
examples/**/*
⚙️ CodeRabbit configuration file
examples/**/*: - This directory contains example code and usage scenarios for the toolkit, at a minimum an example should
contain a README.md or file README.ipynb.
- If an example contains Python code, it should be placed in a subdirectory named
src/and should
contain apyproject.tomlfile. Optionally, it might also contain scripts in ascripts/directory.- If an example contains YAML files, they should be placed in a subdirectory named
configs/. - If an example contains sample data files, they should be placed in a subdirectory nameddata/, and should
be checked into git-lfs.
Files:
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/register.py
🧬 Code graph analysis (1)
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/register.py (6)
src/nat/builder/builder.py (1)
Builder(64-281)src/nat/data_models/function.py (1)
FunctionBaseConfig(26-27)src/nat/llm/nim_llm.py (1)
NIMModelConfig(34-52)examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/rag.py (1)
create_rag_tool(13-69)examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/search.py (1)
create_search_tool(13-60)examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/indexing.py (1)
run_startup_indexing(35-94)
🪛 Ruff (0.13.1)
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/register.py
95-95: Avoid specifying long messages outside the exception class
(TRY003)
97-97: Avoid specifying long messages outside the exception class
(TRY003)
139-139: Unnecessary key check before dictionary access
Replace with dict.get
(RUF019)
147-147: Do not catch blind exception: Exception
(BLE001)
148-148: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
148-148: Use explicit conversion flag
Replace with conversion flag
(RUF010)
149-149: Use explicit conversion flag
Replace with conversion flag
(RUF010)
🔇 Additional comments (4)
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/register.py (4)
1-16: SPDX header present and correct.
Year and license look good.
19-23: Correct package prefix (nat.*) and scoped imports.
This fixes the earlier aiq.* issue.
83-89: Confirm API key sourcing for NIM.
You ignore cfg.api_key and rely on NVIDIA_API_KEY env var. If cfg.api_key can be set in Builder, prefer it (with env fallback).Would you like me to wire cfg.api_key (e.g., Secret.from_token(cfg.api_key) if present) and fall back to Secret.from_env_var("NVIDIA_API_KEY")?
121-149: Harden logging and error handling; don’t leak internal errors to users.
Use dict.get, parameterized logs, and logger.exception; return a generic message. Mirrors earlier feedback.Apply this diff:
@@ - try: - logger.info(f"Processing research query: {input_message}") + try: + logger.info("Processing research query: %s", input_message) @@ - agent_output = agent.run(messages=messages) + agent_output = agent.run(messages=messages) @@ - # Extract response - if "messages" in agent_output and agent_output["messages"]: - response = agent_output["messages"][-1].text - logger.info("Research query completed successfully") - return response - else: - logger.warning(f"No response generated for query: {input_message}") - return "I apologize, but I was unable to generate a response for your query." + # Extract response + messages_out = agent_output.get("messages") or [] + if messages_out: + response = messages_out[-1].text + logger.info("Research query completed successfully") + return response + logger.warning("No response generated for query: %s", input_message) + return "I apologize, but I was unable to generate a response for your query." @@ - except Exception as e: - logger.error(f"Workflow execution failed: {str(e)}") - return f"I apologize, but an error occurred during research: {str(e)}" + except Exception: + logger.exception("Workflow execution failed") + return "I apologize, but an unexpected error occurred during research."
...sic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/register.py
Show resolved
Hide resolved
...sic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/register.py
Show resolved
Hide resolved
|
/ok to test 4a9e81f |
|
@mpangrazzi we have some CI checks that are failing.
|
Signed-off-by: Michele Pangrazzi <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (7)
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/data/carbonara.md (2)
1-3: Add Vale guards to the underlying Markdown content (not the LFS pointer).Per CI feedback, ensure the actual
carbonara.mdblob starts with<!-- vale off -->and ends with<!-- vale on -->. Do not insert these into the three-line LFS pointer; update the real Markdown content so the pointer’s oid/size change accordingly.I can provide a quick patch snippet showing the exact placement if you share the markdown body.
1-3: Optional: Fix bare URLs instead of disabling lint.If you prefer keeping Vale on, convert bare URLs to proper markdown links
[text](url)to address MD034.examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/__init__.py (1)
16-18: Remove unnecessary noqa directives.Static analysis indicates F401 rules are not enabled, making these
noqadirectives unnecessary.-from .indexing import run_startup_indexing # noqa: F401 -from .rag import create_rag_tool # noqa: F401 -from .search import create_search_tool # noqa: F401 +from .indexing import run_startup_indexing +from .rag import create_rag_tool +from .search import create_search_toolexamples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/rag.py (1)
45-47: Move exception message to custom exception class.Long messages in exception constructors can make code harder to maintain. Consider creating a custom exception class or using a shorter message.
if generator is None: - raise ValueError("NvidiaChatGenerator instance must be provided via builder-configured LLM.") + raise ValueError("NvidiaChatGenerator instance must be provided")examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/__init__.py (1)
18-20: Remove unnecessary noqa directives.Static analysis indicates F401 rules are not enabled, making these
noqadirectives unnecessary.- from .pipelines.indexing import run_startup_indexing # noqa: F401 - from .pipelines.rag import create_rag_tool # noqa: F401 - from .pipelines.search import create_search_tool # noqa: F401 + from .pipelines.indexing import run_startup_indexing + from .pipelines.rag import create_rag_tool + from .pipelines.search import create_search_toolexamples/basic/frameworks/haystack_deep_research_agent/tests/test_haystack_deep_research_agent.py (1)
50-51: Use direct attribute access instead of getattr with constant.Using
getattrwith a string constant provides no safety benefit over normal attribute access.- loader_mod = importlib.import_module("nat.runtime.loader") - load_workflow = getattr(loader_mod, "load_workflow") + loader_mod = importlib.import_module("nat.runtime.loader") + load_workflow = loader_mod.load_workflowexamples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/register.py (1)
88-91: Move long exception messages to custom exception classes.Long inline error messages should be moved to custom exception classes for better maintainability.
if not isinstance(rag_llm_cfg, NIMModelConfig): - raise TypeError("llms.rag_llm must be of type 'nim'.") + raise TypeError("llms.rag_llm must be of type 'nim'") if not isinstance(agent_llm_cfg, NIMModelConfig): - raise TypeError("llms.agent_llm must be of type 'nim'.") + raise TypeError("llms.agent_llm must be of type 'nim'")
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
ci/scripts/path_checks.py(1 hunks)examples/basic/frameworks/haystack_deep_research_agent/README.md(1 hunks)examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/__init__.py(1 hunks)examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/data/carbonara.md(1 hunks)examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/__init__.py(1 hunks)examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/indexing.py(1 hunks)examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/rag.py(1 hunks)examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/search.py(1 hunks)examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/register.py(1 hunks)examples/basic/frameworks/haystack_deep_research_agent/tests/test_haystack_deep_research_agent.py(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- examples/basic/frameworks/haystack_deep_research_agent/README.md
🚧 Files skipped from review as they are similar to previous changes (1)
- examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/search.py
🧰 Additional context used
📓 Path-based instructions (6)
**/*.{py,yaml,yml}
📄 CodeRabbit inference engine (.cursor/rules/nat-test-llm.mdc)
**/*.{py,yaml,yml}: Configure response_seq as a list of strings; values cycle per call, and [] yields an empty string.
Configure delay_ms to inject per-call artificial latency in milliseconds for nat_test_llm.
Files:
ci/scripts/path_checks.pyexamples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/indexing.pyexamples/basic/frameworks/haystack_deep_research_agent/tests/test_haystack_deep_research_agent.pyexamples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/register.pyexamples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/rag.pyexamples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/__init__.pyexamples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/__init__.py
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/nat-test-llm.mdc)
**/*.py: Programmatic use: create TestLLMConfig(response_seq=[...], delay_ms=...), add with builder.add_llm("", cfg).
When retrieving the test LLM wrapper, use builder.get_llm(name, wrapper_type=LLMFrameworkEnum.) and call the framework’s method (e.g., ainvoke, achat, call).
**/*.py: In code comments/identifiers use NAT abbreviations as specified: nat for API namespace/CLI, nvidia-nat for package name, NAT for env var prefixes; do not use these abbreviations in documentation
Follow PEP 20 and PEP 8; run yapf with column_limit=120; use 4-space indentation; end files with a single trailing newline
Run ruff check --fix as linter (not formatter) using pyproject.toml config; fix warnings unless explicitly ignored
Respect naming: snake_case for functions/variables, PascalCase for classes, UPPER_CASE for constants
Treat pyright warnings as errors during development
Exception handling: use bare raise to re-raise; log with logger.error() when re-raising to avoid duplicate stack traces; use logger.exception() when catching without re-raising
Provide Google-style docstrings for every public module, class, function, and CLI command; first line concise and ending with a period; surround code entities with backticks
Validate and sanitize all user input, especially in web or CLI interfaces
Prefer httpx with SSL verification enabled by default and follow OWASP Top-10 recommendations
Use async/await for I/O-bound work; profile CPU-heavy paths with cProfile or mprof before optimizing; cache expensive computations with functools.lru_cache or external cache; leverage NumPy vectorized operations when beneficial
Files:
ci/scripts/path_checks.pyexamples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/indexing.pyexamples/basic/frameworks/haystack_deep_research_agent/tests/test_haystack_deep_research_agent.pyexamples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/register.pyexamples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/rag.pyexamples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/__init__.pyexamples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/__init__.py
{scripts/**,ci/scripts/**}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Shell or utility scripts belong in scripts/ or ci/scripts/ and must not be mixed with library code
Files:
ci/scripts/path_checks.py
**/*
⚙️ CodeRabbit configuration file
**/*: # Code Review Instructions
- Ensure the code follows best practices and coding standards. - For Python code, follow
PEP 20 and
PEP 8 for style guidelines.- Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values.
Example:def my_function(param1: int, param2: str) -> bool: pass- For Python exception handling, ensure proper stack trace preservation:
- When re-raising exceptions: use bare
raisestatements to maintain the original stack trace,
and uselogger.error()(notlogger.exception()) to avoid duplicate stack trace output.- When catching and logging exceptions without re-raising: always use
logger.exception()
to capture the full stack trace information.Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any
words listed in the
ci/vale/styles/config/vocabularies/nat/reject.txtfile, words that might appear to be
spelling mistakes but are listed in theci/vale/styles/config/vocabularies/nat/accept.txtfile are OK.Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,
and should contain an Apache License 2.0 header comment at the top of each file.
- Confirm that copyright years are up-to date whenever a file is changed.
Files:
ci/scripts/path_checks.pyexamples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/indexing.pyexamples/basic/frameworks/haystack_deep_research_agent/tests/test_haystack_deep_research_agent.pyexamples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/register.pyexamples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/rag.pyexamples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/__init__.pyexamples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/data/carbonara.mdexamples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/__init__.py
examples/**/*
⚙️ CodeRabbit configuration file
examples/**/*: - This directory contains example code and usage scenarios for the toolkit, at a minimum an example should
contain a README.md or file README.ipynb.
- If an example contains Python code, it should be placed in a subdirectory named
src/and should
contain apyproject.tomlfile. Optionally, it might also contain scripts in ascripts/directory.- If an example contains YAML files, they should be placed in a subdirectory named
configs/. - If an example contains sample data files, they should be placed in a subdirectory nameddata/, and should
be checked into git-lfs.
Files:
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/indexing.pyexamples/basic/frameworks/haystack_deep_research_agent/tests/test_haystack_deep_research_agent.pyexamples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/register.pyexamples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/rag.pyexamples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/__init__.pyexamples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/data/carbonara.mdexamples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/__init__.py
**/data/**
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Large/binary assets must be committed with Git-LFS and placed in a neighbouring data/ folder
Files:
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/data/carbonara.md
🧠 Learnings (2)
📚 Learning: 2025-09-23T18:39:15.023Z
Learnt from: CR
PR: NVIDIA/NeMo-Agent-Toolkit#0
File: .cursor/rules/general.mdc:0-0
Timestamp: 2025-09-23T18:39:15.023Z
Learning: Applies to {tests/**/*.py,examples/*/tests/**/*.py} : Use pytest (with pytest-asyncio for async); name test files test_*.py; test functions start with test_; extract repeated code into fixtures; fixtures must set name in decorator and be named with fixture_ prefix
Applied to files:
examples/basic/frameworks/haystack_deep_research_agent/tests/test_haystack_deep_research_agent.py
📚 Learning: 2025-09-23T18:39:15.023Z
Learnt from: CR
PR: NVIDIA/NeMo-Agent-Toolkit#0
File: .cursor/rules/general.mdc:0-0
Timestamp: 2025-09-23T18:39:15.023Z
Learning: Every file must start with the standard SPDX Apache-2.0 header; keep copyright years up to date
Applied to files:
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/__init__.py
🧬 Code graph analysis (4)
examples/basic/frameworks/haystack_deep_research_agent/tests/test_haystack_deep_research_agent.py (2)
src/nat/runtime/loader.py (1)
load_workflow(97-117)src/nat/runtime/session.py (1)
workflow(84-85)
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/register.py (6)
src/nat/builder/builder.py (1)
Builder(64-281)src/nat/data_models/function.py (1)
FunctionBaseConfig(26-27)src/nat/llm/nim_llm.py (1)
NIMModelConfig(34-52)examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/rag.py (1)
create_rag_tool(25-81)examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/search.py (1)
create_search_tool(25-72)examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/indexing.py (1)
run_startup_indexing(47-106)
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/__init__.py (3)
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/indexing.py (1)
run_startup_indexing(47-106)examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/rag.py (1)
create_rag_tool(25-81)examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/search.py (1)
create_search_tool(25-72)
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/__init__.py (3)
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/indexing.py (1)
run_startup_indexing(47-106)examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/rag.py (1)
create_rag_tool(25-81)examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/search.py (1)
create_search_tool(25-72)
🪛 Ruff (0.13.1)
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/indexing.py
105-105: Do not catch blind exception: Exception
(BLE001)
examples/basic/frameworks/haystack_deep_research_agent/tests/test_haystack_deep_research_agent.py
27-27: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.
(S310)
29-29: Do not catch blind exception: Exception
(BLE001)
51-51: Do not call getattr with a constant attribute value. It is not any safer than normal property access.
Replace getattr with attribute access
(B009)
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/register.py
89-89: Avoid specifying long messages outside the exception class
(TRY003)
91-91: Avoid specifying long messages outside the exception class
(TRY003)
133-133: Unnecessary key check before dictionary access
Replace with dict.get
(RUF019)
141-141: Do not catch blind exception: Exception
(BLE001)
142-142: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
142-142: Use explicit conversion flag
Replace with conversion flag
(RUF010)
143-143: Use explicit conversion flag
Replace with conversion flag
(RUF010)
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/rag.py
47-47: Avoid specifying long messages outside the exception class
(TRY003)
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/__init__.py
18-18: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
19-19: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
20-20: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
21-22: try-except-pass detected, consider logging the exception
(S110)
21-21: Do not catch blind exception: Exception
(BLE001)
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/__init__.py
16-16: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
17-17: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
18-18: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
🪛 markdownlint-cli2 (0.18.1)
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/data/carbonara.md
1-1: Bare URL used
(MD034, no-bare-urls)
🔇 Additional comments (15)
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/data/carbonara.md (1)
1-3: LGTM: Data asset correctly tracked via Git LFS in data/Pointer format and placement comply with repo guidelines for example data.
Please confirm
.gitattributescorrectly tracks this path pattern with LFS (likelyexamples/**/data/**), so future updates keep this file under LFS.ci/scripts/path_checks.py (1)
87-90: LGTM!The allowlist entry correctly permits references from the Haystack Deep Research Agent README to the bedrock-ug.pdf data file, preventing false path check violations. The implementation follows the repository's established pattern for allowlisting file-path pairs.
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/__init__.py (1)
1-15: LGTM!The Apache 2.0 license header is complete and properly formatted.
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/rag.py (1)
25-30: Add OpenSearchDocumentStore type annotation.The document_store parameter is untyped but should be properly annotated per repository guidelines that all parameters must be typed.
+from haystack_integrations.document_stores.opensearch import OpenSearchDocumentStore @@ def create_rag_tool( - document_store, + document_store: OpenSearchDocumentStore, *, top_k: int = 15, generator: NvidiaChatGenerator | None = None, ) -> tuple[ComponentTool, Pipeline]:examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/pipelines/indexing.py (3)
27-30: LGTM!Clean helper function that efficiently gathers source files by type using pathlib globbing.
33-44: LGTM!Pipeline configuration is well-structured with appropriate document processing steps - cleaning, sentence-based splitting with reasonable parameters, and duplicate-skipping policy for the document store.
47-106: LGTM!Robust indexing implementation with good error handling, fallback directory logic, separate processing for PDF and text sources, and proper logging throughout the workflow.
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/__init__.py (1)
17-22: Graceful import handling is appropriate.The try/except pattern allows the package to be importable even when optional dependencies are missing during installation, which is good practice for optional re-exports.
examples/basic/frameworks/haystack_deep_research_agent/tests/test_haystack_deep_research_agent.py (3)
25-30: Fix security and error handling issues in URL health check.The OpenSearch documentation confirms that cluster health endpoints use HTTP/HTTPS URLs, but the current implementation has security and error handling problems that should be addressed.
+from urllib.parse import urlsplit def _opensearch_reachable(url: str) -> bool: try: + # Only allow http/https schemes for security + parts = urlsplit(url) + if parts.scheme not in {"http", "https"}: + return False with urllib.request.urlopen(f"{url.rstrip('/')}/_cluster/health", timeout=1) as resp: - return 200 <= getattr(resp, "status", 0) < 300 - except Exception: + return 200 <= resp.status < 300 + except (urllib.error.URLError, urllib.error.HTTPError, TimeoutError, ValueError): return False
46-58: Fix config path and module import in e2e test.The test references incorrect paths and module names that will cause runtime failures.
+@pytest.mark.asyncio async def test_full_workflow_e2e() -> None: config_file = ( Path(__file__).resolve().parents[1] - / "src" - / "aiq_haystack_deep_research_agent" / "configs" / "config.yml" ) - loader_mod = importlib.import_module("aiq.runtime.loader") - load_workflow = getattr(loader_mod, "load_workflow") + loader_mod = importlib.import_module("nat.runtime.loader") + load_workflow = getattr(loader_mod, "load_workflow")
61-82: LGTM!Clean configuration validation test that checks for required keys using string presence validation, which is appropriate for YAML structure verification.
examples/basic/frameworks/haystack_deep_research_agent/src/nat_haystack_deep_research_agent/register.py (4)
26-47: LGTM!Well-structured configuration class with comprehensive system prompt and sensible defaults. The configuration appropriately extends
FunctionBaseConfigand includes all necessary parameters for the workflow.
50-57: Add explicit return type annotation.The async generator function lacks an explicit return type annotation, which violates repository guidelines requiring all functions to be typed.
+from collections.abc import AsyncGenerator, Awaitable, Callable @@ @register_function(config_type=HaystackDeepResearchWorkflowConfig) -async def haystack_deep_research_agent_workflow(config: HaystackDeepResearchWorkflowConfig, builder: Builder): +async def haystack_deep_research_agent_workflow( + config: HaystackDeepResearchWorkflowConfig, builder: Builder +) -> AsyncGenerator[Callable[[str], Awaitable[str]], None]:
68-68: Avoid logging full config objects to prevent sensitive data leakage.Logging the entire config object may expose sensitive fields like API keys.
- logger.info(f"Starting Haystack Deep Research Agent workflow with config: {config}") + logger.info( + "Starting Haystack Deep Research Agent workflow " + "(max_agent_steps=%d, search_top_k=%d, rag_top_k=%d, index_on_startup=%s, data_dir=%s)", + config.max_agent_steps, + config.search_top_k, + config.rag_top_k, + config.index_on_startup, + config.data_dir, + )
125-143: Improve error handling and message access pattern.The current exception handling and dictionary access patterns need hardening to prevent user-facing error leakage and use safer dict access.
try: - logger.info(f"Processing research query: {input_message}") + logger.info("Processing research query: %s", input_message) # Create messages messages = [ChatMessage.from_user(input_message)] agent_output = agent.run(messages=messages) # Extract response - if "messages" in agent_output and agent_output["messages"]: - response = agent_output["messages"][-1].text + messages_out = agent_output.get("messages") or [] + if messages_out: + response = messages_out[-1].text logger.info("Research query completed successfully") return response else: - logger.warning(f"No response generated for query: {input_message}") + logger.warning("No response generated for query: %s", input_message) return "I apologize, but I was unable to generate a response for your query." except Exception as e: - logger.error(f"Workflow execution failed: {str(e)}") - return f"I apologize, but an error occurred during research: {str(e)}" + logger.exception("Workflow execution failed: %s", e) + return "I apologize, but an unexpected error occurred during research."
|
/ok to test 5b14e5a |
|
@willkill07 I should have fixed most of the things, but still there's an invalid path: Why this happens even if I've added the file paths in |
|
Sorry I gave you bad guidance! The tuple should go under |
Signed-off-by: Michele Pangrazzi <[email protected]>
|
@willkill07 no worries, makes more sense now 😉 Checked locally with |
|
/ok to test 5184836 |
|
/merge |
Description
Adds a blueprint demonstrating how to build a deep research agent using Haystack Framework that combines web search and Retrieval-Augmented Generation (RAG) using the NeMo-Agent-Toolkit.
By Submitting this PR I confirm:
Summary by CodeRabbit
New Features
Documentation
Tests
Chores