Skip to content

Conversation

@willkill07
Copy link
Member

@willkill07 willkill07 commented Oct 5, 2025

Description

Updates the Observability, Evaluation, and Profiling example notebook
Closes

By Submitting this PR I confirm:

  • I am familiar with the Contributing Guidelines.
  • We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
    • Any contribution which contains commits that are not Signed-Off will not be accepted.
  • When the PR is ready for review, new or existing tests cover these changes.
  • When the PR is ready for review, the documentation is up to date with these changes.

Summary by CodeRabbit

  • New Features

    • Added an end-to-end notebook for observability, evaluation, and profiling with Phoenix-based tracing.
    • Introduced a unified workflow combining data analysis, visualization, and RAG agents.
    • Enabled evaluation and profiling runs with metrics, profiler outputs, and charts.
  • Documentation

    • Added step-by-step setup, installation, API key handling, and run commands.
    • Clarified local vs. hosted execution and observability guidance.
  • Chores

    • Included sample retail sales data, product catalog, and evaluation dataset.
    • Added ready-to-use workflow, evaluation, and profiling configurations.
    • Expanded accepted vocabulary to include "Gantt".

@willkill07 willkill07 requested a review from a team as a code owner October 5, 2025 22:15
@coderabbitai
Copy link

coderabbitai bot commented Oct 5, 2025

Walkthrough

Expands the observability/evaluation/profiling notebook to add Phoenix-based telemetry, NAT tool definitions, workflow configs, evaluation/profiling configurations, datasets, and instructions to run workflows, evals, and profiling end-to-end, including artifact generation and output directory organization.

Changes

Cohort / File(s) Summary
Notebook expansion
examples/notebooks/4_observability_evaluation_and_profiling.ipynb
Major content expansion: prerequisites, API keys, installation, Phoenix observability setup, workflow orchestration, evaluation and profiling steps, artifact generation.
NAT tool modules
retail_sales_agent/tools/*
New tool definitions for data analysis, RAG, and visualization (configs, register_function decorators, async handlers) to integrate with NAT.
Workflow configurations
retail_sales_agent/config.yml
Adds llm/embedders, function registrations, and components for data analysis, visualization, and RAG agents; unified workflow wiring.
Evaluation dataset
retail_sales_agent/data/eval_data.json
Introduces evaluation dataset scaffold with multiple test cases.
Evaluation configuration
retail_sales_agent/config_eval.yml
Adds evaluators: rag_accuracy, rag_groundedness, rag_relevance, trajectory_accuracy; eval run setup.
Profiling configuration
retail_sales_agent/config_profile.yml
Adds profiler options (token/runtime forecasts, LLM metrics, concurrency/bottleneck analyses) and output paths.
Phoenix observability config
phoenix_config.yml
Augmentation steps and copy/append workflow to enable telemetry tracing for Phoenix.
RAG content
retail_sales_agent/rag/product_catalog.md
Adds product catalog content used for RAG.
Sample data
retail_sales_agent/data/retail_sales_data.csv
Adds retail sales CSV with Date, StoreID, Product, UnitsSold, Revenue, Promotion.
Vale vocabulary
ci/vale/styles/config/vocabularies/nat/accept.txt
Adds accepted vocabulary entry Gantt.
Artifacts/output paths
.../profile_output/*, gantt_chart.png
Describes profiler outputs and chart generation; notebook cells generate these artifacts.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant U as User
  participant NB as Notebook
  participant NAT as NAT CLI/Runtime
  participant WF as Workflow (Agents + Tools)
  participant PH as Phoenix
  participant DS as Data

  U->>NB: Run setup cells
  NB->>NAT: nat run -c config.yml
  NAT->>WF: Initialize agents/tools
  WF->>DS: Load CSV / product_catalog
  WF->>PH: Emit telemetry (traces/logs)
  WF->>WF: Analyze data / RAG / visualize
  WF-->>NAT: Results + artifacts
  NAT-->>NB: Output_dir with results
  NB-->>U: Display results/paths

  rect rgb(235,245,255)
  note over PH: Phoenix observability (new)
  end
Loading
sequenceDiagram
  autonumber
  participant U as User
  participant NB as Notebook
  participant NAT as NAT CLI
  participant EV as Evaluators
  participant PH as Phoenix

  U->>NB: Trigger eval
  NB->>NAT: nat eval -c config_eval.yml --data eval_data.json
  NAT->>EV: Run rag_* and trajectory evaluators
  EV->>PH: Send telemetry (optional)
  EV-->>NAT: Metrics/summaries
  NAT-->>NB: Eval reports
Loading
sequenceDiagram
  autonumber
  participant U as User
  participant NB as Notebook
  participant NAT as NAT CLI
  participant PR as Profiler
  participant PH as Phoenix

  U->>NB: Trigger profiling
  NB->>NAT: nat profile -c config_profile.yml
  NAT->>PR: Collect runtime/LLM/concurrency data
  PR->>PH: Emit traces/metrics
  PR-->>NB: profile_output + gantt_chart.png
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested labels

feature request, non-breaking

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title uses the imperative verb “Improve,” clearly communicates the main objective of enhancing the developer journey, and is concise at 66 characters. It correctly includes the “feat:” prefix to denote a new feature as per repository conventions. The descriptive scope of “example notebooks (part 2)” aligns with the changes in the pull request.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1aef2ac and ac37cad.

📒 Files selected for processing (1)
  • ci/vale/styles/config/vocabularies/nat/accept.txt (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*

⚙️ CodeRabbit configuration file

**/*: # Code Review Instructions

  • Ensure the code follows best practices and coding standards. - For Python code, follow
    PEP 20 and
    PEP 8 for style guidelines.
  • Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values.
    Example:
    def my_function(param1: int, param2: str) -> bool:
        pass
  • For Python exception handling, ensure proper stack trace preservation:
    • When re-raising exceptions: use bare raise statements to maintain the original stack trace,
      and use logger.error() (not logger.exception()) to avoid duplicate stack trace output.
    • When catching and logging exceptions without re-raising: always use logger.exception()
      to capture the full stack trace information.

Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any

words listed in the ci/vale/styles/config/vocabularies/nat/reject.txt file, words that might appear to be
spelling mistakes but are listed in the ci/vale/styles/config/vocabularies/nat/accept.txt file are OK.

Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,

and should contain an Apache License 2.0 header comment at the top of each file.

  • Confirm that copyright years are up-to date whenever a file is changed.

Files:

  • ci/vale/styles/config/vocabularies/nat/accept.txt
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: CI Pipeline / Check

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@willkill07 willkill07 added improvement Improvement to existing functionality non-breaking Non-breaking change labels Oct 5, 2025
@coderabbitai coderabbitai bot added the feature request New feature or request label Oct 5, 2025
@willkill07 willkill07 removed the feature request New feature or request label Oct 5, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
examples/notebooks/4_observability_evaluation_and_profiling.ipynb (1)

1219-1240: Avoid binding Phoenix to 0.0.0.0 by default

Setting PHOENIX_HOST=0.0.0.0 exposes the Phoenix UI on every network interface. On shared or cloud notebook environments this opens an unauthenticated observability surface to anyone who can reach the machine, which is risky. Default to 127.0.0.1 (loopback) and add an explicit warning or opt-in instructions if external access is truly required.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b96d3d3 and 1aef2ac.

📒 Files selected for processing (1)
  • examples/notebooks/4_observability_evaluation_and_profiling.ipynb (7 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*

⚙️ CodeRabbit configuration file

**/*: # Code Review Instructions

  • Ensure the code follows best practices and coding standards. - For Python code, follow
    PEP 20 and
    PEP 8 for style guidelines.
  • Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values.
    Example:
    def my_function(param1: int, param2: str) -> bool:
        pass
  • For Python exception handling, ensure proper stack trace preservation:
    • When re-raising exceptions: use bare raise statements to maintain the original stack trace,
      and use logger.error() (not logger.exception()) to avoid duplicate stack trace output.
    • When catching and logging exceptions without re-raising: always use logger.exception()
      to capture the full stack trace information.

Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any

words listed in the ci/vale/styles/config/vocabularies/nat/reject.txt file, words that might appear to be
spelling mistakes but are listed in the ci/vale/styles/config/vocabularies/nat/accept.txt file are OK.

Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,

and should contain an Apache License 2.0 header comment at the top of each file.

  • Confirm that copyright years are up-to date whenever a file is changed.

Files:

  • examples/notebooks/4_observability_evaluation_and_profiling.ipynb
examples/**/*

⚙️ CodeRabbit configuration file

examples/**/*: - This directory contains example code and usage scenarios for the toolkit, at a minimum an example should
contain a README.md or file README.ipynb.

  • If an example contains Python code, it should be placed in a subdirectory named src/ and should
    contain a pyproject.toml file. Optionally, it might also contain scripts in a scripts/ directory.
  • If an example contains YAML files, they should be placed in a subdirectory named configs/. - If an example contains sample data files, they should be placed in a subdirectory named data/, and should
    be checked into git-lfs.

Files:

  • examples/notebooks/4_observability_evaluation_and_profiling.ipynb
🪛 Ruff (0.13.3)
examples/notebooks/4_observability_evaluation_and_profiling.ipynb

66-66: Redefinition of unused FunctionInfo from line 22

Remove definition: FunctionInfo

(F811)


77-77: Unused function argument: builder

(ARG001)


113-113: Redefinition of unused FunctionInfo from line 66

Remove definition: FunctionInfo

(F811)


166-166: Redefinition of unused FunctionInfo from line 113

Remove definition: FunctionInfo

(F811)


185-185: Loop control variable root overrides iterable it iterates

(B020)


185-185: Loop control variable dirs not used within loop body

Rename unused dirs to _dirs

(B007)


229-229: Do not catch blind exception: Exception

(BLE001)


230-230: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


231-231: Use explicit conversion flag

Replace with conversion flag

(RUF010)


242-242: Redefinition of unused FunctionInfo from line 166

Remove definition: FunctionInfo

(F811)


299-299: Unused function argument: arg

(ARG001)


332-332: Unused function argument: arg

(ARG001)


360-360: Redefinition of unused llama_index_rag_tool from line 191

(F811)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: CI Pipeline / Check

Signed-off-by: Will Killian <[email protected]>
@dagardner-nv
Copy link
Contributor

/merge

1 similar comment
@willkill07
Copy link
Member Author

/merge

@rapids-bot rapids-bot bot merged commit 14cfda4 into NVIDIA:release/1.3 Oct 5, 2025
17 checks passed
@willkill07 willkill07 deleted the wkk_observability-notebook branch October 23, 2025 18:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improvement to existing functionality non-breaking Non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants