Add an E2E test for Simple Calculator Galileo observability example #1095

dagardner-nv · 2025-10-23T17:14:15Z

Description

Add a dev dependency for galileo and langsmit (currently langsmith exists as a transitive dependency, but the langsmith E2E test depends on it directly).

By Submitting this PR I confirm:

I am familiar with the Contributing Guidelines.
We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
- Any contribution which contains commits that are not Signed-Off will not be accepted.
When the PR is ready for review, new or existing tests cover these changes.
When the PR is ready for review, the documentation is up to date with these changes.

Summary by CodeRabbit

New Features
- Added end-to-end observability/tracing integration to verify sessions, traces, and spans.
Tests
- Added an integration test exercising the full observability workflow.
- Simplified completion checks by replacing manual flags with polling-based waits.
- Expanded test fixtures to provision Galileo and LangSmith projects/log streams for observability tests.
Chores
- Added development dependencies for Galileo and LangSmith to support observability testing.

Signed-off-by: David Gardner <[email protected]>

…to david-galileo-e2e Signed-off-by: David Gardner <[email protected]>

…s currently a transitive dependency Signed-off-by: David Gardner <[email protected]>

…fter the test, along with a fixture to create a log stream in that project Signed-off-by: David Gardner <[email protected]>

Signed-off-by: David Gardner <[email protected]>

coderabbitai · 2025-10-23T17:14:26Z

Walkthrough

Adds Galileo observability integration: new Galileo fixtures, a new end-to-end Galileo tracing test, a LangSmith test polling simplification, and dev dependency additions for Galileo and LangSmith.

Changes

Cohort / File(s)	Summary
Test fixtures `packages/nvidia_nat_test/src/nat/test/plugin.py`	Added `project_name` fixture; added `galileo_api_key` (session), `galileo_project`, and `galileo_log_stream` fixtures; updated `langsmith_project_name_fixture` to accept `project_name`; extended `TYPE_CHECKING` imports for Galileo types; Galileo fixtures skip or error based on availability.
Integration test `examples/observability/simple_calculator_observability/tests/test_simple_calc_observability.py`	Added `test_galileo_full_workflow` to configure Galileo, run the workflow, poll Galileo for sessions/traces/spans and assert expected counts; replaced manual completion flag in existing LangSmith test with polling; added Galileo types to `TYPE_CHECKING` imports.
Dev dependencies `pyproject.toml`	Added `galileo~=1.27` and `langsmith` to dev dependencies (langsmith noted as a transitive dependency of nat).

Sequence Diagram(s)

sequenceDiagram
    participant Test as E2E Test
    participant Runner as Workflow Runner
    participant Galileo as Galileo API

    Note over Test,Galileo: Provisioning
    Test->>Galileo: create project (project_name)
    Galileo-->>Test: project id
    Test->>Galileo: create log stream
    Galileo-->>Test: log_stream id

    Note over Test,Runner: Execute workflow with Galileo tracing configured
    Test->>Runner: start workflow (with Galileo project/log_stream)
    Runner->>Galileo: emit sessions/traces/spans (async)

    Note over Test,Galileo: Polling & validation
    loop poll until data appears
      Test->>Galileo: query sessions/traces/spans
      Galileo-->>Test: results
    end

    Note over Test,Galileo: Cleanup
    Test->>Galileo: delete project
    Galileo-->>Test: ack

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 44.44% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The pull request title "Add an E2E test for Simple Calculator Galileo observability example" accurately captures the main objective of the changeset. The title is concise at 67 characters (well under the 72-character limit), uses imperative mood with the verb "Add," and is fully descriptive of the primary change. While the PR also includes supporting changes such as fixture refactoring, test modifications, and dependency updates, these are infrastructure changes necessary to enable the main Galileo E2E test addition. The title correctly emphasizes the core purpose of the PR.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

pyproject.toml (1)

1-1: Missing required SPDX Apache-2.0 header.

All source files, including .toml files, must include the SPDX Apache-2.0 header at the top of the file.

Add the Apache-2.0 header at the beginning of the file:

+# SPDX-FileCopyrightText: Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
 [build-system]

As per coding guidelines

🧹 Nitpick comments (2)

examples/observability/simple_calculator_observability/tests/test_simple_calc_observability.py (1)
179-183: Consider polling for traces and spans as well.

The test polls for sessions but immediately queries traces and spans without polling. If Galileo has eventual consistency or ingestion delay, this could lead to flaky test failures.

Consider applying the same polling pattern to traces and spans:
traces_result = None
deadline = time.time() + 10
while (traces_result is None or len(traces_result.records) == 0) and time.time() < deadline:
    await asyncio.sleep(0.5)
    traces_result = galileo.search.get_traces(project_id=galileo_project.id, log_stream_id=galileo_log_stream.id)

assert traces_result and len(traces_result.records) == 1
packages/nvidia_nat_test/src/nat/test/plugin.py (1)
294-297: Consider adding explicit scope to the fixture.

The fixture defaults to function scope, but it's used by both langsmith_project_name_fixture and galileo_project_fixture. If multiple fixtures in the same test need the same project name, an explicit scope would make the behavior clearer.
-@pytest.fixture(name="project_name")
+@pytest.fixture(name="project_name", scope="module")
 def project_name_fixture() -> str:
     # Create a unique project name for each test run
     return f"nat-e2e-test-{time.time()}-{random.random()}"
Note: The static analysis warning about random.random() not being cryptographically secure is a false positive here, as cryptographic security is not required for test project name generation.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f2615a3 and da6d931.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (3)

examples/observability/simple_calculator_observability/tests/test_simple_calc_observability.py (2 hunks)
packages/nvidia_nat_test/src/nat/test/plugin.py (2 hunks)
pyproject.toml (1 hunks)

🧰 Additional context used

📓 Path-based instructions (8)

**/*.py

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

**/*.py: In code comments use the abbreviations: nat (API namespace/CLI), nvidia-nat (package), NAT (env var prefixes); never use these abbreviations in documentation
Follow PEP 20 and PEP 8 for Python style
Run yapf with column_limit=120; yapf is used for formatting (run second)
Indent with 4 spaces (no tabs) and end each file with a single trailing newline
Use ruff (ruff check --fix) as a linter (not formatter) per pyproject.toml; fix warnings unless explicitly ignored
Respect Python naming schemes: snake_case for functions/variables, PascalCase for classes, UPPER_CASE for constants
When re-raising exceptions, use bare raise to preserve stack trace; log with logger.error(), not logger.exception()
When catching and logging without re-raising, use logger.exception() to capture full stack trace
Provide Google-style docstrings for every public module, class, function, and CLI command
Docstring first line must be a concise description ending with a period
Validate and sanitize all user input, especially in web or CLI interfaces
Prefer httpx with SSL verification enabled by default and follow OWASP Top-10 recommendations
Use async/await for I/O-bound work (HTTP, DB, file I/O)
Cache expensive computations with functools.lru_cache or an external cache when appropriate
Leverage NumPy vectorized operations when beneficial and feasible

Files:

examples/observability/simple_calculator_observability/tests/test_simple_calc_observability.py
packages/nvidia_nat_test/src/nat/test/plugin.py

{**/*.py,**/*.sh,**/*.md,**/*.toml,**/*.y?(a)ml,**/*.json,**/*.txt,**/*.ini,**/*.cfg,**/*.ipynb}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

{**/*.py,**/*.sh,**/*.md,**/*.toml,**/*.y?(a)ml,**/*.json,**/*.txt,**/*.ini,**/*.cfg,**/*.ipynb}: Every file must start with the standard SPDX Apache-2.0 header
Confirm copyright years are up to date when a file is changed
All source files must include the SPDX Apache-2.0 header template (copy from an existing file)

Files:

examples/observability/simple_calculator_observability/tests/test_simple_calc_observability.py
packages/nvidia_nat_test/src/nat/test/plugin.py
pyproject.toml

**/*

⚙️ CodeRabbit configuration file

**/*: # Code Review Instructions
Ensure the code follows best practices and coding standards. - For Python code, follow
PEP 20 and
PEP 8 for style guidelines.
Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values.
Example:
def my_function(param1: int, param2: str) -> bool:
    pass
For Python exception handling, ensure proper stack trace preservation:

When re-raising exceptions: use bare raise statements to maintain the original stack trace,
and use logger.error() (not logger.exception()) to avoid duplicate stack trace output.

When catching and logging exceptions without re-raising: always use logger.exception()
to capture the full stack trace information.
Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any

words listed in the ci/vale/styles/config/vocabularies/nat/reject.txt file, words that might appear to be
spelling mistakes but are listed in the ci/vale/styles/config/vocabularies/nat/accept.txt file are OK.

Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,

and should contain an Apache License 2.0 header comment at the top of each file.

Confirm that copyright years are up-to date whenever a file is changed.

Files:

examples/observability/simple_calculator_observability/tests/test_simple_calc_observability.py
packages/nvidia_nat_test/src/nat/test/plugin.py
pyproject.toml

examples/**/*

⚙️ CodeRabbit configuration file

examples/**/*: - This directory contains example code and usage scenarios for the toolkit, at a minimum an example should
contain a README.md or file README.ipynb.

If an example contains Python code, it should be placed in a subdirectory named src/ and should
contain a pyproject.toml file. Optionally, it might also contain scripts in a scripts/ directory.

If an example contains YAML files, they should be placed in a subdirectory named configs/. - If an example contains sample data files, they should be placed in a subdirectory named data/, and should
be checked into git-lfs.

Files:

examples/observability/simple_calculator_observability/tests/test_simple_calc_observability.py

{src/**/*.py,packages/*/src/**/*.py}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

{src/**/*.py,packages/*/src/**/*.py}: All importable Python code must live under src/ or packages//src/
All public APIs must have Python 3.11+ type hints on parameters and return values
Prefer typing/collections.abc abstractions (e.g., Sequence over list)
Use typing.Annotated for units or metadata when useful
Treat pyright warnings as errors during development

Files:

packages/nvidia_nat_test/src/nat/test/plugin.py

packages/*/src/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

If a package contains Python code, it must have tests in a tests/ directory at the same level as pyproject.toml

Files:

packages/nvidia_nat_test/src/nat/test/plugin.py

packages/**/*

⚙️ CodeRabbit configuration file

packages/**/*: - This directory contains optional plugin packages for the toolkit, each should contain a pyproject.toml file. - The pyproject.toml file should declare a dependency on nvidia-nat or another package with a name starting
with nvidia-nat-. This dependency should be declared using ~=<version>, and the version should be a two
digit version (ex: ~=1.0).

Not all packages contain Python code, if they do they should also contain their own set of tests, in a
tests/ directory at the same level as the pyproject.toml file.

Files:

packages/nvidia_nat_test/src/nat/test/plugin.py

{**/pyproject.toml,uv.lock}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

New dependencies must be added to both pyproject.toml (alphabetically) and uv.lock via ‘uv pip install --sync’

Files:

pyproject.toml

🪛 Ruff (0.14.1)

packages/nvidia_nat_test/src/nat/test/plugin.py

297-297: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

320-320: Unused function argument: galileo_api_key

(ARG001)

331-331: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: CI Pipeline / Check

🔇 Additional comments (6)

pyproject.toml (1)

226-226: LGTM!

The Galileo dependency is properly added with an appropriate version constraint using ~=1.27.

examples/observability/simple_calculator_observability/tests/test_simple_calc_observability.py (2)

30-31: LGTM!

The TYPE_CHECKING imports for Galileo modules follow best practices for type annotations without introducing runtime dependencies.

143-151: LGTM!

The simplified polling mechanism is cleaner and more readable. It correctly waits for the LangSmith run to be ingested with a 10-second timeout and appropriate assertions.

packages/nvidia_nat_test/src/nat/test/plugin.py (3)

30-31: LGTM!

The TYPE_CHECKING imports for Galileo modules follow best practices for type annotations.

300-305: LGTM!

The updated fixture signature properly accepts the project_name parameter, allowing for consistent naming across different observability backend tests.

308-316: LGTM!

The galileo_api_key fixture follows the established pattern for API key fixtures in this file, with appropriate session scope and proper documentation.

examples/observability/simple_calculator_observability/tests/test_simple_calc_observability.py

packages/nvidia_nat_test/src/nat/test/plugin.py

pyproject.toml

Signed-off-by: David Gardner <[email protected]>

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between da6d931 and 3c9aef8.

📒 Files selected for processing (1)

packages/nvidia_nat_test/src/nat/test/plugin.py (2 hunks)

🧰 Additional context used

📓 Path-based instructions (6)

**/*.py

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

**/*.py: In code comments use the abbreviations: nat (API namespace/CLI), nvidia-nat (package), NAT (env var prefixes); never use these abbreviations in documentation
Follow PEP 20 and PEP 8 for Python style
Run yapf with column_limit=120; yapf is used for formatting (run second)
Indent with 4 spaces (no tabs) and end each file with a single trailing newline
Use ruff (ruff check --fix) as a linter (not formatter) per pyproject.toml; fix warnings unless explicitly ignored
Respect Python naming schemes: snake_case for functions/variables, PascalCase for classes, UPPER_CASE for constants
When re-raising exceptions, use bare raise to preserve stack trace; log with logger.error(), not logger.exception()
When catching and logging without re-raising, use logger.exception() to capture full stack trace
Provide Google-style docstrings for every public module, class, function, and CLI command
Docstring first line must be a concise description ending with a period
Validate and sanitize all user input, especially in web or CLI interfaces
Prefer httpx with SSL verification enabled by default and follow OWASP Top-10 recommendations
Use async/await for I/O-bound work (HTTP, DB, file I/O)
Cache expensive computations with functools.lru_cache or an external cache when appropriate
Leverage NumPy vectorized operations when beneficial and feasible

Files:

packages/nvidia_nat_test/src/nat/test/plugin.py

{src/**/*.py,packages/*/src/**/*.py}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

{src/**/*.py,packages/*/src/**/*.py}: All importable Python code must live under src/ or packages//src/
All public APIs must have Python 3.11+ type hints on parameters and return values
Prefer typing/collections.abc abstractions (e.g., Sequence over list)
Use typing.Annotated for units or metadata when useful
Treat pyright warnings as errors during development

Files:

packages/nvidia_nat_test/src/nat/test/plugin.py

packages/*/src/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

If a package contains Python code, it must have tests in a tests/ directory at the same level as pyproject.toml

Files:

packages/nvidia_nat_test/src/nat/test/plugin.py

{**/*.py,**/*.sh,**/*.md,**/*.toml,**/*.y?(a)ml,**/*.json,**/*.txt,**/*.ini,**/*.cfg,**/*.ipynb}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

{**/*.py,**/*.sh,**/*.md,**/*.toml,**/*.y?(a)ml,**/*.json,**/*.txt,**/*.ini,**/*.cfg,**/*.ipynb}: Every file must start with the standard SPDX Apache-2.0 header
Confirm copyright years are up to date when a file is changed
All source files must include the SPDX Apache-2.0 header template (copy from an existing file)

Files:

packages/nvidia_nat_test/src/nat/test/plugin.py

**/*

⚙️ CodeRabbit configuration file

**/*: # Code Review Instructions
Ensure the code follows best practices and coding standards. - For Python code, follow
PEP 20 and
PEP 8 for style guidelines.
Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values.
Example:
def my_function(param1: int, param2: str) -> bool:
    pass
For Python exception handling, ensure proper stack trace preservation:

When re-raising exceptions: use bare raise statements to maintain the original stack trace,
and use logger.error() (not logger.exception()) to avoid duplicate stack trace output.

When catching and logging exceptions without re-raising: always use logger.exception()
to capture the full stack trace information.
Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any

words listed in the ci/vale/styles/config/vocabularies/nat/reject.txt file, words that might appear to be
spelling mistakes but are listed in the ci/vale/styles/config/vocabularies/nat/accept.txt file are OK.

Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,

and should contain an Apache License 2.0 header comment at the top of each file.

Confirm that copyright years are up-to date whenever a file is changed.

Files:

packages/nvidia_nat_test/src/nat/test/plugin.py

packages/**/*

⚙️ CodeRabbit configuration file

packages/**/*: - This directory contains optional plugin packages for the toolkit, each should contain a pyproject.toml file. - The pyproject.toml file should declare a dependency on nvidia-nat or another package with a name starting
with nvidia-nat-. This dependency should be declared using ~=<version>, and the version should be a two
digit version (ex: ~=1.0).

Not all packages contain Python code, if they do they should also contain their own set of tests, in a
tests/ directory at the same level as the pyproject.toml file.

Files:

packages/nvidia_nat_test/src/nat/test/plugin.py

🪛 Ruff (0.14.1)

packages/nvidia_nat_test/src/nat/test/plugin.py

298-298: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

321-321: Unused function argument: galileo_api_key

(ARG001)

🔇 Additional comments (3)

packages/nvidia_nat_test/src/nat/test/plugin.py (3)

302-306: LGTM - Good refactoring.

The signature change to use the shared project_name fixture promotes consistency across observability backends (LangSmith and Galileo).

320-336: LGTM - Past review issues addressed.

The exception handling now correctly preserves the stack trace with from e, and the fixture follows the established pattern for project lifecycle management.

Note: The galileo_api_key parameter is intentionally unused - it's a fixture dependency ensuring the API key environment variable is set. The static analysis warning (ARG001) is a false positive.

339-347: LGTM - Docstring added.

The docstring from the past review has been added, and the fixture correctly relies on the project lifecycle for cleanup.

packages/nvidia_nat_test/src/nat/test/plugin.py

…into david-galileo-e2e Signed-off-by: David Gardner <[email protected]>

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

examples/observability/simple_calculator_observability/tests/test_simple_calc_observability.py (1)

188-200: Add return type and a concise Google-style docstring to the Galileo test.

Required by the coding guidelines for functions. Keeps style consistent with prior feedback.

 @pytest.mark.integration
 @pytest.mark.usefixtures("galileo_api_key")
 async def test_galileo_full_workflow(config_dir: Path,
                                      galileo_project: "galileo.projects.Project",
                                      galileo_log_stream: "galileo.log_streams.LogStream",
                                      question: str,
-                                     expected_answer: str):
+                                     expected_answer: str) -> None:
+    """
+    Verify Galileo tracing end-to-end.
+
+    Runs the simple calculator workflow with Galileo tracing enabled and checks that a session,
+    a single trace, and multiple spans are created for the project's log stream.
+    """
     config_file = config_dir / "config-galileo.yml"

As per coding guidelines

🧹 Nitpick comments (2)

examples/observability/simple_calculator_observability/tests/test_simple_calc_observability.py (2)

176-186: Make LangSmith polling more robust and fail clearly on timeout.

Increase the wait window and add an explicit failure to aid triage. Keeps assertion semantics intact.

-    deadline = time.time() + 10
+    deadline = time.time() + 30
     while len(runlist) == 0 and time.time() < deadline:
         # Wait for traces to be ingested
         await asyncio.sleep(0.5)
         runs = langsmith_client.list_runs(project_name=langsmith_project_name, is_root=True)
         runlist = [run for run in runs]

-    # Since we have a newly created project, the above workflow should have created exactly one root run
+    if len(runlist) == 0:
+        pytest.fail(f"Timed out waiting for a root run in LangSmith project '{langsmith_project_name}'.")
+    # Since we have a newly created project, the above workflow should have created exactly one root run
     assert len(runlist) == 1

200-218: Reduce flakiness: poll for traces and spans, avoid blocking the event loop, and provide clear failures.

Currently only sessions are polled; traces/spans are fetched once and may race with ingestion. Also, synchronous SDK calls in an async test can block. Poll all three with asyncio.to_thread and clearer failure messages.

     await run_workflow(config=config, question=question, expected_answer=expected_answer)

-    import galileo.search
-
-    sessions = []
-    deadline = time.time() + 10
-    while len(sessions) == 0 and time.time() < deadline:
-        # Wait for traces to be ingested
-        await asyncio.sleep(0.5)
-        results = galileo.search.get_sessions(project_id=galileo_project.id, log_stream_id=galileo_log_stream.id)
-        sessions = results.records or []
-
-    assert len(sessions) == 1
-
-    traces = galileo.search.get_traces(project_id=galileo_project.id, log_stream_id=galileo_log_stream.id)
-    assert len(traces.records) == 1
-
-    spans = galileo.search.get_spans(project_id=galileo_project.id, log_stream_id=galileo_log_stream.id)
-    assert len(spans.records) > 1
+    import galileo.search
+
+    # Sessions
+    sessions = []
+    deadline = time.time() + 30
+    while not sessions and time.time() < deadline:
+        await asyncio.sleep(0.5)
+        results = await asyncio.to_thread(
+            galileo.search.get_sessions, project_id=galileo_project.id, log_stream_id=galileo_log_stream.id
+        )
+        sessions = results.records or []
+    if not sessions:
+        pytest.fail(f"Timed out waiting for Galileo sessions in {galileo_project.name}/{galileo_log_stream.name}.")
+    assert len(sessions) == 1
+
+    # Traces
+    traces = []
+    deadline = time.time() + 30
+    while not traces and time.time() < deadline:
+        await asyncio.sleep(0.5)
+        t_res = await asyncio.to_thread(
+            galileo.search.get_traces, project_id=galileo_project.id, log_stream_id=galileo_log_stream.id
+        )
+        traces = t_res.records or []
+    if not traces:
+        pytest.fail(f"Timed out waiting for Galileo traces in {galileo_project.name}/{galileo_log_stream.name}.")
+    assert len(traces) == 1
+
+    # Spans
+    spans = []
+    deadline = time.time() + 30
+    while not spans and time.time() < deadline:
+        await asyncio.sleep(0.5)
+        s_res = await asyncio.to_thread(
+            galileo.search.get_spans, project_id=galileo_project.id, log_stream_id=galileo_log_stream.id
+        )
+        spans = s_res.records or []
+    if not spans:
+        pytest.fail(f"Timed out waiting for Galileo spans in {galileo_project.name}/{galileo_log_stream.name}.")
+    assert len(spans) > 1

Please confirm whether galileo_project/galileo_log_stream fixtures guarantee fresh, empty resources per test run; if not, asserting exactly one session/trace may flake. If they are shared, consider asserting “>= 1” or filtering by a unique attribute/timestamp. Based on learnings

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3c9aef8 and 02aaff0.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (1)

examples/observability/simple_calculator_observability/tests/test_simple_calc_observability.py (2 hunks)

🧰 Additional context used

📓 Path-based instructions (4)

**/*.py

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

**/*.py: In code comments use the abbreviations: nat (API namespace/CLI), nvidia-nat (package), NAT (env var prefixes); never use these abbreviations in documentation
Follow PEP 20 and PEP 8 for Python style
Run yapf with column_limit=120; yapf is used for formatting (run second)
Indent with 4 spaces (no tabs) and end each file with a single trailing newline
Use ruff (ruff check --fix) as a linter (not formatter) per pyproject.toml; fix warnings unless explicitly ignored
Respect Python naming schemes: snake_case for functions/variables, PascalCase for classes, UPPER_CASE for constants
When re-raising exceptions, use bare raise to preserve stack trace; log with logger.error(), not logger.exception()
When catching and logging without re-raising, use logger.exception() to capture full stack trace
Provide Google-style docstrings for every public module, class, function, and CLI command
Docstring first line must be a concise description ending with a period
Validate and sanitize all user input, especially in web or CLI interfaces
Prefer httpx with SSL verification enabled by default and follow OWASP Top-10 recommendations
Use async/await for I/O-bound work (HTTP, DB, file I/O)
Cache expensive computations with functools.lru_cache or an external cache when appropriate
Leverage NumPy vectorized operations when beneficial and feasible

Files:

examples/observability/simple_calculator_observability/tests/test_simple_calc_observability.py

{**/*.py,**/*.sh,**/*.md,**/*.toml,**/*.y?(a)ml,**/*.json,**/*.txt,**/*.ini,**/*.cfg,**/*.ipynb}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

{**/*.py,**/*.sh,**/*.md,**/*.toml,**/*.y?(a)ml,**/*.json,**/*.txt,**/*.ini,**/*.cfg,**/*.ipynb}: Every file must start with the standard SPDX Apache-2.0 header
Confirm copyright years are up to date when a file is changed
All source files must include the SPDX Apache-2.0 header template (copy from an existing file)

Files:

examples/observability/simple_calculator_observability/tests/test_simple_calc_observability.py

**/*

⚙️ CodeRabbit configuration file

**/*: # Code Review Instructions
Ensure the code follows best practices and coding standards. - For Python code, follow
PEP 20 and
PEP 8 for style guidelines.
Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values.
Example:
def my_function(param1: int, param2: str) -> bool:
    pass
For Python exception handling, ensure proper stack trace preservation:

When re-raising exceptions: use bare raise statements to maintain the original stack trace,
and use logger.error() (not logger.exception()) to avoid duplicate stack trace output.

When catching and logging exceptions without re-raising: always use logger.exception()
to capture the full stack trace information.
Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any

words listed in the ci/vale/styles/config/vocabularies/nat/reject.txt file, words that might appear to be
spelling mistakes but are listed in the ci/vale/styles/config/vocabularies/nat/accept.txt file are OK.

Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,

and should contain an Apache License 2.0 header comment at the top of each file.

Confirm that copyright years are up-to date whenever a file is changed.

Files:

examples/observability/simple_calculator_observability/tests/test_simple_calc_observability.py

examples/**/*

⚙️ CodeRabbit configuration file

examples/**/*: - This directory contains example code and usage scenarios for the toolkit, at a minimum an example should
contain a README.md or file README.ipynb.

If an example contains Python code, it should be placed in a subdirectory named src/ and should
contain a pyproject.toml file. Optionally, it might also contain scripts in a scripts/ directory.

If an example contains YAML files, they should be placed in a subdirectory named configs/. - If an example contains sample data files, they should be placed in a subdirectory named data/, and should
be checked into git-lfs.

Files:

examples/observability/simple_calculator_observability/tests/test_simple_calc_observability.py

🔇 Additional comments (1)

examples/observability/simple_calculator_observability/tests/test_simple_calc_observability.py (1)

31-32: LGTM on type-check-only imports for Galileo.

Safe forward-ref usage; no runtime import cost. No changes required.

dagardner-nv added 6 commits October 23, 2025 08:21

Add fixture for the Galileo API key

8cda426

Signed-off-by: David Gardner <[email protected]>

Add simple E2E test

0355603

Signed-off-by: David Gardner <[email protected]>

Merge branch 'release/1.3' of github.com:NVIDIA/NeMo-Agent-Toolkit in…

4321b01

…to david-galileo-e2e Signed-off-by: David Gardner <[email protected]>

Add dev deps for galileo (Apache v2) and langsmith (MIT). Langsmith i…

06c05a6

…s currently a transitive dependency Signed-off-by: David Gardner <[email protected]>

Add a fixture for creating a unique galileo project and cleaning up a…

b197c1c

…fter the test, along with a fixture to create a log stream in that project Signed-off-by: David Gardner <[email protected]>

Verify that the workflow published traces and spans

da6d931

Signed-off-by: David Gardner <[email protected]>

dagardner-nv self-assigned this Oct 23, 2025

dagardner-nv added improvement Improvement to existing functionality non-breaking Non-breaking change labels Oct 23, 2025

dagardner-nv marked this pull request as ready for review October 23, 2025 17:15

dagardner-nv requested review from a team as code owners October 23, 2025 17:15

coderabbitai bot reviewed Oct 23, 2025

View reviewed changes

dagardner-nv added 2 commits October 23, 2025 10:28

Apply code rabbit suggestions

9be26da

Signed-off-by: David Gardner <[email protected]>

Lint fixes

3c9aef8

Signed-off-by: David Gardner <[email protected]>

coderabbitai bot reviewed Oct 23, 2025

View reviewed changes

packages/nvidia_nat_test/src/nat/test/plugin.py Show resolved Hide resolved

dagardner-nv changed the base branch from release/1.3 to develop October 23, 2025 18:01

Merge branch 'develop' of https://github.com/NVIDIA/NeMo-Agent-Toolkit …

02aaff0

…into david-galileo-e2e Signed-off-by: David Gardner <[email protected]>

coderabbitai bot reviewed Oct 23, 2025

View reviewed changes

Merge branch 'develop' into david-galileo-e2e

a97b506

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add an E2E test for Simple Calculator Galileo observability example #1095

Add an E2E test for Simple Calculator Galileo observability example #1095

dagardner-nv commented Oct 23, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Oct 23, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,

Uh oh!

Uh oh!

coderabbitai bot left a comment

Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Add an E2E test for Simple Calculator Galileo observability example #1095

Are you sure you want to change the base?

Add an E2E test for Simple Calculator Galileo observability example #1095

Conversation

dagardner-nv commented Oct 23, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

By Submitting this PR I confirm:

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

dagardner-nv commented Oct 23, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 23, 2025 •

edited

Loading