Skip to content

Conversation

@bsatapat-jpg
Copy link
Collaborator

@bsatapat-jpg bsatapat-jpg commented Sep 2, 2025

[DESC] [0] Generated 75 relevant test cases
[1] All the test cases passes successfully.

Summary by CodeRabbit

  • Tests
    • Added comprehensive test suites for CLI, config, evaluation, and metrics with robust fixtures, environment setup, and automatic test categorization (unit, integration, slow, component tags).
    • Introduced a test runner CLI to filter by type/markers, toggle verbosity, and optionally collect coverage.
  • Documentation
    • Added a tests README detailing structure, how to run/select tests, environment variables, mocking guidance, CI tips, and troubleshooting.
  • Chores
    • Added centralized Pytest configuration: structured discovery, verbose output, strict markers, colorized output, warning filters, minimum version, timeout, and predefined markers; optional coverage support.

…ramework

[DESC] [0] Generated 75 relevant test cases
       [1] All the test cases passes successfully.
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 2, 2025

Walkthrough

Adds centralized pytest configuration, a test runner script, extensive pytest fixtures and hooks, and comprehensive test suites for CLI, config, evaluation, and metrics. Also includes testing documentation. No production code changes or public API modifications.

Changes

Cohort / File(s) Summary
Pytest Configuration
pytest.ini
New pytest config: discovery rules, verbosity/traceback, strict markers, warnings filters, minversion, timeout; registers markers; optional coverage flags (commented).
Test Scaffolding & Fixtures
tests/conftest.py
Adds fixtures for sample configs/data, temp files/dirs, environment setup; mocks LLM manager; registers markers; auto-tagging via collection hook.
Test Runner
tests/run_tests.py
New CLI runner building and executing pytest commands with type/marker filters, verbosity, optional coverage; includes argument parsing and plugin checks.
Test Suites
tests/test_cli.py, tests/test_config.py, tests/test_evaluation.py, tests/test_metrics.py
Comprehensive unit/integration tests for CLI behavior, configuration loading/validation, evaluation pipeline, metrics frameworks and parsing; heavy use of mocks/fixtures.
Testing Docs
tests/README.md
New README detailing test structure, running options, markers, fixtures, environment variables, mocking, CI notes, and troubleshooting.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant Runner as tests/run_tests.py
  participant Pytest as pytest
  participant Plugins as Markers/Plugins
  participant OS as OS/Subprocess

  User->>Runner: main(args)
  Runner->>Runner: parse args / validate plugins
  Runner->>OS: subprocess.run(["pytest", ...])
  OS->>Pytest: launch
  Pytest->>Plugins: load config & markers (pytest.ini, conftest.py)
  Pytest->>Pytest: collect tests (apply auto-marking)
  Pytest->>OS: execute tests (per markers/filters)
  OS-->>Runner: return code
  Runner-->>User: exit with code
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • Generic eval tool #28 — Introduces/updates the core evaluation components and CLI that these tests and runner target and exercise.

Suggested reviewers

  • tisnik
  • VladimirKadlec
  • Anxhela21

Poem

hop hop! I set the tests to run,
markers lined beneath the sun.
Fixtures bloom in temp-y glades,
pytest sings through mocked cascades.
Green carrots sprout—results align! 🥕
If red appears, I’ll debug fine—
then hop back happy, pipelines shine.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/test_config.py (1)

436-514: Remove duplicate TestConfigurationScenarios class.

Defining the same class name twice in the module overrides the first and risks silently dropping tests. Keep a single definition.

- class TestConfigurationScenarios:
-     """Test realistic configuration scenarios."""
-     ...
+ # Duplicate block removed (class TestConfigurationScenarios defined once above).
🧹 Nitpick comments (21)
pytest.ini (1)

11-16: Avoid verbosity conflicts between pytest.ini and tests/run_tests.py.

tests/run_tests.py adds -q/-v; keeping -v here yields mixed flags. Prefer runner-controlled verbosity.

Apply:

 addopts = 
-    -v
     --tb=short
     --strict-markers
     --disable-warnings
     --color=yes
tests/run_tests.py (2)

19-24: Let pytest.ini drive verbosity; drop default -q here.

This prevents -q/-v conflicts when ini sets -v.

Apply:

-    if verbose:
-        cmd.append("-v")
-    else:
-        cmd.append("-q")
+    if verbose:
+        cmd.append("-v")

61-63: Make relative test-path handling cross-platform and robust.

Avoid string-prefix checks with "/". Use pathlib parts.

Apply:

-        if not test_type.startswith("/") and not test_type.startswith("tests/"):
-            test_type = str(tests_dir / test_type)
-        cmd.append(test_type)
+        p = Path(test_type)
+        if not p.is_absolute() and (not p.parts or p.parts[0] != "tests"):
+            p = tests_dir / p
+        cmd.append(str(p))
tests/README.md (1)

7-16: Add language to fenced code block (markdownlint MD040).

Improves rendering and tooling.

Apply:

-```
+```text
 tests/
 ├── README.md                 # This file
 ├── conftest.py              # Pytest configuration and shared fixtures
 ├── run_tests.py             # Test runner script for convenient test execution
 ├── test_evaluation.py       # Main evaluation tests
 ├── test_config.py           # Configuration loading and validation tests
 ├── test_metrics.py          # Metrics evaluation tests
 └── test_cli.py              # Command-line interface tests

</blockquote></details>
<details>
<summary>tests/test_config.py (3)</summary><blockquote>

`212-214`: **Isolate loader test from metric mapping side effects.**

`load_system_config` calls `populate_metric_mappings`. Patch it to a no-op to avoid unintended state or import-time errors during this unit test.


```diff
-    @patch('lightspeed_evaluation.core.config.loader.setup_logging')
-    def test_load_system_config_with_mock(self, mock_setup_logging):
+    @patch('lightspeed_evaluation.core.config.loader.populate_metric_mappings')
+    @patch('lightspeed_evaluation.core.config.loader.setup_logging')
+    def test_load_system_config_with_mock(self, mock_setup_logging, mock_populate_metric_mappings):
         """Test loading system config with mocked dependencies."""

283-291: Assert all documented fallback environment variables.

Also assert DEEPEVAL_DISABLE_PROGRESS_BAR to fully validate fallback behavior.

         # Should set fallback values
         assert os.environ.get("DEEPEVAL_TELEMETRY_OPT_OUT") == "YES"
         assert os.environ.get("LITELLM_LOG_LEVEL") == "ERROR"
+        assert os.environ.get("DEEPEVAL_DISABLE_PROGRESS_BAR") == "YES"

345-362: Parametrize provider cases for clarity.

Use pytest.mark.parametrize instead of a manual loop to improve readability and failure reporting.

-    def test_system_config_with_different_providers(self):
-        """Test SystemConfig with different LLM providers."""
-        providers_config = [
-            {"provider": "openai", "model": "gpt-4o-mini", "temperature": 0.0},
-            {"provider": "anthropic", "model": "claude-3-sonnet", "temperature": 0.1},
-        ]
-        for config_data in providers_config:
+    @pytest.mark.parametrize(
+        "config_data",
+        [
+            {"provider": "openai", "model": "gpt-4o-mini", "temperature": 0.0},
+            {"provider": "anthropic", "model": "claude-3-sonnet", "temperature": 0.1},
+        ],
+    )
+    def test_system_config_with_different_providers(self, config_data):
         ...
tests/test_evaluation.py (3)

55-56: Avoid brittle assumption about sample data length.

Hard-coding len(evaluation_data) == 3 couples the test to external sample files. Prefer asserting structure/content rather than exact count.

-        assert len(evaluation_data) == 3  # Based on the sample data
+        assert len(evaluation_data) >= 1

421-427: Remove local pytest_configure; keep marker registration centralized in conftest.py.

Redundant marker registration can confuse contributors.

-# Pytest configuration
-def pytest_configure(config):
-    """Configure pytest with custom markers."""
-    config.addinivalue_line(
-        "markers", "integration: mark test as integration test"
-    )

179-183: Patch at the boundary you control.

Consider patching MetricsManager.evaluate_metric instead of a specific framework’s class to reduce coupling to internal modules.

If you want, I can provide a minimal patch that stubs MetricsManager.evaluate_metric and returns framework-specific tuples based on framework/metric_name.

tests/conftest.py (1)

221-226: Return a pathlib.Path for temp_output_dir.

Most call sites use Path. Returning Path(temp_dir) avoids repeated conversions.

-    with tempfile.TemporaryDirectory() as temp_dir:
-        yield temp_dir
+    with tempfile.TemporaryDirectory() as temp_dir:
+        yield Path(temp_dir)
tests/test_cli.py (6)

51-62: Also assert environment initialization is invoked.

Verifies CLI calls setup_environment_variables on the happy path.

                 result = main()
-                assert result == 0  # Should return success code
+                assert result == 0  # Should return success code
+                mock_setup_env.assert_called_once()

Also applies to: 93-97


103-132: Assert env setup on --output-dir flow as well.

Keeps behavior consistent across success scenarios.

             ]):
                 result = main()
                 assert result == 0
+                mock_setup_env.assert_called_once()

Also applies to: 125-131


356-373: Verify run_evaluation receives the overridden output dir.

Tightens the contract in the real-world scenario test.

                     result = main()
-                    assert result == 0
+                    assert result == 0
+                    mock_run_eval.assert_called_once_with(
+                        'config/system.yaml',
+                        'config/evaluation_data.yaml',
+                        '/custom/output/path'
+                    )

Also applies to: 361-372


301-311: Patch Path.exists directly to avoid over-mocking Path.

Reduces side effects from replacing Path with a MagicMock.

-                with patch('lightspeed_evaluation.runner.evaluation.Path') as mock_path:
-                    # Mock Path.exists to return False for default paths
-                    mock_path.return_value.exists.return_value = False
+                with patch('lightspeed_evaluation.runner.evaluation.Path.exists', return_value=False):

312-329: Same: patch Path.exists directly.

-            with patch('lightspeed_evaluation.runner.evaluation.setup_environment_variables'):
-                with patch('lightspeed_evaluation.runner.evaluation.Path') as mock_path:
-                    # Mock Path.exists to return False
-                    mock_path.return_value.exists.return_value = False
+            with patch('lightspeed_evaluation.runner.evaluation.setup_environment_variables'):
+                with patch('lightspeed_evaluation.runner.evaluation.Path.exists', return_value=False):

336-351: Same: patch Path.exists directly.

-            with patch('lightspeed_evaluation.runner.evaluation.setup_environment_variables'):
-                with patch('lightspeed_evaluation.runner.evaluation.Path') as mock_path:
-                    mock_path.return_value.exists.return_value = False
+            with patch('lightspeed_evaluation.runner.evaluation.setup_environment_variables'):
+                with patch('lightspeed_evaluation.runner.evaluation.Path.exists', return_value=False):
tests/test_metrics.py (4)

36-43: Assert the specific exception type (LLMError) for missing API key.

Makes the test stricter and documents expected contract.

-from lightspeed_evaluation.core.llm.manager import LLMConfig, LLMManager
+from lightspeed_evaluation.core.llm.manager import LLMConfig, LLMManager, LLMError
@@
-        with patch.dict(os.environ, {}, clear=True):
-            with pytest.raises(Exception, match="OPENAI_API_KEY"):
-                LLMManager(config)
+        with patch.dict(os.environ, {}, clear=True):
+            with pytest.raises(LLMError, match="OPENAI_API_KEY"):
+                LLMManager(config)

Also applies to: 8-10


90-119: Also verify litellm.completion is called with the resolved model.

Guards the wiring between LLMManager params and the prompt call.

         score, reason = metrics.evaluate("answer_correctness", None, scope)
 
         assert score == 0.85
         assert "Custom answer correctness" in reason
+        # Ensure completion was invoked with expected params
+        mock_completion.assert_called_once()
+        _, kwargs = mock_completion.call_args
+        assert kwargs.get("model") == "gpt-4o-mini"
+        assert "messages" in kwargs

373-405: Mirror the completion-call assertion in the documentation scenario.

Keeps consistency with the earlier custom-metric test.

         score, reason = metrics.evaluate("answer_correctness", None, scope)
 
         assert score == 0.88
         assert "technical information" in reason
+        mock_completion.assert_called_once()

136-145: Add a test to ensure CustomMetrics rejects conversation scope.

Covers the documented constraint for turn-only metrics.

         assert "Unsupported custom metric" in reason
+
+    def test_answer_correctness_rejects_conversation_scope(self, mock_llm_manager):
+        """Custom metric should not run at conversation-level."""
+        metrics = CustomMetrics(mock_llm_manager)
+        scope = EvaluationScope(
+            turn_idx=0,
+            turn_data=TurnData(turn_id=1, query="q", response="r"),
+            is_conversation=True,
+        )
+        score, reason = metrics.evaluate("answer_correctness", None, scope)
+        assert score is None
+        assert "turn" in reason.lower()
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 2bbafdd and eb20bf4.

📒 Files selected for processing (8)
  • pytest.ini (1 hunks)
  • tests/README.md (1 hunks)
  • tests/conftest.py (1 hunks)
  • tests/run_tests.py (1 hunks)
  • tests/test_cli.py (1 hunks)
  • tests/test_config.py (1 hunks)
  • tests/test_evaluation.py (1 hunks)
  • tests/test_metrics.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (5)
tests/conftest.py (5)
src/lightspeed_evaluation/core/config/models.py (3)
  • EvaluationData (47-130)
  • TurnData (8-44)
  • LLMConfig (172-202)
src/lightspeed_evaluation/core/config/loader.py (1)
  • SystemConfig (151-190)
src/lightspeed_evaluation/core/llm/manager.py (2)
  • LLMManager (13-150)
  • get_litellm_params (131-139)
tests/test_evaluation.py (1)
  • pytest_configure (422-426)
tests/test_metrics.py (4)
  • mock_llm_manager (70-81)
  • mock_llm_manager (151-160)
  • mock_llm_manager (241-250)
  • mock_llm_manager (360-371)
tests/test_metrics.py (8)
src/lightspeed_evaluation/core/config/models.py (3)
  • EvaluationData (47-130)
  • TurnData (8-44)
  • LLMConfig (172-202)
src/lightspeed_evaluation/core/llm/manager.py (2)
  • LLMManager (13-150)
  • get_litellm_params (131-139)
src/lightspeed_evaluation/core/metrics/custom.py (2)
  • CustomMetrics (29-251)
  • _parse_score_response (91-130)
src/lightspeed_evaluation/core/metrics/deepeval.py (1)
  • DeepEvalMetrics (19-138)
src/lightspeed_evaluation/core/metrics/ragas.py (1)
  • RagasMetrics (23-263)
src/lightspeed_evaluation/core/output/statistics.py (1)
  • EvaluationScope (11-16)
tests/conftest.py (1)
  • mock_llm_manager (82-101)
src/lightspeed_evaluation/drivers/evaluation.py (2)
  • MetricsManager (77-115)
  • get_supported_frameworks (113-115)
tests/test_config.py (3)
src/lightspeed_evaluation/core/config/loader.py (4)
  • ConfigLoader (193-275)
  • SystemConfig (151-190)
  • setup_environment_variables (30-47)
  • load_system_config (202-259)
src/lightspeed_evaluation/core/config/validator.py (2)
  • DataValidator (11-82)
  • load_evaluation_data (19-37)
src/lightspeed_evaluation/core/config/models.py (4)
  • EvaluationData (47-130)
  • LLMConfig (172-202)
  • TurnData (8-44)
  • from_dict (193-202)
tests/test_cli.py (3)
src/lightspeed_evaluation/core/config/loader.py (1)
  • load_system_config (202-259)
src/lightspeed_evaluation/core/config/validator.py (1)
  • load_evaluation_data (19-37)
src/lightspeed_evaluation/core/output/generator.py (1)
  • generate_reports (32-80)
tests/test_evaluation.py (6)
src/lightspeed_evaluation/core/config/loader.py (3)
  • ConfigLoader (193-275)
  • load_system_config (202-259)
  • get_llm_config_dict (261-275)
src/lightspeed_evaluation/core/config/validator.py (3)
  • DataValidator (11-82)
  • load_evaluation_data (19-37)
  • validate_evaluation_data (39-54)
src/lightspeed_evaluation/drivers/evaluation.py (1)
  • EvaluationDriver (118-322)
src/lightspeed_evaluation/core/output/generator.py (2)
  • OutputHandler (15-244)
  • generate_reports (32-80)
src/lightspeed_evaluation/core/config/models.py (2)
  • EvaluationData (47-130)
  • TurnData (8-44)
tests/conftest.py (1)
  • pytest_configure (256-278)
🪛 LanguageTool
tests/README.md

[grammar] ~23-~23: There might be a mistake here.
Context: ...ading, validation, and environment setup - metrics: Metric evaluation (Ragas, DeepEval, Cu...

(QB_NEW_EN)


[grammar] ~24-~24: There might be a mistake here.
Context: ...ric evaluation (Ragas, DeepEval, Custom) - cli: Command-line interface and argument pa...

(QB_NEW_EN)


[grammar] ~25-~25: There might be a mistake here.
Context: ...mand-line interface and argument parsing - output: Report generation and output handling ...

(QB_NEW_EN)


[grammar] ~29-~29: There might be a mistake here.
Context: ...Fast unit tests with mocked dependencies - integration: Integration tests using real configura...

(QB_NEW_EN)


[grammar] ~30-~30: There might be a mistake here.
Context: ...ion tests using real configuration files - slow: Tests that take longer to run (usually...

(QB_NEW_EN)


[grammar] ~138-~138: There might be a mistake here.
Context: ...g**: Pre-configured SystemConfig object - **sample_llm_config**: Pre-configured LLMConfig object - **s...

(QB_NEW_EN)


[grammar] ~139-~139: There might be a mistake here.
Context: ...nfig**: Pre-configured LLMConfig object - **sample_turn_data**: Sample conversation turn data - **sam...

(QB_NEW_EN)


[grammar] ~140-~140: There might be a mistake here.
Context: ...n_data**: Sample conversation turn data - **sample_evaluation_data`**: Complete evaluation data structure - *...

(QB_NEW_EN)


[grammar] ~141-~141: There might be a mistake here.
Context: ...a**: Complete evaluation data structure - **mock_llm_manager**: Mocked LLM manager for testing - **te...

(QB_NEW_EN)


[grammar] ~142-~142: There might be a mistake here.
Context: ...nager**: Mocked LLM manager for testing - **temp_config_files**: Temporary configuration files - **tem...

(QB_NEW_EN)


[grammar] ~143-~143: There might be a mistake here.
Context: ..._files**: Temporary configuration files - **temp_output_dir`**: Temporary output directory ## Test Co...

(QB_NEW_EN)


[grammar] ~169-~169: There might be a mistake here.
Context: ... data - Use descriptive test names: test_load_system_config_with_valid_file - Group related tests: Use test classes ...

(QB_NEW_EN)


[style] ~234-~234: Consider an alternative adverb to strengthen your wording.
Context: ...encies are mocked - Temporary files are properly cleaned up - Tests are deterministic an...

(PROPERLY_THOROUGHLY)

🪛 markdownlint-cli2 (0.17.2)
tests/README.md

7-7: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🔇 Additional comments (2)
tests/test_metrics.py (2)

147-173: LGTM on Ragas manager wiring and supported metrics.

Constructor patching is correct; validates mapping without importing heavy deps.


252-263: LGTM on DeepEval manager wiring and supported metrics.

Good isolation via patch; covers the registry.

@@ -0,0 +1,41 @@
[tool:pytest]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Wrong INI section header — pytest will ignore this file.

In pytest.ini the section must be [pytest], not [tool:pytest].

Apply:

-[tool:pytest]
+[pytest]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
[tool:pytest]
File: pytest.ini
[pytest]
🤖 Prompt for AI Agents
In pytest.ini around line 1, the INI section header is incorrect: change the
section header from "[tool:pytest]" to "[pytest]" so pytest will recognize and
load the configuration; update that single line accordingly and keep the rest of
the file intact.

Comment on lines +18 to +26
# Markers
markers =
integration: Integration tests that require real config files
slow: Tests that take a long time to run
unit: Fast unit tests
config: Configuration-related tests
metrics: Metric evaluation tests
output: Output generation tests

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Unregistered marker used by suite and runner ("cli").

With --strict-markers enabled, -m cli will error unless registered.

Apply:

 markers =
     integration: Integration tests that require real config files
     slow: Tests that take a long time to run
     unit: Fast unit tests
     config: Configuration-related tests
     metrics: Metric evaluation tests
     output: Output generation tests
+    cli: CLI-related tests
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Markers
markers =
integration: Integration tests that require real config files
slow: Tests that take a long time to run
unit: Fast unit tests
config: Configuration-related tests
metrics: Metric evaluation tests
output: Output generation tests
# Markers
markers =
integration: Integration tests that require real config files
slow: Tests that take a long time to run
unit: Fast unit tests
config: Configuration-related tests
metrics: Metric evaluation tests
output: Output generation tests
cli: CLI-related tests
🤖 Prompt for AI Agents
In pytest.ini around lines 18 to 26, the test suite uses an unregistered pytest
marker "cli" which causes failures when --strict-markers is enabled; add a new
marker entry for "cli" under the markers list (e.g., add a line like "cli:
Command-line interface tests") so the marker is registered and the -m cli
selection and runner no longer error.

minversion = 6.0

# Test timeout (in seconds)
timeout = 300
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

timeout requires pytest-timeout plugin — verify availability.

If pytest-timeout isn’t installed, this option is ignored.

Run:


🏁 Script executed:

#!/bin/bash
python - <<'PY'
import importlib.util as u
print("pytest-timeout installed:", bool(u.find_spec("pytest_timeout")))
PY

Length of output: 42


Install pytest-timeout plugin – the timeout = 300 setting in pytest.ini (line 31) is ignored without it (pytest-timeout not detected).

🤖 Prompt for AI Agents
In pytest.ini around line 31, the timeout = 300 setting is ignored because the
pytest-timeout plugin is not installed; add pytest-timeout to the project's
test/dev dependencies (e.g. requirements-dev.txt, pyproject.toml/poetry, or
Pipfile), update the lockfile, and run the package install so pytest can detect
the plugin and honor the timeout setting; optionally verify by running pytest -q
to confirm the timeout plugin appears in the collected plugins list.

Comment on lines +282 to +301
def pytest_collection_modifyitems(config, items):
"""Modify test collection to add markers based on test names and locations."""
for item in items:
# Add markers based on test file names
if "test_config" in item.fspath.basename:
item.add_marker(pytest.mark.config)
elif "test_metrics" in item.fspath.basename:
item.add_marker(pytest.mark.metrics)
elif "test_cli" in item.fspath.basename:
item.add_marker(pytest.mark.cli)
elif "test_output" in item.fspath.basename:
item.add_marker(pytest.mark.output)

# Add markers based on test names
if "integration" in item.name:
item.add_marker(pytest.mark.integration)
elif "slow" in item.name:
item.add_marker(pytest.mark.slow)
else:
item.add_marker(pytest.mark.unit)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Fix pytest 8 compatibility: use item.path, not deprecated fspath.

item.fspath was removed in pytest 8; use item.path with a fallback. This prevents collection-time AttributeError.

Run this to find other occurrences of fspath:

Apply this diff:

 def pytest_collection_modifyitems(config, items):
     """Modify test collection to add markers based on test names and locations."""
     for item in items:
-        # Add markers based on test file names
-        if "test_config" in item.fspath.basename:
+        # Add markers based on test file names (pytest>=8 uses item.path; fallback to fspath)
+        filename = item.path.name if hasattr(item, "path") else os.path.basename(str(item.fspath))
+        if "test_config" in filename:
             item.add_marker(pytest.mark.config)
-        elif "test_metrics" in item.fspath.basename:
+        elif "test_metrics" in filename:
             item.add_marker(pytest.mark.metrics)
-        elif "test_cli" in item.fspath.basename:
+        elif "test_cli" in filename:
             item.add_marker(pytest.mark.cli)
-        elif "test_output" in item.fspath.basename:
+        elif "test_output" in filename:
             item.add_marker(pytest.mark.output)
         
         # Add markers based on test names
         if "integration" in item.name:
             item.add_marker(pytest.mark.integration)
         elif "slow" in item.name:
             item.add_marker(pytest.mark.slow)
         else:
             item.add_marker(pytest.mark.unit)

🏁 Script executed:

#!/bin/bash
rg -nC1 '\bfspath\b' tests

Length of output: 667


🏁 Script executed:

#!/bin/bash
rg -n "import os" tests/conftest.py
rg -n "import pytest" tests/conftest.py

Length of output: 107


Replace deprecated item.fspath with item.path (with fallback) for pytest 8 compatibility
item.fspath was removed in pytest 8 and will raise an AttributeError; use item.path.name when available, falling back to os.path.basename(str(item.fspath)).

 def pytest_collection_modifyitems(config, items):
     """Modify test collection to add markers based on test names and locations."""
     for item in items:
-        # Add markers based on test file names
-        if "test_config" in item.fspath.basename:
+        # Add markers based on test file names (pytest>=8 uses item.path; fallback to fspath)
+        filename = item.path.name if hasattr(item, "path") else os.path.basename(str(item.fspath))
+        if "test_config" in filename:
             item.add_marker(pytest.mark.config)
-        elif "test_metrics" in item.fspath.basename:
+        elif "test_metrics" in filename:
             item.add_marker(pytest.mark.metrics)
-        elif "test_cli" in item.fspath.basename:
+        elif "test_cli" in filename:
             item.add_marker(pytest.mark.cli)
-        elif "test_output" in item.fspath.basename:
+        elif "test_output" in filename:
             item.add_marker(pytest.mark.output)

No other fspath usages found in tests/.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In tests/conftest.py around lines 282 to 301, replace uses of the removed pytest
8 attribute item.fspath by reading the filename via item.path.name when
available, with a fallback to os.path.basename(str(item.fspath)) to preserve
compatibility; implement this by computing a local variable like filename =
getattr(item, "path", None).name if getattr(item, "path", None) else
os.path.basename(str(item.fspath)) (and add import os at top if missing), then
use that filename variable in the existing filename-based marker checks instead
of item.fspath.basename.

Comment on lines +154 to +162
if args.coverage:
try:
subprocess.run([python_exe, "-m", "pytest_cov", "--version"],
check=True, capture_output=True)
except subprocess.CalledProcessError:
print("Warning: pytest-cov is not installed. Coverage disabled.")
print(f"Install it with: {python_exe} -m pip install pytest-cov")
args.coverage = False

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

pytest-cov detection is incorrect and may disable coverage even when installed.

python -m pytest_cov often fails (module has no main). Check importability instead.

Apply:

-    if args.coverage:
-        try:
-            subprocess.run([python_exe, "-m", "pytest_cov", "--version"], 
-                          check=True, capture_output=True)
-        except subprocess.CalledProcessError:
-            print("Warning: pytest-cov is not installed. Coverage disabled.")
-            print(f"Install it with: {python_exe} -m pip install pytest-cov")
-            args.coverage = False
+    if args.coverage:
+        try:
+            import importlib.util as _ilu  # noqa: F401
+            has_cov = _ilu.find_spec("pytest_cov") is not None
+        except Exception:
+            has_cov = False
+        if not has_cov:
+            print("Warning: pytest-cov is not installed. Coverage disabled.")
+            print(f"Install it with: {python_exe} -m pip install pytest-cov")
+            args.coverage = False
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if args.coverage:
try:
subprocess.run([python_exe, "-m", "pytest_cov", "--version"],
check=True, capture_output=True)
except subprocess.CalledProcessError:
print("Warning: pytest-cov is not installed. Coverage disabled.")
print(f"Install it with: {python_exe} -m pip install pytest-cov")
args.coverage = False
if args.coverage:
try:
import importlib.util as _ilu # noqa: F401
has_cov = _ilu.find_spec("pytest_cov") is not None
except Exception:
has_cov = False
if not has_cov:
print("Warning: pytest-cov is not installed. Coverage disabled.")
print(f"Install it with: {python_exe} -m pip install pytest-cov")
args.coverage = False
🤖 Prompt for AI Agents
In tests/run_tests.py around lines 154 to 162, the current detection uses
subprocess.run(["python", "-m", "pytest_cov", "--version"]) which incorrectly
fails for packages that don't provide a __main__; replace this with an import
check: attempt to import pytest_cov via importlib.import_module("pytest_cov")
(or a simple import) inside a try/except ImportError and only disable coverage
and print the install instructions if ImportError is raised, leaving
args.coverage True when import succeeds; ensure other exceptions are not
silently treated as "not installed" (only catch ImportError).

Comment on lines +327 to +334
def side_effect(metric_name, conv_data, scope):
if conv_data.conversation_group_id == "high_quality_conv":
return (0.92, "High quality response with good faithfulness")
else:
return (0.45, "Low quality response, lacks detail")

mock_ragas.side_effect = side_effect

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Make mock side_effect signature robust.

Don’t assume the exact signature of RagasMetrics.evaluate. Accept *args, **kwargs and derive the conversation id from the request to prevent breakage if internals change.

-            def side_effect(metric_name, conv_data, scope):
-                if conv_data.conversation_group_id == "high_quality_conv":
-                    return (0.92, "High quality response with good faithfulness")
-                else:
-                    return (0.45, "Low quality response, lacks detail")
+            def side_effect(*args, **kwargs):
+                # request is expected to be the last positional argument
+                request = args[-1]
+                conv_id = getattr(request.conv_data, "conversation_group_id", None)
+                return (
+                    (0.92, "High quality response with good faithfulness")
+                    if conv_id == "high_quality_conv"
+                    else (0.45, "Low quality response, lacks detail")
+                )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def side_effect(metric_name, conv_data, scope):
if conv_data.conversation_group_id == "high_quality_conv":
return (0.92, "High quality response with good faithfulness")
else:
return (0.45, "Low quality response, lacks detail")
mock_ragas.side_effect = side_effect
def side_effect(*args, **kwargs):
# request is expected to be the last positional argument
request = args[-1]
conv_id = getattr(request.conv_data, "conversation_group_id", None)
return (
(0.92, "High quality response with good faithfulness")
if conv_id == "high_quality_conv"
else (0.45, "Low quality response, lacks detail")
)
mock_ragas.side_effect = side_effect
🤖 Prompt for AI Agents
In tests/test_evaluation.py around lines 327 to 334, the mock side_effect
currently assumes a fixed signature and direct access to
conv_data.conversation_group_id; change the side_effect to accept *args,
**kwargs, extract the request/conv_data robustly (e.g. look for a positional arg
that has conversation_group_id attribute or a kwargs key like 'conv_data' or
'request'), read the conversation id defensively (use getattr or dict.get with a
default), then return the same tuples based on that id; ensure the function
handles missing/None values gracefully so future signature changes won't break
the test.

Copy link
Collaborator

@asamal4 asamal4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for adding test cases.
But this is suitable for integration tests (really relevant). I would like us to add unit tests also.. Adding tests cases per file level as much as possible..

@bsatapat-jpg bsatapat-jpg deleted the dev branch September 2, 2025 14:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants