add support for fail_on_invalid_data option #94

VladimirKadlec · 2025-11-07T14:34:17Z

Summary by CodeRabbit

New Features
- New configuration flag core.fail_on_invalid_data (default: true).
- Per-turn expected-keywords validation and per-turn / per-conversation invalid-metric tracking.
Bug Fixes
- Metrics evaluation now yields explicit ERROR results and skips evaluation for invalid metrics, with logging.
Tests
- Added comprehensive unit tests for turn/conversation evaluation, invalid-metric handling, ordering, and request construction.
Documentation
- README updated to document the new config flag.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-11-07T14:34:36Z

Walkthrough

Adds config flag core.fail_on_invalid_data; augments data models with per-turn and per-conversation invalid-metric tracking and an expected_keywords validator; updates the DataValidator to respect the new flag and record invalid metrics; evaluation pipeline and processor skip metrics marked invalid; new unit tests for turn evaluation behavior.

Changes

Cohort / File(s)	Summary
Configuration & System Config `config/system.yaml`, `src/lightspeed_evaluation/core/models/system.py`, `README.md`	Added boolean `core.fail_on_invalid_data` (default: true) to config and `CoreConfig`; README updated.
Data Model Enhancements `src/lightspeed_evaluation/core/models/data.py`	Added `TurnData.expected_keywords: Optional[list[list[str]]]` with validator; added `_invalid_metrics: set[str]` plus `add_invalid_metric(metric: str)` and `is_metric_invalid(metric: str)` on `TurnData` and `EvaluationData`.
Validation Logic `src/lightspeed_evaluation/core/system/validator.py`	`DataValidator.__init__` gains `fail_on_invalid_data: bool`; validator records unknown/invalid metrics on turns/conversations and only returns failure when flag is true, otherwise logs and continues.
Pipeline & Runner Integration `src/lightspeed_evaluation/pipeline/evaluation/pipeline.py`, `src/lightspeed_evaluation/runner/evaluation.py`	Pass `config.core.fail_on_invalid_data` into `DataValidator` during initialization.
Evaluation Processor `src/lightspeed_evaluation/pipeline/evaluation/processor.py`	Early checks in `_evaluate_turn` and `_evaluate_conversation` produce `ERROR` results and skip evaluation for metrics flagged invalid by `is_metric_invalid()`.
Tests `tests/unit/pipeline/evaluation/test_processor.py`	New unit tests for ConversationProcessor._evaluate_turn covering metric ordering, valid/invalid/mixed metrics, EvaluationRequest construction, evaluator None behavior, turn index handling, and invalid-metric state behavior.

Sequence Diagram(s)

sequenceDiagram
    participant Config as Config
    participant Validator as DataValidator
    participant Data as Turn/Eval Data
    participant Processor as ConversationProcessor
    participant Evaluator as MetricsEvaluator

    Config->>Validator: __init__(api_enabled, fail_on_invalid_data)
    Validator->>Validator: validate(data)

    alt fail_on_invalid_data = true
        Validator-->>Config: return False on validation error
        Note over Validator: validation fails (hard)
    else fail_on_invalid_data = false
        Validator->>Data: add_invalid_metric(metric)
        Validator->>Validator: log notice, continue
    end

    Processor->>Data: is_metric_invalid(metric)?
    alt metric invalid
        Processor->>Processor: log error, emit ERROR result, skip metric
    else metric valid
        Processor->>Evaluator: evaluate(EvaluationRequest)
        Evaluator-->>Processor: result / None
    end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Focus review on:
- DataValidator branching for fail_on_invalid_data and logging vs. hard-fail behavior.
- Correct initialization and mutation semantics of _invalid_metrics on TurnData and EvaluationData.
- Processor changes to ensure skipped metrics produce expected ERROR results and preserve ordering for remaining metrics.
- New tests: verify mocks/fixtures correctly exercise invalid-metric states and assertions match intended behavior.

Possibly related PRs

Add keyword eval metric #93 — modifies TurnData.expected_keywords and validation logic (direct overlap).
Turn metric override #55 — changes per-turn metric handling and invalid-metric tracking (same surface).
Added Unit test cases as well as integration test cases #42 — touches CoreConfig/validator model changes that overlap with this PR.

Suggested reviewers

asamal4
tisnik

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: adding support for a new fail_on_invalid_data configuration option across the codebase.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

Provide your own instructions using the high_level_summary_instructions setting.
Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

📝 Description — Summarize the main change in 50–60 words, explaining what was done.

📓 References — List relevant issues, discussions, documentation, or related PRs.

📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.

📊 Contributor Summary — Include a Markdown table showing contributions:
| Contributor | Lines Added | Lines Removed | Files Changed |

✔️ Additional Notes — Add any extra reviewer context.
Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

src/lightspeed_evaluation/pipeline/evaluation/processor.py

src/lightspeed_evaluation/core/system/validator.py

lpiwowar · 2025-11-11T08:44:00Z

Thanks @VladimirKadlec for working on this 👍

I agree with comments from Asutosh.

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

src/lightspeed_evaluation/core/models/system.py (1)
265-268: Improve field description for clarity.

The description has a minor grammatical issue (missing comma) and could be more explicit about the behavior in both states.

Consider this improved description:
     fail_on_invalid_data: bool = Field(
         default=True,
-        description="If False don't fail on invalid conversations",
+        description="When True, raise errors on validation failures; when False, log warnings and continue processing",
     )

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7eb05b7 and 059c7c9.

📒 Files selected for processing (9)

README.md (1 hunks)
config/system.yaml (2 hunks)
src/lightspeed_evaluation/core/models/data.py (2 hunks)
src/lightspeed_evaluation/core/models/system.py (1 hunks)
src/lightspeed_evaluation/core/system/validator.py (6 hunks)
src/lightspeed_evaluation/pipeline/evaluation/pipeline.py (1 hunks)
src/lightspeed_evaluation/pipeline/evaluation/processor.py (2 hunks)
src/lightspeed_evaluation/runner/evaluation.py (1 hunks)
tests/unit/pipeline/evaluation/test_processor.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (5)

README.md
config/system.yaml
src/lightspeed_evaluation/runner/evaluation.py
src/lightspeed_evaluation/core/system/validator.py
src/lightspeed_evaluation/core/models/data.py

🧰 Additional context used

🧠 Learnings (5)

📓 Common learnings

Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 55
File: src/lightspeed_evaluation/core/system/validator.py:146-155
Timestamp: 2025-09-18T23:59:37.026Z
Learning: In the lightspeed-evaluation project, the DataValidator in `src/lightspeed_evaluation/core/system/validator.py` is intentionally designed to validate only explicitly provided user evaluation data, not resolved metrics that include system defaults. When turn_metrics is None, the system falls back to system config defaults, and this validation separation is by design.

Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 55
File: src/lightspeed_evaluation/pipeline/evaluation/errors.py:33-36
Timestamp: 2025-09-19T00:37:23.798Z
Learning: In the lightspeed-evaluation codebase, metric resolution (including applying defaults when turn_metrics is None) happens upstream in ConversationProcessor.process_conversation() using MetricManager.resolve_metrics(), not in the EvaluationErrorHandler. The error handler only marks explicitly defined metrics as ERROR.

📚 Learning: 2025-09-18T23:59:37.026Z

Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 55
File: src/lightspeed_evaluation/core/system/validator.py:146-155
Timestamp: 2025-09-18T23:59:37.026Z
Learning: In the lightspeed-evaluation project, the DataValidator in `src/lightspeed_evaluation/core/system/validator.py` is intentionally designed to validate only explicitly provided user evaluation data, not resolved metrics that include system defaults. When turn_metrics is None, the system falls back to system config defaults, and this validation separation is by design.

Applied to files:

src/lightspeed_evaluation/pipeline/evaluation/pipeline.py
src/lightspeed_evaluation/core/models/system.py
src/lightspeed_evaluation/pipeline/evaluation/processor.py
tests/unit/pipeline/evaluation/test_processor.py

📚 Learning: 2025-09-10T15:48:14.671Z

Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 47
File: src/lightspeed_evaluation/core/output/generator.py:43-49
Timestamp: 2025-09-10T15:48:14.671Z
Learning: In the lightspeed-evaluation framework, system configuration uses Pydantic data models (SystemConfig, OutputConfig, LoggingConfig, etc.) rather than plain dictionaries. Components like OutputHandler receive properly structured Pydantic models, so direct attribute access (e.g., system_config.output.enabled_outputs) is the correct approach.

Applied to files:

src/lightspeed_evaluation/pipeline/evaluation/pipeline.py

📚 Learning: 2025-09-19T00:37:23.798Z

Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 55
File: src/lightspeed_evaluation/pipeline/evaluation/errors.py:33-36
Timestamp: 2025-09-19T00:37:23.798Z
Learning: In the lightspeed-evaluation codebase, metric resolution (including applying defaults when turn_metrics is None) happens upstream in ConversationProcessor.process_conversation() using MetricManager.resolve_metrics(), not in the EvaluationErrorHandler. The error handler only marks explicitly defined metrics as ERROR.

Applied to files:

src/lightspeed_evaluation/pipeline/evaluation/processor.py

📚 Learning: 2025-09-19T12:32:06.403Z

Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 55
File: src/lightspeed_evaluation/pipeline/evaluation/errors.py:18-31
Timestamp: 2025-09-19T12:32:06.403Z
Learning: When analyzing method calls, always examine the complete call site including all parameters before suggesting fixes. In the lightspeed-evaluation codebase, mark_all_metrics_as_error in processor.py correctly passes both resolved_turn_metrics and resolved_conversation_metrics parameters.

Applied to files:

src/lightspeed_evaluation/pipeline/evaluation/processor.py
tests/unit/pipeline/evaluation/test_processor.py

🧬 Code graph analysis (2)

src/lightspeed_evaluation/pipeline/evaluation/pipeline.py (1)

src/lightspeed_evaluation/core/system/validator.py (1)

DataValidator (79-331)

src/lightspeed_evaluation/pipeline/evaluation/processor.py (1)

src/lightspeed_evaluation/core/models/data.py (2)

is_metric_invalid (95-97)

is_metric_invalid (355-357)

🔇 Additional comments (2)

src/lightspeed_evaluation/pipeline/evaluation/pipeline.py (1)

65-68: LGTM!

The DataValidator initialization correctly passes the fail_on_invalid_data configuration parameter, aligning with the PR objective to add configurable validation behavior.

tests/unit/pipeline/evaluation/test_processor.py (1)

1-394: Excellent test coverage!

The test suite comprehensively covers the new invalid metric handling functionality with well-structured test cases including:

Valid metrics scenarios

Invalid metric detection and skipping

Mixed valid/invalid metrics

Edge cases (empty metrics, evaluator returning None)

Verification of correct request construction and metric ordering

The tests properly use fixtures, mocks, and log capture to validate behavior.

src/lightspeed_evaluation/pipeline/evaluation/processor.py

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 582ba25 and 81eb549.

📒 Files selected for processing (2)

src/lightspeed_evaluation/pipeline/evaluation/processor.py (2 hunks)
tests/unit/pipeline/evaluation/test_processor.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

src/lightspeed_evaluation/pipeline/evaluation/processor.py

🧰 Additional context used

📓 Path-based instructions (1)

tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.py: Use pytest with pytest-mock (mocker fixture), not unittest.mock, for all mocking
Test files should use naming convention test_*.py for files, test_* for functions, and Test* for classes

Files:

tests/unit/pipeline/evaluation/test_processor.py

🧠 Learnings (3)

📓 Common learnings

Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 55
File: src/lightspeed_evaluation/core/system/validator.py:146-155
Timestamp: 2025-09-18T23:59:37.026Z
Learning: In the lightspeed-evaluation project, the DataValidator in `src/lightspeed_evaluation/core/system/validator.py` is intentionally designed to validate only explicitly provided user evaluation data, not resolved metrics that include system defaults. When turn_metrics is None, the system falls back to system config defaults, and this validation separation is by design.

📚 Learning: 2025-09-19T12:32:06.403Z

Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 55
File: src/lightspeed_evaluation/pipeline/evaluation/errors.py:18-31
Timestamp: 2025-09-19T12:32:06.403Z
Learning: When analyzing method calls, always examine the complete call site including all parameters before suggesting fixes. In the lightspeed-evaluation codebase, mark_all_metrics_as_error in processor.py correctly passes both resolved_turn_metrics and resolved_conversation_metrics parameters.

Applied to files:

tests/unit/pipeline/evaluation/test_processor.py

📚 Learning: 2025-09-18T23:59:37.026Z

Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 55
File: src/lightspeed_evaluation/core/system/validator.py:146-155
Timestamp: 2025-09-18T23:59:37.026Z
Learning: In the lightspeed-evaluation project, the DataValidator in `src/lightspeed_evaluation/core/system/validator.py` is intentionally designed to validate only explicitly provided user evaluation data, not resolved metrics that include system defaults. When turn_metrics is None, the system falls back to system config defaults, and this validation separation is by design.

Applied to files:

tests/unit/pipeline/evaluation/test_processor.py

🧬 Code graph analysis (1)

tests/unit/pipeline/evaluation/test_processor.py (8)

src/lightspeed_evaluation/core/models/data.py (8)

EvaluationData (309-367)

EvaluationRequest (426-479)

EvaluationResult (370-409)

TurnData (35-306)

add_invalid_metric (91-93)

add_invalid_metric (351-353)

is_metric_invalid (95-97)

is_metric_invalid (355-357)

src/lightspeed_evaluation/core/models/system.py (1)

SystemConfig (271-301)

src/lightspeed_evaluation/core/system/loader.py (1)

ConfigLoader (70-125)

src/lightspeed_evaluation/pipeline/evaluation/processor.py (3)

ConversationProcessor (37-299)

ProcessorComponents (27-34)

_evaluate_turn (181-214)

src/lightspeed_evaluation/pipeline/evaluation/evaluator.py (1)

MetricsEvaluator (25-172)

src/lightspeed_evaluation/pipeline/evaluation/amender.py (1)

APIDataAmender (13-80)

src/lightspeed_evaluation/pipeline/evaluation/errors.py (1)

EvaluationErrorHandler (10-201)

src/lightspeed_evaluation/core/metrics/manager.py (1)

MetricManager (17-166)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: tests (3.11)
GitHub Check: tests (3.13)
GitHub Check: tests (3.12)
GitHub Check: mypy

🔇 Additional comments (11)

tests/unit/pipeline/evaluation/test_processor.py (11)

1-19: LGTM!

Imports are well-organized and appropriate for the test module. Correctly uses pytest as required by the coding guidelines.

21-121: LGTM!

Fixtures are well-structured and follow pytest best practices. Correctly uses mocker fixture from pytest-mock as required by coding guidelines. Good use of fixture composition for assembling the processor with all its dependencies.

126-150: LGTM!

Comprehensive test of the happy path with valid metrics. Correctly verifies result count, types, evaluator invocation count, and metric identifier ordering.

152-184: LGTM!

Well-structured test for invalid metric handling. Correctly verifies that invalid metrics produce ERROR results without invoking the evaluator, while valid metrics are still evaluated. Good use of caplog to verify error logging.

218-254: LGTM!

Excellent test for mixed valid/invalid metrics. Correctly verifies that result order is preserved, invalid metrics produce ERROR results at the correct position, and only valid metrics trigger evaluator calls.

256-273: LGTM!

Good edge case test for empty metrics list. Correctly verifies no results and no evaluator invocation.

275-299: LGTM!

Thorough verification of EvaluationRequest construction. Correctly checks that all key fields (conv_data, metric_identifier, turn_id, turn_idx) are properly populated from the input data.

301-324: LGTM!

Good test for handling evaluator returning None. Correctly reconfigures the mock and verifies graceful handling with empty results while confirming the evaluator is still invoked.

326-347: LGTM!

Good test for turn index handling in multi-turn conversations. Correctly verifies that turn_idx and turn_id are properly set when evaluating a non-first turn.

349-374: LGTM!

Excellent test for metric evaluation order preservation. Correctly verifies that metrics are evaluated in the order provided, which is important for consistent behavior.

376-401: LGTM!

Comprehensive test for TurnData's invalid metric tracking. While this tests the data model directly rather than the processor, it's appropriately placed here since this functionality is central to the processor's behavior. Good coverage of initial state, additions, and idempotent behavior.

coderabbitai · 2025-11-25T11:42:48Z

tests/unit/pipeline/evaluation/test_processor.py

+    def test_evaluate_turn_with_all_invalid_metrics(
+        self, processor, mock_metrics_evaluator, caplog
+    ):
+        """Test _evaluate_turn with all metrics invalid - should return empty results."""
+        turn_data = TurnData(
+            turn_id="1",
+            query="What is Python?",
+            response="Python is a programming language.",
+            contexts=["Context"],
+        )
+        conv_data = EvaluationData(conversation_group_id="test_conv", turns=[turn_data])
+
+        # Mark all metrics as invalid
+        turn_data.add_invalid_metric("ragas:faithfulness")
+        turn_data.add_invalid_metric("custom:answer_correctness")
+
+        turn_metrics = ["ragas:faithfulness", "custom:answer_correctness"]
+
+        with caplog.at_level(logging.ERROR):
+            results = processor._evaluate_turn(conv_data, 0, turn_data, turn_metrics)
+
+        assert len(results) == 2
+        assert results[0].result == "ERROR"
+        assert results[1].result == "ERROR"
+
+        # Verify evaluate_metric was never called
+        assert mock_metrics_evaluator.evaluate_metric.call_count == 0
+
+        # Verify errors were logged for both invalid metrics
+        assert "Invalid turn metric 'ragas:faithfulness'" in caplog.text
+        assert "Invalid turn metric 'custom:answer_correctness'" in caplog.text


⚠️ Potential issue | 🟡 Minor

Fix misleading docstring.

The docstring states "should return empty results" but the test correctly expects 2 ERROR results (one for each invalid metric). The test logic is correct—invalid metrics produce ERROR results rather than being omitted entirely.

Apply this diff to correct the docstring:

- """Test _evaluate_turn with all metrics invalid - should return empty results.""" + """Test _evaluate_turn with all metrics invalid - should return ERROR results."""

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def test_evaluate_turn_with_all_invalid_metrics(

self, processor, mock_metrics_evaluator, caplog

):

"""Test _evaluate_turn with all metrics invalid - should return empty results."""

turn_data = TurnData(

turn_id="1",

query="What is Python?",

response="Python is a programming language.",

contexts=["Context"],

)

conv_data = EvaluationData(conversation_group_id="test_conv", turns=[turn_data])

# Mark all metrics as invalid

turn_data.add_invalid_metric("ragas:faithfulness")

turn_data.add_invalid_metric("custom:answer_correctness")

turn_metrics = ["ragas:faithfulness", "custom:answer_correctness"]

with caplog.at_level(logging.ERROR):

results = processor._evaluate_turn(conv_data, 0, turn_data, turn_metrics)

assert len(results) == 2

assert results[0].result == "ERROR"

assert results[1].result == "ERROR"

# Verify evaluate_metric was never called

assert mock_metrics_evaluator.evaluate_metric.call_count == 0

# Verify errors were logged for both invalid metrics

assert "Invalid turn metric 'ragas:faithfulness'" in caplog.text

assert "Invalid turn metric 'custom:answer_correctness'" in caplog.text

def test_evaluate_turn_with_all_invalid_metrics(

self, processor, mock_metrics_evaluator, caplog

):

"""Test _evaluate_turn with all metrics invalid - should return ERROR results."""

turn_data = TurnData(

turn_id="1",

query="What is Python?",

response="Python is a programming language.",

contexts=["Context"],

)

conv_data = EvaluationData(conversation_group_id="test_conv", turns=[turn_data])

# Mark all metrics as invalid

turn_data.add_invalid_metric("ragas:faithfulness")

turn_data.add_invalid_metric("custom:answer_correctness")

turn_metrics = ["ragas:faithfulness", "custom:answer_correctness"]

with caplog.at_level(logging.ERROR):

results = processor._evaluate_turn(conv_data, 0, turn_data, turn_metrics)

assert len(results) == 2

assert results[0].result == "ERROR"

assert results[1].result == "ERROR"

# Verify evaluate_metric was never called

assert mock_metrics_evaluator.evaluate_metric.call_count == 0

# Verify errors were logged for both invalid metrics

assert "Invalid turn metric 'ragas:faithfulness'" in caplog.text

assert "Invalid turn metric 'custom:answer_correctness'" in caplog.text

🤖 Prompt for AI Agents

tests/unit/pipeline/evaluation/test_processor.py lines 186-216: the test docstring is misleading — it says "should return empty results" but the test asserts two ERROR results for the two invalid metrics; update the docstring to accurately state that invalid metrics produce ERROR results (e.g., "Test _evaluate_turn with all invalid metrics - should return ERROR results for each invalid metric.") and ensure wording matches the assertions.

asamal4

Now changes seem to be aligned.. Thank you..
LGTM

There is a minor doc string issue though (already highlighted by coderabbit).

VladimirKadlec marked this pull request as ready for review November 7, 2025 14:48

VladimirKadlec force-pushed the missing-context-fail branch from adef500 to 7eb05b7 Compare November 10, 2025 08:14

asamal4 reviewed Nov 10, 2025

View reviewed changes

src/lightspeed_evaluation/pipeline/evaluation/processor.py Outdated Show resolved Hide resolved

asamal4 reviewed Nov 10, 2025

View reviewed changes

src/lightspeed_evaluation/core/system/validator.py Show resolved Hide resolved

VladimirKadlec marked this pull request as draft November 24, 2025 10:25

VladimirKadlec added 5 commits November 24, 2025 11:27

add support for fail_on_invalid_data option

5633f92

docs: add docs, fix default value

523737c

add support for per metric fails

8bb0b64

add tests (claude assisted)

49aecab

fix linting

059c7c9

VladimirKadlec force-pushed the missing-context-fail branch from 3a498c2 to 059c7c9 Compare November 24, 2025 10:36

VladimirKadlec marked this pull request as ready for review November 24, 2025 10:39

VladimirKadlec requested a review from asamal4 November 24, 2025 10:39

coderabbitai bot reviewed Nov 24, 2025

View reviewed changes

src/lightspeed_evaluation/pipeline/evaluation/processor.py Show resolved Hide resolved

src/lightspeed_evaluation/pipeline/evaluation/processor.py Show resolved Hide resolved

fix logging

582ba25

asamal4 reviewed Nov 24, 2025

View reviewed changes

src/lightspeed_evaluation/pipeline/evaluation/processor.py Outdated Show resolved Hide resolved

add failed metrics to results

81eb549

VladimirKadlec requested a review from asamal4 November 25, 2025 11:40

coderabbitai bot reviewed Nov 25, 2025

View reviewed changes

asamal4 approved these changes Nov 25, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add support for fail_on_invalid_data option #94

add support for fail_on_invalid_data option #94

Uh oh!

VladimirKadlec commented Nov 7, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Nov 7, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

lpiwowar commented Nov 11, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Nov 25, 2025

Uh oh!

asamal4 left a comment •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

add support for fail_on_invalid_data option #94

Are you sure you want to change the base?

add support for fail_on_invalid_data option #94

Uh oh!

Conversation

VladimirKadlec commented Nov 7, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

Uh oh!

Uh oh!

lpiwowar commented Nov 11, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

asamal4 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

VladimirKadlec commented Nov 7, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 7, 2025 •

edited

Loading

asamal4 left a comment •

edited

Loading