Skip to content

Conversation

@asamal4
Copy link
Collaborator

@asamal4 asamal4 commented Oct 7, 2025

Create a common module for custom llm to have an abstract layer for litellm.
This can be anything in future (so no need to mention litellm everywhere)

Summary by CodeRabbit

  • New Features

    • Public base LLM client exposed, providing a unified call interface for custom LLM providers.
  • Refactor

    • Replaced provider-specific parameters with generic llm_params across managers and metrics.
    • Renamed public API get_litellm_params → get_llm_params.
  • Improvements

    • Unified call flow for custom LLMs, consistent temperature/timeout/max_tokens handling, and standardized error handling.
  • Tests

    • Unit tests updated to reflect the renamed API.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 7, 2025

Walkthrough

Adds a new BaseCustomLLM class and exposes it publicly; generalizes parameter names from litellm_paramsllm_params across managers and metrics; refactors Ragas and metric flows to use BaseCustomLLM.call; renames LLMManager.get_litellm_paramsget_llm_params; updates tests accordingly.

Changes

Cohort / File(s) Summary
Public export update
src/lightspeed_evaluation/core/llm/__init__.py
Imports and exposes BaseCustomLLM via __all__.
New base LLM client
src/lightspeed_evaluation/core/llm/custom.py
Adds BaseCustomLLM with __init__ and call(...) that builds completion params, invokes litellm.completion, aggregates choice contents, and raises LLMError on failure or empty single-result.
LLM manager API rename & tests
src/lightspeed_evaluation/core/llm/manager.py, tests/unit/core/llm/test_manager.py
Renames get_litellm_paramsget_llm_params; docstrings updated; returned dict now includes model, temperature, max_tokens, timeout, num_retries. Test updated to assert new method and keys.
DeepEval manager param rename
src/lightspeed_evaluation/core/llm/deepeval.py, src/lightspeed_evaluation/core/metrics/deepeval.py
Constructor and usages switched from litellm_paramsllm_params; model info now reads from llm_params.
Ragas LLM refactor & manager
src/lightspeed_evaluation/core/llm/ragas.py, src/lightspeed_evaluation/core/metrics/ragas.py
Adds RagasCustomLLM(BaseRagasLLM, BaseCustomLLM) and uses self.call(...) for generation; sources temperature from llm_params; errors raise LLMError; adds is_finished() (returns True); manager ctor accepts llm_params.
Custom metrics refactor
src/lightspeed_evaluation/core/metrics/custom.py
Replaces direct litellm usage with BaseCustomLLM.call; _call_llm signature simplified to _call_llm(prompt); _evaluate_answer_correctness catches LLMError and returns a normalized error string.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Evaluator
  participant Metrics as Metrics/Manager
  participant Mgr as LLMManager
  participant Custom as BaseCustomLLM
  participant Lite as litellm

  Evaluator->>Metrics: evaluate(...)
  Metrics->>Mgr: get_model_name(), get_llm_params()
  Mgr-->>Metrics: model_name, llm_params
  Metrics->>Custom: instantiate(model_name, llm_params)
  Metrics->>Custom: call(prompt, n=1, temperature=?, return_single=True)
  Custom->>Lite: completion(model, messages, temperature, n, max_tokens, timeout, num_retries, ...)
  Lite-->>Custom: response(choices[*].message.content)
  Custom-->>Metrics: "text" or ["texts"]
  note right of Custom: on error -> raise LLMError
Loading
sequenceDiagram
  autonumber
  actor RagasFlow
  participant RagasMgr
  participant RagasCustom
  participant BaseCustom
  participant Lite

  RagasFlow->>RagasMgr: generate_text(prompt, n, temperature)
  RagasMgr->>RagasCustom: generate_text(...)
  RagasCustom->>BaseCustom: call(prompt, n, temperature, return_single=False)
  BaseCustom->>Lite: completion(...)
  Lite-->>BaseCustom: response
  BaseCustom-->>RagasCustom: list[string] texts
  RagasCustom-->>RagasMgr: LLMResult (Generation objects)
  note right of RagasCustom: is_finished(response) -> True
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • VladimirKadlec
  • tisnik

Poem

hop hop, I debug and sing,
a new base call on soft spring.
llm_params lead the trail,
choices trimmed, responses sail.
metrics clap — the rabbit's tale 🐇✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title succinctly refers to the core change of adding a shared custom LLM component, which matches the PR’s objective of introducing a generic abstraction layer for LLM integrations.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1aab90d and 0871015.

📒 Files selected for processing (7)
  • src/lightspeed_evaluation/core/llm/deepeval.py (2 hunks)
  • src/lightspeed_evaluation/core/llm/manager.py (3 hunks)
  • src/lightspeed_evaluation/core/llm/ragas.py (4 hunks)
  • src/lightspeed_evaluation/core/metrics/custom.py (4 hunks)
  • src/lightspeed_evaluation/core/metrics/deepeval.py (1 hunks)
  • src/lightspeed_evaluation/core/metrics/ragas.py (1 hunks)
  • tests/unit/core/llm/test_manager.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/lightspeed_evaluation/core/metrics/ragas.py
  • src/lightspeed_evaluation/core/metrics/deepeval.py
🧰 Additional context used
🧬 Code graph analysis (4)
src/lightspeed_evaluation/core/metrics/custom.py (3)
src/lightspeed_evaluation/core/llm/custom.py (2)
  • BaseCustomLLM (10-75)
  • call (18-75)
src/lightspeed_evaluation/core/llm/manager.py (1)
  • get_llm_params (95-103)
src/lightspeed_evaluation/core/system/exceptions.py (1)
  • LLMError (24-25)
src/lightspeed_evaluation/core/llm/ragas.py (2)
src/lightspeed_evaluation/core/llm/custom.py (2)
  • BaseCustomLLM (10-75)
  • call (18-75)
src/lightspeed_evaluation/core/system/exceptions.py (1)
  • LLMError (24-25)
src/lightspeed_evaluation/core/llm/deepeval.py (1)
src/lightspeed_evaluation/core/llm/ragas.py (2)
  • get_llm (99-101)
  • get_model_info (103-108)
tests/unit/core/llm/test_manager.py (1)
src/lightspeed_evaluation/core/llm/manager.py (1)
  • get_llm_params (95-103)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: mypy
🔇 Additional comments (5)
tests/unit/core/llm/test_manager.py (1)

69-91: Test exercises new parameter surface

Great coverage to lock in the generalized get_llm_params() payload—matches the manager contract exactly.

src/lightspeed_evaluation/core/llm/deepeval.py (1)

14-42: DeepEval manager wiring stays aligned

Passing the normalized llm_params straight into LiteLLMModel keeps temperature, token, timeout, and retry behavior in sync with the core manager—nice deduction of the LiteLLM-specific naming.

src/lightspeed_evaluation/core/metrics/custom.py (1)

39-72: BaseCustomLLM integration looks solid

Love the move to instantiate BaseCustomLLM once and reuse its call()—the normalized return plus graceful fallback when a list slips through keeps callers happy. The extra LLMError guard downstream should make diagnostics cleaner.

src/lightspeed_evaluation/core/llm/manager.py (1)

16-103: Renamed accessor returns richer config

The generalized get_llm_params() surface—including retries and timeout—fits the new abstraction nicely, and the docstring updates keep the intent clear.

src/lightspeed_evaluation/core/llm/ragas.py (1)

40-47: Stop handling preserved in Ragas bridge

Forwarding stop through call_kwargs while leaning on BaseCustomLLM keeps the Ragas integration feature-complete and avoids the regression we worried about earlier.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
src/lightspeed_evaluation/core/llm/custom.py (1)

56-75: Consider more specific exception handling.

The broad except Exception at Line 74 catches all exceptions, including KeyboardInterrupt and SystemExit. While this is wrapped in LLMError, it could mask critical system-level issues during debugging. Consider catching more specific exceptions that litellm might raise (e.g., litellm.exceptions.APIError, litellm.exceptions.Timeout).

If litellm provides specific exception types, consider refactoring:

         try:
             response = litellm.completion(**call_params)
             # ... response processing ...
-        except Exception as e:
-            raise LLMError(f"LLM call failed: {str(e)}") from e
+        except (litellm.APIError, litellm.Timeout, ValueError) as e:
+            raise LLMError(f"LLM call failed: {str(e)}") from e
+        except Exception as e:
+            # Log unexpected exceptions for debugging
+            raise LLMError(f"Unexpected LLM error: {str(e)}") from e
src/lightspeed_evaluation/core/metrics/custom.py (1)

67-72: Simplify unnecessary list handling.

At Line 70-71, there's a check for isinstance(result, list) even though return_single=True is passed to self.llm.call(). According to the BaseCustomLLM.call() implementation, when return_single=True and n=1, it always returns a single string, never a list.

Simplify the method:

     def _call_llm(self, prompt: str) -> str:
         """Make an LLM call with the configured parameters."""
-        result = self.llm.call(prompt, return_single=True)
-        if isinstance(result, list):
-            return result[0] if result else ""
-        return result
+        return self.llm.call(prompt, return_single=True)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 94c8236 and a028301.

📒 Files selected for processing (9)
  • src/lightspeed_evaluation/core/llm/__init__.py (2 hunks)
  • src/lightspeed_evaluation/core/llm/custom.py (1 hunks)
  • src/lightspeed_evaluation/core/llm/deepeval.py (2 hunks)
  • src/lightspeed_evaluation/core/llm/manager.py (3 hunks)
  • src/lightspeed_evaluation/core/llm/ragas.py (4 hunks)
  • src/lightspeed_evaluation/core/metrics/custom.py (4 hunks)
  • src/lightspeed_evaluation/core/metrics/deepeval.py (1 hunks)
  • src/lightspeed_evaluation/core/metrics/ragas.py (1 hunks)
  • tests/unit/core/llm/test_manager.py (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (8)
src/lightspeed_evaluation/core/metrics/ragas.py (1)
src/lightspeed_evaluation/core/llm/manager.py (2)
  • get_model_name (91-93)
  • get_llm_params (95-103)
src/lightspeed_evaluation/core/llm/__init__.py (1)
src/lightspeed_evaluation/core/llm/custom.py (1)
  • BaseCustomLLM (10-75)
src/lightspeed_evaluation/core/llm/deepeval.py (1)
src/lightspeed_evaluation/core/llm/ragas.py (2)
  • get_llm (95-97)
  • get_model_info (99-104)
tests/unit/core/llm/test_manager.py (1)
src/lightspeed_evaluation/core/llm/manager.py (1)
  • get_llm_params (95-103)
src/lightspeed_evaluation/core/llm/custom.py (1)
src/lightspeed_evaluation/core/system/exceptions.py (1)
  • LLMError (24-25)
src/lightspeed_evaluation/core/metrics/deepeval.py (1)
src/lightspeed_evaluation/core/llm/manager.py (2)
  • get_model_name (91-93)
  • get_llm_params (95-103)
src/lightspeed_evaluation/core/llm/ragas.py (2)
src/lightspeed_evaluation/core/llm/custom.py (2)
  • BaseCustomLLM (10-75)
  • call (18-75)
src/lightspeed_evaluation/core/system/exceptions.py (1)
  • LLMError (24-25)
src/lightspeed_evaluation/core/metrics/custom.py (3)
src/lightspeed_evaluation/core/llm/custom.py (2)
  • BaseCustomLLM (10-75)
  • call (18-75)
src/lightspeed_evaluation/core/llm/manager.py (2)
  • get_model_name (91-93)
  • get_llm_params (95-103)
src/lightspeed_evaluation/core/system/exceptions.py (1)
  • LLMError (24-25)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: mypy
🔇 Additional comments (8)
src/lightspeed_evaluation/core/metrics/deepeval.py (1)

29-29: LGTM!

The parameter retrieval correctly uses the renamed get_llm_params() API, aligning with the generalized LLM parameter handling introduced in this PR.

src/lightspeed_evaluation/core/metrics/ragas.py (1)

39-39: LGTM!

The parameter retrieval correctly uses the renamed get_llm_params() API, consistent with the broader refactor.

src/lightspeed_evaluation/core/llm/manager.py (1)

95-103: LGTM!

The method rename from get_litellm_params to get_llm_params correctly generalizes the API, making it provider-agnostic while maintaining the same return structure and parameter semantics.

tests/unit/core/llm/test_manager.py (1)

69-91: LGTM!

The test correctly validates the renamed get_llm_params() method and ensures all expected parameter keys and values are returned.

src/lightspeed_evaluation/core/llm/deepeval.py (1)

14-26: LGTM!

The parameter handling correctly transitions from litellm_params to llm_params, maintaining consistency with the broader refactor while preserving the same initialization logic.

src/lightspeed_evaluation/core/metrics/custom.py (2)

39-41: LGTM!

The initialization correctly uses BaseCustomLLM with model name and LLM parameters from the manager, properly abstracting the LLM layer.


218-230: LGTM!

The error handling correctly catches LLMError and returns an informative message, improving robustness when LLM calls fail.

src/lightspeed_evaluation/core/llm/custom.py (1)

44-53: Avoid passing None for optional parameters
Ensure max_tokens and timeout aren’t set to None in call_params—either omit keys when their values are None or supply explicit defaults—so you don’t inadvertently pass None into litellm.completion.

@asamal4
Copy link
Collaborator Author

asamal4 commented Oct 8, 2025

@VladimirKadlec @tisnik PTAL
First PR to simplify custom metric (mostly refactoring to eliminate duplicate code)

Copy link
Contributor

@VladimirKadlec VladimirKadlec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We currently have the following classes holding the same params -- the params for llm -- and propagating them to the lower level:
(Config.LLMConfig)->RagasLLMManager->RagasCustomLLM->finally calls litellm.compltetition
This PR adds another guy to the chain (we have to go deeper :-) ) :
(Config.LLMConfig)->RagasLLMManager->RagasCustomLLM->BaseCustomLLM->finally calls litellm.completition

Not sure why we need it, just consider if the path from the config to the LLM instance is not too long.

Other than that, LGTM.

@asamal4
Copy link
Collaborator Author

asamal4 commented Oct 8, 2025

@VladimirKadlec
We are adding BaseCustomLLM to create an separate module for litellm for below reasons
-> remove duplicate code (because same completion call happens in both RAGAS and Custom Metric)
-> single place for litellm completion (easy switch to something else in future if required)

Regarding RagasLLMManager->RagasCustomLLM: Technically we don't need both; only reason we have RagasCustomLLM is just to identify that we are not using anything given by RAGAS.. for deepeval we don't have this, because it provides litellm method.

Copy link
Contributor

@tisnik tisnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tisnik tisnik merged commit 2ededad into lightspeed-core:main Oct 8, 2025
15 checks passed
@coderabbitai coderabbitai bot mentioned this pull request Nov 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants