add common custom llm #70

asamal4 · 2025-10-07T13:48:45Z

Create a common module for custom llm to have an abstract layer for litellm.
This can be anything in future (so no need to mention litellm everywhere)

Summary by CodeRabbit

New Features
- Public base LLM client exposed, providing a unified call interface for custom LLM providers.
Refactor
- Replaced provider-specific parameters with generic llm_params across managers and metrics.
- Renamed public API get_litellm_params → get_llm_params.
Improvements
- Unified call flow for custom LLMs, consistent temperature/timeout/max_tokens handling, and standardized error handling.
Tests
- Unit tests updated to reflect the renamed API.

coderabbitai · 2025-10-07T13:48:54Z

Walkthrough

Adds a new BaseCustomLLM class and exposes it publicly; generalizes parameter names from litellm_params → llm_params across managers and metrics; refactors Ragas and metric flows to use BaseCustomLLM.call; renames LLMManager.get_litellm_params → get_llm_params; updates tests accordingly.

Changes

Cohort / File(s)	Summary
Public export update `src/lightspeed_evaluation/core/llm/__init__.py`	Imports and exposes `BaseCustomLLM` via `__all__`.
New base LLM client `src/lightspeed_evaluation/core/llm/custom.py`	Adds `BaseCustomLLM` with `__init__` and `call(...)` that builds completion params, invokes `litellm.completion`, aggregates choice contents, and raises `LLMError` on failure or empty single-result.
LLM manager API rename & tests `src/lightspeed_evaluation/core/llm/manager.py`, `tests/unit/core/llm/test_manager.py`	Renames `get_litellm_params` → `get_llm_params`; docstrings updated; returned dict now includes `model`, `temperature`, `max_tokens`, `timeout`, `num_retries`. Test updated to assert new method and keys.
DeepEval manager param rename `src/lightspeed_evaluation/core/llm/deepeval.py`, `src/lightspeed_evaluation/core/metrics/deepeval.py`	Constructor and usages switched from `litellm_params` → `llm_params`; model info now reads from `llm_params`.
Ragas LLM refactor & manager `src/lightspeed_evaluation/core/llm/ragas.py`, `src/lightspeed_evaluation/core/metrics/ragas.py`	Adds `RagasCustomLLM(BaseRagasLLM, BaseCustomLLM)` and uses `self.call(...)` for generation; sources temperature from `llm_params`; errors raise `LLMError`; adds `is_finished()` (returns True); manager ctor accepts `llm_params`.
Custom metrics refactor `src/lightspeed_evaluation/core/metrics/custom.py`	Replaces direct `litellm` usage with `BaseCustomLLM.call`; `_call_llm` signature simplified to `_call_llm(prompt)`; `_evaluate_answer_correctness` catches `LLMError` and returns a normalized error string.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Evaluator
  participant Metrics as Metrics/Manager
  participant Mgr as LLMManager
  participant Custom as BaseCustomLLM
  participant Lite as litellm

  Evaluator->>Metrics: evaluate(...)
  Metrics->>Mgr: get_model_name(), get_llm_params()
  Mgr-->>Metrics: model_name, llm_params
  Metrics->>Custom: instantiate(model_name, llm_params)
  Metrics->>Custom: call(prompt, n=1, temperature=?, return_single=True)
  Custom->>Lite: completion(model, messages, temperature, n, max_tokens, timeout, num_retries, ...)
  Lite-->>Custom: response(choices[*].message.content)
  Custom-->>Metrics: "text" or ["texts"]
  note right of Custom: on error -> raise LLMError

sequenceDiagram
  autonumber
  actor RagasFlow
  participant RagasMgr
  participant RagasCustom
  participant BaseCustom
  participant Lite

  RagasFlow->>RagasMgr: generate_text(prompt, n, temperature)
  RagasMgr->>RagasCustom: generate_text(...)
  RagasCustom->>BaseCustom: call(prompt, n, temperature, return_single=False)
  BaseCustom->>Lite: completion(...)
  Lite-->>BaseCustom: response
  BaseCustom-->>RagasCustom: list[string] texts
  RagasCustom-->>RagasMgr: LLMResult (Generation objects)
  note right of RagasCustom: is_finished(response) -> True

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

API integration & refactoring #47 — Overlapping changes to LLM manager signatures and llm_params renaming across core.llm files.
archive old eval and make lsc eval as primary #35 — Related reorganization of core.llm exports and added custom LLM surface.
Added Unit test cases as well as integration test cases #42 — Tests and API updates around get_llm_params and LLM parameter standardization.

Suggested reviewers

VladimirKadlec
tisnik

Poem

hop hop, I debug and sing,
a new base call on soft spring.
llm_params lead the trail,
choices trimmed, responses sail.
metrics clap — the rabbit's tale 🐇✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title succinctly refers to the core change of adding a shared custom LLM component, which matches the PR’s objective of introducing a generic abstraction layer for LLM integrations.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1aab90d and 0871015.

📒 Files selected for processing (7)

src/lightspeed_evaluation/core/llm/deepeval.py (2 hunks)
src/lightspeed_evaluation/core/llm/manager.py (3 hunks)
src/lightspeed_evaluation/core/llm/ragas.py (4 hunks)
src/lightspeed_evaluation/core/metrics/custom.py (4 hunks)
src/lightspeed_evaluation/core/metrics/deepeval.py (1 hunks)
src/lightspeed_evaluation/core/metrics/ragas.py (1 hunks)
tests/unit/core/llm/test_manager.py (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

src/lightspeed_evaluation/core/metrics/ragas.py
src/lightspeed_evaluation/core/metrics/deepeval.py

🧰 Additional context used

🧬 Code graph analysis (4)

src/lightspeed_evaluation/core/metrics/custom.py (3)

src/lightspeed_evaluation/core/llm/custom.py (2)

BaseCustomLLM (10-75)

call (18-75)

src/lightspeed_evaluation/core/llm/manager.py (1)

get_llm_params (95-103)

src/lightspeed_evaluation/core/system/exceptions.py (1)

LLMError (24-25)

src/lightspeed_evaluation/core/llm/ragas.py (2)

src/lightspeed_evaluation/core/llm/custom.py (2)

BaseCustomLLM (10-75)

call (18-75)

src/lightspeed_evaluation/core/system/exceptions.py (1)

LLMError (24-25)

src/lightspeed_evaluation/core/llm/deepeval.py (1)

src/lightspeed_evaluation/core/llm/ragas.py (2)

get_llm (99-101)

get_model_info (103-108)

tests/unit/core/llm/test_manager.py (1)

src/lightspeed_evaluation/core/llm/manager.py (1)

get_llm_params (95-103)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: mypy

🔇 Additional comments (5)

tests/unit/core/llm/test_manager.py (1)

69-91: Test exercises new parameter surface

Great coverage to lock in the generalized get_llm_params() payload—matches the manager contract exactly.

src/lightspeed_evaluation/core/llm/deepeval.py (1)

14-42: DeepEval manager wiring stays aligned

Passing the normalized llm_params straight into LiteLLMModel keeps temperature, token, timeout, and retry behavior in sync with the core manager—nice deduction of the LiteLLM-specific naming.

src/lightspeed_evaluation/core/metrics/custom.py (1)

39-72: BaseCustomLLM integration looks solid

Love the move to instantiate BaseCustomLLM once and reuse its call()—the normalized return plus graceful fallback when a list slips through keeps callers happy. The extra LLMError guard downstream should make diagnostics cleaner.

src/lightspeed_evaluation/core/llm/manager.py (1)

16-103: Renamed accessor returns richer config

The generalized get_llm_params() surface—including retries and timeout—fits the new abstraction nicely, and the docstring updates keep the intent clear.

src/lightspeed_evaluation/core/llm/ragas.py (1)

40-47: Stop handling preserved in Ragas bridge

Forwarding stop through call_kwargs while leaning on BaseCustomLLM keeps the Ragas integration feature-complete and avoids the regression we worried about earlier.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

src/lightspeed_evaluation/core/llm/custom.py (1)
56-75: Consider more specific exception handling.

The broad except Exception at Line 74 catches all exceptions, including KeyboardInterrupt and SystemExit. While this is wrapped in LLMError, it could mask critical system-level issues during debugging. Consider catching more specific exceptions that litellm might raise (e.g., litellm.exceptions.APIError, litellm.exceptions.Timeout).

If litellm provides specific exception types, consider refactoring:
         try:
             response = litellm.completion(**call_params)
             # ... response processing ...
-        except Exception as e:
-            raise LLMError(f"LLM call failed: {str(e)}") from e
+        except (litellm.APIError, litellm.Timeout, ValueError) as e:
+            raise LLMError(f"LLM call failed: {str(e)}") from e
+        except Exception as e:
+            # Log unexpected exceptions for debugging
+            raise LLMError(f"Unexpected LLM error: {str(e)}") from e
src/lightspeed_evaluation/core/metrics/custom.py (1)
67-72: Simplify unnecessary list handling.

At Line 70-71, there's a check for isinstance(result, list) even though return_single=True is passed to self.llm.call(). According to the BaseCustomLLM.call() implementation, when return_single=True and n=1, it always returns a single string, never a list.

Simplify the method:
     def _call_llm(self, prompt: str) -> str:
         """Make an LLM call with the configured parameters."""
-        result = self.llm.call(prompt, return_single=True)
-        if isinstance(result, list):
-            return result[0] if result else ""
-        return result
+        return self.llm.call(prompt, return_single=True)

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 94c8236 and a028301.

📒 Files selected for processing (9)

src/lightspeed_evaluation/core/llm/__init__.py (2 hunks)
src/lightspeed_evaluation/core/llm/custom.py (1 hunks)
src/lightspeed_evaluation/core/llm/deepeval.py (2 hunks)
src/lightspeed_evaluation/core/llm/manager.py (3 hunks)
src/lightspeed_evaluation/core/llm/ragas.py (4 hunks)
src/lightspeed_evaluation/core/metrics/custom.py (4 hunks)
src/lightspeed_evaluation/core/metrics/deepeval.py (1 hunks)
src/lightspeed_evaluation/core/metrics/ragas.py (1 hunks)
tests/unit/core/llm/test_manager.py (2 hunks)

🧰 Additional context used

🧬 Code graph analysis (8)

src/lightspeed_evaluation/core/metrics/ragas.py (1)

src/lightspeed_evaluation/core/llm/manager.py (2)

get_model_name (91-93)

get_llm_params (95-103)

src/lightspeed_evaluation/core/llm/__init__.py (1)

src/lightspeed_evaluation/core/llm/custom.py (1)

BaseCustomLLM (10-75)

src/lightspeed_evaluation/core/llm/deepeval.py (1)

src/lightspeed_evaluation/core/llm/ragas.py (2)

get_llm (95-97)

get_model_info (99-104)

tests/unit/core/llm/test_manager.py (1)

src/lightspeed_evaluation/core/llm/manager.py (1)

get_llm_params (95-103)

src/lightspeed_evaluation/core/llm/custom.py (1)

src/lightspeed_evaluation/core/system/exceptions.py (1)

LLMError (24-25)

src/lightspeed_evaluation/core/metrics/deepeval.py (1)

src/lightspeed_evaluation/core/llm/manager.py (2)

get_model_name (91-93)

get_llm_params (95-103)

src/lightspeed_evaluation/core/llm/ragas.py (2)

src/lightspeed_evaluation/core/llm/custom.py (2)

BaseCustomLLM (10-75)

call (18-75)

src/lightspeed_evaluation/core/system/exceptions.py (1)

LLMError (24-25)

src/lightspeed_evaluation/core/metrics/custom.py (3)

src/lightspeed_evaluation/core/llm/custom.py (2)

BaseCustomLLM (10-75)

call (18-75)

src/lightspeed_evaluation/core/llm/manager.py (2)

get_model_name (91-93)

get_llm_params (95-103)

src/lightspeed_evaluation/core/system/exceptions.py (1)

LLMError (24-25)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: mypy

🔇 Additional comments (8)

src/lightspeed_evaluation/core/metrics/deepeval.py (1)

29-29: LGTM!

The parameter retrieval correctly uses the renamed get_llm_params() API, aligning with the generalized LLM parameter handling introduced in this PR.

src/lightspeed_evaluation/core/metrics/ragas.py (1)

39-39: LGTM!

The parameter retrieval correctly uses the renamed get_llm_params() API, consistent with the broader refactor.

src/lightspeed_evaluation/core/llm/manager.py (1)

95-103: LGTM!

The method rename from get_litellm_params to get_llm_params correctly generalizes the API, making it provider-agnostic while maintaining the same return structure and parameter semantics.

tests/unit/core/llm/test_manager.py (1)

69-91: LGTM!

The test correctly validates the renamed get_llm_params() method and ensures all expected parameter keys and values are returned.

src/lightspeed_evaluation/core/llm/deepeval.py (1)

14-26: LGTM!

The parameter handling correctly transitions from litellm_params to llm_params, maintaining consistency with the broader refactor while preserving the same initialization logic.

src/lightspeed_evaluation/core/metrics/custom.py (2)

39-41: LGTM!

The initialization correctly uses BaseCustomLLM with model name and LLM parameters from the manager, properly abstracting the LLM layer.

218-230: LGTM!

The error handling correctly catches LLMError and returns an informative message, improving robustness when LLM calls fail.

src/lightspeed_evaluation/core/llm/custom.py (1)

44-53: Avoid passing None for optional parameters
Ensure max_tokens and timeout aren’t set to None in call_params—either omit keys when their values are None or supply explicit defaults—so you don’t inadvertently pass None into litellm.completion.

src/lightspeed_evaluation/core/llm/ragas.py

asamal4 · 2025-10-08T12:43:40Z

@VladimirKadlec @tisnik PTAL
First PR to simplify custom metric (mostly refactoring to eliminate duplicate code)

VladimirKadlec

We currently have the following classes holding the same params -- the params for llm -- and propagating them to the lower level:
(Config.LLMConfig)->RagasLLMManager->RagasCustomLLM->finally calls litellm.compltetition
This PR adds another guy to the chain (we have to go deeper :-) ) :
(Config.LLMConfig)->RagasLLMManager->RagasCustomLLM->BaseCustomLLM->finally calls litellm.completition

Not sure why we need it, just consider if the path from the config to the LLM instance is not too long.

Other than that, LGTM.

asamal4 · 2025-10-08T14:25:34Z

@VladimirKadlec
We are adding BaseCustomLLM to create an separate module for litellm for below reasons
-> remove duplicate code (because same completion call happens in both RAGAS and Custom Metric)
-> single place for litellm completion (easy switch to something else in future if required)

Regarding RagasLLMManager->RagasCustomLLM: Technically we don't need both; only reason we have RagasCustomLLM is just to identify that we are not using anything given by RAGAS.. for deepeval we don't have this, because it provides litellm method.

tisnik

LGTM

add common litellm call to remove duplicate code

11874b9

coderabbitai bot reviewed Oct 7, 2025

View reviewed changes

src/lightspeed_evaluation/core/llm/ragas.py Show resolved Hide resolved

asamal4 force-pushed the common-llm-call branch from a028301 to 1aab90d Compare October 7, 2025 16:12

asamal4 added 2 commits October 7, 2025 23:09

use common custom llm call

8e3513c

remove unnecessary litellm reference

0871015

asamal4 force-pushed the common-llm-call branch from 1aab90d to 0871015 Compare October 7, 2025 17:53

VladimirKadlec approved these changes Oct 8, 2025

View reviewed changes

tisnik approved these changes Oct 8, 2025

View reviewed changes

tisnik merged commit 2ededad into lightspeed-core:main Oct 8, 2025
15 checks passed

coderabbitai bot mentioned this pull request Nov 13, 2025

GEval Integration #97

Merged

coderabbitai bot mentioned this pull request Nov 27, 2025

LEADS-26: Increased Unit test cases coverage #109

Open

15 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add common custom llm #70

add common custom llm #70

Uh oh!

asamal4 commented Oct 7, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Oct 7, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

asamal4 commented Oct 8, 2025

Uh oh!

VladimirKadlec left a comment

Uh oh!

asamal4 commented Oct 8, 2025 •

edited

Loading

Uh oh!

tisnik left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

add common custom llm #70

add common custom llm #70

Uh oh!

Conversation

asamal4 commented Oct 7, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

asamal4 commented Oct 8, 2025

Uh oh!

VladimirKadlec left a comment

Choose a reason for hiding this comment

Uh oh!

asamal4 commented Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tisnik left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

asamal4 commented Oct 7, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 7, 2025 •

edited

Loading

asamal4 commented Oct 8, 2025 •

edited

Loading