fix: Add OpenAI API compatibility with prepare_llm_params function#691
fix: Add OpenAI API compatibility with prepare_llm_params function#691leoric-crown wants to merge 1 commit intocoleam00:mainfrom
Conversation
- Add prepare_llm_params() function to handle provider-specific API compatibility - Handle OpenAI max_tokens → max_completion_tokens deprecation - Exclude temperature parameter for reasoning models (o1, gpt-5 series) - Update code_storage_service.py and contextual_embedding_service.py to use compatibility function - Maintain backward compatibility with other providers (Ollama, Google) This fixes API errors when using newer OpenAI models while establishing a maintainable pattern for future API changes. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
WalkthroughCentralizes LLM API parameter handling via a new prepare_llm_params helper and updates contextual embedding and code storage services to use it. Adds reasoning-model checks and max_tokens mapping for OpenAI. Extends code storage to support 3072-dim embeddings. Minor import reorders and whitespace edits. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant CES as ContextualEmbeddingService
participant CSS as CodeStorageService
participant LPS as LLMProviderService
participant LLM as OpenAI API
rect rgba(230,240,255,0.4)
CES->>LPS: prepare_llm_params(provider, model, {temperature, max_tokens})
CSS->>LPS: prepare_llm_params(provider, model, {temperature, max_tokens})
note over LPS: Normalize params<br/>- If OpenAI: max_tokens→max_completion_tokens<br/>- If reasoning model: remove temperature
LPS-->>CES: params
LPS-->>CSS: params
end
CES->>LLM: chat.completions(**params)
CSS->>LLM: chat.completions(**params)
LLm-->>CES: response
LLm-->>CSS: response
note over CSS: Also maps embedding dims incl. 3072 path
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (3)
python/src/server/services/llm_provider_service.py (2)
417-429: Broaden reasoning-model detection (o3/o4/GPT‑5 families) and future-proofCurrent patterns miss o3/o4 series mentioned in docs/guides; expand detection so temperature/etc. are correctly pruned. (learn.microsoft.com)
Apply this diff:
-def _is_reasoning_model(model: str) -> bool: +def _is_reasoning_model(model: str) -> bool: @@ - reasoning_patterns = ["o1", "o1-preview", "o1-mini", "gpt-5"] - model_lower = model.lower() - return any(pattern in model_lower for pattern in reasoning_patterns) + m = model.lower() + # Includes o‑series (o1/o2/o3/o4/…) and GPT‑5 family + return m.startswith(("o1", "o2", "o3", "o4", "o5", "gpt-5"))
388-389: Nit: Return type should be precise for mypyUse dict[str, Any] to satisfy typing guidelines (Any is already imported).
python/src/server/services/storage/code_storage_service.py (1)
510-512: Remove redundant local importasyncio is already imported at module top (Line 7). Drop the inner import to avoid duplication.
Apply this diff:
- import asyncio
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
python/src/server/services/embeddings/contextual_embedding_service.py(8 hunks)python/src/server/services/llm_provider_service.py(1 hunks)python/src/server/services/storage/code_storage_service.py(7 hunks)
🧰 Additional context used
📓 Path-based instructions (5)
python/src/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
python/src/**/*.py: Fail fast on critical conditions: service startup failures, missing configuration/env vars, database connection/auth failures, critical dependencies unavailable
Never accept or store corrupted data (e.g., zero embeddings, null foreign keys, malformed JSON); skip failed items entirely and continue processing
For batch/background operations, continue processing but log detailed per-item failures; for external APIs use retries with exponential backoff and then fail clearly
Error messages must include context, use specific exception types, preserve full stack traces (logging with exc_info=True), include relevant IDs/URLs, and never return None to indicate failure—raise instead; for batch ops report success counts and detailed failures
Backend uses Python 3.12 with a 120-character line length
Avoid introducing WebSocket support in the backend; updates are handled via HTTP polling
Adhere to Ruff lint rules (e.g., no unused imports) and provide type hints to satisfy MyPy
python/src/**/*.py: Fail fast on service startup failures (credentials, DB, service init); crash with clear errors
Treat missing configuration (env vars/invalid settings) as fatal; stop the system
Do not hide database connection failures; bubble up and surface clearly
Authentication/authorization failures must halt the operation and be visible
Never silently accept bad data; let Pydantic validation errors raise
If critical dependencies are unavailable, fail immediately
Reject invalid data that could corrupt state (e.g., zero embeddings, null FKs, malformed JSON)
Batch processing should complete remaining items but log detailed per-item failures
Background tasks (e.g., embedding generation) should finish queues while logging failures
Treat optional features as skippable: log and skip when disabled rather than crashing
External API calls: use retry with exponential backoff; on final failure, raise with clear service/context info
Never accept corrupted data during partial-failure work...
Files:
python/src/server/services/llm_provider_service.pypython/src/server/services/embeddings/contextual_embedding_service.pypython/src/server/services/storage/code_storage_service.py
python/src/server/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Use specific exception classes and FastAPI exception handlers to produce rich JSON error responses
python/src/server/**/*.py: Provide rich error responses in FastAPI (custom exceptions and handlers that return structured JSON with detail/type)
Implement HTTP polling support and ETag caching on endpoints to enable 304 responses
Files:
python/src/server/services/llm_provider_service.pypython/src/server/services/embeddings/contextual_embedding_service.pypython/src/server/services/storage/code_storage_service.py
python/src/server/{api_routes,services}/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Follow Service Layer pattern: API routes delegate to services, which handle business logic and call the database layer
Files:
python/src/server/services/llm_provider_service.pypython/src/server/services/embeddings/contextual_embedding_service.pypython/src/server/services/storage/code_storage_service.py
**/*.{py,ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
Never return None/null to indicate failure; raise an exception with details instead
Files:
python/src/server/services/llm_provider_service.pypython/src/server/services/embeddings/contextual_embedding_service.pypython/src/server/services/storage/code_storage_service.py
python/src/server/services/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Place business logic in service layer under python/src/server/services/
Files:
python/src/server/services/llm_provider_service.pypython/src/server/services/embeddings/contextual_embedding_service.pypython/src/server/services/storage/code_storage_service.py
🧬 Code graph analysis (2)
python/src/server/services/embeddings/contextual_embedding_service.py (1)
python/src/server/services/llm_provider_service.py (1)
prepare_llm_params(388-414)
python/src/server/services/storage/code_storage_service.py (1)
python/src/server/services/llm_provider_service.py (3)
get_llm_client(42-170)prepare_llm_params(388-414)get_embedding_model(221-273)
🔇 Additional comments (7)
python/src/server/services/embeddings/contextual_embedding_service.py (3)
13-13: LGTM: Centralized param builder importImporting prepare_llm_params here keeps call sites consistent.
193-196: LGTM: Batch path uses the same compatibility layerConsistent usage in batch reduces drift; nice.
Also applies to: 207-207
254-254: Acceptable fallback behaviorReturning (chunk, False) on final failure aligns with “log and continue” for batch-ish flows.
python/src/server/services/storage/code_storage_service.py (4)
521-522: LGTM: Use get_llm_client + prepare_llm_params inside async variantThis avoids circular imports and standardizes param handling.
559-562: LGTM with dependency on prepare_llm_params fixCall site is correct; it will pick up the reasoning‑model rules once the helper is adjusted as proposed.
If you want to be extra safe on Google’s OpenAI‑compat layer, note it accepts both maxTokens and maxCompletionTokens; our shim leaving max_tokens for non‑reasoning models is fine. (ai.google.dev)
Also applies to: 573-573
858-861: LGTM: get_embedding_model import/useKeeps embedding model tracking accurate.
915-927: Verify DB schema includes embedding_3072 before mergeRepo search returned only code references to embedding_3072 (e.g., python/src/server/services/storage/code_storage_service.py mapping) and no SQL migration creating that column — confirm the migration exists and is deployed to Supabase, or add a safe fallback/guard to use embedding_1536 to avoid 400/500 on writes.
| # Prepare compatible parameters for the API call | ||
| params = prepare_llm_params(provider or "openai", model, | ||
| temperature=0.3, max_tokens=200) | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
LGTM: Params now flow through compatibility shim
Passing params via prepare_llm_params before chat.completions.create is the right direction. This will work once prepare_llm_params gates max_tokens only for reasoning models as suggested.
To catch any lingering direct uses, run:
Also applies to: 81-81
🏁 Script executed:
#!/bin/bash
# Find chat.completions calls still passing max_tokens/temperature directly
rg -nP -C2 --type=py '\bchat\.completions\.create\('
rg -nP --type=py '\bmax_tokens\s*='
rg -nP --type=py '\btemperature\s*='Length of output: 8943
Fix remaining direct max_tokens/temperature usages before approving
Direct calls still pass max_tokens/temperature to client.chat.completions.create — switch these to use prepare_llm_params (and gate max_tokens only for reasoning models), then re-run the verification script.
- python/src/server/services/embeddings/contextual_embedding_service.py:70 (temperature=0.3, max_tokens=200)
- python/src/server/services/embeddings/contextual_embedding_service.py:195 (temperature=0, max_tokens=100 * len(chunks))
- python/src/server/services/storage/code_storage_service.py:561 (max_tokens=500, temperature=0.3)
- python/src/server/services/provider_discovery_service.py:154 (max_tokens=50)
- python/src/server/services/ollama/model_discovery_service.py:587 (max_tokens=1)
- python/src/server/services/ollama/model_discovery_service.py:611 (max_tokens=10)
- python/src/server/services/ollama/model_discovery_service.py:613 (temperature=0.1)
- python/src/server/services/ollama/model_discovery_service.py:670 (max_tokens=1)
- python/src/server/services/ollama/model_discovery_service.py:844 (max_tokens=50)
- python/src/server/services/ollama/model_discovery_service.py:878 (max_tokens=100)
- python/src/server/api_routes/ollama_api.py:883 (max_tokens=50)
- python/src/server/api_routes/ollama_api.py:928 (max_tokens=100)
- python/src/server/api_routes/ollama_api.py:930 (temperature=0.1)
🤖 Prompt for AI Agents
In python/src/server/services/embeddings/contextual_embedding_service.py around
lines 68-71, the call is directly passing temperature=0.3 and max_tokens=200;
replace these direct arguments with a call to prepare_llm_params(provider or
"openai", model, ...) and merge the returned params into the client call, and
ensure max_tokens is only included when the model is a reasoning-type model
(gate it behind the existing reasoning-model check) so that temperature and
max_tokens are not passed directly to client.chat.completions.create; apply the
same pattern to the other listed locations and re-run the verification script.
| def prepare_llm_params(provider: str, model: str, **kwargs) -> dict: | ||
| """ | ||
| Prepare LLM API parameters with automatic compatibility handling. | ||
|
|
||
| Handles: | ||
| - OpenAI max_tokens → max_completion_tokens deprecation | ||
| - Reasoning model temperature exclusion (o1, gpt-5 series) | ||
|
|
||
| Args: | ||
| provider: LLM provider name (openai, ollama, google) | ||
| model: Model name to check for special requirements | ||
| **kwargs: Original API parameters | ||
|
|
||
| Returns: | ||
| dict: Compatible parameters ready for API call | ||
| """ | ||
| params = kwargs.copy() | ||
|
|
||
| # Handle OpenAI parameter deprecation | ||
| if provider == "openai" and "max_tokens" in params: | ||
| params["max_completion_tokens"] = params.pop("max_tokens") | ||
|
|
||
| # Handle reasoning model restrictions | ||
| if model and _is_reasoning_model(model): | ||
| params.pop("temperature", None) | ||
|
|
||
| return params | ||
|
|
There was a problem hiding this comment.
Blocker: Don’t remap max_tokens globally; gate by reasoning models and drop unsupported params
Unconditionally converting max_tokens → max_completion_tokens risks 400s on non‑reasoning chat models. For o‑series/GPT‑5 reasoning models, use max_completion_tokens and strip unsupported params (temperature/top_p/penalties/logprobs/logit_bias/n). For non‑reasoning chat models, keep max_tokens (or translate back if only max_completion_tokens was supplied). This aligns with OpenAI guidance and observed API behavior. (community.openai.com)
Apply this diff:
-def prepare_llm_params(provider: str, model: str, **kwargs) -> dict:
+def prepare_llm_params(provider: str, model: str, **kwargs) -> dict[str, Any]:
@@
- params = kwargs.copy()
-
- # Handle OpenAI parameter deprecation
- if provider == "openai" and "max_tokens" in params:
- params["max_completion_tokens"] = params.pop("max_tokens")
-
- # Handle reasoning model restrictions
- if model and _is_reasoning_model(model):
- params.pop("temperature", None)
-
- return params
+ params = dict(kwargs)
+ is_reasoning = bool(model and _is_reasoning_model(model))
+
+ # OpenAI compatibility handling
+ if provider == "openai":
+ if is_reasoning:
+ # o‑/GPT‑5 reasoning models: require max_completion_tokens; drop unsupported knobs
+ if "max_tokens" in params and "max_completion_tokens" not in params:
+ params["max_completion_tokens"] = params.pop("max_tokens")
+ for k in (
+ "temperature",
+ "top_p",
+ "presence_penalty",
+ "frequency_penalty",
+ "logprobs",
+ "top_logprobs",
+ "logit_bias",
+ "n",
+ ):
+ params.pop(k, None)
+ else:
+ # Legacy/non‑reasoning chat models: prefer max_tokens for broad compatibility
+ if "max_completion_tokens" in params and "max_tokens" not in params:
+ params["max_tokens"] = params.pop("max_completion_tokens")
+
+ return params📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def prepare_llm_params(provider: str, model: str, **kwargs) -> dict: | |
| """ | |
| Prepare LLM API parameters with automatic compatibility handling. | |
| Handles: | |
| - OpenAI max_tokens → max_completion_tokens deprecation | |
| - Reasoning model temperature exclusion (o1, gpt-5 series) | |
| Args: | |
| provider: LLM provider name (openai, ollama, google) | |
| model: Model name to check for special requirements | |
| **kwargs: Original API parameters | |
| Returns: | |
| dict: Compatible parameters ready for API call | |
| """ | |
| params = kwargs.copy() | |
| # Handle OpenAI parameter deprecation | |
| if provider == "openai" and "max_tokens" in params: | |
| params["max_completion_tokens"] = params.pop("max_tokens") | |
| # Handle reasoning model restrictions | |
| if model and _is_reasoning_model(model): | |
| params.pop("temperature", None) | |
| return params | |
| def prepare_llm_params(provider: str, model: str, **kwargs) -> dict[str, Any]: | |
| """ | |
| Prepare LLM API parameters with automatic compatibility handling. | |
| Handles: | |
| - OpenAI max_tokens → max_completion_tokens deprecation | |
| - Reasoning model temperature exclusion (o1, gpt-5 series) | |
| Args: | |
| provider: LLM provider name (openai, ollama, google) | |
| model: Model name to check for special requirements | |
| **kwargs: Original API parameters | |
| Returns: | |
| dict: Compatible parameters ready for API call | |
| """ | |
| params = dict(kwargs) | |
| is_reasoning = bool(model and _is_reasoning_model(model)) | |
| # OpenAI compatibility handling | |
| if provider == "openai": | |
| if is_reasoning: | |
| # o-/GPT-5 reasoning models: require max_completion_tokens; drop unsupported knobs | |
| if "max_tokens" in params and "max_completion_tokens" not in params: | |
| params["max_completion_tokens"] = params.pop("max_tokens") | |
| for k in ( | |
| "temperature", | |
| "top_p", | |
| "presence_penalty", | |
| "frequency_penalty", | |
| "logprobs", | |
| "top_logprobs", | |
| "logit_bias", | |
| "n", | |
| ): | |
| params.pop(k, None) | |
| else: | |
| # Legacy/non-reasoning chat models: prefer max_tokens for broad compatibility | |
| if "max_completion_tokens" in params and "max_tokens" not in params: | |
| params["max_tokens"] = params.pop("max_completion_tokens") | |
| return params |
|
this looks like a nice quick fix to allow gpt-5 support will be a nice to have before we migrate to the responses api thank you for this contribution @leoric-crown |
|
Yeah this is great, thanks @leoric-crown! @Wirasm have you had a chance to review or just providing your initial comments here? |
|
Closing this PR as the underlying OpenAI API compatibility issues were resolved in a580fdf (PR #736). While this PR proposed a centralized
Both approaches solved the same core problem (OpenAI API parameter deprecation), but the merged solution integrated it into a larger provider architecture refactor. Look forward to contributing more! |
…based retries (#729) * feat: safe session continuity across workflow steps with forkSession-based retries (#691) Retries previously discarded all prior conversation context by passing resumeSessionId=undefined, because naive session resume would append to the original session file and corrupt it on a crash. This adds forkSession:true support so the SDK copies the prior session history into a new file before appending — leaving the source untouched — making it safe to always pass the prior session ID on retry. Changes: - Add forkSession?: boolean to WorkflowAssistantOptions (deps.ts) and AssistantRequestOptions (types/index.ts) - Pass forkSession to SDK Options in ClaudeClient with enhanced resuming_session log - Sequential executor: set shouldForkSession=true when resuming; inject into stepOptions; fix retry to always pass resumeSessionId - DAG executor: set shouldForkSession=true when resuming in executeNodeInternal; fix retry to always pass resumeSessionId; update isFresh comment - Fix resolvedOptions to explicitly set persistSession:false for Claude workflows (was silently defaulting to SDK's true) - Add context:'shared' union to DagNodeBase.context type; update clearContext docstring Fixes #691 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: remove persistSession:false from workflow resolvedOptions to preserve cross-step session continuity SDK docs confirm that persistSession:false prevents session files from being written to disk, making resume impossible for subsequent steps. Removing this option restores the SDK default (true), so step 1's session ID can be used by step 2 to resume. Also clarify context:'shared' docstring to note it has no effect on parallel layer nodes, and add unit tests asserting that retries pass forkSession:true when a prior session exists. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: add persistSession: false to Claude resolvedOptions Implements the missing persistSession fix claimed in the PR description. Without this, the SDK defaults to true and writes session transcripts to disk. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(loader): preserve context: 'shared' in DAG node YAML parsing The PR added 'shared' to DagNodeBase.context type but the loader only preserved 'fresh', silently dropping 'shared' during YAML parsing. This meant context: shared in workflow YAML was ignored at load time. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
…based retries (coleam00#729) * feat: safe session continuity across workflow steps with forkSession-based retries (coleam00#691) Retries previously discarded all prior conversation context by passing resumeSessionId=undefined, because naive session resume would append to the original session file and corrupt it on a crash. This adds forkSession:true support so the SDK copies the prior session history into a new file before appending — leaving the source untouched — making it safe to always pass the prior session ID on retry. Changes: - Add forkSession?: boolean to WorkflowAssistantOptions (deps.ts) and AssistantRequestOptions (types/index.ts) - Pass forkSession to SDK Options in ClaudeClient with enhanced resuming_session log - Sequential executor: set shouldForkSession=true when resuming; inject into stepOptions; fix retry to always pass resumeSessionId - DAG executor: set shouldForkSession=true when resuming in executeNodeInternal; fix retry to always pass resumeSessionId; update isFresh comment - Fix resolvedOptions to explicitly set persistSession:false for Claude workflows (was silently defaulting to SDK's true) - Add context:'shared' union to DagNodeBase.context type; update clearContext docstring Fixes coleam00#691 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: remove persistSession:false from workflow resolvedOptions to preserve cross-step session continuity SDK docs confirm that persistSession:false prevents session files from being written to disk, making resume impossible for subsequent steps. Removing this option restores the SDK default (true), so step 1's session ID can be used by step 2 to resume. Also clarify context:'shared' docstring to note it has no effect on parallel layer nodes, and add unit tests asserting that retries pass forkSession:true when a prior session exists. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: add persistSession: false to Claude resolvedOptions Implements the missing persistSession fix claimed in the PR description. Without this, the SDK defaults to true and writes session transcripts to disk. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(loader): preserve context: 'shared' in DAG node YAML parsing The PR added 'shared' to DagNodeBase.context type but the loader only preserved 'fresh', silently dropping 'shared' during YAML parsing. This meant context: shared in workflow YAML was ignored at load time. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
…based retries (coleam00#729) * feat: safe session continuity across workflow steps with forkSession-based retries (coleam00#691) Retries previously discarded all prior conversation context by passing resumeSessionId=undefined, because naive session resume would append to the original session file and corrupt it on a crash. This adds forkSession:true support so the SDK copies the prior session history into a new file before appending — leaving the source untouched — making it safe to always pass the prior session ID on retry. Changes: - Add forkSession?: boolean to WorkflowAssistantOptions (deps.ts) and AssistantRequestOptions (types/index.ts) - Pass forkSession to SDK Options in ClaudeClient with enhanced resuming_session log - Sequential executor: set shouldForkSession=true when resuming; inject into stepOptions; fix retry to always pass resumeSessionId - DAG executor: set shouldForkSession=true when resuming in executeNodeInternal; fix retry to always pass resumeSessionId; update isFresh comment - Fix resolvedOptions to explicitly set persistSession:false for Claude workflows (was silently defaulting to SDK's true) - Add context:'shared' union to DagNodeBase.context type; update clearContext docstring Fixes coleam00#691 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: remove persistSession:false from workflow resolvedOptions to preserve cross-step session continuity SDK docs confirm that persistSession:false prevents session files from being written to disk, making resume impossible for subsequent steps. Removing this option restores the SDK default (true), so step 1's session ID can be used by step 2 to resume. Also clarify context:'shared' docstring to note it has no effect on parallel layer nodes, and add unit tests asserting that retries pass forkSession:true when a prior session exists. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: add persistSession: false to Claude resolvedOptions Implements the missing persistSession fix claimed in the PR description. Without this, the SDK defaults to true and writes session transcripts to disk. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(loader): preserve context: 'shared' in DAG node YAML parsing The PR added 'shared' to DagNodeBase.context type but the loader only preserved 'fresh', silently dropping 'shared' during YAML parsing. This meant context: shared in workflow YAML was ignored at load time. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Summary
Fixes OpenAI API compatibility issues when using newer models by adding a centralized
prepare_llm_params()function that handles provider-specific API parameter transformations.Problems Solved
max_tokens→max_completion_tokenstransitionType of Change
Affected Services
Changes Made
prepare_llm_params()function inllm_provider_service.pyfor centralized API compatibility handling_is_reasoning_model()helper function to detect models requiring parameter restrictionscode_storage_service.py: Modified code summary generation to use compatibility functioncontextual_embedding_service.py: Modified both single and batch context generation callsUsage Example
Testing
Manual Testing Steps
OpenAI API compatibility testing:
Reasoning model temperature exclusion:
Backward compatibility verification:
Code Quality Checks
Test Results: All modified files pass linting checks. Functions are properly typed and follow existing code patterns.
Architecture Rationale
This implementation follows the existing functional programming patterns in the codebase:
Checklist
Breaking Changes
None. This change is fully backward compatible:
prepare_llm_params()function is additive and doesn't modify existing interfacesFuture Considerations
This solution addresses immediate needs while opening discussion for more robust patterns:
Alternative Patterns Worth Considering
1. Automatic Compatibility via Enhanced Context Manager
Trade-off: Hidden magic vs. explicit transformations
2. Linting/Static Analysis Integration
chat.completions.createcallsDiscussion Points
Related Issues
Addresses similar concerns raised in #687 regarding LLM client management patterns.
🤖 Generated with Claude Code