Skip to content

fix: Add OpenAI API compatibility with prepare_llm_params function#691

Closed
leoric-crown wants to merge 1 commit intocoleam00:mainfrom
leoric-crown:fix/openai-api-compatibility
Closed

fix: Add OpenAI API compatibility with prepare_llm_params function#691
leoric-crown wants to merge 1 commit intocoleam00:mainfrom
leoric-crown:fix/openai-api-compatibility

Conversation

@leoric-crown
Copy link
Copy Markdown

@leoric-crown leoric-crown commented Sep 17, 2025

Summary

Fixes OpenAI API compatibility issues when using newer models by adding a centralized prepare_llm_params() function that handles provider-specific API parameter transformations.

Problems Solved

  • OpenAI parameter deprecation: max_tokensmax_completion_tokens transition
  • Reasoning model restrictions: Temperature parameter exclusion for o1, gpt-5 series models
  • Provider compatibility: Maintains backward compatibility with Ollama, Google providers

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

Affected Services

  • Backend/API
  • Frontend/UI
  • Database
  • Documentation
  • Infrastructure

Changes Made

  • Added prepare_llm_params() function in llm_provider_service.py for centralized API compatibility handling
  • Added _is_reasoning_model() helper function to detect models requiring parameter restrictions
  • Updated code_storage_service.py: Modified code summary generation to use compatibility function
  • Updated contextual_embedding_service.py: Modified both single and batch context generation calls
  • Applied linting fixes: Auto-fixed whitespace and formatting issues

Usage Example

# Before (fails with newer OpenAI API):
response = await client.chat.completions.create(
    model="gpt-4",
    max_tokens=500,      # ❌ Deprecated parameter
    temperature=0.3      # ❌ Not supported by reasoning models
)

# After (automatic compatibility):
params = prepare_llm_params("openai", "gpt-4", max_tokens=500, temperature=0.3)
response = await client.chat.completions.create(model="gpt-4", **params)

Testing

Manual Testing Steps

  1. OpenAI API compatibility testing:

    # Start services
    docker compose up --build -d
    
    # Test crawling with OpenAI provider (should work without max_tokens errors)
    curl -X POST "http://localhost:8181/api/knowledge-items/crawl" \
      -H "Content-Type: application/json" \
      -d '{"url": "https://docs.anthropic.com", "knowledge_type": "technical"}'
  2. Reasoning model temperature exclusion:

    • Configure system to use a gpt-5 or o1 model
    • Verify that API calls complete without temperature parameter errors
    • Check logs for proper parameter transformation
  3. Backward compatibility verification:

    • Test with Ollama provider (should continue using max_tokens)
    • Test with Google provider (should continue using max_tokens)
    • Verify no regression in existing functionality

Code Quality Checks

# Linting verification
cd python && uv run ruff check src/server/services/llm_provider_service.py
cd python && uv run ruff check src/server/services/storage/code_storage_service.py  
cd python && uv run ruff check src/server/services/embeddings/contextual_embedding_service.py

# Type checking
cd python && uv run mypy src/server/services/llm_provider_service.py

Test Results: All modified files pass linting checks. Functions are properly typed and follow existing code patterns.

Architecture Rationale

This implementation follows the existing functional programming patterns in the codebase:

  • ✅ No classes, pure functions only
  • ✅ Explicit transformations (developers see what's happening)
  • ✅ Single responsibility with clear separation of concerns
  • ✅ Easily testable and extensible

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation (function docstrings)
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works (manual testing performed)
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules

Breaking Changes

None. This change is fully backward compatible:

  • Existing API calls without max_tokens/temperature continue to work unchanged
  • Other providers (Ollama, Google) maintain their existing parameter usage
  • The prepare_llm_params() function is additive and doesn't modify existing interfaces

Future Considerations

This solution addresses immediate needs while opening discussion for more robust patterns:

Alternative Patterns Worth Considering

1. Automatic Compatibility via Enhanced Context Manager

@asynccontextmanager
async def get_llm_client(provider: str = None, **kwargs):
    # Automatically wrap client methods for compatibility

Trade-off: Hidden magic vs. explicit transformations

2. Linting/Static Analysis Integration

  • Custom Ruff rules to detect raw chat.completions.create calls
  • Enforce compatibility function usage at build time

Discussion Points

  • Maintainability: As more providers evolve, should we move toward automatic compatibility?
  • Developer Experience: Is the explicit approach worth the extra lines?
  • Architecture Evolution: Balance functional purity with practical usability

Related Issues

Addresses similar concerns raised in #687 regarding LLM client management patterns.

🤖 Generated with Claude Code

- Add prepare_llm_params() function to handle provider-specific API compatibility
- Handle OpenAI max_tokens → max_completion_tokens deprecation
- Exclude temperature parameter for reasoning models (o1, gpt-5 series)
- Update code_storage_service.py and contextual_embedding_service.py to use compatibility function
- Maintain backward compatibility with other providers (Ollama, Google)

This fixes API errors when using newer OpenAI models while establishing a maintainable pattern for future API changes.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Sep 17, 2025

Walkthrough

Centralizes LLM API parameter handling via a new prepare_llm_params helper and updates contextual embedding and code storage services to use it. Adds reasoning-model checks and max_tokens mapping for OpenAI. Extends code storage to support 3072-dim embeddings. Minor import reorders and whitespace edits.

Changes

Cohort / File(s) Summary
LLM parameter preparation utility
python/src/server/services/llm_provider_service.py
Added prepare_llm_params(provider, model, **kwargs). Maps OpenAI max_tokens→max_completion_tokens, strips temperature for reasoning models (o1/o1-preview/o1-mini/gpt-5), via private _is_reasoning_model.
Contextual embedding calls standardized
python/src/server/services/embeddings/contextual_embedding_service.py
Replaced inline temperature/max_tokens with params built by prepare_llm_params for single and batch embedding generation; minor whitespace cleanup.
Code storage summarization and embeddings updates
python/src/server/services/storage/code_storage_service.py
Switched LLM calls to use prepare_llm_params; updated imports. Added 3072-dim embedding handling with embedding_3072; fallback unchanged. Minor import reorder, whitespace edits.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant CES as ContextualEmbeddingService
  participant CSS as CodeStorageService
  participant LPS as LLMProviderService
  participant LLM as OpenAI API

  rect rgba(230,240,255,0.4)
    CES->>LPS: prepare_llm_params(provider, model, {temperature, max_tokens})
    CSS->>LPS: prepare_llm_params(provider, model, {temperature, max_tokens})
    note over LPS: Normalize params<br/>- If OpenAI: max_tokens→max_completion_tokens<br/>- If reasoning model: remove temperature
    LPS-->>CES: params
    LPS-->>CSS: params
  end

  CES->>LLM: chat.completions(**params)
  CSS->>LLM: chat.completions(**params)
  LLm-->>CES: response
  LLm-->>CSS: response

  note over CSS: Also maps embedding dims incl. 3072 path
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

A nibble of params, a hop through the code,
I bundled the tokens for a smoother road.
Reasoning winds? I lower the heat,
Map max to completion—so tidy, so neat.
With 3k dims, I leap and I sing—
Consistent calls for a springy spring! 🐇✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Title Check ✅ Passed The title accurately and concisely summarizes the primary change — adding OpenAI API compatibility via a new prepare_llm_params function — and uses the conventional "fix:" prefix so it's clear to reviewers; it references the main artifact modified and is on-topic for the changeset. Consider shortening for brevity (optional).
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check ✅ Passed The PR description closely follows the repository's template and includes the key sections: Summary, Changes Made, Type of Change, Affected Services, Testing (manual steps and lint/type-check commands), Checklist, and Breaking Changes; it clearly describes the new prepare_llm_params function, usage examples, and rationale. The testing section provides manual reproduction steps and lint/type-check commands, and the checklist claims tests and linting pass. One minor omission is that Test Evidence does not include verbatim command outputs or CI run links to substantiate the claimed results, but the description is otherwise sufficiently detailed for review.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
python/src/server/services/llm_provider_service.py (2)

417-429: Broaden reasoning-model detection (o3/o4/GPT‑5 families) and future-proof

Current patterns miss o3/o4 series mentioned in docs/guides; expand detection so temperature/etc. are correctly pruned. (learn.microsoft.com)

Apply this diff:

-def _is_reasoning_model(model: str) -> bool:
+def _is_reasoning_model(model: str) -> bool:
@@
-    reasoning_patterns = ["o1", "o1-preview", "o1-mini", "gpt-5"]
-    model_lower = model.lower()
-    return any(pattern in model_lower for pattern in reasoning_patterns)
+    m = model.lower()
+    # Includes o‑series (o1/o2/o3/o4/…) and GPT‑5 family
+    return m.startswith(("o1", "o2", "o3", "o4", "o5", "gpt-5"))

388-389: Nit: Return type should be precise for mypy

Use dict[str, Any] to satisfy typing guidelines (Any is already imported).

python/src/server/services/storage/code_storage_service.py (1)

510-512: Remove redundant local import

asyncio is already imported at module top (Line 7). Drop the inner import to avoid duplication.

Apply this diff:

-    import asyncio
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b383c8c and 926ffd2.

📒 Files selected for processing (3)
  • python/src/server/services/embeddings/contextual_embedding_service.py (8 hunks)
  • python/src/server/services/llm_provider_service.py (1 hunks)
  • python/src/server/services/storage/code_storage_service.py (7 hunks)
🧰 Additional context used
📓 Path-based instructions (5)
python/src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

python/src/**/*.py: Fail fast on critical conditions: service startup failures, missing configuration/env vars, database connection/auth failures, critical dependencies unavailable
Never accept or store corrupted data (e.g., zero embeddings, null foreign keys, malformed JSON); skip failed items entirely and continue processing
For batch/background operations, continue processing but log detailed per-item failures; for external APIs use retries with exponential backoff and then fail clearly
Error messages must include context, use specific exception types, preserve full stack traces (logging with exc_info=True), include relevant IDs/URLs, and never return None to indicate failure—raise instead; for batch ops report success counts and detailed failures
Backend uses Python 3.12 with a 120-character line length
Avoid introducing WebSocket support in the backend; updates are handled via HTTP polling
Adhere to Ruff lint rules (e.g., no unused imports) and provide type hints to satisfy MyPy

python/src/**/*.py: Fail fast on service startup failures (credentials, DB, service init); crash with clear errors
Treat missing configuration (env vars/invalid settings) as fatal; stop the system
Do not hide database connection failures; bubble up and surface clearly
Authentication/authorization failures must halt the operation and be visible
Never silently accept bad data; let Pydantic validation errors raise
If critical dependencies are unavailable, fail immediately
Reject invalid data that could corrupt state (e.g., zero embeddings, null FKs, malformed JSON)
Batch processing should complete remaining items but log detailed per-item failures
Background tasks (e.g., embedding generation) should finish queues while logging failures
Treat optional features as skippable: log and skip when disabled rather than crashing
External API calls: use retry with exponential backoff; on final failure, raise with clear service/context info
Never accept corrupted data during partial-failure work...

Files:

  • python/src/server/services/llm_provider_service.py
  • python/src/server/services/embeddings/contextual_embedding_service.py
  • python/src/server/services/storage/code_storage_service.py
python/src/server/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use specific exception classes and FastAPI exception handlers to produce rich JSON error responses

python/src/server/**/*.py: Provide rich error responses in FastAPI (custom exceptions and handlers that return structured JSON with detail/type)
Implement HTTP polling support and ETag caching on endpoints to enable 304 responses

Files:

  • python/src/server/services/llm_provider_service.py
  • python/src/server/services/embeddings/contextual_embedding_service.py
  • python/src/server/services/storage/code_storage_service.py
python/src/server/{api_routes,services}/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Follow Service Layer pattern: API routes delegate to services, which handle business logic and call the database layer

Files:

  • python/src/server/services/llm_provider_service.py
  • python/src/server/services/embeddings/contextual_embedding_service.py
  • python/src/server/services/storage/code_storage_service.py
**/*.{py,ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

Never return None/null to indicate failure; raise an exception with details instead

Files:

  • python/src/server/services/llm_provider_service.py
  • python/src/server/services/embeddings/contextual_embedding_service.py
  • python/src/server/services/storage/code_storage_service.py
python/src/server/services/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Place business logic in service layer under python/src/server/services/

Files:

  • python/src/server/services/llm_provider_service.py
  • python/src/server/services/embeddings/contextual_embedding_service.py
  • python/src/server/services/storage/code_storage_service.py
🧬 Code graph analysis (2)
python/src/server/services/embeddings/contextual_embedding_service.py (1)
python/src/server/services/llm_provider_service.py (1)
  • prepare_llm_params (388-414)
python/src/server/services/storage/code_storage_service.py (1)
python/src/server/services/llm_provider_service.py (3)
  • get_llm_client (42-170)
  • prepare_llm_params (388-414)
  • get_embedding_model (221-273)
🔇 Additional comments (7)
python/src/server/services/embeddings/contextual_embedding_service.py (3)

13-13: LGTM: Centralized param builder import

Importing prepare_llm_params here keeps call sites consistent.


193-196: LGTM: Batch path uses the same compatibility layer

Consistent usage in batch reduces drift; nice.

Also applies to: 207-207


254-254: Acceptable fallback behavior

Returning (chunk, False) on final failure aligns with “log and continue” for batch-ish flows.

python/src/server/services/storage/code_storage_service.py (4)

521-522: LGTM: Use get_llm_client + prepare_llm_params inside async variant

This avoids circular imports and standardizes param handling.


559-562: LGTM with dependency on prepare_llm_params fix

Call site is correct; it will pick up the reasoning‑model rules once the helper is adjusted as proposed.

If you want to be extra safe on Google’s OpenAI‑compat layer, note it accepts both maxTokens and maxCompletionTokens; our shim leaving max_tokens for non‑reasoning models is fine. (ai.google.dev)

Also applies to: 573-573


858-861: LGTM: get_embedding_model import/use

Keeps embedding model tracking accurate.


915-927: Verify DB schema includes embedding_3072 before merge

Repo search returned only code references to embedding_3072 (e.g., python/src/server/services/storage/code_storage_service.py mapping) and no SQL migration creating that column — confirm the migration exists and is deployed to Supabase, or add a safe fallback/guard to use embedding_1536 to avoid 400/500 on writes.

Comment on lines +68 to +71
# Prepare compatible parameters for the API call
params = prepare_llm_params(provider or "openai", model,
temperature=0.3, max_tokens=200)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

🧩 Analysis chain

LGTM: Params now flow through compatibility shim

Passing params via prepare_llm_params before chat.completions.create is the right direction. This will work once prepare_llm_params gates max_tokens only for reasoning models as suggested.

To catch any lingering direct uses, run:

Also applies to: 81-81


🏁 Script executed:

#!/bin/bash
# Find chat.completions calls still passing max_tokens/temperature directly
rg -nP -C2 --type=py '\bchat\.completions\.create\('
rg -nP --type=py '\bmax_tokens\s*='
rg -nP --type=py '\btemperature\s*='

Length of output: 8943


Fix remaining direct max_tokens/temperature usages before approving

Direct calls still pass max_tokens/temperature to client.chat.completions.create — switch these to use prepare_llm_params (and gate max_tokens only for reasoning models), then re-run the verification script.

  • python/src/server/services/embeddings/contextual_embedding_service.py:70 (temperature=0.3, max_tokens=200)
  • python/src/server/services/embeddings/contextual_embedding_service.py:195 (temperature=0, max_tokens=100 * len(chunks))
  • python/src/server/services/storage/code_storage_service.py:561 (max_tokens=500, temperature=0.3)
  • python/src/server/services/provider_discovery_service.py:154 (max_tokens=50)
  • python/src/server/services/ollama/model_discovery_service.py:587 (max_tokens=1)
  • python/src/server/services/ollama/model_discovery_service.py:611 (max_tokens=10)
  • python/src/server/services/ollama/model_discovery_service.py:613 (temperature=0.1)
  • python/src/server/services/ollama/model_discovery_service.py:670 (max_tokens=1)
  • python/src/server/services/ollama/model_discovery_service.py:844 (max_tokens=50)
  • python/src/server/services/ollama/model_discovery_service.py:878 (max_tokens=100)
  • python/src/server/api_routes/ollama_api.py:883 (max_tokens=50)
  • python/src/server/api_routes/ollama_api.py:928 (max_tokens=100)
  • python/src/server/api_routes/ollama_api.py:930 (temperature=0.1)
🤖 Prompt for AI Agents
In python/src/server/services/embeddings/contextual_embedding_service.py around
lines 68-71, the call is directly passing temperature=0.3 and max_tokens=200;
replace these direct arguments with a call to prepare_llm_params(provider or
"openai", model, ...) and merge the returned params into the client call, and
ensure max_tokens is only included when the model is a reasoning-type model
(gate it behind the existing reasoning-model check) so that temperature and
max_tokens are not passed directly to client.chat.completions.create; apply the
same pattern to the other listed locations and re-run the verification script.

Comment on lines +388 to +415
def prepare_llm_params(provider: str, model: str, **kwargs) -> dict:
"""
Prepare LLM API parameters with automatic compatibility handling.

Handles:
- OpenAI max_tokens → max_completion_tokens deprecation
- Reasoning model temperature exclusion (o1, gpt-5 series)

Args:
provider: LLM provider name (openai, ollama, google)
model: Model name to check for special requirements
**kwargs: Original API parameters

Returns:
dict: Compatible parameters ready for API call
"""
params = kwargs.copy()

# Handle OpenAI parameter deprecation
if provider == "openai" and "max_tokens" in params:
params["max_completion_tokens"] = params.pop("max_tokens")

# Handle reasoning model restrictions
if model and _is_reasoning_model(model):
params.pop("temperature", None)

return params

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Blocker: Don’t remap max_tokens globally; gate by reasoning models and drop unsupported params

Unconditionally converting max_tokens → max_completion_tokens risks 400s on non‑reasoning chat models. For o‑series/GPT‑5 reasoning models, use max_completion_tokens and strip unsupported params (temperature/top_p/penalties/logprobs/logit_bias/n). For non‑reasoning chat models, keep max_tokens (or translate back if only max_completion_tokens was supplied). This aligns with OpenAI guidance and observed API behavior. (community.openai.com)

Apply this diff:

-def prepare_llm_params(provider: str, model: str, **kwargs) -> dict:
+def prepare_llm_params(provider: str, model: str, **kwargs) -> dict[str, Any]:
@@
-    params = kwargs.copy()
-
-    # Handle OpenAI parameter deprecation
-    if provider == "openai" and "max_tokens" in params:
-        params["max_completion_tokens"] = params.pop("max_tokens")
-
-    # Handle reasoning model restrictions
-    if model and _is_reasoning_model(model):
-        params.pop("temperature", None)
-
-    return params
+    params = dict(kwargs)
+    is_reasoning = bool(model and _is_reasoning_model(model))
+
+    # OpenAI compatibility handling
+    if provider == "openai":
+        if is_reasoning:
+            # o‑/GPT‑5 reasoning models: require max_completion_tokens; drop unsupported knobs
+            if "max_tokens" in params and "max_completion_tokens" not in params:
+                params["max_completion_tokens"] = params.pop("max_tokens")
+            for k in (
+                "temperature",
+                "top_p",
+                "presence_penalty",
+                "frequency_penalty",
+                "logprobs",
+                "top_logprobs",
+                "logit_bias",
+                "n",
+            ):
+                params.pop(k, None)
+        else:
+            # Legacy/non‑reasoning chat models: prefer max_tokens for broad compatibility
+            if "max_completion_tokens" in params and "max_tokens" not in params:
+                params["max_tokens"] = params.pop("max_completion_tokens")
+
+    return params
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def prepare_llm_params(provider: str, model: str, **kwargs) -> dict:
"""
Prepare LLM API parameters with automatic compatibility handling.
Handles:
- OpenAI max_tokensmax_completion_tokens deprecation
- Reasoning model temperature exclusion (o1, gpt-5 series)
Args:
provider: LLM provider name (openai, ollama, google)
model: Model name to check for special requirements
**kwargs: Original API parameters
Returns:
dict: Compatible parameters ready for API call
"""
params = kwargs.copy()
# Handle OpenAI parameter deprecation
if provider == "openai" and "max_tokens" in params:
params["max_completion_tokens"] = params.pop("max_tokens")
# Handle reasoning model restrictions
if model and _is_reasoning_model(model):
params.pop("temperature", None)
return params
def prepare_llm_params(provider: str, model: str, **kwargs) -> dict[str, Any]:
"""
Prepare LLM API parameters with automatic compatibility handling.
Handles:
- OpenAI max_tokensmax_completion_tokens deprecation
- Reasoning model temperature exclusion (o1, gpt-5 series)
Args:
provider: LLM provider name (openai, ollama, google)
model: Model name to check for special requirements
**kwargs: Original API parameters
Returns:
dict: Compatible parameters ready for API call
"""
params = dict(kwargs)
is_reasoning = bool(model and _is_reasoning_model(model))
# OpenAI compatibility handling
if provider == "openai":
if is_reasoning:
# o-/GPT-5 reasoning models: require max_completion_tokens; drop unsupported knobs
if "max_tokens" in params and "max_completion_tokens" not in params:
params["max_completion_tokens"] = params.pop("max_tokens")
for k in (
"temperature",
"top_p",
"presence_penalty",
"frequency_penalty",
"logprobs",
"top_logprobs",
"logit_bias",
"n",
):
params.pop(k, None)
else:
# Legacy/non-reasoning chat models: prefer max_tokens for broad compatibility
if "max_completion_tokens" in params and "max_tokens" not in params:
params["max_tokens"] = params.pop("max_completion_tokens")
return params

@Wirasm
Copy link
Copy Markdown
Collaborator

Wirasm commented Sep 18, 2025

this looks like a nice quick fix to allow gpt-5 support

will be a nice to have before we migrate to the responses api

thank you for this contribution @leoric-crown

@coleam00
Copy link
Copy Markdown
Owner

Yeah this is great, thanks @leoric-crown! @Wirasm have you had a chance to review or just providing your initial comments here?

@leoric-crown
Copy link
Copy Markdown
Author

Closing this PR as the underlying OpenAI API compatibility issues were resolved in a580fdf (PR #736).

While this PR proposed a centralized prepare_llm_params() function approach, the merged solution implemented reasoning model detection (requires_max_completion_tokens(), prepare_chat_completion_params()) as part of a broader multi-provider support feature that includes:

  • GPT-5, o1, o3, and Grok-3 reasoning model support
  • Automatic max_tokensmax_completion_tokens conversion for reasoning models
  • Temperature handling for reasoning models
  • OpenRouter, Anthropic, and Grok provider integration

Both approaches solved the same core problem (OpenAI API parameter deprecation), but the merged solution integrated it into a larger provider architecture refactor.

Look forward to contributing more!

coleam00 added a commit that referenced this pull request Apr 7, 2026
…based retries (#729)

* feat: safe session continuity across workflow steps with forkSession-based retries (#691)

Retries previously discarded all prior conversation context by passing
resumeSessionId=undefined, because naive session resume would append to
the original session file and corrupt it on a crash. This adds
forkSession:true support so the SDK copies the prior session history into
a new file before appending — leaving the source untouched — making it
safe to always pass the prior session ID on retry.

Changes:
- Add forkSession?: boolean to WorkflowAssistantOptions (deps.ts) and AssistantRequestOptions (types/index.ts)
- Pass forkSession to SDK Options in ClaudeClient with enhanced resuming_session log
- Sequential executor: set shouldForkSession=true when resuming; inject into stepOptions; fix retry to always pass resumeSessionId
- DAG executor: set shouldForkSession=true when resuming in executeNodeInternal; fix retry to always pass resumeSessionId; update isFresh comment
- Fix resolvedOptions to explicitly set persistSession:false for Claude workflows (was silently defaulting to SDK's true)
- Add context:'shared' union to DagNodeBase.context type; update clearContext docstring

Fixes #691

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: remove persistSession:false from workflow resolvedOptions to preserve cross-step session continuity

SDK docs confirm that persistSession:false prevents session files from being written to disk,
making resume impossible for subsequent steps. Removing this option restores the SDK default
(true), so step 1's session ID can be used by step 2 to resume. Also clarify context:'shared'
docstring to note it has no effect on parallel layer nodes, and add unit tests asserting that
retries pass forkSession:true when a prior session exists.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: add persistSession: false to Claude resolvedOptions

Implements the missing persistSession fix claimed in the PR description.
Without this, the SDK defaults to true and writes session transcripts to disk.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(loader): preserve context: 'shared' in DAG node YAML parsing

The PR added 'shared' to DagNodeBase.context type but the loader only
preserved 'fresh', silently dropping 'shared' during YAML parsing.
This meant context: shared in workflow YAML was ignored at load time.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Tyone88 pushed a commit to Tyone88/Archon that referenced this pull request Apr 16, 2026
…based retries (coleam00#729)

* feat: safe session continuity across workflow steps with forkSession-based retries (coleam00#691)

Retries previously discarded all prior conversation context by passing
resumeSessionId=undefined, because naive session resume would append to
the original session file and corrupt it on a crash. This adds
forkSession:true support so the SDK copies the prior session history into
a new file before appending — leaving the source untouched — making it
safe to always pass the prior session ID on retry.

Changes:
- Add forkSession?: boolean to WorkflowAssistantOptions (deps.ts) and AssistantRequestOptions (types/index.ts)
- Pass forkSession to SDK Options in ClaudeClient with enhanced resuming_session log
- Sequential executor: set shouldForkSession=true when resuming; inject into stepOptions; fix retry to always pass resumeSessionId
- DAG executor: set shouldForkSession=true when resuming in executeNodeInternal; fix retry to always pass resumeSessionId; update isFresh comment
- Fix resolvedOptions to explicitly set persistSession:false for Claude workflows (was silently defaulting to SDK's true)
- Add context:'shared' union to DagNodeBase.context type; update clearContext docstring

Fixes coleam00#691

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: remove persistSession:false from workflow resolvedOptions to preserve cross-step session continuity

SDK docs confirm that persistSession:false prevents session files from being written to disk,
making resume impossible for subsequent steps. Removing this option restores the SDK default
(true), so step 1's session ID can be used by step 2 to resume. Also clarify context:'shared'
docstring to note it has no effect on parallel layer nodes, and add unit tests asserting that
retries pass forkSession:true when a prior session exists.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: add persistSession: false to Claude resolvedOptions

Implements the missing persistSession fix claimed in the PR description.
Without this, the SDK defaults to true and writes session transcripts to disk.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(loader): preserve context: 'shared' in DAG node YAML parsing

The PR added 'shared' to DagNodeBase.context type but the loader only
preserved 'fresh', silently dropping 'shared' during YAML parsing.
This meant context: shared in workflow YAML was ignored at load time.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
joaobmonteiro pushed a commit to joaobmonteiro/Archon that referenced this pull request Apr 26, 2026
…based retries (coleam00#729)

* feat: safe session continuity across workflow steps with forkSession-based retries (coleam00#691)

Retries previously discarded all prior conversation context by passing
resumeSessionId=undefined, because naive session resume would append to
the original session file and corrupt it on a crash. This adds
forkSession:true support so the SDK copies the prior session history into
a new file before appending — leaving the source untouched — making it
safe to always pass the prior session ID on retry.

Changes:
- Add forkSession?: boolean to WorkflowAssistantOptions (deps.ts) and AssistantRequestOptions (types/index.ts)
- Pass forkSession to SDK Options in ClaudeClient with enhanced resuming_session log
- Sequential executor: set shouldForkSession=true when resuming; inject into stepOptions; fix retry to always pass resumeSessionId
- DAG executor: set shouldForkSession=true when resuming in executeNodeInternal; fix retry to always pass resumeSessionId; update isFresh comment
- Fix resolvedOptions to explicitly set persistSession:false for Claude workflows (was silently defaulting to SDK's true)
- Add context:'shared' union to DagNodeBase.context type; update clearContext docstring

Fixes coleam00#691

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: remove persistSession:false from workflow resolvedOptions to preserve cross-step session continuity

SDK docs confirm that persistSession:false prevents session files from being written to disk,
making resume impossible for subsequent steps. Removing this option restores the SDK default
(true), so step 1's session ID can be used by step 2 to resume. Also clarify context:'shared'
docstring to note it has no effect on parallel layer nodes, and add unit tests asserting that
retries pass forkSession:true when a prior session exists.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: add persistSession: false to Claude resolvedOptions

Implements the missing persistSession fix claimed in the PR description.
Without this, the SDK defaults to true and writes session transcripts to disk.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(loader): preserve context: 'shared' in DAG node YAML parsing

The PR added 'shared' to DagNodeBase.context type but the loader only
preserved 'fresh', silently dropping 'shared' during YAML parsing.
This meant context: shared in workflow YAML was ignored at load time.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants