fix: Add OpenAI API compatibility with prepare_llm_params function by leoric-crown · Pull Request #691 · coleam00/Archon

leoric-crown · 2025-09-17T18:05:16Z

Summary

Fixes OpenAI API compatibility issues when using newer models by adding a centralized prepare_llm_params() function that handles provider-specific API parameter transformations.

Problems Solved

OpenAI parameter deprecation: max_tokens → max_completion_tokens transition
Reasoning model restrictions: Temperature parameter exclusion for o1, gpt-5 series models
Provider compatibility: Maintains backward compatibility with Ollama, Google providers

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

Affected Services

Changes Made

Added prepare_llm_params() function in llm_provider_service.py for centralized API compatibility handling
Added _is_reasoning_model() helper function to detect models requiring parameter restrictions
Updated code_storage_service.py: Modified code summary generation to use compatibility function
Updated contextual_embedding_service.py: Modified both single and batch context generation calls
Applied linting fixes: Auto-fixed whitespace and formatting issues

Usage Example

# Before (fails with newer OpenAI API):
response = await client.chat.completions.create(
    model="gpt-4",
    max_tokens=500,      # ❌ Deprecated parameter
    temperature=0.3      # ❌ Not supported by reasoning models
)

# After (automatic compatibility):
params = prepare_llm_params("openai", "gpt-4", max_tokens=500, temperature=0.3)
response = await client.chat.completions.create(model="gpt-4", **params)

Testing

Manual Testing Steps

OpenAI API compatibility testing:

# Start services
docker compose up --build -d

# Test crawling with OpenAI provider (should work without max_tokens errors)
curl -X POST "http://localhost:8181/api/knowledge-items/crawl" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://docs.anthropic.com", "knowledge_type": "technical"}'

Reasoning model temperature exclusion:
- Configure system to use a gpt-5 or o1 model
- Verify that API calls complete without temperature parameter errors
- Check logs for proper parameter transformation
Backward compatibility verification:
- Test with Ollama provider (should continue using max_tokens)
- Test with Google provider (should continue using max_tokens)
- Verify no regression in existing functionality

Code Quality Checks

# Linting verification
cd python && uv run ruff check src/server/services/llm_provider_service.py
cd python && uv run ruff check src/server/services/storage/code_storage_service.py  
cd python && uv run ruff check src/server/services/embeddings/contextual_embedding_service.py

# Type checking
cd python && uv run mypy src/server/services/llm_provider_service.py

Test Results: All modified files pass linting checks. Functions are properly typed and follow existing code patterns.

Architecture Rationale

This implementation follows the existing functional programming patterns in the codebase:

✅ No classes, pure functions only
✅ Explicit transformations (developers see what's happening)
✅ Single responsibility with clear separation of concerns
✅ Easily testable and extensible

Checklist

My code follows the style guidelines of this project
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation (function docstrings)
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works (manual testing performed)
New and existing unit tests pass locally with my changes
Any dependent changes have been merged and published in downstream modules

Breaking Changes

None. This change is fully backward compatible:

Existing API calls without max_tokens/temperature continue to work unchanged
Other providers (Ollama, Google) maintain their existing parameter usage
The prepare_llm_params() function is additive and doesn't modify existing interfaces

Future Considerations

This solution addresses immediate needs while opening discussion for more robust patterns:

Alternative Patterns Worth Considering

1. Automatic Compatibility via Enhanced Context Manager

@asynccontextmanager
async def get_llm_client(provider: str = None, **kwargs):
    # Automatically wrap client methods for compatibility

Trade-off: Hidden magic vs. explicit transformations

2. Linting/Static Analysis Integration

Custom Ruff rules to detect raw chat.completions.create calls
Enforce compatibility function usage at build time

Discussion Points

Maintainability: As more providers evolve, should we move toward automatic compatibility?
Developer Experience: Is the explicit approach worth the extra lines?
Architecture Evolution: Balance functional purity with practical usability

Related Issues

Addresses similar concerns raised in #687 regarding LLM client management patterns.

🤖 Generated with Claude Code

- Add prepare_llm_params() function to handle provider-specific API compatibility - Handle OpenAI max_tokens → max_completion_tokens deprecation - Exclude temperature parameter for reasoning models (o1, gpt-5 series) - Update code_storage_service.py and contextual_embedding_service.py to use compatibility function - Maintain backward compatibility with other providers (Ollama, Google) This fixes API errors when using newer OpenAI models while establishing a maintainable pattern for future API changes. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

coderabbitai · 2025-09-17T18:05:24Z

Walkthrough

Centralizes LLM API parameter handling via a new prepare_llm_params helper and updates contextual embedding and code storage services to use it. Adds reasoning-model checks and max_tokens mapping for OpenAI. Extends code storage to support 3072-dim embeddings. Minor import reorders and whitespace edits.

Changes

Cohort / File(s)	Summary
LLM parameter preparation utility `python/src/server/services/llm_provider_service.py`	Added prepare_llm_params(provider, model, **kwargs). Maps OpenAI max_tokens→max_completion_tokens, strips temperature for reasoning models (o1/o1-preview/o1-mini/gpt-5), via private _is_reasoning_model.
Contextual embedding calls standardized `python/src/server/services/embeddings/contextual_embedding_service.py`	Replaced inline temperature/max_tokens with params built by prepare_llm_params for single and batch embedding generation; minor whitespace cleanup.
Code storage summarization and embeddings updates `python/src/server/services/storage/code_storage_service.py`	Switched LLM calls to use prepare_llm_params; updated imports. Added 3072-dim embedding handling with embedding_3072; fallback unchanged. Minor import reorder, whitespace edits.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant CES as ContextualEmbeddingService
  participant CSS as CodeStorageService
  participant LPS as LLMProviderService
  participant LLM as OpenAI API

  rect rgba(230,240,255,0.4)
    CES->>LPS: prepare_llm_params(provider, model, {temperature, max_tokens})
    CSS->>LPS: prepare_llm_params(provider, model, {temperature, max_tokens})
    note over LPS: Normalize params<br/>- If OpenAI: max_tokens→max_completion_tokens<br/>- If reasoning model: remove temperature
    LPS-->>CES: params
    LPS-->>CSS: params
  end

  CES->>LLM: chat.completions(**params)
  CSS->>LLM: chat.completions(**params)
  LLm-->>CES: response
  LLm-->>CSS: response

  note over CSS: Also maps embedding dims incl. 3072 path

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

A nibble of params, a hop through the code,
I bundled the tokens for a smoother road.
Reasoning winds? I lower the heat,
Map max to completion—so tidy, so neat.
With 3k dims, I leap and I sing—
Consistent calls for a springy spring! 🐇✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The title accurately and concisely summarizes the primary change — adding OpenAI API compatibility via a new prepare_llm_params function — and uses the conventional "fix:" prefix so it's clear to reviewers; it references the main artifact modified and is on-topic for the changeset. Consider shortening for brevity (optional).
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check	✅ Passed	The PR description closely follows the repository's template and includes the key sections: Summary, Changes Made, Type of Change, Affected Services, Testing (manual steps and lint/type-check commands), Checklist, and Breaking Changes; it clearly describes the new prepare_llm_params function, usage examples, and rationale. The testing section provides manual reproduction steps and lint/type-check commands, and the checklist claims tests and linting pass. One minor omission is that Test Evidence does not include verbatim command outputs or CI run links to substantiate the claimed results, but the description is otherwise sufficiently detailed for review.

✨ Finishing touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (3)

python/src/server/services/llm_provider_service.py (2)
417-429: Broaden reasoning-model detection (o3/o4/GPT‑5 families) and future-proof

Current patterns miss o3/o4 series mentioned in docs/guides; expand detection so temperature/etc. are correctly pruned. (learn.microsoft.com)

Apply this diff:
-def _is_reasoning_model(model: str) -> bool:
+def _is_reasoning_model(model: str) -> bool:
@@
-    reasoning_patterns = ["o1", "o1-preview", "o1-mini", "gpt-5"]
-    model_lower = model.lower()
-    return any(pattern in model_lower for pattern in reasoning_patterns)
+    m = model.lower()
+    # Includes o‑series (o1/o2/o3/o4/…) and GPT‑5 family
+    return m.startswith(("o1", "o2", "o3", "o4", "o5", "gpt-5"))
388-389: Nit: Return type should be precise for mypy

Use dict[str, Any] to satisfy typing guidelines (Any is already imported).
python/src/server/services/storage/code_storage_service.py (1)
510-512: Remove redundant local import

asyncio is already imported at module top (Line 7). Drop the inner import to avoid duplication.

Apply this diff:
-    import asyncio

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b383c8c and 926ffd2.

📒 Files selected for processing (3)

python/src/server/services/embeddings/contextual_embedding_service.py (8 hunks)
python/src/server/services/llm_provider_service.py (1 hunks)
python/src/server/services/storage/code_storage_service.py (7 hunks)

🧰 Additional context used

📓 Path-based instructions (5)

python/src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

python/src/**/*.py: Fail fast on critical conditions: service startup failures, missing configuration/env vars, database connection/auth failures, critical dependencies unavailable
Never accept or store corrupted data (e.g., zero embeddings, null foreign keys, malformed JSON); skip failed items entirely and continue processing
For batch/background operations, continue processing but log detailed per-item failures; for external APIs use retries with exponential backoff and then fail clearly
Error messages must include context, use specific exception types, preserve full stack traces (logging with exc_info=True), include relevant IDs/URLs, and never return None to indicate failure—raise instead; for batch ops report success counts and detailed failures
Backend uses Python 3.12 with a 120-character line length
Avoid introducing WebSocket support in the backend; updates are handled via HTTP polling
Adhere to Ruff lint rules (e.g., no unused imports) and provide type hints to satisfy MyPy

python/src/**/*.py: Fail fast on service startup failures (credentials, DB, service init); crash with clear errors
Treat missing configuration (env vars/invalid settings) as fatal; stop the system
Do not hide database connection failures; bubble up and surface clearly
Authentication/authorization failures must halt the operation and be visible
Never silently accept bad data; let Pydantic validation errors raise
If critical dependencies are unavailable, fail immediately
Reject invalid data that could corrupt state (e.g., zero embeddings, null FKs, malformed JSON)
Batch processing should complete remaining items but log detailed per-item failures
Background tasks (e.g., embedding generation) should finish queues while logging failures
Treat optional features as skippable: log and skip when disabled rather than crashing
External API calls: use retry with exponential backoff; on final failure, raise with clear service/context info
Never accept corrupted data during partial-failure work...

Files:

python/src/server/services/llm_provider_service.py
python/src/server/services/embeddings/contextual_embedding_service.py
python/src/server/services/storage/code_storage_service.py

python/src/server/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use specific exception classes and FastAPI exception handlers to produce rich JSON error responses

python/src/server/**/*.py: Provide rich error responses in FastAPI (custom exceptions and handlers that return structured JSON with detail/type)
Implement HTTP polling support and ETag caching on endpoints to enable 304 responses

Files:

python/src/server/services/llm_provider_service.py
python/src/server/services/embeddings/contextual_embedding_service.py
python/src/server/services/storage/code_storage_service.py

python/src/server/{api_routes,services}/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Follow Service Layer pattern: API routes delegate to services, which handle business logic and call the database layer

Files:

python/src/server/services/llm_provider_service.py
python/src/server/services/embeddings/contextual_embedding_service.py
python/src/server/services/storage/code_storage_service.py

**/*.{py,ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

Never return None/null to indicate failure; raise an exception with details instead

Files:

python/src/server/services/llm_provider_service.py
python/src/server/services/embeddings/contextual_embedding_service.py
python/src/server/services/storage/code_storage_service.py

python/src/server/services/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Place business logic in service layer under python/src/server/services/

Files:

python/src/server/services/llm_provider_service.py
python/src/server/services/embeddings/contextual_embedding_service.py
python/src/server/services/storage/code_storage_service.py

🧬 Code graph analysis (2)

python/src/server/services/embeddings/contextual_embedding_service.py (1)

python/src/server/services/llm_provider_service.py (1)

prepare_llm_params (388-414)

python/src/server/services/storage/code_storage_service.py (1)

python/src/server/services/llm_provider_service.py (3)

get_llm_client (42-170)

prepare_llm_params (388-414)

get_embedding_model (221-273)

🔇 Additional comments (7)

python/src/server/services/embeddings/contextual_embedding_service.py (3)

13-13: LGTM: Centralized param builder import

Importing prepare_llm_params here keeps call sites consistent.

193-196: LGTM: Batch path uses the same compatibility layer

Consistent usage in batch reduces drift; nice.

Also applies to: 207-207

254-254: Acceptable fallback behavior

Returning (chunk, False) on final failure aligns with “log and continue” for batch-ish flows.

python/src/server/services/storage/code_storage_service.py (4)

521-522: LGTM: Use get_llm_client + prepare_llm_params inside async variant

This avoids circular imports and standardizes param handling.

559-562: LGTM with dependency on prepare_llm_params fix

Call site is correct; it will pick up the reasoning‑model rules once the helper is adjusted as proposed.

If you want to be extra safe on Google’s OpenAI‑compat layer, note it accepts both maxTokens and maxCompletionTokens; our shim leaving max_tokens for non‑reasoning models is fine. (ai.google.dev)

Also applies to: 573-573

858-861: LGTM: get_embedding_model import/use

Keeps embedding model tracking accurate.

915-927: Verify DB schema includes embedding_3072 before merge

Repo search returned only code references to embedding_3072 (e.g., python/src/server/services/storage/code_storage_service.py mapping) and no SQL migration creating that column — confirm the migration exists and is deployed to Supabase, or add a safe fallback/guard to use embedding_1536 to avoid 400/500 on writes.

coderabbitai · 2025-09-17T18:14:59Z

+                # Prepare compatible parameters for the API call
+                params = prepare_llm_params(provider or "openai", model,
+                                          temperature=0.3, max_tokens=200)
+


⚠️ Potential issue

🧩 Analysis chain

LGTM: Params now flow through compatibility shim

Passing params via prepare_llm_params before chat.completions.create is the right direction. This will work once prepare_llm_params gates max_tokens only for reasoning models as suggested.

To catch any lingering direct uses, run:

Also applies to: 81-81

🏁 Script executed:

#!/bin/bash # Find chat.completions calls still passing max_tokens/temperature directly rg -nP -C2 --type=py '\bchat\.completions\.create\(' rg -nP --type=py '\bmax_tokens\s*=' rg -nP --type=py '\btemperature\s*='

Length of output: 8943

Fix remaining direct max_tokens/temperature usages before approving

Direct calls still pass max_tokens/temperature to client.chat.completions.create — switch these to use prepare_llm_params (and gate max_tokens only for reasoning models), then re-run the verification script.

python/src/server/services/embeddings/contextual_embedding_service.py:70 (temperature=0.3, max_tokens=200)

python/src/server/services/embeddings/contextual_embedding_service.py:195 (temperature=0, max_tokens=100 * len(chunks))

python/src/server/services/storage/code_storage_service.py:561 (max_tokens=500, temperature=0.3)

python/src/server/services/provider_discovery_service.py:154 (max_tokens=50)

python/src/server/services/ollama/model_discovery_service.py:587 (max_tokens=1)

python/src/server/services/ollama/model_discovery_service.py:611 (max_tokens=10)

python/src/server/services/ollama/model_discovery_service.py:613 (temperature=0.1)

python/src/server/services/ollama/model_discovery_service.py:670 (max_tokens=1)

python/src/server/services/ollama/model_discovery_service.py:844 (max_tokens=50)

python/src/server/services/ollama/model_discovery_service.py:878 (max_tokens=100)

python/src/server/api_routes/ollama_api.py:883 (max_tokens=50)

python/src/server/api_routes/ollama_api.py:928 (max_tokens=100)

python/src/server/api_routes/ollama_api.py:930 (temperature=0.1)

🤖 Prompt for AI Agents

In python/src/server/services/embeddings/contextual_embedding_service.py around lines 68-71, the call is directly passing temperature=0.3 and max_tokens=200; replace these direct arguments with a call to prepare_llm_params(provider or "openai", model, ...) and merge the returned params into the client call, and ensure max_tokens is only included when the model is a reasoning-type model (gate it behind the existing reasoning-model check) so that temperature and max_tokens are not passed directly to client.chat.completions.create; apply the same pattern to the other listed locations and re-run the verification script.

coderabbitai · 2025-09-17T18:15:00Z

+def prepare_llm_params(provider: str, model: str, **kwargs) -> dict:
+    """
+    Prepare LLM API parameters with automatic compatibility handling.
+
+    Handles:
+    - OpenAI max_tokens → max_completion_tokens deprecation
+    - Reasoning model temperature exclusion (o1, gpt-5 series)
+
+    Args:
+        provider: LLM provider name (openai, ollama, google)
+        model: Model name to check for special requirements
+        **kwargs: Original API parameters
+
+    Returns:
+        dict: Compatible parameters ready for API call
+    """
+    params = kwargs.copy()
+
+    # Handle OpenAI parameter deprecation
+    if provider == "openai" and "max_tokens" in params:
+        params["max_completion_tokens"] = params.pop("max_tokens")
+
+    # Handle reasoning model restrictions
+    if model and _is_reasoning_model(model):
+        params.pop("temperature", None)
+
+    return params
+


⚠️ Potential issue

Blocker: Don’t remap max_tokens globally; gate by reasoning models and drop unsupported params

Unconditionally converting max_tokens → max_completion_tokens risks 400s on non‑reasoning chat models. For o‑series/GPT‑5 reasoning models, use max_completion_tokens and strip unsupported params (temperature/top_p/penalties/logprobs/logit_bias/n). For non‑reasoning chat models, keep max_tokens (or translate back if only max_completion_tokens was supplied). This aligns with OpenAI guidance and observed API behavior. (community.openai.com)

Apply this diff:

-def prepare_llm_params(provider: str, model: str, **kwargs) -> dict: +def prepare_llm_params(provider: str, model: str, **kwargs) -> dict[str, Any]: @@ - params = kwargs.copy() - - # Handle OpenAI parameter deprecation - if provider == "openai" and "max_tokens" in params: - params["max_completion_tokens"] = params.pop("max_tokens") - - # Handle reasoning model restrictions - if model and _is_reasoning_model(model): - params.pop("temperature", None) - - return params + params = dict(kwargs) + is_reasoning = bool(model and _is_reasoning_model(model)) + + # OpenAI compatibility handling + if provider == "openai": + if is_reasoning: + # o‑/GPT‑5 reasoning models: require max_completion_tokens; drop unsupported knobs + if "max_tokens" in params and "max_completion_tokens" not in params: + params["max_completion_tokens"] = params.pop("max_tokens") + for k in ( + "temperature", + "top_p", + "presence_penalty", + "frequency_penalty", + "logprobs", + "top_logprobs", + "logit_bias", + "n", + ): + params.pop(k, None) + else: + # Legacy/non‑reasoning chat models: prefer max_tokens for broad compatibility + if "max_completion_tokens" in params and "max_tokens" not in params: + params["max_tokens"] = params.pop("max_completion_tokens") + + return params

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def prepare_llm_params(provider: str, model: str, **kwargs) -> dict:

"""

Prepare LLM API parameters with automatic compatibility handling.

Handles:

- OpenAI max_tokens → max_completion_tokens deprecation

- Reasoning model temperature exclusion (o1, gpt-5 series)

Args:

provider: LLM provider name (openai, ollama, google)

model: Model name to check for special requirements

**kwargs: Original API parameters

Returns:

dict: Compatible parameters ready for API call

"""

params = kwargs.copy()

# Handle OpenAI parameter deprecation

if provider == "openai" and "max_tokens" in params:

params["max_completion_tokens"] = params.pop("max_tokens")

# Handle reasoning model restrictions

if model and _is_reasoning_model(model):

params.pop("temperature", None)

return params

def prepare_llm_params(provider: str, model: str, **kwargs) -> dict[str, Any]:

"""

Prepare LLM API parameters with automatic compatibility handling.

Handles:

- OpenAI max_tokens → max_completion_tokens deprecation

- Reasoning model temperature exclusion (o1, gpt-5 series)

Args:

provider: LLM provider name (openai, ollama, google)

model: Model name to check for special requirements

**kwargs: Original API parameters

Returns:

dict: Compatible parameters ready for API call

"""

params = dict(kwargs)

is_reasoning = bool(model and _is_reasoning_model(model))

# OpenAI compatibility handling

if provider == "openai":

if is_reasoning:

# o-/GPT-5 reasoning models: require max_completion_tokens; drop unsupported knobs

if "max_tokens" in params and "max_completion_tokens" not in params:

params["max_completion_tokens"] = params.pop("max_tokens")

for k in (

"temperature",

"top_p",

"presence_penalty",

"frequency_penalty",

"logprobs",

"top_logprobs",

"logit_bias",

"n",

):

params.pop(k, None)

else:

# Legacy/non-reasoning chat models: prefer max_tokens for broad compatibility

if "max_completion_tokens" in params and "max_tokens" not in params:

params["max_tokens"] = params.pop("max_completion_tokens")

return params

Wirasm · 2025-09-18T10:20:26Z

this looks like a nice quick fix to allow gpt-5 support

will be a nice to have before we migrate to the responses api

thank you for this contribution @leoric-crown

coleam00 · 2025-09-20T18:50:27Z

Yeah this is great, thanks @leoric-crown! @Wirasm have you had a chance to review or just providing your initial comments here?

leoric-crown · 2025-10-07T07:02:41Z

Closing this PR as the underlying OpenAI API compatibility issues were resolved in a580fdf (PR #736).

While this PR proposed a centralized prepare_llm_params() function approach, the merged solution implemented reasoning model detection (requires_max_completion_tokens(), prepare_chat_completion_params()) as part of a broader multi-provider support feature that includes:

GPT-5, o1, o3, and Grok-3 reasoning model support
Automatic max_tokens → max_completion_tokens conversion for reasoning models
Temperature handling for reasoning models
OpenRouter, Anthropic, and Grok provider integration

Both approaches solved the same core problem (OpenAI API parameter deprecation), but the merged solution integrated it into a larger provider architecture refactor.

Look forward to contributing more!

…based retries (#729) * feat: safe session continuity across workflow steps with forkSession-based retries (#691) Retries previously discarded all prior conversation context by passing resumeSessionId=undefined, because naive session resume would append to the original session file and corrupt it on a crash. This adds forkSession:true support so the SDK copies the prior session history into a new file before appending — leaving the source untouched — making it safe to always pass the prior session ID on retry. Changes: - Add forkSession?: boolean to WorkflowAssistantOptions (deps.ts) and AssistantRequestOptions (types/index.ts) - Pass forkSession to SDK Options in ClaudeClient with enhanced resuming_session log - Sequential executor: set shouldForkSession=true when resuming; inject into stepOptions; fix retry to always pass resumeSessionId - DAG executor: set shouldForkSession=true when resuming in executeNodeInternal; fix retry to always pass resumeSessionId; update isFresh comment - Fix resolvedOptions to explicitly set persistSession:false for Claude workflows (was silently defaulting to SDK's true) - Add context:'shared' union to DagNodeBase.context type; update clearContext docstring Fixes #691 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: remove persistSession:false from workflow resolvedOptions to preserve cross-step session continuity SDK docs confirm that persistSession:false prevents session files from being written to disk, making resume impossible for subsequent steps. Removing this option restores the SDK default (true), so step 1's session ID can be used by step 2 to resume. Also clarify context:'shared' docstring to note it has no effect on parallel layer nodes, and add unit tests asserting that retries pass forkSession:true when a prior session exists. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: add persistSession: false to Claude resolvedOptions Implements the missing persistSession fix claimed in the PR description. Without this, the SDK defaults to true and writes session transcripts to disk. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(loader): preserve context: 'shared' in DAG node YAML parsing The PR added 'shared' to DagNodeBase.context type but the loader only preserved 'fresh', silently dropping 'shared' during YAML parsing. This meant context: shared in workflow YAML was ignored at load time. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

…based retries (coleam00#729) * feat: safe session continuity across workflow steps with forkSession-based retries (coleam00#691) Retries previously discarded all prior conversation context by passing resumeSessionId=undefined, because naive session resume would append to the original session file and corrupt it on a crash. This adds forkSession:true support so the SDK copies the prior session history into a new file before appending — leaving the source untouched — making it safe to always pass the prior session ID on retry. Changes: - Add forkSession?: boolean to WorkflowAssistantOptions (deps.ts) and AssistantRequestOptions (types/index.ts) - Pass forkSession to SDK Options in ClaudeClient with enhanced resuming_session log - Sequential executor: set shouldForkSession=true when resuming; inject into stepOptions; fix retry to always pass resumeSessionId - DAG executor: set shouldForkSession=true when resuming in executeNodeInternal; fix retry to always pass resumeSessionId; update isFresh comment - Fix resolvedOptions to explicitly set persistSession:false for Claude workflows (was silently defaulting to SDK's true) - Add context:'shared' union to DagNodeBase.context type; update clearContext docstring Fixes coleam00#691 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: remove persistSession:false from workflow resolvedOptions to preserve cross-step session continuity SDK docs confirm that persistSession:false prevents session files from being written to disk, making resume impossible for subsequent steps. Removing this option restores the SDK default (true), so step 1's session ID can be used by step 2 to resume. Also clarify context:'shared' docstring to note it has no effect on parallel layer nodes, and add unit tests asserting that retries pass forkSession:true when a prior session exists. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: add persistSession: false to Claude resolvedOptions Implements the missing persistSession fix claimed in the PR description. Without this, the SDK defaults to true and writes session transcripts to disk. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(loader): preserve context: 'shared' in DAG node YAML parsing The PR added 'shared' to DagNodeBase.context type but the loader only preserved 'fresh', silently dropping 'shared' during YAML parsing. This meant context: shared in workflow YAML was ignored at load time. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

coderabbitai Bot reviewed Sep 17, 2025

View reviewed changes

leoric-crown closed this Oct 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Add OpenAI API compatibility with prepare_llm_params function#691

fix: Add OpenAI API compatibility with prepare_llm_params function#691
leoric-crown wants to merge 1 commit intocoleam00:mainfrom
leoric-crown:fix/openai-api-compatibility

leoric-crown commented Sep 17, 2025 •

edited

Loading

Uh oh!

coderabbitai Bot commented Sep 17, 2025 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Sep 17, 2025

Uh oh!

coderabbitai Bot Sep 17, 2025

Uh oh!

Wirasm commented Sep 18, 2025

Uh oh!

coleam00 commented Sep 20, 2025

Uh oh!

leoric-crown commented Oct 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

leoric-crown commented Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problems Solved

Type of Change

Affected Services

Changes Made

Usage Example

Testing

Manual Testing Steps

Code Quality Checks

Architecture Rationale

Checklist

Breaking Changes

Future Considerations

Alternative Patterns Worth Considering

Discussion Points

Related Issues

Uh oh!

coderabbitai Bot commented Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Sep 17, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Sep 17, 2025

Choose a reason for hiding this comment

Uh oh!

Wirasm commented Sep 18, 2025

Uh oh!

coleam00 commented Sep 20, 2025

Uh oh!

leoric-crown commented Oct 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

leoric-crown commented Sep 17, 2025 •

edited

Loading

coderabbitai Bot commented Sep 17, 2025 •

edited

Loading