Skip to content

fix: auto-fill reasoning_content for moonshot kimi reasoning models#23580

Merged
krrishdholakia merged 1 commit intoBerriAI:mainfrom
pradyyadav:fix/moonshot-reasoning-content-multi-turn
Mar 14, 2026
Merged

fix: auto-fill reasoning_content for moonshot kimi reasoning models#23580
krrishdholakia merged 1 commit intoBerriAI:mainfrom
pradyyadav:fix/moonshot-reasoning-content-multi-turn

Conversation

@pradyyadav
Copy link
Contributor

@pradyyadav pradyyadav commented Mar 13, 2026

Relevant issues

Fixes #21672

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

  • I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem
  • I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

CI (LiteLLM team)

CI status guideline:

  • 50-55 passing tests: main is stable with minor issues.
  • 45-49 passing tests: acceptable but needs attention
  • <= 40 passing tests: unstable; be careful with your merges and assess the risk.
  • Branch creation CI run
    Link:

  • CI run for the last commit
    Link:

  • Merge / cherry-pick CI run
    Links:

Type

🐛 Bug Fix

Changes

Moonshot's kimi-k2.5 and related reasoning models require reasoning_content on every assistant message that has tool_calls in multi-turn conversations. Without it the Moonshot API returns 400 Bad Request.
The root cause was that LiteLLM had no supports_reasoning: true flag for Moonshot reasoning models, so no special handling was applied before forwarding the request.

  • Added "supports_reasoning": true to kimi-k2.5, kimi-k2-thinking, and kimi-k2-thinking-turbo in model_prices_and_context_window.json and the bundled backup file
  • Added fill_reasoning_content() to MoonshotChatConfig that runs before every API call: promotes reasoning_content from provider_specific_fields if available, otherwise injects a space placeholder and logs a warning
  • Added 4 unit tests in tests/test_litellm/llms/moonshot/ covering space injection, no overwrite, promotion from provider_specific_fields, and non-reasoning models left untouched

@vercel
Copy link

vercel bot commented Mar 13, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment Mar 13, 2026 8:42pm

Request Review

@pradyyadav
Copy link
Contributor Author

@greptileai

@pradyyadav pradyyadav changed the title fix(moonshot): auto-fill reasoning_content for kimi reasoning models … fix: auto-fill reasoning_content for moonshot kimi reasoning models Mar 13, 2026
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 13, 2026

Greptile Summary

This PR fixes a 400 Bad Request error from the Moonshot API by ensuring reasoning_content is present on every assistant message that contains tool_calls in multi-turn conversations with Moonshot reasoning models (kimi-k2.5, kimi-k2-thinking, kimi-k2-thinking-turbo).

The implementation correctly follows the LiteLLM pattern: capability flags are added to model_prices_and_context_window.json (and the backup) so supports_reasoning() can gate the behaviour without any hardcoded model-name lists. A new fill_reasoning_content method is added to MoonshotChatConfig and called from transform_request — it promotes a stored value from provider_specific_fields when available, otherwise injects a single-space placeholder and warns the caller. Five unit tests are included (all mock-based) covering injection, no-overwrite, promotion, empty-list, and end-to-end wiring.

Key findings:

  • Logic edge case: The guard "reasoning_content" not in msg (line 161) skips injection when the key is absent, but will silently forward reasoning_content: None or reasoning_content: "" to the API if either value is ever present in history — a falsy check (not msg.get("reasoning_content")) would be more robust.
  • The provider_specific_fields cleanup after promotion is handled correctly; the shallow-copy pattern prevents mutation of the caller's original message dicts.

Confidence Score: 4/5

  • Safe to merge with a minor fix — one edge case in the reasoning_content presence check could cause the same 400 error it aims to prevent when the field is explicitly None.
  • The approach is architecturally sound and follows LiteLLM patterns. The logic correctly handles the primary use cases (absent key, stored in provider_specific_fields, already present). The only remaining gap is the key-presence check not catching None/empty-string values, which is a narrow but real edge case. Test coverage is solid and all tests are mock-based.
  • litellm/llms/moonshot/chat/transformation.py — specifically line 161's "reasoning_content" not in msg condition.

Important Files Changed

Filename Overview
litellm/llms/moonshot/chat/transformation.py Adds fill_reasoning_content helper and wires it into transform_request for supports_reasoning Moonshot models; one logic issue: key-presence check doesn't guard against reasoning_content: None, which would still propagate a null value to the API.
litellm/model_prices_and_context_window_backup.json Adds "supports_reasoning": true to moonshot/kimi-k2.5, moonshot/kimi-k2-thinking, and moonshot/kimi-k2-thinking-turbo; correctly follows the pattern of storing model capabilities in the JSON so supports_reasoning() picks them up without hardcoding.
model_prices_and_context_window.json Mirrors the same three "supports_reasoning": true additions to the canonical JSON file; changes look consistent with the backup.
tests/test_litellm/llms/moonshot/test_moonshot_chat_transformation.py Adds five unit tests covering the happy path (space injection, no-overwrite, provider_specific_fields promotion, empty tool_calls list), as well as end-to-end wiring through transform_request; all tests use mocks so they respect the no-real-network-calls rule.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant MoonshotChatConfig
    participant supports_reasoning
    participant fill_reasoning_content
    participant OpenAIGPTConfig
    participant MoonshotAPI

    Caller->>MoonshotChatConfig: transform_request(model, messages, ...)
    MoonshotChatConfig->>supports_reasoning: supports_reasoning(model, "moonshot")
    supports_reasoning-->>MoonshotChatConfig: true/false

    alt reasoning model (kimi-k2.5 / kimi-k2-thinking / kimi-k2-thinking-turbo)
        MoonshotChatConfig->>fill_reasoning_content: fill_reasoning_content(messages)
        loop each assistant message with tool_calls
            alt reasoning_content absent (key not in msg)
                alt provider_specific_fields["reasoning_content"] present
                    fill_reasoning_content->>fill_reasoning_content: promote to top-level, clean provider_specific_fields
                else no stored value
                    fill_reasoning_content->>fill_reasoning_content: inject " " placeholder, log warning
                end
            else reasoning_content already present
                fill_reasoning_content->>fill_reasoning_content: pass through unchanged
            end
        end
        fill_reasoning_content-->>MoonshotChatConfig: patched messages
    end

    MoonshotChatConfig->>OpenAIGPTConfig: super().transform_request(...)
    OpenAIGPTConfig-->>MoonshotChatConfig: request body dict
    MoonshotChatConfig-->>Caller: request body dict
    Caller->>MoonshotAPI: POST /v1/chat/completions
Loading

Last reviewed commit: 268616b

Comment on lines +476 to +505
"""For non-reasoning models, transform_request leaves messages unchanged."""
config = MoonshotChatConfig()

messages = [
{"role": "user", "content": "Hello"},
{
"role": "assistant",
"content": None,
"tool_calls": [
{"id": "call_1", "type": "function", "function": {"name": "fn", "arguments": "{}"}}
],
},
]

with patch(
"litellm.llms.moonshot.chat.transformation.supports_reasoning",
return_value=False,
):
result = config.transform_request(
model="moonshot-v1-8k",
messages=messages,
optional_params={},
litellm_params={},
headers={},
)

# reasoning_content must not have been injected
for msg in result["messages"]:
assert "reasoning_content" not in msg
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No integration test for reasoning model path through transform_request

The four new tests call fill_reasoning_content directly or patch supports_reasoning to False (the non-reasoning path). There is no test that exercises the full transform_request pipeline for an actual reasoning model without mocking supports_reasoning.

This means the integration wiring — specifically, that transform_request actually invokes fill_reasoning_content when supports_reasoning returns True for a Moonshot reasoning model — is untested. A regression here (e.g., checking the wrong provider string) would not be caught by the current suite.

Consider adding a test similar to test_non_reasoning_model_messages_untouched but with the mock returning True (or using an actual reasoning model name like kimi-k2-thinking):

def test_reasoning_model_fill_called_from_transform_request(self):
    """transform_request injects reasoning_content for reasoning models."""
    config = MoonshotChatConfig()
    messages = [
        {"role": "user", "content": "Call a tool"},
        {
            "role": "assistant",
            "content": None,
            "tool_calls": [
                {"id": "c1", "type": "function", "function": {"name": "fn", "arguments": "{}"}}
            ],
        },
    ]
    with patch(
        "litellm.llms.moonshot.chat.transformation.supports_reasoning",
        return_value=True,
    ):
        result = config.transform_request(
            model="kimi-k2-thinking",
            messages=messages,
            optional_params={},
            litellm_params={},
            headers={},
        )
    assert result["messages"][1].get("reasoning_content") == " "

Comment on lines +201 to +203
# Moonshot reasoning models: fill in reasoning_content before the API call
if supports_reasoning(model=model, custom_llm_provider="moonshot"):
messages = self.fill_reasoning_content(messages)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kimi-thinking-preview is gated by supports_reasoning but excludes tools

fill_reasoning_content is invoked whenever supports_reasoning(model, "moonshot") is True. kimi-thinking-preview is now being given "supports_reasoning": true in the JSON, but that same model explicitly has tools and tool_choice removed from get_supported_openai_params (line 100). Because the function only patches assistant messages that have a non-empty tool_calls list, it will always be a no-op for kimi-thinking-preview in normal usage.

However, if a caller passes in a conversation history that contains tool-call messages originally generated by a different model (e.g. kimi-k2-thinking), fill_reasoning_content will silently inject reasoning_content into those messages before forwarding to the API. The API will still likely reject the request for having unsupported tool calls — but the injected reasoning_content may obscure the root cause.

Consider documenting or guarding against this edge-case, for example:

# Only patch tool-call messages if the model actually supports tool calls
if (
    supports_reasoning(model=model, custom_llm_provider="moonshot")
    and "tools" in self.get_supported_openai_params(model)
):
    messages = self.fill_reasoning_content(messages)

Comment on lines +163 to +167
patched = dict(cast(dict, msg))
provider_fields = patched.get("provider_specific_fields") or {}
stored = provider_fields.get("reasoning_content")
if stored:
patched["reasoning_content"] = stored
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Promoted reasoning_content left duplicated in provider_specific_fields

When reasoning_content is found in provider_specific_fields and promoted to the top level (line 167), the original entry inside provider_specific_fields is not removed. This means the patched message dict ends up containing the value in two places simultaneously.

If LiteLLM does not strip provider_specific_fields before serializing the request body, the Moonshot API will receive an unexpected extra field. More practically, if any downstream code reads provider_specific_fields to check whether reasoning_content has already been handled, it will still see the value there and may act on it again.

Consider cleaning up the promoted key after copying it to the top level:

if stored:
    patched["reasoning_content"] = stored
    # Remove from provider_specific_fields to avoid duplication
    pf = dict(provider_fields)
    pf.pop("reasoning_content", None)
    patched["provider_specific_fields"] = pf

Comment on lines +158 to +162
if (
msg.get("role") == "assistant"
and isinstance(msg.get("tool_calls"), list)
and "reasoning_content" not in msg
):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty tool_calls list triggers unintended injection

isinstance(msg.get("tool_calls"), list) returns True for an empty list []. This means an assistant message carrying "tool_calls": [] (e.g., an incorrectly serialised history entry) would have a space placeholder injected — even though there are no actual tool calls.

Use a truthiness check instead, which is falsy for both None and []:

Suggested change
if (
msg.get("role") == "assistant"
and isinstance(msg.get("tool_calls"), list)
and "reasoning_content" not in msg
):
if (
msg.get("role") == "assistant"
and msg.get("tool_calls")
and "reasoning_content" not in msg
):

"For best results, preserve `reasoning_content` from the original "
"assistant response when building multi-turn conversation history."
)
patched["reasoning_content"] = " "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Magic string for reasoning content placeholder

The value " " (single space) injected as the minimum placeholder is a magic string embedded directly in the code. If the Moonshot API ever tightens validation to require a non-whitespace value, this would need to be tracked down across usages. Define a module-level constant to make the intent explicit and simplify future changes:

Suggested change
patched["reasoning_content"] = " "
patched["reasoning_content"] = _REASONING_PLACEHOLDER

With the constant defined at the top of the file:

# Minimum value accepted by the Moonshot API when reasoning_content is unavailable
_REASONING_PLACEHOLDER = " "

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Comment on lines +170 to +172
cleaned_provider_fields = dict(provider_fields)
cleaned_provider_fields.pop("reasoning_content", None)
patched["provider_specific_fields"] = cleaned_provider_fields
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty provider_specific_fields dict left in message after promotion

When reasoning_content was the only key in provider_specific_fields, after pop the resulting cleaned_provider_fields will be {}. The code then writes patched["provider_specific_fields"] = {}, leaving an empty dict on the message. Any downstream consumer (e.g., response logging, another middleware) that does if msg.get("provider_specific_fields"): would treat this as a no-op, but it's unexpected to find an explicitly empty dict where the original may have been absent. Consider removing the key entirely when the result is empty:

Suggested change
cleaned_provider_fields = dict(provider_fields)
cleaned_provider_fields.pop("reasoning_content", None)
patched["provider_specific_fields"] = cleaned_provider_fields
cleaned_provider_fields = dict(provider_fields)
cleaned_provider_fields.pop("reasoning_content", None)
if cleaned_provider_fields:
patched["provider_specific_fields"] = cleaned_provider_fields
else:
patched.pop("provider_specific_fields", None)

@@ -149,6 +141,48 @@ def map_openai_params(
optional_params["temperature"] = 0.3
return optional_params

def fill_reasoning_content(self, messages: List[AllMessageValues]) -> List[AllMessageValues]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Public method name for an internal helper

fill_reasoning_content has no leading underscore, making it appear as part of the public API surface of MoonshotChatConfig. It is only called from transform_request within the same class and is exclusively tested via direct invocation in unit tests. Consider renaming it to _fill_reasoning_content to signal that it is an internal implementation detail, which also makes future refactoring safer.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Comment on lines +158 to +162
if (
msg.get("role") == "assistant"
and msg.get("tool_calls")
and "reasoning_content" not in msg
):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reasoning_content: None bypasses placeholder injection

The condition "reasoning_content" not in msg only checks for key presence, not value. If an assistant message has "reasoning_content": None (e.g., from a deserialised response where the field was null, or manually constructed history), the if branch is skipped and None is forwarded to the Moonshot API, which is likely to result in the same 400 Bad Request the PR is trying to fix.

Replacing the key-presence check with a falsy check handles the None and "" cases alongside the absent-key case:

Suggested change
if (
msg.get("role") == "assistant"
and msg.get("tool_calls")
and "reasoning_content" not in msg
):
if (
msg.get("role") == "assistant"
and msg.get("tool_calls")
and not msg.get("reasoning_content")
):

@krrishdholakia krrishdholakia merged commit a94b961 into BerriAI:main Mar 14, 2026
27 of 37 checks passed
RheagalFire pushed a commit that referenced this pull request Mar 15, 2026
…der (#23663)

* fix: forward extra_headers to HuggingFace embedding calls (#23525)

Fixes #23502

The huggingface_embed.embedding() call was not receiving the headers
parameter, causing extra_headers (e.g., X-HF-Bill-To) to be silently
dropped. Other providers (openrouter, vercel_ai_gateway, bedrock) already
pass headers correctly. This fix adds headers=headers to match the
behavior of other providers.

Co-authored-by: Jah-yee <sparklab@outlook.com>

* fix: add getPopupContainer to Select components in fallback modal to fix z-index issue (#23516)

The model dropdown menus in the Add Fallbacks modal were rendering behind
the modal overlay because Ant Design portals Select dropdowns to document.body
by default. By setting getPopupContainer to attach the dropdown to its parent
element, the dropdown inherits the modal's stacking context and renders above
the modal.

Fixes #17895

* PR #22867 added _remove_scope_from_cache_control for Bedrock and Azur… (#23183)

* PR #22867 added _remove_scope_from_cache_control for Bedrock and Azure AI but omitted Vertex AI. This applies the same pattern to VertexAIPartnerModelsAnthropicMessagesConfig."

* PR #22867 added _remove_scope_from_cache_control for Bedrock and Azure AI but omitted Vertex AI. This applies the same pattern to VertexAIPartnerModelsAnthropicMessagesConfig."

* PR #22867 added _remove_scope_from_cache_control to AzureAnthropicMessagesConfig
 but missed VertexAIPartnerModelsAnthropicMessagesConfi Rather than duplicating the method again, moved it up to the base AnthropicMessagesConfig so all providers
  inherit it, and removed the now-redundant copy from the Azure AI subclass.

* PR #22867 added _remove_scope_from_cache_control to AzureAnthropicMessagesConfig
 but missed VertexAIPartnerModelsAnthropicMessagesConfi Rather than duplicating the method again, moved it up to the base AnthropicMessagesConfig so all providers
  inherit it, and removed the now-redundant copy from the Azure AI subclass.

---------

Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com>

* fix: auto-fill reasoning_content for moonshot kimi reasoning models in multi-turn tool calling (#23580)

* Handle response.failed, response.incomplete, and response.cancelled (#23492)

* Handle response.failed, response.incomplete, and response.cancelled terminal events in background streaming

Previously the background streaming task only handled response.completed and
hardcoded the final status to "completed". This missed three other terminal
event types from the OpenAI streaming spec, causing failed/incomplete/cancelled
responses to be incorrectly marked as completed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Committed-By-Agent: claude

* Remove unused terminal_response_data variable

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Committed-By-Agent: claude

* Address code review: derive fallback status from event type, rewrite tests as integration tests

1. Replace hardcoded "completed" fallback in response_data.get("status")
   with _event_to_status lookup so that response.incomplete and
   response.cancelled events get the correct fallback if the response
   body ever omits the status field.

2. Replace duplicated-logic unit tests with integration tests that
   exercise background_streaming_task directly using mocked streaming
   responses and assert on the final update_state call arguments.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Committed-By-Agent: claude

* Remove dead mock_processor and unused mock_response parameter from test helper

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Committed-By-Agent: claude

* Remove FastAPI and UserAPIKeyAuth imports from test file

These types were only used as Mock(spec=...) arguments. Drop the spec
constraints and remove the top-level imports to avoid pulling FastAPI
into test files outside litellm/proxy/.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Committed-By-Agent: claude

* Log warning when streaming response has no body_iterator

If base_process_llm_request returns a non-streaming response (no
body_iterator), log a warning since this likely indicates a
misconfiguration or provider error rather than a successful completion.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Committed-By-Agent: claude

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(security): bump tar to 7.5.11 and tornado to 6.5.5 (#23602)

* fix(security): bump tar to 7.5.11 and tornado to 6.5.5

- tar >=7.5.11: fixes CVE-2026-31802 (HIGH) in node-pkg
- tornado >=6.5.5: fixes CVE-2026-31958 (HIGH) and GHSA-78cv-mqj4-43f7 (MEDIUM) in python-pkg

Addresses vulnerabilities found in ghcr.io/berriai/litellm:main-v1.82.0-stable Trivy scan.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: document tar override is enforced via Dockerfile, not npm

* fix: revert invalid JSON comment in package.json tar override

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* [Feat] - Ishaan main merge branch  (#23596)

* fix(bedrock): respect s3_region_name for batch file uploads (#23569)

* fix(bedrock): respect s3_region_name for batch file uploads (GovCloud fix)

* fix: s3_region_name always wins over aws_region_name for S3 signing (Greptile feedback)

* fix: _filter_headers_for_aws_signature - Bedrock KB (#23571)

* fix: _filter_headers_for_aws_signature

* fix: filter None header values in all post-signing re-merge paths

Addresses Greptile feedback: None-valued headers were being filtered
during SigV4 signing but re-merged back into the final headers dict
afterward, which would cause downstream HTTP client failures.

Made-with: Cursor

* feat(router): tag_regex routing — route by User-Agent regex without per-developer tag config (#23594)

* feat(router): add tag_regex support for header-based routing

Adds a new `tag_regex` field to litellm_params that lets operators route
requests based on regex patterns matched against request headers — primarily
User-Agent — without requiring per-developer tag configuration.

Use case: route all Claude Code traffic (User-Agent: claude-code/x.y.z) to
a dedicated deployment by setting:

  tag_regex:
    - "^User-Agent: claude-code\\/"

in the deployment's litellm_params. Works alongside existing `tags` routing;
exact tag match takes precedence over regex match. Unmatched requests fall
through to deployments tagged `default`.

The matched deployment, pattern, and user_agent are recorded in
`metadata["tag_routing"]` so they flow through to SpendLogs automatically.

* fix(tag_regex): address backwards-compat, metadata overwrite, and warning noise

Three issues from code review:

1. Backwards-compat: `has_tag_filter` was widened to activate on any non-empty
   User-Agent, which would raise ValueError for existing deployments using plain
   tags without a `default` fallback. Fix: only activate header-based regex
   filtering when at least one candidate deployment has `tag_regex` configured.

2. Metadata overwrite: `metadata["tag_routing"]` was overwritten for every
   matching deployment in the loop, leaving inaccurate provenance when multiple
   deployments match. Fix: write only for the first match.

3. Warning noise: an invalid regex pattern logged one warning per header string
   rather than once per pattern. Fix: compile first (catching re.error once),
   then iterate over header strings.

Also adds two new tests covering these cases, and adds docs page for
tag_regex routing with a Claude Code walk-through.

* refactor(tag_regex): remove unnecessary _healthy_list copy

* docs: merge tag_regex section into tag_routing.md, remove standalone page

- Add ## Regex-based tag routing (tag_regex) section to existing
  tag_routing.md instead of a separate page
- Remove tag_regex_routing.md standalone doc (odd UX to have a separate
  page for a sub-feature)
- Remove proxy/tag_regex_routing from sidebars.js
- Add match_any=False debug warning in tag_based_routing.py when regex
  routing fires under strict mode (regex always uses OR semantics)

* fix(tag_regex): address greptile review - security docs, strict-mode enforcement, validation order

- Strengthen security note in tag_routing.md: explicitly state User-Agent
  is client-supplied and can be set to any value; frame tag_regex as a
  traffic classification hint, not an access-control mechanism
- Move tag_regex startup validation before _add_deployment() so an invalid
  pattern never leaves partial router state
- Enforce match_any=False strict-tag policy: when a deployment has both
  tags and tag_regex and the strict tag check fails, skip the regex fallback
  rather than silently bypassing the operator's intent
- Extract per-deployment match logic into _match_deployment() helper to
  keep get_deployments_for_tag() readable
- Add two new tests: strict-mode blocks regex fallback, regex-only
  deployment still matches under match_any=False

* fix(ci): apply Black formatting to 14 files and stabilize flaky caplog tests

- Run Black formatter on 14 files that were failing the lint check
- Replace caplog-based assertions in TestAliasConflicts with
  unittest.mock.patch on verbose_logger.warning for xdist compatibility
- The caplog fixture can produce empty text in pytest-xdist workers
  in certain CI environments, causing flaky test failures

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix: tiktoken cache nonroot offline (#23498)

* fix: restore offline tiktoken cache for non-root envs

Made-with: Cursor

* chore: mkdir for custom tiktoken cache dir

Made-with: Cursor

* test: patch tiktoken.get_encoding in custom-dir test to avoid network

Made-with: Cursor

* test: clear CUSTOM_TIKTOKEN_CACHE_DIR in helper for test isolation

Made-with: Cursor

* test: restore default_encoding module state after custom-dir test

Made-with: Cursor

* fix: normalize content_filtered finish_reason (#23564)

Map provider finish_reason "content_filtered" to the OpenAI-compatible "content_filter" and extend core_helpers tests to cover this case.

Made-with: Cursor

* fix: Fixes #23185 (#23647)

* fix: merge annotations from all streaming chunks in stream_chunk_builder

Previously, stream_chunk_builder only took annotations from the first
chunk that contained them, losing any annotations from later chunks.

This is a problem because providers like Gemini/Vertex AI send grounding
metadata (converted to annotations) in the final streaming chunk, while
other providers may spread annotations across multiple chunks.

Changes:
- Collect and merge annotations from ALL annotation-bearing chunks
  instead of only using the first one

---------

Co-authored-by: RoomWithOutRoof <166608075+Jah-yee@users.noreply.github.com>
Co-authored-by: Jah-yee <sparklab@outlook.com>
Co-authored-by: Ethan T. <ethanchang32@gmail.com>
Co-authored-by: Awais Qureshi <awais.qureshi@arbisoft.com>
Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com>
Co-authored-by: Pradyumna Yadav <pradyumna.aky@gmail.com>
Co-authored-by: xianzongxie-stripe <87151258+xianzongxie-stripe@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Joe Reyna <joseph.reyna@gmail.com>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
Co-authored-by: milan-berri <milan@berri.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Moonshot/Kimi K2.5 - reasoning_content is missing in assistant tool call message during multi-turn tool calling

2 participants