Skip to content

fix(bedrock): correct modelInput format for Converse API batch models#21656

Merged
krrishdholakia merged 1 commit intoBerriAI:litellm_oss_staging_02_21_2026from
hztBUAA:fix/bedrock-batch-converse-format
Feb 21, 2026
Merged

fix(bedrock): correct modelInput format for Converse API batch models#21656
krrishdholakia merged 1 commit intoBerriAI:litellm_oss_staging_02_21_2026from
hztBUAA:fix/bedrock-batch-converse-format

Conversation

@hztBUAA
Copy link
Contributor

@hztBUAA hztBUAA commented Feb 20, 2026

Summary

Fixes #21596

When litellm.create_file() / litellm.acreate_file() is called with custom_llm_provider="bedrock" for Converse API models (e.g. Amazon Nova), the uploaded JSONL batch file contained incorrect modelInput format:

  1. image_url content blocks were passed through unchanged instead of being converted to Bedrock's {"image": {"format": "...", "source": {"bytes": "..."}}} format
  2. Inference parameters (max_tokens, temperature, top_p) were left at the top level instead of being wrapped inside inferenceConfig

This caused Bedrock batch jobs to fail at inference time for Nova and other Converse API models.

Root cause: BedrockFilesConfig._map_openai_to_bedrock_params() only had a transformation path for Anthropic models. All other providers fell through to a raw passthrough else branch.

Fix: Added a new code path for Converse API providers (currently Nova) that routes through the existing AmazonConverseConfig.transform_request(), which correctly:

  • Converts image_url blocks to Bedrock Converse image format
  • Wraps max_tokens/temperature/top_p inside inferenceConfig
  • Transforms messages to Converse content block structure

The fix uses a CONVERSE_INVOKE_PROVIDERS tuple that can be extended as other providers adopt the same schema. Anthropic and OpenAI-compatible model paths remain unchanged.

Test plan

  • test_nova_text_only_uses_converse_format - Verifies Nova text-only requests get inferenceConfig wrapping
  • test_nova_image_content_uses_converse_image_blocks - Verifies image_url blocks are converted to Bedrock image format
  • test_anthropic_still_works_after_nova_fix - Regression test for Anthropic models
  • test_openai_passthrough_still_works - Regression test for OpenAI-compatible models
  • test_transform_openai_jsonl_content_to_bedrock_jsonl_content - Existing test still passes

🤖 Generated with Claude Code

For Bedrock batch file uploads targeting Converse API models (e.g. Nova),
the modelInput was incorrectly using raw OpenAI passthrough format. This
caused image_url blocks to be kept as-is (not converted to Bedrock image
format) and inference params like max_tokens to remain at the top level
instead of being wrapped in inferenceConfig.

Route Nova (and future Converse-format providers) through
AmazonConverseConfig.transform_request() so that:
- image_url content blocks are converted to Bedrock image format
- max_tokens/temperature/top_p are wrapped in inferenceConfig
- messages use the Converse content block structure

Fixes BerriAI#21596
@vercel
Copy link

vercel bot commented Feb 20, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment Feb 20, 2026 6:03am

Request Review

@chatgpt-codex-connector
Copy link

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@CLAassistant
Copy link

CLAassistant commented Feb 20, 2026

CLA assistant check
All committers have signed the CLA.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 20, 2026

Greptile Summary

This PR fixes a critical bug in Bedrock batch file transformation for Nova (Converse API) models. The issue was that _map_openai_to_bedrock_params() had only Anthropic-specific and generic passthrough paths, causing Nova batch requests to use the wrong format.

Key Changes:

  • Added CONVERSE_INVOKE_PROVIDERS = ("nova",) tuple to identify models using Converse API format
  • Added new code path routing Nova requests through AmazonConverseConfig.transform_request()
  • This correctly converts image_url content blocks to Bedrock's {"image": {"format": "...", "source": {"bytes": "..."}}} format
  • Wraps inference parameters (max_tokens, temperature, top_p) inside inferenceConfig as required by Converse API

Test Coverage:

  • 4 new tests validate Nova text-only and image transformations
  • Regression tests confirm Anthropic and OpenAI-compatible models unchanged
  • Tests verify proper inferenceConfig structure and image block format

Architecture:
The fix properly reuses existing transformation infrastructure. Nova models now follow the same transformation path as real-time Converse API requests, ensuring consistency between batch and real-time inference.

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk
  • The fix correctly addresses the root cause by routing Nova models through the appropriate transformation config. Implementation properly reuses existing, battle-tested transformation logic from AmazonConverseConfig. Comprehensive tests cover all code paths including regressions. Changes are isolated to the files transformation layer with no impact on runtime request handling. The approach is extensible - additional Converse API providers can be easily added to the tuple.
  • No files require special attention

Important Files Changed

Filename Overview
litellm/llms/bedrock/files/transformation.py Added Converse API transformation path for Nova models. Routes Nova batch requests through AmazonConverseConfig to properly convert image_url blocks and wrap inference params in inferenceConfig. Anthropic and OpenAI-compatible model paths remain unchanged.
tests/test_litellm/llms/bedrock/files/test_bedrock_files_transformation.py Added 4 comprehensive tests: Nova text-only format validation, Nova image content transformation, Anthropic regression test, and OpenAI passthrough regression test. Tests verify correct inferenceConfig wrapping and image block transformations.
tests/test_litellm/llms/bedrock/files/expected_bedrock_batch_completions.jsonl Updated expected output to include anthropic_beta: [] field in Anthropic responses, matching current transformation behavior.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[OpenAI JSONL Request] --> B[_map_openai_to_bedrock_params]
    B --> C{Determine Provider}
    C -->|anthropic| D[AmazonAnthropicClaudeConfig]
    C -->|nova| E[AmazonConverseConfig]
    C -->|Other providers| F[Passthrough]
    
    D --> D1[Transform messages]
    D --> D2[Keep max_tokens at top level]
    D --> D3[Add anthropic_version]
    D1 --> G[Bedrock Batch Record]
    D2 --> G
    D3 --> G
    
    E --> E1[Transform messages via<br/>bedrock_converse_messages_pt]
    E --> E2[Convert image_url to<br/>Bedrock image format]
    E --> E3[Wrap params in inferenceConfig]
    E1 --> G
    E2 --> G
    E3 --> G
    
    F --> F1[Keep messages unchanged]
    F --> F2[Keep params at top level]
    F1 --> G
    F2 --> G
    
    G --> H[Upload to S3]
Loading

Last reviewed commit: 27d695b

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@krrishdholakia krrishdholakia changed the base branch from main to litellm_oss_staging_02_21_2026 February 21, 2026 06:27
@krrishdholakia krrishdholakia merged commit 4cc58fe into BerriAI:litellm_oss_staging_02_21_2026 Feb 21, 2026
13 of 21 checks passed
krrishdholakia added a commit that referenced this pull request Feb 24, 2026
…21970)

* auth_with_role_name add region_name arg for cross-account sts

* update tests to include case with aws_region_name for _auth_with_aws_role

* Only pass region_name to STS client when aws_region_name is set

* Add optional aws_sts_endpoint to _auth_with_aws_role

* Parametrize ambient-credentials test for no opts, region_name, and aws_sts_endpoint

* consistently passing region and endpoint args into explicit credentials irsa

* fix env var leakage

* fix: bedrock openai-compatible imported-model should also have model arn encoded

* feat: show proxy url in ModelHub (#21660)

* fix(bedrock): correct modelInput format for Converse API batch models (#21656)

* fix(proxy): add model_ids param to access group endpoints for precise deployment tagging (#21655)

POST /access_group/new and PUT /access_group/{name}/update now accept an
optional model_ids list that targets specific deployments by their unique
model_id, instead of tagging every deployment that shares a model_name.

When model_ids is provided it takes priority over model_names, giving
API callers the same single-deployment precision that the UI already has
via PATCH /model/{model_id}/update.

Backward compatible: model_names continues to work as before.

Closes #21544

* feat(proxy): add custom favicon support\n\nAdd ability to configure a custom favicon for the litellm proxy UI.\n\n- Add favicon_url field to UIThemeConfig model\n- Add LITELLM_FAVICON_URL env var support\n- Add /get_favicon endpoint to serve custom favicons\n- Update ThemeContext to dynamically set favicon\n- Add favicon URL input to UI theme settings page\n- Add comprehensive tests\n\nCloses #8323 (#21653)

* fix(bedrock): prevent double UUID in create_file S3 key (#21650)

In create_file for Bedrock, get_complete_file_url is called twice:
once in the sync handler (generating UUID-1 for api_base) and once
inside transform_create_file_request (generating UUID-2 for the
actual S3 upload). The Bedrock provider correctly writes UUID-2 into
litellm_params["upload_url"], but the sync handler unconditionally
overwrites it with api_base (UUID-1). This causes the returned
file_id to point to a non-existent S3 key.

Fix: only set upload_url to api_base when transform_create_file_request
has not already set it, preserving the Bedrock provider's value.

Closes #21546

* feat(semantic-cache): support configurable vector dimensions for Qdrant (#21649)

Add vector_size parameter to QdrantSemanticCache and expose it through
the Cache facade as qdrant_semantic_cache_vector_size. This allows users
to use embedding models with dimensions other than the default 1536,
enabling cheaper/stronger models like Stella (1024d), bge-en-icl (4096d),
voyage, cohere, etc.

The parameter defaults to QDRANT_VECTOR_SIZE (env var or 1536) for
backward compatibility. When creating new collections, the configured
vector_size is used instead of the hardcoded constant.

Closes #9377

* fix(utils): normalize camelCase thinking param keys to snake_case (#21762)

Clients like OpenCode's @ai-sdk/openai-compatible send budgetTokens
(camelCase) instead of budget_tokens in the thinking parameter, causing
validation errors. Add early normalization in completion().

* feat: add optional digest mode for Slack alert types (#21683)

Adds per-alert-type digest mode that aggregates duplicate alerts
within a configurable time window and emits a single summary message
with count, start/end timestamps.

Configuration via general_settings.alert_type_config:
  alert_type_config:
    llm_requests_hanging:
      digest: true
      digest_interval: 86400

Digest key: (alert_type, request_model, api_base)
Default interval: 24 hours
Window type: fixed interval

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add blog_posts.json and local backup

* feat: add GetBlogPosts utility with GitHub fetch and local fallback

Adds GetBlogPosts class that fetches blog posts from GitHub with a 1-hour
in-process TTL cache, validates the response, and falls back to the bundled
blog_posts_backup.json on any network or validation failure.

* test: add cache reset fixture and LITELLM_LOCAL_BLOG_POSTS test

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat: add GET /public/litellm_blog_posts endpoint

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: log fallback warning in blog posts endpoint and tighten test

* feat: add disable_show_blog to UISettings

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat: add useUISettings and useDisableShowBlog hooks

* fix: rename useUISettings to useUISettingsFlags to avoid naming collision

* fix: use existing useUISettings hook in useDisableShowBlog to avoid cache duplication

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat: add BlogDropdown component with react-query and error/retry state

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: enforce 5-post limit in BlogDropdown and add cap test

* fix: add retry, stable post key, enabled guard in BlogDropdown

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat: add BlogDropdown to navbar after Docs link

* feat: add network_mock transport for benchmarking proxy overhead without real API calls

Intercepts at httpx transport layer so the full proxy path (auth, routing,
OpenAI SDK, response transformation) is exercised with zero-latency responses.
Activated via `litellm_settings: { network_mock: true }` in proxy config.

* Litellm dev 02 19 2026 p2 (#21871)

* feat(ui/): new guardrails monitor 'demo

mock representation of what guardrails monitor looks like

* fix: ui updates

* style(ui/): fix styling

* feat: enable running ai monitor on individual guardrails

* feat: add backend logic for guardrail monitoring

* fix(guardrails/usage_endpoints.py): fix usage dashboard

* fix(budget): fix timezone config lookup and replace hardcoded timezone map with ZoneInfo (#21754)

* fix(budget): fix timezone config lookup and replace hardcoded timezone map with ZoneInfo

* fix(budget): update stale docstring on get_budget_reset_time

* fix: add missing return type annotations to iterator protocol methods in streaming_handler (#21750)

* fix: add return type annotations to iterator protocol methods in streaming_handler

Add missing return type annotations to __iter__, __aiter__, __next__, and __anext__ methods in CustomStreamWrapper and related classes.

- __iter__(self) -> Iterator["ModelResponseStream"]
- __aiter__(self) -> AsyncIterator["ModelResponseStream"]
- __next__(self) -> "ModelResponseStream"
- __anext__(self) -> "ModelResponseStream"

Also adds AsyncIterator and Iterator to typing imports.

Fixes issue with PLR0915 noqa comments and ensures proper type checking support.
Related to: #8304

* fix: add ruff PLR0915 noqa for files with too many statements

* Add gollem Go agent framework cookbook example (#21747)

Show how to use gollem, a production Go agent framework, with
LiteLLM proxy for multi-provider LLM access including tool use
and streaming.

* fix: avoid mutating caller-owned dicts in SpendUpdateQueue aggregation (#21742)

* fix(vertex_ai): enable context-1m-2025-08-07 beta header (#21870)

* server root path regression doc

* fixing syntax

* fix: replace Zapier webhook with Google Form for survey submission (#21621)

* Replace Zapier webhook with Google Form for survey submission

* Add back error logging for survey submission debugging

---------

Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>

* Revert "Merge pull request #21140 from BerriAI/litellm_perf_user_api_key_auth"

This reverts commit 0e1db3f, reversing
changes made to 7e2d6f2.

* test_vertex_ai_gemini_2_5_pro_streaming

* UI new build

* fix rendering

* ui new build

* docs fix

* docs fix

* docs fix

* docs fix

* docs fix

* docs fix

* docs fix

* docs fix

* release note docs

* docs

* adding image

* fix(vertex_ai): enable context-1m-2025-08-07 beta header

The `context-1m-2025-08-07` Anthropic beta header was set to `null` for vertex_ai,
causing it to be filtered out when users set `extra_headers: {anthropic-beta: context-1m-2025-08-07}`.

This prevented using Claude's 1M context window feature via Vertex AI, resulting in
`prompt is too long: 460500 tokens > 200000 maximum` errors.

Fixes #21861

---------

Co-authored-by: yuneng-jiang <yuneng.jiang@gmail.com>
Co-authored-by: milan-berri <milan@berri.ai>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>

* Revert "fix(vertex_ai): enable context-1m-2025-08-07 beta header (#21870)" (#21876)

This reverts commit bce078a.

* docs(ui): add pre-PR checklist to UI contributing guide

Add testing and build verification steps per maintainer feedback
from @yjiang-litellm. Contributors should run their related tests
per-file and ensure npm run build passes before opening PRs.

* Fix entries with fast and us/

* Add tests for fast and us

* Add support for Priority PayGo for vertex ai and gemini

* Add model pricing

* fix: ensure arrival_time is set before calculating queue time

* Fix: Anthropic model wildcard access issue

* Add incident report

* Add ability to see which model cost map is getting used

* Fix name of title

* Readd tpm limit

* State management fixes for CheckBatchCost

* Fix PR review comments

* State management fixes for CheckBatchCost - Address greptile comments

* fix mypy issues:

* Add Noma guardrails v2 based on custom guardrails (#21400)

* Fix code qa issues

* Fix mypy issues

* Fix mypy issues

* Fix test_aaamodel_prices_and_context_window_json_is_valid

* fix: update calendly on repo

* fix(tests): use counter-based mock for time.time in prisma self-heal test

The test used a fixed side_effect list for time.time(), but the number
of calls varies by Python version, causing StopIteration on 3.12 and
AssertionError on 3.14. Replace with an infinite counter-based callable
and assert the timestamp was updated rather than checking for an exact
value.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(tests): use absolute path for model_prices JSON in validation test

The test used a relative path 'litellm/model_prices_and_context_window.json'
which only works when pytest runs from a specific working directory.
Use os.path based on __file__ to resolve the path reliably.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Update tests/test_litellm/test_utils.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* fix(tests): use os.path instead of Path to avoid NameError

Path is not imported at module level. Use os.path.join which is already
available.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* clean up mock transport: remove streaming, add defensive parsing

* docs: add Google GenAI SDK tutorial (JS & Python) (#21885)

* docs: add Google GenAI SDK tutorial for JS and Python

Add tutorial for using Google's official GenAI SDK (@google/genai for JS,
google-genai for Python) with LiteLLM proxy. Covers pass-through and
native router endpoints, streaming, multi-turn chat, and multi-provider
routing via model_group_alias. Also updates pass-through docs to use the
new SDK replacing the deprecated @google/generative-ai.

* fix(docs): correct Python SDK env var name in GenAI tutorial

GOOGLE_GENAI_API_KEY does not exist in the google-genai SDK.
The correct env var is GEMINI_API_KEY (or GOOGLE_API_KEY).
Also note that the Python SDK has no base URL env var.

* fix(docs): replace non-existent GOOGLE_GENAI_BASE_URL env var in interactions.md

The Python google-genai SDK does not read GOOGLE_GENAI_BASE_URL.
Use http_options={"base_url": "..."} in code instead.

* docs: add network mock benchmarking section

* docs: tweak benchmarks wording

* fix: add auth headers and empty latencies guard to benchmark script

* refactor: use method-level import for MockOpenAITransport

* fix: guard print_aggregate against empty latencies

* fix: add INCOMPLETE status to Interactions API enum and test

Google added INCOMPLETE to the Interactions API OpenAPI spec status enum.
Update both the Status3 enum in the SDK types and the test's expected
values to match.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Guardrail Monitor - measure guardrail reliability in prod  (#21944)

* fix: fix log viewer for guardrail monitoring

* feat(ui/): fix rendering logs per guardrail

* fix: fix viewing logs on overview tab of guardrail

* fix: log viewer

* fix: fix naming to align with metric

* docs: add performance & reliability section to v1.81.14 release notes

* fix(tests): make RPM limit test sequential to avoid race condition

Concurrent requests via run_in_executor + asyncio.gather caused a race
condition where more requests slipped through the rate limiter than
expected, leading to flaky test failures (e.g. 3 successes instead of 2
with rpm_limit=2).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: Singapore guardrail policies (PDPA + MAS AI Risk Management) (#21948)

* feat: Singapore PDPA PII protection guardrail policy template

Add Singapore Personal Data Protection Act (PDPA) guardrail support:

Regex patterns (patterns.json):
- sg_nric: NRIC/FIN detection ([STFGM] + 7 digits + checksum letter)
- sg_phone: Singapore phone numbers (+65/0065/65 prefix)
- sg_postal_code: 6-digit postal codes (contextual)
- passport_singapore: Passport numbers (E/K + 7 digits, contextual)
- sg_uen: Unique Entity Numbers (3 formats)
- sg_bank_account: Bank account numbers (dash format, contextual)

YAML policy templates (5 sub-guardrails):
- sg_pdpa_personal_identifiers: s.13 Consent
- sg_pdpa_sensitive_data: Advisory Guidelines
- sg_pdpa_do_not_call: Part IX DNC Registry
- sg_pdpa_data_transfer: s.26 overseas transfers
- sg_pdpa_profiling_automated_decisions: Model AI Governance Framework

Policy template entry in policy_templates.json with 9 guardrail definitions
(4 regex-based + 5 YAML conditional keyword matching).

Tests:
- test_sg_patterns.py: regex pattern unit tests
- test_sg_pdpa_guardrails.py: conditional keyword matching tests (100+ cases)

* feat: MAS AI Risk Management Guidelines guardrail policy template

Add Monetary Authority of Singapore (MAS) AI Risk Management Guidelines
guardrail support for financial institutions:

YAML policy templates (5 sub-guardrails):
- sg_mas_fairness_bias: Blocks discriminatory financial AI (credit/loans/insurance by protected attributes)
- sg_mas_transparency_explainability: Blocks opaque/unexplainable AI for consequential financial decisions
- sg_mas_human_oversight: Blocks fully automated financial decisions without human-in-the-loop
- sg_mas_data_governance: Blocks unauthorized sharing/mishandling of financial customer data
- sg_mas_model_security: Blocks adversarial attacks, model poisoning, inversion on financial AI

Policy template entry in policy_templates.json with 5 guardrail definitions.
Aligned with MAS FEAT Principles, Project MindForge, and NIST AI RMF.

Tests:
- test_sg_mas_ai_guardrails.py: conditional keyword matching tests (100+ cases)

* fix: address SG pattern review feedback

- Update NRIC lowercase test for IGNORECASE runtime behavior
- Add keyword context guard to sg_uen pattern to reduce false positives

* docs: clarify MAS AIRM timeline references

- Explicitly mark MAS AIRM as Nov 2025 consultation draft
- Add 2018 qualifier for FEAT principles in MAS policy descriptions
- Update MAS guardrail wording to avoid release-year ambiguity

* chore: commit resolved MAS policy conflicts

* test:

* chore:

* Add OpenAI Agents SDK tutorial with LiteLLM Proxy to docs  (#21221)

* Add OpenAI Agents SDK tutorial to docs

* Update OpenAI Agents SDK tutorial to use LiteLLM environment variables

* Enhance OpenAI Agents SDK tutorial with built-in LiteLLM extension details and updated configuration steps. Adjust section headings for clarity and improve the flow of information regarding model setup and usage.

* adjust blog posts to fetch from github first

* feat(videos): add variant parameter to video content download (#21955)

openai videos models support the features to download variants.
See more details here: https://developers.openai.com/api/docs/guides/video-generation#use-image-references.
Plumb variant (e.g. "thumbnail", "spritesheet") through the full
video content download chain: avideo_content → video_content →
video_content_handler → transform_video_content_request. OpenAI
appends ?variant=<value> to the GET URL; other providers accept
the parameter in their signature but ignore it.

* fixing path

* adjust blog post path

* Revert duplicate issue checker to text-based matching, remove duplicate PR workflow

Remove the Claude Code-powered duplicate PR detection workflow and revert
the duplicate issue checker back to wow-actions/potential-duplicates with
text similarity matching.

* ui changes

* adding tests

* adjust default aggregation threshold

* fix(videos): pass api_key from litellm_params to video remix handlers (#21965)

video_remix_handler and async_video_remix_handler were not falling back
to litellm_params.api_key when the api_key parameter was None, causing
Authorization: Bearer None to be sent to the provider. This matches the
pattern already used by async_video_generation_handler.

* adding testing coverage + fixing flaky tests

* fix(ollama): thread api_base through get_model_info and add graceful fallback

When users pass api_base to litellm.completion() for Ollama, the model
info fetch (context window, function_calling support) was ignoring the
user's api_base and only reading OLLAMA_API_BASE env var or defaulting
to localhost:11434. This caused confusing errors in logs when Ollama
runs on a remote server.

Thread api_base from litellm_params through the get_model_info call
chain so OllamaConfig.get_model_info() uses the correct server. Also
return safe defaults instead of raising when the server is unreachable.

Fixes #21967

---------

Co-authored-by: An Tang <ta@stripe.com>
Co-authored-by: janfrederickk <75388864+janfrederickk@users.noreply.github.com>
Co-authored-by: Zhenting Huang <3061613175@qq.com>
Co-authored-by: Darien Kindlund <darien@kindlund.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: yuneng-jiang <yuneng.jiang@gmail.com>
Co-authored-by: Ryan Crabbe <rcrabbe@berkeley.edu>
Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com>
Co-authored-by: LeeJuOh <56071126+LeeJuOh@users.noreply.github.com>
Co-authored-by: Monesh Ram <31161039+WhoisMonesh@users.noreply.github.com>
Co-authored-by: Trevor Prater <trevor.prater@gmail.com>
Co-authored-by: The Mavik <179817126+themavik@users.noreply.github.com>
Co-authored-by: Edwin Isac <33712823+edwiniac@users.noreply.github.com>
Co-authored-by: milan-berri <milan@berri.ai>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: Sameer Kankute <sameer@berri.ai>
Co-authored-by: Harshit Jain <harshitjain0562@gmail.com>
Co-authored-by: Harshit Jain <48647625+Harshit28j@users.noreply.github.com>
Co-authored-by: Ephrim Stanley <ephrim.stanley@point72.com>
Co-authored-by: TomAlon <tom@noma.security>
Co-authored-by: Julio Quinteros Pro <jquinter@gmail.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: ryan-crabbe <128659760+ryan-crabbe@users.noreply.github.com>
Co-authored-by: Ron Zhong <ron-zhong@hotmail.com>
Co-authored-by: Arindam Majumder <109217591+Arindam200@users.noreply.github.com>
Co-authored-by: Lei Nie <lenie@quora.com>
krrishdholakia added a commit that referenced this pull request Feb 24, 2026
…voke (#21964)

* auth_with_role_name add region_name arg for cross-account sts

* update tests to include case with aws_region_name for _auth_with_aws_role

* Only pass region_name to STS client when aws_region_name is set

* Add optional aws_sts_endpoint to _auth_with_aws_role

* Parametrize ambient-credentials test for no opts, region_name, and aws_sts_endpoint

* consistently passing region and endpoint args into explicit credentials irsa

* fix env var leakage

* fix: bedrock openai-compatible imported-model should also have model arn encoded

* feat: show proxy url in ModelHub (#21660)

* fix(bedrock): correct modelInput format for Converse API batch models (#21656)

* fix(proxy): add model_ids param to access group endpoints for precise deployment tagging (#21655)

POST /access_group/new and PUT /access_group/{name}/update now accept an
optional model_ids list that targets specific deployments by their unique
model_id, instead of tagging every deployment that shares a model_name.

When model_ids is provided it takes priority over model_names, giving
API callers the same single-deployment precision that the UI already has
via PATCH /model/{model_id}/update.

Backward compatible: model_names continues to work as before.

Closes #21544

* feat(proxy): add custom favicon support\n\nAdd ability to configure a custom favicon for the litellm proxy UI.\n\n- Add favicon_url field to UIThemeConfig model\n- Add LITELLM_FAVICON_URL env var support\n- Add /get_favicon endpoint to serve custom favicons\n- Update ThemeContext to dynamically set favicon\n- Add favicon URL input to UI theme settings page\n- Add comprehensive tests\n\nCloses #8323 (#21653)

* fix(bedrock): prevent double UUID in create_file S3 key (#21650)

In create_file for Bedrock, get_complete_file_url is called twice:
once in the sync handler (generating UUID-1 for api_base) and once
inside transform_create_file_request (generating UUID-2 for the
actual S3 upload). The Bedrock provider correctly writes UUID-2 into
litellm_params["upload_url"], but the sync handler unconditionally
overwrites it with api_base (UUID-1). This causes the returned
file_id to point to a non-existent S3 key.

Fix: only set upload_url to api_base when transform_create_file_request
has not already set it, preserving the Bedrock provider's value.

Closes #21546

* feat(semantic-cache): support configurable vector dimensions for Qdrant (#21649)

Add vector_size parameter to QdrantSemanticCache and expose it through
the Cache facade as qdrant_semantic_cache_vector_size. This allows users
to use embedding models with dimensions other than the default 1536,
enabling cheaper/stronger models like Stella (1024d), bge-en-icl (4096d),
voyage, cohere, etc.

The parameter defaults to QDRANT_VECTOR_SIZE (env var or 1536) for
backward compatibility. When creating new collections, the configured
vector_size is used instead of the hardcoded constant.

Closes #9377

* fix(utils): normalize camelCase thinking param keys to snake_case (#21762)

Clients like OpenCode's @ai-sdk/openai-compatible send budgetTokens
(camelCase) instead of budget_tokens in the thinking parameter, causing
validation errors. Add early normalization in completion().

* feat: add optional digest mode for Slack alert types (#21683)

Adds per-alert-type digest mode that aggregates duplicate alerts
within a configurable time window and emits a single summary message
with count, start/end timestamps.

Configuration via general_settings.alert_type_config:
  alert_type_config:
    llm_requests_hanging:
      digest: true
      digest_interval: 86400

Digest key: (alert_type, request_model, api_base)
Default interval: 24 hours
Window type: fixed interval

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add blog_posts.json and local backup

* feat: add GetBlogPosts utility with GitHub fetch and local fallback

Adds GetBlogPosts class that fetches blog posts from GitHub with a 1-hour
in-process TTL cache, validates the response, and falls back to the bundled
blog_posts_backup.json on any network or validation failure.

* test: add cache reset fixture and LITELLM_LOCAL_BLOG_POSTS test

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat: add GET /public/litellm_blog_posts endpoint

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: log fallback warning in blog posts endpoint and tighten test

* feat: add disable_show_blog to UISettings

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat: add useUISettings and useDisableShowBlog hooks

* fix: rename useUISettings to useUISettingsFlags to avoid naming collision

* fix: use existing useUISettings hook in useDisableShowBlog to avoid cache duplication

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat: add BlogDropdown component with react-query and error/retry state

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: enforce 5-post limit in BlogDropdown and add cap test

* fix: add retry, stable post key, enabled guard in BlogDropdown

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat: add BlogDropdown to navbar after Docs link

* feat: add network_mock transport for benchmarking proxy overhead without real API calls

Intercepts at httpx transport layer so the full proxy path (auth, routing,
OpenAI SDK, response transformation) is exercised with zero-latency responses.
Activated via `litellm_settings: { network_mock: true }` in proxy config.

* Litellm dev 02 19 2026 p2 (#21871)

* feat(ui/): new guardrails monitor 'demo

mock representation of what guardrails monitor looks like

* fix: ui updates

* style(ui/): fix styling

* feat: enable running ai monitor on individual guardrails

* feat: add backend logic for guardrail monitoring

* fix(guardrails/usage_endpoints.py): fix usage dashboard

* fix(budget): fix timezone config lookup and replace hardcoded timezone map with ZoneInfo (#21754)

* fix(budget): fix timezone config lookup and replace hardcoded timezone map with ZoneInfo

* fix(budget): update stale docstring on get_budget_reset_time

* fix: add missing return type annotations to iterator protocol methods in streaming_handler (#21750)

* fix: add return type annotations to iterator protocol methods in streaming_handler

Add missing return type annotations to __iter__, __aiter__, __next__, and __anext__ methods in CustomStreamWrapper and related classes.

- __iter__(self) -> Iterator["ModelResponseStream"]
- __aiter__(self) -> AsyncIterator["ModelResponseStream"]
- __next__(self) -> "ModelResponseStream"
- __anext__(self) -> "ModelResponseStream"

Also adds AsyncIterator and Iterator to typing imports.

Fixes issue with PLR0915 noqa comments and ensures proper type checking support.
Related to: #8304

* fix: add ruff PLR0915 noqa for files with too many statements

* Add gollem Go agent framework cookbook example (#21747)

Show how to use gollem, a production Go agent framework, with
LiteLLM proxy for multi-provider LLM access including tool use
and streaming.

* fix: avoid mutating caller-owned dicts in SpendUpdateQueue aggregation (#21742)

* fix(vertex_ai): enable context-1m-2025-08-07 beta header (#21870)

* server root path regression doc

* fixing syntax

* fix: replace Zapier webhook with Google Form for survey submission (#21621)

* Replace Zapier webhook with Google Form for survey submission

* Add back error logging for survey submission debugging

---------

Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>

* Revert "Merge pull request #21140 from BerriAI/litellm_perf_user_api_key_auth"

This reverts commit 0e1db3f, reversing
changes made to 7e2d6f2.

* test_vertex_ai_gemini_2_5_pro_streaming

* UI new build

* fix rendering

* ui new build

* docs fix

* docs fix

* docs fix

* docs fix

* docs fix

* docs fix

* docs fix

* docs fix

* release note docs

* docs

* adding image

* fix(vertex_ai): enable context-1m-2025-08-07 beta header

The `context-1m-2025-08-07` Anthropic beta header was set to `null` for vertex_ai,
causing it to be filtered out when users set `extra_headers: {anthropic-beta: context-1m-2025-08-07}`.

This prevented using Claude's 1M context window feature via Vertex AI, resulting in
`prompt is too long: 460500 tokens > 200000 maximum` errors.

Fixes #21861

---------

Co-authored-by: yuneng-jiang <yuneng.jiang@gmail.com>
Co-authored-by: milan-berri <milan@berri.ai>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>

* Revert "fix(vertex_ai): enable context-1m-2025-08-07 beta header (#21870)" (#21876)

This reverts commit bce078a.

* docs(ui): add pre-PR checklist to UI contributing guide

Add testing and build verification steps per maintainer feedback
from @yjiang-litellm. Contributors should run their related tests
per-file and ensure npm run build passes before opening PRs.

* Fix entries with fast and us/

* Add tests for fast and us

* Add support for Priority PayGo for vertex ai and gemini

* Add model pricing

* fix: ensure arrival_time is set before calculating queue time

* Fix: Anthropic model wildcard access issue

* Add incident report

* Add ability to see which model cost map is getting used

* Fix name of title

* Readd tpm limit

* State management fixes for CheckBatchCost

* Fix PR review comments

* State management fixes for CheckBatchCost - Address greptile comments

* fix mypy issues:

* Add Noma guardrails v2 based on custom guardrails (#21400)

* Fix code qa issues

* Fix mypy issues

* Fix mypy issues

* Fix test_aaamodel_prices_and_context_window_json_is_valid

* fix: update calendly on repo

* fix(tests): use counter-based mock for time.time in prisma self-heal test

The test used a fixed side_effect list for time.time(), but the number
of calls varies by Python version, causing StopIteration on 3.12 and
AssertionError on 3.14. Replace with an infinite counter-based callable
and assert the timestamp was updated rather than checking for an exact
value.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(tests): use absolute path for model_prices JSON in validation test

The test used a relative path 'litellm/model_prices_and_context_window.json'
which only works when pytest runs from a specific working directory.
Use os.path based on __file__ to resolve the path reliably.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Update tests/test_litellm/test_utils.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* fix(tests): use os.path instead of Path to avoid NameError

Path is not imported at module level. Use os.path.join which is already
available.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* clean up mock transport: remove streaming, add defensive parsing

* docs: add Google GenAI SDK tutorial (JS & Python) (#21885)

* docs: add Google GenAI SDK tutorial for JS and Python

Add tutorial for using Google's official GenAI SDK (@google/genai for JS,
google-genai for Python) with LiteLLM proxy. Covers pass-through and
native router endpoints, streaming, multi-turn chat, and multi-provider
routing via model_group_alias. Also updates pass-through docs to use the
new SDK replacing the deprecated @google/generative-ai.

* fix(docs): correct Python SDK env var name in GenAI tutorial

GOOGLE_GENAI_API_KEY does not exist in the google-genai SDK.
The correct env var is GEMINI_API_KEY (or GOOGLE_API_KEY).
Also note that the Python SDK has no base URL env var.

* fix(docs): replace non-existent GOOGLE_GENAI_BASE_URL env var in interactions.md

The Python google-genai SDK does not read GOOGLE_GENAI_BASE_URL.
Use http_options={"base_url": "..."} in code instead.

* docs: add network mock benchmarking section

* docs: tweak benchmarks wording

* fix: add auth headers and empty latencies guard to benchmark script

* refactor: use method-level import for MockOpenAITransport

* fix: guard print_aggregate against empty latencies

* fix: add INCOMPLETE status to Interactions API enum and test

Google added INCOMPLETE to the Interactions API OpenAPI spec status enum.
Update both the Status3 enum in the SDK types and the test's expected
values to match.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Guardrail Monitor - measure guardrail reliability in prod  (#21944)

* fix: fix log viewer for guardrail monitoring

* feat(ui/): fix rendering logs per guardrail

* fix: fix viewing logs on overview tab of guardrail

* fix: log viewer

* fix: fix naming to align with metric

* docs: add performance & reliability section to v1.81.14 release notes

* fix(tests): make RPM limit test sequential to avoid race condition

Concurrent requests via run_in_executor + asyncio.gather caused a race
condition where more requests slipped through the rate limiter than
expected, leading to flaky test failures (e.g. 3 successes instead of 2
with rpm_limit=2).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: Singapore guardrail policies (PDPA + MAS AI Risk Management) (#21948)

* feat: Singapore PDPA PII protection guardrail policy template

Add Singapore Personal Data Protection Act (PDPA) guardrail support:

Regex patterns (patterns.json):
- sg_nric: NRIC/FIN detection ([STFGM] + 7 digits + checksum letter)
- sg_phone: Singapore phone numbers (+65/0065/65 prefix)
- sg_postal_code: 6-digit postal codes (contextual)
- passport_singapore: Passport numbers (E/K + 7 digits, contextual)
- sg_uen: Unique Entity Numbers (3 formats)
- sg_bank_account: Bank account numbers (dash format, contextual)

YAML policy templates (5 sub-guardrails):
- sg_pdpa_personal_identifiers: s.13 Consent
- sg_pdpa_sensitive_data: Advisory Guidelines
- sg_pdpa_do_not_call: Part IX DNC Registry
- sg_pdpa_data_transfer: s.26 overseas transfers
- sg_pdpa_profiling_automated_decisions: Model AI Governance Framework

Policy template entry in policy_templates.json with 9 guardrail definitions
(4 regex-based + 5 YAML conditional keyword matching).

Tests:
- test_sg_patterns.py: regex pattern unit tests
- test_sg_pdpa_guardrails.py: conditional keyword matching tests (100+ cases)

* feat: MAS AI Risk Management Guidelines guardrail policy template

Add Monetary Authority of Singapore (MAS) AI Risk Management Guidelines
guardrail support for financial institutions:

YAML policy templates (5 sub-guardrails):
- sg_mas_fairness_bias: Blocks discriminatory financial AI (credit/loans/insurance by protected attributes)
- sg_mas_transparency_explainability: Blocks opaque/unexplainable AI for consequential financial decisions
- sg_mas_human_oversight: Blocks fully automated financial decisions without human-in-the-loop
- sg_mas_data_governance: Blocks unauthorized sharing/mishandling of financial customer data
- sg_mas_model_security: Blocks adversarial attacks, model poisoning, inversion on financial AI

Policy template entry in policy_templates.json with 5 guardrail definitions.
Aligned with MAS FEAT Principles, Project MindForge, and NIST AI RMF.

Tests:
- test_sg_mas_ai_guardrails.py: conditional keyword matching tests (100+ cases)

* fix: address SG pattern review feedback

- Update NRIC lowercase test for IGNORECASE runtime behavior
- Add keyword context guard to sg_uen pattern to reduce false positives

* docs: clarify MAS AIRM timeline references

- Explicitly mark MAS AIRM as Nov 2025 consultation draft
- Add 2018 qualifier for FEAT principles in MAS policy descriptions
- Update MAS guardrail wording to avoid release-year ambiguity

* chore: commit resolved MAS policy conflicts

* test:

* chore:

* Add OpenAI Agents SDK tutorial with LiteLLM Proxy to docs  (#21221)

* Add OpenAI Agents SDK tutorial to docs

* Update OpenAI Agents SDK tutorial to use LiteLLM environment variables

* Enhance OpenAI Agents SDK tutorial with built-in LiteLLM extension details and updated configuration steps. Adjust section headings for clarity and improve the flow of information regarding model setup and usage.

* adjust blog posts to fetch from github first

* feat(videos): add variant parameter to video content download (#21955)

openai videos models support the features to download variants.
See more details here: https://developers.openai.com/api/docs/guides/video-generation#use-image-references.
Plumb variant (e.g. "thumbnail", "spritesheet") through the full
video content download chain: avideo_content → video_content →
video_content_handler → transform_video_content_request. OpenAI
appends ?variant=<value> to the GET URL; other providers accept
the parameter in their signature but ignore it.

* fixing path

* adjust blog post path

* Revert duplicate issue checker to text-based matching, remove duplicate PR workflow

Remove the Claude Code-powered duplicate PR detection workflow and revert
the duplicate issue checker back to wow-actions/potential-duplicates with
text similarity matching.

* ui changes

* adding tests

* fix(anthropic): sanitize tool_use IDs in assistant messages

Apply _sanitize_anthropic_tool_use_id to tool_use blocks in
convert_to_anthropic_tool_invoke, not just tool_result blocks.
IDs from external frameworks (e.g. MiniMax) may contain characters
like colons that violate Anthropic's ^[a-zA-Z0-9_-]+$ pattern.

Adds test for invalid ID sanitization in tool_use blocks.

---------

Co-authored-by: An Tang <ta@stripe.com>
Co-authored-by: janfrederickk <75388864+janfrederickk@users.noreply.github.com>
Co-authored-by: Zhenting Huang <3061613175@qq.com>
Co-authored-by: Cesar Garcia <128240629+Chesars@users.noreply.github.com>
Co-authored-by: Darien Kindlund <darien@kindlund.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: yuneng-jiang <yuneng.jiang@gmail.com>
Co-authored-by: Ryan Crabbe <rcrabbe@berkeley.edu>
Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com>
Co-authored-by: LeeJuOh <56071126+LeeJuOh@users.noreply.github.com>
Co-authored-by: Monesh Ram <31161039+WhoisMonesh@users.noreply.github.com>
Co-authored-by: Trevor Prater <trevor.prater@gmail.com>
Co-authored-by: The Mavik <179817126+themavik@users.noreply.github.com>
Co-authored-by: Edwin Isac <33712823+edwiniac@users.noreply.github.com>
Co-authored-by: milan-berri <milan@berri.ai>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: Chesars <cesarponce19544@gmail.com>
Co-authored-by: Sameer Kankute <sameer@berri.ai>
Co-authored-by: Harshit Jain <harshitjain0562@gmail.com>
Co-authored-by: Harshit Jain <48647625+Harshit28j@users.noreply.github.com>
Co-authored-by: Ephrim Stanley <ephrim.stanley@point72.com>
Co-authored-by: TomAlon <tom@noma.security>
Co-authored-by: Julio Quinteros Pro <jquinter@gmail.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: ryan-crabbe <128659760+ryan-crabbe@users.noreply.github.com>
Co-authored-by: Ron Zhong <ron-zhong@hotmail.com>
Co-authored-by: Arindam Majumder <109217591+Arindam200@users.noreply.github.com>
Co-authored-by: Lei Nie <lenie@quora.com>
@hung-phan
Copy link

This still have some issues and will be rejected by bedrock @hztBUAA . It adds additionalModelRequestFields

{"modelInput":{"messages":[{"role":"user","content":[{"text":"Explain the importance of cash flow management for businesses."}]}],"additionalModelRequestFields":{},"system":[{"text":"You are a business consultant with expertise in strategy, management, and economics."}],"inferenceConfig":{"maxTokens":1000}},"error":{"errorCode":400,"errorMessage":"Malformed input request: #: extraneous key [additionalModelRequestFields] is not permitted, reformat your input and try again.","expired":false,"retryable":false},"recordId":"request-45"}

damhau pushed a commit to damhau/litellm that referenced this pull request Feb 26, 2026
…erriAI#21970)

* auth_with_role_name add region_name arg for cross-account sts

* update tests to include case with aws_region_name for _auth_with_aws_role

* Only pass region_name to STS client when aws_region_name is set

* Add optional aws_sts_endpoint to _auth_with_aws_role

* Parametrize ambient-credentials test for no opts, region_name, and aws_sts_endpoint

* consistently passing region and endpoint args into explicit credentials irsa

* fix env var leakage

* fix: bedrock openai-compatible imported-model should also have model arn encoded

* feat: show proxy url in ModelHub (BerriAI#21660)

* fix(bedrock): correct modelInput format for Converse API batch models (BerriAI#21656)

* fix(proxy): add model_ids param to access group endpoints for precise deployment tagging (BerriAI#21655)

POST /access_group/new and PUT /access_group/{name}/update now accept an
optional model_ids list that targets specific deployments by their unique
model_id, instead of tagging every deployment that shares a model_name.

When model_ids is provided it takes priority over model_names, giving
API callers the same single-deployment precision that the UI already has
via PATCH /model/{model_id}/update.

Backward compatible: model_names continues to work as before.

Closes BerriAI#21544

* feat(proxy): add custom favicon support\n\nAdd ability to configure a custom favicon for the litellm proxy UI.\n\n- Add favicon_url field to UIThemeConfig model\n- Add LITELLM_FAVICON_URL env var support\n- Add /get_favicon endpoint to serve custom favicons\n- Update ThemeContext to dynamically set favicon\n- Add favicon URL input to UI theme settings page\n- Add comprehensive tests\n\nCloses BerriAI#8323 (BerriAI#21653)

* fix(bedrock): prevent double UUID in create_file S3 key (BerriAI#21650)

In create_file for Bedrock, get_complete_file_url is called twice:
once in the sync handler (generating UUID-1 for api_base) and once
inside transform_create_file_request (generating UUID-2 for the
actual S3 upload). The Bedrock provider correctly writes UUID-2 into
litellm_params["upload_url"], but the sync handler unconditionally
overwrites it with api_base (UUID-1). This causes the returned
file_id to point to a non-existent S3 key.

Fix: only set upload_url to api_base when transform_create_file_request
has not already set it, preserving the Bedrock provider's value.

Closes BerriAI#21546

* feat(semantic-cache): support configurable vector dimensions for Qdrant (BerriAI#21649)

Add vector_size parameter to QdrantSemanticCache and expose it through
the Cache facade as qdrant_semantic_cache_vector_size. This allows users
to use embedding models with dimensions other than the default 1536,
enabling cheaper/stronger models like Stella (1024d), bge-en-icl (4096d),
voyage, cohere, etc.

The parameter defaults to QDRANT_VECTOR_SIZE (env var or 1536) for
backward compatibility. When creating new collections, the configured
vector_size is used instead of the hardcoded constant.

Closes BerriAI#9377

* fix(utils): normalize camelCase thinking param keys to snake_case (BerriAI#21762)

Clients like OpenCode's @ai-sdk/openai-compatible send budgetTokens
(camelCase) instead of budget_tokens in the thinking parameter, causing
validation errors. Add early normalization in completion().

* feat: add optional digest mode for Slack alert types (BerriAI#21683)

Adds per-alert-type digest mode that aggregates duplicate alerts
within a configurable time window and emits a single summary message
with count, start/end timestamps.

Configuration via general_settings.alert_type_config:
  alert_type_config:
    llm_requests_hanging:
      digest: true
      digest_interval: 86400

Digest key: (alert_type, request_model, api_base)
Default interval: 24 hours
Window type: fixed interval

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add blog_posts.json and local backup

* feat: add GetBlogPosts utility with GitHub fetch and local fallback

Adds GetBlogPosts class that fetches blog posts from GitHub with a 1-hour
in-process TTL cache, validates the response, and falls back to the bundled
blog_posts_backup.json on any network or validation failure.

* test: add cache reset fixture and LITELLM_LOCAL_BLOG_POSTS test

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat: add GET /public/litellm_blog_posts endpoint

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: log fallback warning in blog posts endpoint and tighten test

* feat: add disable_show_blog to UISettings

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat: add useUISettings and useDisableShowBlog hooks

* fix: rename useUISettings to useUISettingsFlags to avoid naming collision

* fix: use existing useUISettings hook in useDisableShowBlog to avoid cache duplication

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat: add BlogDropdown component with react-query and error/retry state

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: enforce 5-post limit in BlogDropdown and add cap test

* fix: add retry, stable post key, enabled guard in BlogDropdown

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat: add BlogDropdown to navbar after Docs link

* feat: add network_mock transport for benchmarking proxy overhead without real API calls

Intercepts at httpx transport layer so the full proxy path (auth, routing,
OpenAI SDK, response transformation) is exercised with zero-latency responses.
Activated via `litellm_settings: { network_mock: true }` in proxy config.

* Litellm dev 02 19 2026 p2 (BerriAI#21871)

* feat(ui/): new guardrails monitor 'demo

mock representation of what guardrails monitor looks like

* fix: ui updates

* style(ui/): fix styling

* feat: enable running ai monitor on individual guardrails

* feat: add backend logic for guardrail monitoring

* fix(guardrails/usage_endpoints.py): fix usage dashboard

* fix(budget): fix timezone config lookup and replace hardcoded timezone map with ZoneInfo (BerriAI#21754)

* fix(budget): fix timezone config lookup and replace hardcoded timezone map with ZoneInfo

* fix(budget): update stale docstring on get_budget_reset_time

* fix: add missing return type annotations to iterator protocol methods in streaming_handler (BerriAI#21750)

* fix: add return type annotations to iterator protocol methods in streaming_handler

Add missing return type annotations to __iter__, __aiter__, __next__, and __anext__ methods in CustomStreamWrapper and related classes.

- __iter__(self) -> Iterator["ModelResponseStream"]
- __aiter__(self) -> AsyncIterator["ModelResponseStream"]
- __next__(self) -> "ModelResponseStream"
- __anext__(self) -> "ModelResponseStream"

Also adds AsyncIterator and Iterator to typing imports.

Fixes issue with PLR0915 noqa comments and ensures proper type checking support.
Related to: BerriAI#8304

* fix: add ruff PLR0915 noqa for files with too many statements

* Add gollem Go agent framework cookbook example (BerriAI#21747)

Show how to use gollem, a production Go agent framework, with
LiteLLM proxy for multi-provider LLM access including tool use
and streaming.

* fix: avoid mutating caller-owned dicts in SpendUpdateQueue aggregation (BerriAI#21742)

* fix(vertex_ai): enable context-1m-2025-08-07 beta header (BerriAI#21870)

* server root path regression doc

* fixing syntax

* fix: replace Zapier webhook with Google Form for survey submission (BerriAI#21621)

* Replace Zapier webhook with Google Form for survey submission

* Add back error logging for survey submission debugging

---------

Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>

* Revert "Merge pull request BerriAI#21140 from BerriAI/litellm_perf_user_api_key_auth"

This reverts commit 0e1db3f, reversing
changes made to 7e2d6f2.

* test_vertex_ai_gemini_2_5_pro_streaming

* UI new build

* fix rendering

* ui new build

* docs fix

* docs fix

* docs fix

* docs fix

* docs fix

* docs fix

* docs fix

* docs fix

* release note docs

* docs

* adding image

* fix(vertex_ai): enable context-1m-2025-08-07 beta header

The `context-1m-2025-08-07` Anthropic beta header was set to `null` for vertex_ai,
causing it to be filtered out when users set `extra_headers: {anthropic-beta: context-1m-2025-08-07}`.

This prevented using Claude's 1M context window feature via Vertex AI, resulting in
`prompt is too long: 460500 tokens > 200000 maximum` errors.

Fixes BerriAI#21861

---------

Co-authored-by: yuneng-jiang <yuneng.jiang@gmail.com>
Co-authored-by: milan-berri <milan@berri.ai>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>

* Revert "fix(vertex_ai): enable context-1m-2025-08-07 beta header (BerriAI#21870)" (BerriAI#21876)

This reverts commit bce078a.

* docs(ui): add pre-PR checklist to UI contributing guide

Add testing and build verification steps per maintainer feedback
from @yjiang-litellm. Contributors should run their related tests
per-file and ensure npm run build passes before opening PRs.

* Fix entries with fast and us/

* Add tests for fast and us

* Add support for Priority PayGo for vertex ai and gemini

* Add model pricing

* fix: ensure arrival_time is set before calculating queue time

* Fix: Anthropic model wildcard access issue

* Add incident report

* Add ability to see which model cost map is getting used

* Fix name of title

* Readd tpm limit

* State management fixes for CheckBatchCost

* Fix PR review comments

* State management fixes for CheckBatchCost - Address greptile comments

* fix mypy issues:

* Add Noma guardrails v2 based on custom guardrails (BerriAI#21400)

* Fix code qa issues

* Fix mypy issues

* Fix mypy issues

* Fix test_aaamodel_prices_and_context_window_json_is_valid

* fix: update calendly on repo

* fix(tests): use counter-based mock for time.time in prisma self-heal test

The test used a fixed side_effect list for time.time(), but the number
of calls varies by Python version, causing StopIteration on 3.12 and
AssertionError on 3.14. Replace with an infinite counter-based callable
and assert the timestamp was updated rather than checking for an exact
value.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(tests): use absolute path for model_prices JSON in validation test

The test used a relative path 'litellm/model_prices_and_context_window.json'
which only works when pytest runs from a specific working directory.
Use os.path based on __file__ to resolve the path reliably.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Update tests/test_litellm/test_utils.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* fix(tests): use os.path instead of Path to avoid NameError

Path is not imported at module level. Use os.path.join which is already
available.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* clean up mock transport: remove streaming, add defensive parsing

* docs: add Google GenAI SDK tutorial (JS & Python) (BerriAI#21885)

* docs: add Google GenAI SDK tutorial for JS and Python

Add tutorial for using Google's official GenAI SDK (@google/genai for JS,
google-genai for Python) with LiteLLM proxy. Covers pass-through and
native router endpoints, streaming, multi-turn chat, and multi-provider
routing via model_group_alias. Also updates pass-through docs to use the
new SDK replacing the deprecated @google/generative-ai.

* fix(docs): correct Python SDK env var name in GenAI tutorial

GOOGLE_GENAI_API_KEY does not exist in the google-genai SDK.
The correct env var is GEMINI_API_KEY (or GOOGLE_API_KEY).
Also note that the Python SDK has no base URL env var.

* fix(docs): replace non-existent GOOGLE_GENAI_BASE_URL env var in interactions.md

The Python google-genai SDK does not read GOOGLE_GENAI_BASE_URL.
Use http_options={"base_url": "..."} in code instead.

* docs: add network mock benchmarking section

* docs: tweak benchmarks wording

* fix: add auth headers and empty latencies guard to benchmark script

* refactor: use method-level import for MockOpenAITransport

* fix: guard print_aggregate against empty latencies

* fix: add INCOMPLETE status to Interactions API enum and test

Google added INCOMPLETE to the Interactions API OpenAPI spec status enum.
Update both the Status3 enum in the SDK types and the test's expected
values to match.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Guardrail Monitor - measure guardrail reliability in prod  (BerriAI#21944)

* fix: fix log viewer for guardrail monitoring

* feat(ui/): fix rendering logs per guardrail

* fix: fix viewing logs on overview tab of guardrail

* fix: log viewer

* fix: fix naming to align with metric

* docs: add performance & reliability section to v1.81.14 release notes

* fix(tests): make RPM limit test sequential to avoid race condition

Concurrent requests via run_in_executor + asyncio.gather caused a race
condition where more requests slipped through the rate limiter than
expected, leading to flaky test failures (e.g. 3 successes instead of 2
with rpm_limit=2).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: Singapore guardrail policies (PDPA + MAS AI Risk Management) (BerriAI#21948)

* feat: Singapore PDPA PII protection guardrail policy template

Add Singapore Personal Data Protection Act (PDPA) guardrail support:

Regex patterns (patterns.json):
- sg_nric: NRIC/FIN detection ([STFGM] + 7 digits + checksum letter)
- sg_phone: Singapore phone numbers (+65/0065/65 prefix)
- sg_postal_code: 6-digit postal codes (contextual)
- passport_singapore: Passport numbers (E/K + 7 digits, contextual)
- sg_uen: Unique Entity Numbers (3 formats)
- sg_bank_account: Bank account numbers (dash format, contextual)

YAML policy templates (5 sub-guardrails):
- sg_pdpa_personal_identifiers: s.13 Consent
- sg_pdpa_sensitive_data: Advisory Guidelines
- sg_pdpa_do_not_call: Part IX DNC Registry
- sg_pdpa_data_transfer: s.26 overseas transfers
- sg_pdpa_profiling_automated_decisions: Model AI Governance Framework

Policy template entry in policy_templates.json with 9 guardrail definitions
(4 regex-based + 5 YAML conditional keyword matching).

Tests:
- test_sg_patterns.py: regex pattern unit tests
- test_sg_pdpa_guardrails.py: conditional keyword matching tests (100+ cases)

* feat: MAS AI Risk Management Guidelines guardrail policy template

Add Monetary Authority of Singapore (MAS) AI Risk Management Guidelines
guardrail support for financial institutions:

YAML policy templates (5 sub-guardrails):
- sg_mas_fairness_bias: Blocks discriminatory financial AI (credit/loans/insurance by protected attributes)
- sg_mas_transparency_explainability: Blocks opaque/unexplainable AI for consequential financial decisions
- sg_mas_human_oversight: Blocks fully automated financial decisions without human-in-the-loop
- sg_mas_data_governance: Blocks unauthorized sharing/mishandling of financial customer data
- sg_mas_model_security: Blocks adversarial attacks, model poisoning, inversion on financial AI

Policy template entry in policy_templates.json with 5 guardrail definitions.
Aligned with MAS FEAT Principles, Project MindForge, and NIST AI RMF.

Tests:
- test_sg_mas_ai_guardrails.py: conditional keyword matching tests (100+ cases)

* fix: address SG pattern review feedback

- Update NRIC lowercase test for IGNORECASE runtime behavior
- Add keyword context guard to sg_uen pattern to reduce false positives

* docs: clarify MAS AIRM timeline references

- Explicitly mark MAS AIRM as Nov 2025 consultation draft
- Add 2018 qualifier for FEAT principles in MAS policy descriptions
- Update MAS guardrail wording to avoid release-year ambiguity

* chore: commit resolved MAS policy conflicts

* test:

* chore:

* Add OpenAI Agents SDK tutorial with LiteLLM Proxy to docs  (BerriAI#21221)

* Add OpenAI Agents SDK tutorial to docs

* Update OpenAI Agents SDK tutorial to use LiteLLM environment variables

* Enhance OpenAI Agents SDK tutorial with built-in LiteLLM extension details and updated configuration steps. Adjust section headings for clarity and improve the flow of information regarding model setup and usage.

* adjust blog posts to fetch from github first

* feat(videos): add variant parameter to video content download (BerriAI#21955)

openai videos models support the features to download variants.
See more details here: https://developers.openai.com/api/docs/guides/video-generation#use-image-references.
Plumb variant (e.g. "thumbnail", "spritesheet") through the full
video content download chain: avideo_content → video_content →
video_content_handler → transform_video_content_request. OpenAI
appends ?variant=<value> to the GET URL; other providers accept
the parameter in their signature but ignore it.

* fixing path

* adjust blog post path

* Revert duplicate issue checker to text-based matching, remove duplicate PR workflow

Remove the Claude Code-powered duplicate PR detection workflow and revert
the duplicate issue checker back to wow-actions/potential-duplicates with
text similarity matching.

* ui changes

* adding tests

* adjust default aggregation threshold

* fix(videos): pass api_key from litellm_params to video remix handlers (BerriAI#21965)

video_remix_handler and async_video_remix_handler were not falling back
to litellm_params.api_key when the api_key parameter was None, causing
Authorization: Bearer None to be sent to the provider. This matches the
pattern already used by async_video_generation_handler.

* adding testing coverage + fixing flaky tests

* fix(ollama): thread api_base through get_model_info and add graceful fallback

When users pass api_base to litellm.completion() for Ollama, the model
info fetch (context window, function_calling support) was ignoring the
user's api_base and only reading OLLAMA_API_BASE env var or defaulting
to localhost:11434. This caused confusing errors in logs when Ollama
runs on a remote server.

Thread api_base from litellm_params through the get_model_info call
chain so OllamaConfig.get_model_info() uses the correct server. Also
return safe defaults instead of raising when the server is unreachable.

Fixes BerriAI#21967

---------

Co-authored-by: An Tang <ta@stripe.com>
Co-authored-by: janfrederickk <75388864+janfrederickk@users.noreply.github.com>
Co-authored-by: Zhenting Huang <3061613175@qq.com>
Co-authored-by: Darien Kindlund <darien@kindlund.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: yuneng-jiang <yuneng.jiang@gmail.com>
Co-authored-by: Ryan Crabbe <rcrabbe@berkeley.edu>
Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com>
Co-authored-by: LeeJuOh <56071126+LeeJuOh@users.noreply.github.com>
Co-authored-by: Monesh Ram <31161039+WhoisMonesh@users.noreply.github.com>
Co-authored-by: Trevor Prater <trevor.prater@gmail.com>
Co-authored-by: The Mavik <179817126+themavik@users.noreply.github.com>
Co-authored-by: Edwin Isac <33712823+edwiniac@users.noreply.github.com>
Co-authored-by: milan-berri <milan@berri.ai>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: Sameer Kankute <sameer@berri.ai>
Co-authored-by: Harshit Jain <harshitjain0562@gmail.com>
Co-authored-by: Harshit Jain <48647625+Harshit28j@users.noreply.github.com>
Co-authored-by: Ephrim Stanley <ephrim.stanley@point72.com>
Co-authored-by: TomAlon <tom@noma.security>
Co-authored-by: Julio Quinteros Pro <jquinter@gmail.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: ryan-crabbe <128659760+ryan-crabbe@users.noreply.github.com>
Co-authored-by: Ron Zhong <ron-zhong@hotmail.com>
Co-authored-by: Arindam Majumder <109217591+Arindam200@users.noreply.github.com>
Co-authored-by: Lei Nie <lenie@quora.com>
damhau pushed a commit to damhau/litellm that referenced this pull request Feb 26, 2026
…voke (BerriAI#21964)

* auth_with_role_name add region_name arg for cross-account sts

* update tests to include case with aws_region_name for _auth_with_aws_role

* Only pass region_name to STS client when aws_region_name is set

* Add optional aws_sts_endpoint to _auth_with_aws_role

* Parametrize ambient-credentials test for no opts, region_name, and aws_sts_endpoint

* consistently passing region and endpoint args into explicit credentials irsa

* fix env var leakage

* fix: bedrock openai-compatible imported-model should also have model arn encoded

* feat: show proxy url in ModelHub (BerriAI#21660)

* fix(bedrock): correct modelInput format for Converse API batch models (BerriAI#21656)

* fix(proxy): add model_ids param to access group endpoints for precise deployment tagging (BerriAI#21655)

POST /access_group/new and PUT /access_group/{name}/update now accept an
optional model_ids list that targets specific deployments by their unique
model_id, instead of tagging every deployment that shares a model_name.

When model_ids is provided it takes priority over model_names, giving
API callers the same single-deployment precision that the UI already has
via PATCH /model/{model_id}/update.

Backward compatible: model_names continues to work as before.

Closes BerriAI#21544

* feat(proxy): add custom favicon support\n\nAdd ability to configure a custom favicon for the litellm proxy UI.\n\n- Add favicon_url field to UIThemeConfig model\n- Add LITELLM_FAVICON_URL env var support\n- Add /get_favicon endpoint to serve custom favicons\n- Update ThemeContext to dynamically set favicon\n- Add favicon URL input to UI theme settings page\n- Add comprehensive tests\n\nCloses BerriAI#8323 (BerriAI#21653)

* fix(bedrock): prevent double UUID in create_file S3 key (BerriAI#21650)

In create_file for Bedrock, get_complete_file_url is called twice:
once in the sync handler (generating UUID-1 for api_base) and once
inside transform_create_file_request (generating UUID-2 for the
actual S3 upload). The Bedrock provider correctly writes UUID-2 into
litellm_params["upload_url"], but the sync handler unconditionally
overwrites it with api_base (UUID-1). This causes the returned
file_id to point to a non-existent S3 key.

Fix: only set upload_url to api_base when transform_create_file_request
has not already set it, preserving the Bedrock provider's value.

Closes BerriAI#21546

* feat(semantic-cache): support configurable vector dimensions for Qdrant (BerriAI#21649)

Add vector_size parameter to QdrantSemanticCache and expose it through
the Cache facade as qdrant_semantic_cache_vector_size. This allows users
to use embedding models with dimensions other than the default 1536,
enabling cheaper/stronger models like Stella (1024d), bge-en-icl (4096d),
voyage, cohere, etc.

The parameter defaults to QDRANT_VECTOR_SIZE (env var or 1536) for
backward compatibility. When creating new collections, the configured
vector_size is used instead of the hardcoded constant.

Closes BerriAI#9377

* fix(utils): normalize camelCase thinking param keys to snake_case (BerriAI#21762)

Clients like OpenCode's @ai-sdk/openai-compatible send budgetTokens
(camelCase) instead of budget_tokens in the thinking parameter, causing
validation errors. Add early normalization in completion().

* feat: add optional digest mode for Slack alert types (BerriAI#21683)

Adds per-alert-type digest mode that aggregates duplicate alerts
within a configurable time window and emits a single summary message
with count, start/end timestamps.

Configuration via general_settings.alert_type_config:
  alert_type_config:
    llm_requests_hanging:
      digest: true
      digest_interval: 86400

Digest key: (alert_type, request_model, api_base)
Default interval: 24 hours
Window type: fixed interval

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add blog_posts.json and local backup

* feat: add GetBlogPosts utility with GitHub fetch and local fallback

Adds GetBlogPosts class that fetches blog posts from GitHub with a 1-hour
in-process TTL cache, validates the response, and falls back to the bundled
blog_posts_backup.json on any network or validation failure.

* test: add cache reset fixture and LITELLM_LOCAL_BLOG_POSTS test

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat: add GET /public/litellm_blog_posts endpoint

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: log fallback warning in blog posts endpoint and tighten test

* feat: add disable_show_blog to UISettings

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat: add useUISettings and useDisableShowBlog hooks

* fix: rename useUISettings to useUISettingsFlags to avoid naming collision

* fix: use existing useUISettings hook in useDisableShowBlog to avoid cache duplication

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat: add BlogDropdown component with react-query and error/retry state

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: enforce 5-post limit in BlogDropdown and add cap test

* fix: add retry, stable post key, enabled guard in BlogDropdown

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat: add BlogDropdown to navbar after Docs link

* feat: add network_mock transport for benchmarking proxy overhead without real API calls

Intercepts at httpx transport layer so the full proxy path (auth, routing,
OpenAI SDK, response transformation) is exercised with zero-latency responses.
Activated via `litellm_settings: { network_mock: true }` in proxy config.

* Litellm dev 02 19 2026 p2 (BerriAI#21871)

* feat(ui/): new guardrails monitor 'demo

mock representation of what guardrails monitor looks like

* fix: ui updates

* style(ui/): fix styling

* feat: enable running ai monitor on individual guardrails

* feat: add backend logic for guardrail monitoring

* fix(guardrails/usage_endpoints.py): fix usage dashboard

* fix(budget): fix timezone config lookup and replace hardcoded timezone map with ZoneInfo (BerriAI#21754)

* fix(budget): fix timezone config lookup and replace hardcoded timezone map with ZoneInfo

* fix(budget): update stale docstring on get_budget_reset_time

* fix: add missing return type annotations to iterator protocol methods in streaming_handler (BerriAI#21750)

* fix: add return type annotations to iterator protocol methods in streaming_handler

Add missing return type annotations to __iter__, __aiter__, __next__, and __anext__ methods in CustomStreamWrapper and related classes.

- __iter__(self) -> Iterator["ModelResponseStream"]
- __aiter__(self) -> AsyncIterator["ModelResponseStream"]
- __next__(self) -> "ModelResponseStream"
- __anext__(self) -> "ModelResponseStream"

Also adds AsyncIterator and Iterator to typing imports.

Fixes issue with PLR0915 noqa comments and ensures proper type checking support.
Related to: BerriAI#8304

* fix: add ruff PLR0915 noqa for files with too many statements

* Add gollem Go agent framework cookbook example (BerriAI#21747)

Show how to use gollem, a production Go agent framework, with
LiteLLM proxy for multi-provider LLM access including tool use
and streaming.

* fix: avoid mutating caller-owned dicts in SpendUpdateQueue aggregation (BerriAI#21742)

* fix(vertex_ai): enable context-1m-2025-08-07 beta header (BerriAI#21870)

* server root path regression doc

* fixing syntax

* fix: replace Zapier webhook with Google Form for survey submission (BerriAI#21621)

* Replace Zapier webhook with Google Form for survey submission

* Add back error logging for survey submission debugging

---------

Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>

* Revert "Merge pull request BerriAI#21140 from BerriAI/litellm_perf_user_api_key_auth"

This reverts commit 0e1db3f, reversing
changes made to 7e2d6f2.

* test_vertex_ai_gemini_2_5_pro_streaming

* UI new build

* fix rendering

* ui new build

* docs fix

* docs fix

* docs fix

* docs fix

* docs fix

* docs fix

* docs fix

* docs fix

* release note docs

* docs

* adding image

* fix(vertex_ai): enable context-1m-2025-08-07 beta header

The `context-1m-2025-08-07` Anthropic beta header was set to `null` for vertex_ai,
causing it to be filtered out when users set `extra_headers: {anthropic-beta: context-1m-2025-08-07}`.

This prevented using Claude's 1M context window feature via Vertex AI, resulting in
`prompt is too long: 460500 tokens > 200000 maximum` errors.

Fixes BerriAI#21861

---------

Co-authored-by: yuneng-jiang <yuneng.jiang@gmail.com>
Co-authored-by: milan-berri <milan@berri.ai>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>

* Revert "fix(vertex_ai): enable context-1m-2025-08-07 beta header (BerriAI#21870)" (BerriAI#21876)

This reverts commit bce078a.

* docs(ui): add pre-PR checklist to UI contributing guide

Add testing and build verification steps per maintainer feedback
from @yjiang-litellm. Contributors should run their related tests
per-file and ensure npm run build passes before opening PRs.

* Fix entries with fast and us/

* Add tests for fast and us

* Add support for Priority PayGo for vertex ai and gemini

* Add model pricing

* fix: ensure arrival_time is set before calculating queue time

* Fix: Anthropic model wildcard access issue

* Add incident report

* Add ability to see which model cost map is getting used

* Fix name of title

* Readd tpm limit

* State management fixes for CheckBatchCost

* Fix PR review comments

* State management fixes for CheckBatchCost - Address greptile comments

* fix mypy issues:

* Add Noma guardrails v2 based on custom guardrails (BerriAI#21400)

* Fix code qa issues

* Fix mypy issues

* Fix mypy issues

* Fix test_aaamodel_prices_and_context_window_json_is_valid

* fix: update calendly on repo

* fix(tests): use counter-based mock for time.time in prisma self-heal test

The test used a fixed side_effect list for time.time(), but the number
of calls varies by Python version, causing StopIteration on 3.12 and
AssertionError on 3.14. Replace with an infinite counter-based callable
and assert the timestamp was updated rather than checking for an exact
value.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(tests): use absolute path for model_prices JSON in validation test

The test used a relative path 'litellm/model_prices_and_context_window.json'
which only works when pytest runs from a specific working directory.
Use os.path based on __file__ to resolve the path reliably.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Update tests/test_litellm/test_utils.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* fix(tests): use os.path instead of Path to avoid NameError

Path is not imported at module level. Use os.path.join which is already
available.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* clean up mock transport: remove streaming, add defensive parsing

* docs: add Google GenAI SDK tutorial (JS & Python) (BerriAI#21885)

* docs: add Google GenAI SDK tutorial for JS and Python

Add tutorial for using Google's official GenAI SDK (@google/genai for JS,
google-genai for Python) with LiteLLM proxy. Covers pass-through and
native router endpoints, streaming, multi-turn chat, and multi-provider
routing via model_group_alias. Also updates pass-through docs to use the
new SDK replacing the deprecated @google/generative-ai.

* fix(docs): correct Python SDK env var name in GenAI tutorial

GOOGLE_GENAI_API_KEY does not exist in the google-genai SDK.
The correct env var is GEMINI_API_KEY (or GOOGLE_API_KEY).
Also note that the Python SDK has no base URL env var.

* fix(docs): replace non-existent GOOGLE_GENAI_BASE_URL env var in interactions.md

The Python google-genai SDK does not read GOOGLE_GENAI_BASE_URL.
Use http_options={"base_url": "..."} in code instead.

* docs: add network mock benchmarking section

* docs: tweak benchmarks wording

* fix: add auth headers and empty latencies guard to benchmark script

* refactor: use method-level import for MockOpenAITransport

* fix: guard print_aggregate against empty latencies

* fix: add INCOMPLETE status to Interactions API enum and test

Google added INCOMPLETE to the Interactions API OpenAPI spec status enum.
Update both the Status3 enum in the SDK types and the test's expected
values to match.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Guardrail Monitor - measure guardrail reliability in prod  (BerriAI#21944)

* fix: fix log viewer for guardrail monitoring

* feat(ui/): fix rendering logs per guardrail

* fix: fix viewing logs on overview tab of guardrail

* fix: log viewer

* fix: fix naming to align with metric

* docs: add performance & reliability section to v1.81.14 release notes

* fix(tests): make RPM limit test sequential to avoid race condition

Concurrent requests via run_in_executor + asyncio.gather caused a race
condition where more requests slipped through the rate limiter than
expected, leading to flaky test failures (e.g. 3 successes instead of 2
with rpm_limit=2).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: Singapore guardrail policies (PDPA + MAS AI Risk Management) (BerriAI#21948)

* feat: Singapore PDPA PII protection guardrail policy template

Add Singapore Personal Data Protection Act (PDPA) guardrail support:

Regex patterns (patterns.json):
- sg_nric: NRIC/FIN detection ([STFGM] + 7 digits + checksum letter)
- sg_phone: Singapore phone numbers (+65/0065/65 prefix)
- sg_postal_code: 6-digit postal codes (contextual)
- passport_singapore: Passport numbers (E/K + 7 digits, contextual)
- sg_uen: Unique Entity Numbers (3 formats)
- sg_bank_account: Bank account numbers (dash format, contextual)

YAML policy templates (5 sub-guardrails):
- sg_pdpa_personal_identifiers: s.13 Consent
- sg_pdpa_sensitive_data: Advisory Guidelines
- sg_pdpa_do_not_call: Part IX DNC Registry
- sg_pdpa_data_transfer: s.26 overseas transfers
- sg_pdpa_profiling_automated_decisions: Model AI Governance Framework

Policy template entry in policy_templates.json with 9 guardrail definitions
(4 regex-based + 5 YAML conditional keyword matching).

Tests:
- test_sg_patterns.py: regex pattern unit tests
- test_sg_pdpa_guardrails.py: conditional keyword matching tests (100+ cases)

* feat: MAS AI Risk Management Guidelines guardrail policy template

Add Monetary Authority of Singapore (MAS) AI Risk Management Guidelines
guardrail support for financial institutions:

YAML policy templates (5 sub-guardrails):
- sg_mas_fairness_bias: Blocks discriminatory financial AI (credit/loans/insurance by protected attributes)
- sg_mas_transparency_explainability: Blocks opaque/unexplainable AI for consequential financial decisions
- sg_mas_human_oversight: Blocks fully automated financial decisions without human-in-the-loop
- sg_mas_data_governance: Blocks unauthorized sharing/mishandling of financial customer data
- sg_mas_model_security: Blocks adversarial attacks, model poisoning, inversion on financial AI

Policy template entry in policy_templates.json with 5 guardrail definitions.
Aligned with MAS FEAT Principles, Project MindForge, and NIST AI RMF.

Tests:
- test_sg_mas_ai_guardrails.py: conditional keyword matching tests (100+ cases)

* fix: address SG pattern review feedback

- Update NRIC lowercase test for IGNORECASE runtime behavior
- Add keyword context guard to sg_uen pattern to reduce false positives

* docs: clarify MAS AIRM timeline references

- Explicitly mark MAS AIRM as Nov 2025 consultation draft
- Add 2018 qualifier for FEAT principles in MAS policy descriptions
- Update MAS guardrail wording to avoid release-year ambiguity

* chore: commit resolved MAS policy conflicts

* test:

* chore:

* Add OpenAI Agents SDK tutorial with LiteLLM Proxy to docs  (BerriAI#21221)

* Add OpenAI Agents SDK tutorial to docs

* Update OpenAI Agents SDK tutorial to use LiteLLM environment variables

* Enhance OpenAI Agents SDK tutorial with built-in LiteLLM extension details and updated configuration steps. Adjust section headings for clarity and improve the flow of information regarding model setup and usage.

* adjust blog posts to fetch from github first

* feat(videos): add variant parameter to video content download (BerriAI#21955)

openai videos models support the features to download variants.
See more details here: https://developers.openai.com/api/docs/guides/video-generation#use-image-references.
Plumb variant (e.g. "thumbnail", "spritesheet") through the full
video content download chain: avideo_content → video_content →
video_content_handler → transform_video_content_request. OpenAI
appends ?variant=<value> to the GET URL; other providers accept
the parameter in their signature but ignore it.

* fixing path

* adjust blog post path

* Revert duplicate issue checker to text-based matching, remove duplicate PR workflow

Remove the Claude Code-powered duplicate PR detection workflow and revert
the duplicate issue checker back to wow-actions/potential-duplicates with
text similarity matching.

* ui changes

* adding tests

* fix(anthropic): sanitize tool_use IDs in assistant messages

Apply _sanitize_anthropic_tool_use_id to tool_use blocks in
convert_to_anthropic_tool_invoke, not just tool_result blocks.
IDs from external frameworks (e.g. MiniMax) may contain characters
like colons that violate Anthropic's ^[a-zA-Z0-9_-]+$ pattern.

Adds test for invalid ID sanitization in tool_use blocks.

---------

Co-authored-by: An Tang <ta@stripe.com>
Co-authored-by: janfrederickk <75388864+janfrederickk@users.noreply.github.com>
Co-authored-by: Zhenting Huang <3061613175@qq.com>
Co-authored-by: Cesar Garcia <128240629+Chesars@users.noreply.github.com>
Co-authored-by: Darien Kindlund <darien@kindlund.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: yuneng-jiang <yuneng.jiang@gmail.com>
Co-authored-by: Ryan Crabbe <rcrabbe@berkeley.edu>
Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com>
Co-authored-by: LeeJuOh <56071126+LeeJuOh@users.noreply.github.com>
Co-authored-by: Monesh Ram <31161039+WhoisMonesh@users.noreply.github.com>
Co-authored-by: Trevor Prater <trevor.prater@gmail.com>
Co-authored-by: The Mavik <179817126+themavik@users.noreply.github.com>
Co-authored-by: Edwin Isac <33712823+edwiniac@users.noreply.github.com>
Co-authored-by: milan-berri <milan@berri.ai>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: Chesars <cesarponce19544@gmail.com>
Co-authored-by: Sameer Kankute <sameer@berri.ai>
Co-authored-by: Harshit Jain <harshitjain0562@gmail.com>
Co-authored-by: Harshit Jain <48647625+Harshit28j@users.noreply.github.com>
Co-authored-by: Ephrim Stanley <ephrim.stanley@point72.com>
Co-authored-by: TomAlon <tom@noma.security>
Co-authored-by: Julio Quinteros Pro <jquinter@gmail.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: ryan-crabbe <128659760+ryan-crabbe@users.noreply.github.com>
Co-authored-by: Ron Zhong <ron-zhong@hotmail.com>
Co-authored-by: Arindam Majumder <109217591+Arindam200@users.noreply.github.com>
Co-authored-by: Lei Nie <lenie@quora.com>
Sameerlite added a commit that referenced this pull request Mar 3, 2026
…21970)

* auth_with_role_name add region_name arg for cross-account sts

* update tests to include case with aws_region_name for _auth_with_aws_role

* Only pass region_name to STS client when aws_region_name is set

* Add optional aws_sts_endpoint to _auth_with_aws_role

* Parametrize ambient-credentials test for no opts, region_name, and aws_sts_endpoint

* consistently passing region and endpoint args into explicit credentials irsa

* fix env var leakage

* fix: bedrock openai-compatible imported-model should also have model arn encoded

* feat: show proxy url in ModelHub (#21660)

* fix(bedrock): correct modelInput format for Converse API batch models (#21656)

* fix(proxy): add model_ids param to access group endpoints for precise deployment tagging (#21655)

POST /access_group/new and PUT /access_group/{name}/update now accept an
optional model_ids list that targets specific deployments by their unique
model_id, instead of tagging every deployment that shares a model_name.

When model_ids is provided it takes priority over model_names, giving
API callers the same single-deployment precision that the UI already has
via PATCH /model/{model_id}/update.

Backward compatible: model_names continues to work as before.

Closes #21544

* feat(proxy): add custom favicon support\n\nAdd ability to configure a custom favicon for the litellm proxy UI.\n\n- Add favicon_url field to UIThemeConfig model\n- Add LITELLM_FAVICON_URL env var support\n- Add /get_favicon endpoint to serve custom favicons\n- Update ThemeContext to dynamically set favicon\n- Add favicon URL input to UI theme settings page\n- Add comprehensive tests\n\nCloses #8323 (#21653)

* fix(bedrock): prevent double UUID in create_file S3 key (#21650)

In create_file for Bedrock, get_complete_file_url is called twice:
once in the sync handler (generating UUID-1 for api_base) and once
inside transform_create_file_request (generating UUID-2 for the
actual S3 upload). The Bedrock provider correctly writes UUID-2 into
litellm_params["upload_url"], but the sync handler unconditionally
overwrites it with api_base (UUID-1). This causes the returned
file_id to point to a non-existent S3 key.

Fix: only set upload_url to api_base when transform_create_file_request
has not already set it, preserving the Bedrock provider's value.

Closes #21546

* feat(semantic-cache): support configurable vector dimensions for Qdrant (#21649)

Add vector_size parameter to QdrantSemanticCache and expose it through
the Cache facade as qdrant_semantic_cache_vector_size. This allows users
to use embedding models with dimensions other than the default 1536,
enabling cheaper/stronger models like Stella (1024d), bge-en-icl (4096d),
voyage, cohere, etc.

The parameter defaults to QDRANT_VECTOR_SIZE (env var or 1536) for
backward compatibility. When creating new collections, the configured
vector_size is used instead of the hardcoded constant.

Closes #9377

* fix(utils): normalize camelCase thinking param keys to snake_case (#21762)

Clients like OpenCode's @ai-sdk/openai-compatible send budgetTokens
(camelCase) instead of budget_tokens in the thinking parameter, causing
validation errors. Add early normalization in completion().

* feat: add optional digest mode for Slack alert types (#21683)

Adds per-alert-type digest mode that aggregates duplicate alerts
within a configurable time window and emits a single summary message
with count, start/end timestamps.

Configuration via general_settings.alert_type_config:
  alert_type_config:
    llm_requests_hanging:
      digest: true
      digest_interval: 86400

Digest key: (alert_type, request_model, api_base)
Default interval: 24 hours
Window type: fixed interval

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add blog_posts.json and local backup

* feat: add GetBlogPosts utility with GitHub fetch and local fallback

Adds GetBlogPosts class that fetches blog posts from GitHub with a 1-hour
in-process TTL cache, validates the response, and falls back to the bundled
blog_posts_backup.json on any network or validation failure.

* test: add cache reset fixture and LITELLM_LOCAL_BLOG_POSTS test

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat: add GET /public/litellm_blog_posts endpoint

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: log fallback warning in blog posts endpoint and tighten test

* feat: add disable_show_blog to UISettings

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat: add useUISettings and useDisableShowBlog hooks

* fix: rename useUISettings to useUISettingsFlags to avoid naming collision

* fix: use existing useUISettings hook in useDisableShowBlog to avoid cache duplication

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat: add BlogDropdown component with react-query and error/retry state

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: enforce 5-post limit in BlogDropdown and add cap test

* fix: add retry, stable post key, enabled guard in BlogDropdown

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat: add BlogDropdown to navbar after Docs link

* feat: add network_mock transport for benchmarking proxy overhead without real API calls

Intercepts at httpx transport layer so the full proxy path (auth, routing,
OpenAI SDK, response transformation) is exercised with zero-latency responses.
Activated via `litellm_settings: { network_mock: true }` in proxy config.

* Litellm dev 02 19 2026 p2 (#21871)

* feat(ui/): new guardrails monitor 'demo

mock representation of what guardrails monitor looks like

* fix: ui updates

* style(ui/): fix styling

* feat: enable running ai monitor on individual guardrails

* feat: add backend logic for guardrail monitoring

* fix(guardrails/usage_endpoints.py): fix usage dashboard

* fix(budget): fix timezone config lookup and replace hardcoded timezone map with ZoneInfo (#21754)

* fix(budget): fix timezone config lookup and replace hardcoded timezone map with ZoneInfo

* fix(budget): update stale docstring on get_budget_reset_time

* fix: add missing return type annotations to iterator protocol methods in streaming_handler (#21750)

* fix: add return type annotations to iterator protocol methods in streaming_handler

Add missing return type annotations to __iter__, __aiter__, __next__, and __anext__ methods in CustomStreamWrapper and related classes.

- __iter__(self) -> Iterator["ModelResponseStream"]
- __aiter__(self) -> AsyncIterator["ModelResponseStream"]
- __next__(self) -> "ModelResponseStream"
- __anext__(self) -> "ModelResponseStream"

Also adds AsyncIterator and Iterator to typing imports.

Fixes issue with PLR0915 noqa comments and ensures proper type checking support.
Related to: #8304

* fix: add ruff PLR0915 noqa for files with too many statements

* Add gollem Go agent framework cookbook example (#21747)

Show how to use gollem, a production Go agent framework, with
LiteLLM proxy for multi-provider LLM access including tool use
and streaming.

* fix: avoid mutating caller-owned dicts in SpendUpdateQueue aggregation (#21742)

* fix(vertex_ai): enable context-1m-2025-08-07 beta header (#21870)

* server root path regression doc

* fixing syntax

* fix: replace Zapier webhook with Google Form for survey submission (#21621)

* Replace Zapier webhook with Google Form for survey submission

* Add back error logging for survey submission debugging

---------

Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>

* Revert "Merge pull request #21140 from BerriAI/litellm_perf_user_api_key_auth"

This reverts commit 0e1db3f, reversing
changes made to 7e2d6f2.

* test_vertex_ai_gemini_2_5_pro_streaming

* UI new build

* fix rendering

* ui new build

* docs fix

* docs fix

* docs fix

* docs fix

* docs fix

* docs fix

* docs fix

* docs fix

* release note docs

* docs

* adding image

* fix(vertex_ai): enable context-1m-2025-08-07 beta header

The `context-1m-2025-08-07` Anthropic beta header was set to `null` for vertex_ai,
causing it to be filtered out when users set `extra_headers: {anthropic-beta: context-1m-2025-08-07}`.

This prevented using Claude's 1M context window feature via Vertex AI, resulting in
`prompt is too long: 460500 tokens > 200000 maximum` errors.

Fixes #21861

---------

Co-authored-by: yuneng-jiang <yuneng.jiang@gmail.com>
Co-authored-by: milan-berri <milan@berri.ai>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>

* Revert "fix(vertex_ai): enable context-1m-2025-08-07 beta header (#21870)" (#21876)

This reverts commit bce078a.

* docs(ui): add pre-PR checklist to UI contributing guide

Add testing and build verification steps per maintainer feedback
from @yjiang-litellm. Contributors should run their related tests
per-file and ensure npm run build passes before opening PRs.

* Fix entries with fast and us/

* Add tests for fast and us

* Add support for Priority PayGo for vertex ai and gemini

* Add model pricing

* fix: ensure arrival_time is set before calculating queue time

* Fix: Anthropic model wildcard access issue

* Add incident report

* Add ability to see which model cost map is getting used

* Fix name of title

* Readd tpm limit

* State management fixes for CheckBatchCost

* Fix PR review comments

* State management fixes for CheckBatchCost - Address greptile comments

* fix mypy issues:

* Add Noma guardrails v2 based on custom guardrails (#21400)

* Fix code qa issues

* Fix mypy issues

* Fix mypy issues

* Fix test_aaamodel_prices_and_context_window_json_is_valid

* fix: update calendly on repo

* fix(tests): use counter-based mock for time.time in prisma self-heal test

The test used a fixed side_effect list for time.time(), but the number
of calls varies by Python version, causing StopIteration on 3.12 and
AssertionError on 3.14. Replace with an infinite counter-based callable
and assert the timestamp was updated rather than checking for an exact
value.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(tests): use absolute path for model_prices JSON in validation test

The test used a relative path 'litellm/model_prices_and_context_window.json'
which only works when pytest runs from a specific working directory.
Use os.path based on __file__ to resolve the path reliably.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Update tests/test_litellm/test_utils.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* fix(tests): use os.path instead of Path to avoid NameError

Path is not imported at module level. Use os.path.join which is already
available.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* clean up mock transport: remove streaming, add defensive parsing

* docs: add Google GenAI SDK tutorial (JS & Python) (#21885)

* docs: add Google GenAI SDK tutorial for JS and Python

Add tutorial for using Google's official GenAI SDK (@google/genai for JS,
google-genai for Python) with LiteLLM proxy. Covers pass-through and
native router endpoints, streaming, multi-turn chat, and multi-provider
routing via model_group_alias. Also updates pass-through docs to use the
new SDK replacing the deprecated @google/generative-ai.

* fix(docs): correct Python SDK env var name in GenAI tutorial

GOOGLE_GENAI_API_KEY does not exist in the google-genai SDK.
The correct env var is GEMINI_API_KEY (or GOOGLE_API_KEY).
Also note that the Python SDK has no base URL env var.

* fix(docs): replace non-existent GOOGLE_GENAI_BASE_URL env var in interactions.md

The Python google-genai SDK does not read GOOGLE_GENAI_BASE_URL.
Use http_options={"base_url": "..."} in code instead.

* docs: add network mock benchmarking section

* docs: tweak benchmarks wording

* fix: add auth headers and empty latencies guard to benchmark script

* refactor: use method-level import for MockOpenAITransport

* fix: guard print_aggregate against empty latencies

* fix: add INCOMPLETE status to Interactions API enum and test

Google added INCOMPLETE to the Interactions API OpenAPI spec status enum.
Update both the Status3 enum in the SDK types and the test's expected
values to match.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Guardrail Monitor - measure guardrail reliability in prod  (#21944)

* fix: fix log viewer for guardrail monitoring

* feat(ui/): fix rendering logs per guardrail

* fix: fix viewing logs on overview tab of guardrail

* fix: log viewer

* fix: fix naming to align with metric

* docs: add performance & reliability section to v1.81.14 release notes

* fix(tests): make RPM limit test sequential to avoid race condition

Concurrent requests via run_in_executor + asyncio.gather caused a race
condition where more requests slipped through the rate limiter than
expected, leading to flaky test failures (e.g. 3 successes instead of 2
with rpm_limit=2).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: Singapore guardrail policies (PDPA + MAS AI Risk Management) (#21948)

* feat: Singapore PDPA PII protection guardrail policy template

Add Singapore Personal Data Protection Act (PDPA) guardrail support:

Regex patterns (patterns.json):
- sg_nric: NRIC/FIN detection ([STFGM] + 7 digits + checksum letter)
- sg_phone: Singapore phone numbers (+65/0065/65 prefix)
- sg_postal_code: 6-digit postal codes (contextual)
- passport_singapore: Passport numbers (E/K + 7 digits, contextual)
- sg_uen: Unique Entity Numbers (3 formats)
- sg_bank_account: Bank account numbers (dash format, contextual)

YAML policy templates (5 sub-guardrails):
- sg_pdpa_personal_identifiers: s.13 Consent
- sg_pdpa_sensitive_data: Advisory Guidelines
- sg_pdpa_do_not_call: Part IX DNC Registry
- sg_pdpa_data_transfer: s.26 overseas transfers
- sg_pdpa_profiling_automated_decisions: Model AI Governance Framework

Policy template entry in policy_templates.json with 9 guardrail definitions
(4 regex-based + 5 YAML conditional keyword matching).

Tests:
- test_sg_patterns.py: regex pattern unit tests
- test_sg_pdpa_guardrails.py: conditional keyword matching tests (100+ cases)

* feat: MAS AI Risk Management Guidelines guardrail policy template

Add Monetary Authority of Singapore (MAS) AI Risk Management Guidelines
guardrail support for financial institutions:

YAML policy templates (5 sub-guardrails):
- sg_mas_fairness_bias: Blocks discriminatory financial AI (credit/loans/insurance by protected attributes)
- sg_mas_transparency_explainability: Blocks opaque/unexplainable AI for consequential financial decisions
- sg_mas_human_oversight: Blocks fully automated financial decisions without human-in-the-loop
- sg_mas_data_governance: Blocks unauthorized sharing/mishandling of financial customer data
- sg_mas_model_security: Blocks adversarial attacks, model poisoning, inversion on financial AI

Policy template entry in policy_templates.json with 5 guardrail definitions.
Aligned with MAS FEAT Principles, Project MindForge, and NIST AI RMF.

Tests:
- test_sg_mas_ai_guardrails.py: conditional keyword matching tests (100+ cases)

* fix: address SG pattern review feedback

- Update NRIC lowercase test for IGNORECASE runtime behavior
- Add keyword context guard to sg_uen pattern to reduce false positives

* docs: clarify MAS AIRM timeline references

- Explicitly mark MAS AIRM as Nov 2025 consultation draft
- Add 2018 qualifier for FEAT principles in MAS policy descriptions
- Update MAS guardrail wording to avoid release-year ambiguity

* chore: commit resolved MAS policy conflicts

* test:

* chore:

* Add OpenAI Agents SDK tutorial with LiteLLM Proxy to docs  (#21221)

* Add OpenAI Agents SDK tutorial to docs

* Update OpenAI Agents SDK tutorial to use LiteLLM environment variables

* Enhance OpenAI Agents SDK tutorial with built-in LiteLLM extension details and updated configuration steps. Adjust section headings for clarity and improve the flow of information regarding model setup and usage.

* adjust blog posts to fetch from github first

* feat(videos): add variant parameter to video content download (#21955)

openai videos models support the features to download variants.
See more details here: https://developers.openai.com/api/docs/guides/video-generation#use-image-references.
Plumb variant (e.g. "thumbnail", "spritesheet") through the full
video content download chain: avideo_content → video_content →
video_content_handler → transform_video_content_request. OpenAI
appends ?variant=<value> to the GET URL; other providers accept
the parameter in their signature but ignore it.

* fixing path

* adjust blog post path

* Revert duplicate issue checker to text-based matching, remove duplicate PR workflow

Remove the Claude Code-powered duplicate PR detection workflow and revert
the duplicate issue checker back to wow-actions/potential-duplicates with
text similarity matching.

* ui changes

* adding tests

* adjust default aggregation threshold

* fix(videos): pass api_key from litellm_params to video remix handlers (#21965)

video_remix_handler and async_video_remix_handler were not falling back
to litellm_params.api_key when the api_key parameter was None, causing
Authorization: Bearer None to be sent to the provider. This matches the
pattern already used by async_video_generation_handler.

* adding testing coverage + fixing flaky tests

* fix(ollama): thread api_base through get_model_info and add graceful fallback

When users pass api_base to litellm.completion() for Ollama, the model
info fetch (context window, function_calling support) was ignoring the
user's api_base and only reading OLLAMA_API_BASE env var or defaulting
to localhost:11434. This caused confusing errors in logs when Ollama
runs on a remote server.

Thread api_base from litellm_params through the get_model_info call
chain so OllamaConfig.get_model_info() uses the correct server. Also
return safe defaults instead of raising when the server is unreachable.

Fixes #21967

---------

Co-authored-by: An Tang <ta@stripe.com>
Co-authored-by: janfrederickk <75388864+janfrederickk@users.noreply.github.com>
Co-authored-by: Zhenting Huang <3061613175@qq.com>
Co-authored-by: Darien Kindlund <darien@kindlund.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: yuneng-jiang <yuneng.jiang@gmail.com>
Co-authored-by: Ryan Crabbe <rcrabbe@berkeley.edu>
Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com>
Co-authored-by: LeeJuOh <56071126+LeeJuOh@users.noreply.github.com>
Co-authored-by: Monesh Ram <31161039+WhoisMonesh@users.noreply.github.com>
Co-authored-by: Trevor Prater <trevor.prater@gmail.com>
Co-authored-by: The Mavik <179817126+themavik@users.noreply.github.com>
Co-authored-by: Edwin Isac <33712823+edwiniac@users.noreply.github.com>
Co-authored-by: milan-berri <milan@berri.ai>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: Sameer Kankute <sameer@berri.ai>
Co-authored-by: Harshit Jain <harshitjain0562@gmail.com>
Co-authored-by: Harshit Jain <48647625+Harshit28j@users.noreply.github.com>
Co-authored-by: Ephrim Stanley <ephrim.stanley@point72.com>
Co-authored-by: TomAlon <tom@noma.security>
Co-authored-by: Julio Quinteros Pro <jquinter@gmail.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: ryan-crabbe <128659760+ryan-crabbe@users.noreply.github.com>
Co-authored-by: Ron Zhong <ron-zhong@hotmail.com>
Co-authored-by: Arindam Majumder <109217591+Arindam200@users.noreply.github.com>
Co-authored-by: Lei Nie <lenie@quora.com>
Sameerlite added a commit that referenced this pull request Mar 3, 2026
…voke (#21964)

* auth_with_role_name add region_name arg for cross-account sts

* update tests to include case with aws_region_name for _auth_with_aws_role

* Only pass region_name to STS client when aws_region_name is set

* Add optional aws_sts_endpoint to _auth_with_aws_role

* Parametrize ambient-credentials test for no opts, region_name, and aws_sts_endpoint

* consistently passing region and endpoint args into explicit credentials irsa

* fix env var leakage

* fix: bedrock openai-compatible imported-model should also have model arn encoded

* feat: show proxy url in ModelHub (#21660)

* fix(bedrock): correct modelInput format for Converse API batch models (#21656)

* fix(proxy): add model_ids param to access group endpoints for precise deployment tagging (#21655)

POST /access_group/new and PUT /access_group/{name}/update now accept an
optional model_ids list that targets specific deployments by their unique
model_id, instead of tagging every deployment that shares a model_name.

When model_ids is provided it takes priority over model_names, giving
API callers the same single-deployment precision that the UI already has
via PATCH /model/{model_id}/update.

Backward compatible: model_names continues to work as before.

Closes #21544

* feat(proxy): add custom favicon support\n\nAdd ability to configure a custom favicon for the litellm proxy UI.\n\n- Add favicon_url field to UIThemeConfig model\n- Add LITELLM_FAVICON_URL env var support\n- Add /get_favicon endpoint to serve custom favicons\n- Update ThemeContext to dynamically set favicon\n- Add favicon URL input to UI theme settings page\n- Add comprehensive tests\n\nCloses #8323 (#21653)

* fix(bedrock): prevent double UUID in create_file S3 key (#21650)

In create_file for Bedrock, get_complete_file_url is called twice:
once in the sync handler (generating UUID-1 for api_base) and once
inside transform_create_file_request (generating UUID-2 for the
actual S3 upload). The Bedrock provider correctly writes UUID-2 into
litellm_params["upload_url"], but the sync handler unconditionally
overwrites it with api_base (UUID-1). This causes the returned
file_id to point to a non-existent S3 key.

Fix: only set upload_url to api_base when transform_create_file_request
has not already set it, preserving the Bedrock provider's value.

Closes #21546

* feat(semantic-cache): support configurable vector dimensions for Qdrant (#21649)

Add vector_size parameter to QdrantSemanticCache and expose it through
the Cache facade as qdrant_semantic_cache_vector_size. This allows users
to use embedding models with dimensions other than the default 1536,
enabling cheaper/stronger models like Stella (1024d), bge-en-icl (4096d),
voyage, cohere, etc.

The parameter defaults to QDRANT_VECTOR_SIZE (env var or 1536) for
backward compatibility. When creating new collections, the configured
vector_size is used instead of the hardcoded constant.

Closes #9377

* fix(utils): normalize camelCase thinking param keys to snake_case (#21762)

Clients like OpenCode's @ai-sdk/openai-compatible send budgetTokens
(camelCase) instead of budget_tokens in the thinking parameter, causing
validation errors. Add early normalization in completion().

* feat: add optional digest mode for Slack alert types (#21683)

Adds per-alert-type digest mode that aggregates duplicate alerts
within a configurable time window and emits a single summary message
with count, start/end timestamps.

Configuration via general_settings.alert_type_config:
  alert_type_config:
    llm_requests_hanging:
      digest: true
      digest_interval: 86400

Digest key: (alert_type, request_model, api_base)
Default interval: 24 hours
Window type: fixed interval

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add blog_posts.json and local backup

* feat: add GetBlogPosts utility with GitHub fetch and local fallback

Adds GetBlogPosts class that fetches blog posts from GitHub with a 1-hour
in-process TTL cache, validates the response, and falls back to the bundled
blog_posts_backup.json on any network or validation failure.

* test: add cache reset fixture and LITELLM_LOCAL_BLOG_POSTS test

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat: add GET /public/litellm_blog_posts endpoint

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: log fallback warning in blog posts endpoint and tighten test

* feat: add disable_show_blog to UISettings

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat: add useUISettings and useDisableShowBlog hooks

* fix: rename useUISettings to useUISettingsFlags to avoid naming collision

* fix: use existing useUISettings hook in useDisableShowBlog to avoid cache duplication

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat: add BlogDropdown component with react-query and error/retry state

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: enforce 5-post limit in BlogDropdown and add cap test

* fix: add retry, stable post key, enabled guard in BlogDropdown

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat: add BlogDropdown to navbar after Docs link

* feat: add network_mock transport for benchmarking proxy overhead without real API calls

Intercepts at httpx transport layer so the full proxy path (auth, routing,
OpenAI SDK, response transformation) is exercised with zero-latency responses.
Activated via `litellm_settings: { network_mock: true }` in proxy config.

* Litellm dev 02 19 2026 p2 (#21871)

* feat(ui/): new guardrails monitor 'demo

mock representation of what guardrails monitor looks like

* fix: ui updates

* style(ui/): fix styling

* feat: enable running ai monitor on individual guardrails

* feat: add backend logic for guardrail monitoring

* fix(guardrails/usage_endpoints.py): fix usage dashboard

* fix(budget): fix timezone config lookup and replace hardcoded timezone map with ZoneInfo (#21754)

* fix(budget): fix timezone config lookup and replace hardcoded timezone map with ZoneInfo

* fix(budget): update stale docstring on get_budget_reset_time

* fix: add missing return type annotations to iterator protocol methods in streaming_handler (#21750)

* fix: add return type annotations to iterator protocol methods in streaming_handler

Add missing return type annotations to __iter__, __aiter__, __next__, and __anext__ methods in CustomStreamWrapper and related classes.

- __iter__(self) -> Iterator["ModelResponseStream"]
- __aiter__(self) -> AsyncIterator["ModelResponseStream"]
- __next__(self) -> "ModelResponseStream"
- __anext__(self) -> "ModelResponseStream"

Also adds AsyncIterator and Iterator to typing imports.

Fixes issue with PLR0915 noqa comments and ensures proper type checking support.
Related to: #8304

* fix: add ruff PLR0915 noqa for files with too many statements

* Add gollem Go agent framework cookbook example (#21747)

Show how to use gollem, a production Go agent framework, with
LiteLLM proxy for multi-provider LLM access including tool use
and streaming.

* fix: avoid mutating caller-owned dicts in SpendUpdateQueue aggregation (#21742)

* fix(vertex_ai): enable context-1m-2025-08-07 beta header (#21870)

* server root path regression doc

* fixing syntax

* fix: replace Zapier webhook with Google Form for survey submission (#21621)

* Replace Zapier webhook with Google Form for survey submission

* Add back error logging for survey submission debugging

---------

Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>

* Revert "Merge pull request #21140 from BerriAI/litellm_perf_user_api_key_auth"

This reverts commit 0e1db3f, reversing
changes made to 7e2d6f2.

* test_vertex_ai_gemini_2_5_pro_streaming

* UI new build

* fix rendering

* ui new build

* docs fix

* docs fix

* docs fix

* docs fix

* docs fix

* docs fix

* docs fix

* docs fix

* release note docs

* docs

* adding image

* fix(vertex_ai): enable context-1m-2025-08-07 beta header

The `context-1m-2025-08-07` Anthropic beta header was set to `null` for vertex_ai,
causing it to be filtered out when users set `extra_headers: {anthropic-beta: context-1m-2025-08-07}`.

This prevented using Claude's 1M context window feature via Vertex AI, resulting in
`prompt is too long: 460500 tokens > 200000 maximum` errors.

Fixes #21861

---------

Co-authored-by: yuneng-jiang <yuneng.jiang@gmail.com>
Co-authored-by: milan-berri <milan@berri.ai>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>

* Revert "fix(vertex_ai): enable context-1m-2025-08-07 beta header (#21870)" (#21876)

This reverts commit bce078a.

* docs(ui): add pre-PR checklist to UI contributing guide

Add testing and build verification steps per maintainer feedback
from @yjiang-litellm. Contributors should run their related tests
per-file and ensure npm run build passes before opening PRs.

* Fix entries with fast and us/

* Add tests for fast and us

* Add support for Priority PayGo for vertex ai and gemini

* Add model pricing

* fix: ensure arrival_time is set before calculating queue time

* Fix: Anthropic model wildcard access issue

* Add incident report

* Add ability to see which model cost map is getting used

* Fix name of title

* Readd tpm limit

* State management fixes for CheckBatchCost

* Fix PR review comments

* State management fixes for CheckBatchCost - Address greptile comments

* fix mypy issues:

* Add Noma guardrails v2 based on custom guardrails (#21400)

* Fix code qa issues

* Fix mypy issues

* Fix mypy issues

* Fix test_aaamodel_prices_and_context_window_json_is_valid

* fix: update calendly on repo

* fix(tests): use counter-based mock for time.time in prisma self-heal test

The test used a fixed side_effect list for time.time(), but the number
of calls varies by Python version, causing StopIteration on 3.12 and
AssertionError on 3.14. Replace with an infinite counter-based callable
and assert the timestamp was updated rather than checking for an exact
value.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(tests): use absolute path for model_prices JSON in validation test

The test used a relative path 'litellm/model_prices_and_context_window.json'
which only works when pytest runs from a specific working directory.
Use os.path based on __file__ to resolve the path reliably.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Update tests/test_litellm/test_utils.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* fix(tests): use os.path instead of Path to avoid NameError

Path is not imported at module level. Use os.path.join which is already
available.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* clean up mock transport: remove streaming, add defensive parsing

* docs: add Google GenAI SDK tutorial (JS & Python) (#21885)

* docs: add Google GenAI SDK tutorial for JS and Python

Add tutorial for using Google's official GenAI SDK (@google/genai for JS,
google-genai for Python) with LiteLLM proxy. Covers pass-through and
native router endpoints, streaming, multi-turn chat, and multi-provider
routing via model_group_alias. Also updates pass-through docs to use the
new SDK replacing the deprecated @google/generative-ai.

* fix(docs): correct Python SDK env var name in GenAI tutorial

GOOGLE_GENAI_API_KEY does not exist in the google-genai SDK.
The correct env var is GEMINI_API_KEY (or GOOGLE_API_KEY).
Also note that the Python SDK has no base URL env var.

* fix(docs): replace non-existent GOOGLE_GENAI_BASE_URL env var in interactions.md

The Python google-genai SDK does not read GOOGLE_GENAI_BASE_URL.
Use http_options={"base_url": "..."} in code instead.

* docs: add network mock benchmarking section

* docs: tweak benchmarks wording

* fix: add auth headers and empty latencies guard to benchmark script

* refactor: use method-level import for MockOpenAITransport

* fix: guard print_aggregate against empty latencies

* fix: add INCOMPLETE status to Interactions API enum and test

Google added INCOMPLETE to the Interactions API OpenAPI spec status enum.
Update both the Status3 enum in the SDK types and the test's expected
values to match.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Guardrail Monitor - measure guardrail reliability in prod  (#21944)

* fix: fix log viewer for guardrail monitoring

* feat(ui/): fix rendering logs per guardrail

* fix: fix viewing logs on overview tab of guardrail

* fix: log viewer

* fix: fix naming to align with metric

* docs: add performance & reliability section to v1.81.14 release notes

* fix(tests): make RPM limit test sequential to avoid race condition

Concurrent requests via run_in_executor + asyncio.gather caused a race
condition where more requests slipped through the rate limiter than
expected, leading to flaky test failures (e.g. 3 successes instead of 2
with rpm_limit=2).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: Singapore guardrail policies (PDPA + MAS AI Risk Management) (#21948)

* feat: Singapore PDPA PII protection guardrail policy template

Add Singapore Personal Data Protection Act (PDPA) guardrail support:

Regex patterns (patterns.json):
- sg_nric: NRIC/FIN detection ([STFGM] + 7 digits + checksum letter)
- sg_phone: Singapore phone numbers (+65/0065/65 prefix)
- sg_postal_code: 6-digit postal codes (contextual)
- passport_singapore: Passport numbers (E/K + 7 digits, contextual)
- sg_uen: Unique Entity Numbers (3 formats)
- sg_bank_account: Bank account numbers (dash format, contextual)

YAML policy templates (5 sub-guardrails):
- sg_pdpa_personal_identifiers: s.13 Consent
- sg_pdpa_sensitive_data: Advisory Guidelines
- sg_pdpa_do_not_call: Part IX DNC Registry
- sg_pdpa_data_transfer: s.26 overseas transfers
- sg_pdpa_profiling_automated_decisions: Model AI Governance Framework

Policy template entry in policy_templates.json with 9 guardrail definitions
(4 regex-based + 5 YAML conditional keyword matching).

Tests:
- test_sg_patterns.py: regex pattern unit tests
- test_sg_pdpa_guardrails.py: conditional keyword matching tests (100+ cases)

* feat: MAS AI Risk Management Guidelines guardrail policy template

Add Monetary Authority of Singapore (MAS) AI Risk Management Guidelines
guardrail support for financial institutions:

YAML policy templates (5 sub-guardrails):
- sg_mas_fairness_bias: Blocks discriminatory financial AI (credit/loans/insurance by protected attributes)
- sg_mas_transparency_explainability: Blocks opaque/unexplainable AI for consequential financial decisions
- sg_mas_human_oversight: Blocks fully automated financial decisions without human-in-the-loop
- sg_mas_data_governance: Blocks unauthorized sharing/mishandling of financial customer data
- sg_mas_model_security: Blocks adversarial attacks, model poisoning, inversion on financial AI

Policy template entry in policy_templates.json with 5 guardrail definitions.
Aligned with MAS FEAT Principles, Project MindForge, and NIST AI RMF.

Tests:
- test_sg_mas_ai_guardrails.py: conditional keyword matching tests (100+ cases)

* fix: address SG pattern review feedback

- Update NRIC lowercase test for IGNORECASE runtime behavior
- Add keyword context guard to sg_uen pattern to reduce false positives

* docs: clarify MAS AIRM timeline references

- Explicitly mark MAS AIRM as Nov 2025 consultation draft
- Add 2018 qualifier for FEAT principles in MAS policy descriptions
- Update MAS guardrail wording to avoid release-year ambiguity

* chore: commit resolved MAS policy conflicts

* test:

* chore:

* Add OpenAI Agents SDK tutorial with LiteLLM Proxy to docs  (#21221)

* Add OpenAI Agents SDK tutorial to docs

* Update OpenAI Agents SDK tutorial to use LiteLLM environment variables

* Enhance OpenAI Agents SDK tutorial with built-in LiteLLM extension details and updated configuration steps. Adjust section headings for clarity and improve the flow of information regarding model setup and usage.

* adjust blog posts to fetch from github first

* feat(videos): add variant parameter to video content download (#21955)

openai videos models support the features to download variants.
See more details here: https://developers.openai.com/api/docs/guides/video-generation#use-image-references.
Plumb variant (e.g. "thumbnail", "spritesheet") through the full
video content download chain: avideo_content → video_content →
video_content_handler → transform_video_content_request. OpenAI
appends ?variant=<value> to the GET URL; other providers accept
the parameter in their signature but ignore it.

* fixing path

* adjust blog post path

* Revert duplicate issue checker to text-based matching, remove duplicate PR workflow

Remove the Claude Code-powered duplicate PR detection workflow and revert
the duplicate issue checker back to wow-actions/potential-duplicates with
text similarity matching.

* ui changes

* adding tests

* fix(anthropic): sanitize tool_use IDs in assistant messages

Apply _sanitize_anthropic_tool_use_id to tool_use blocks in
convert_to_anthropic_tool_invoke, not just tool_result blocks.
IDs from external frameworks (e.g. MiniMax) may contain characters
like colons that violate Anthropic's ^[a-zA-Z0-9_-]+$ pattern.

Adds test for invalid ID sanitization in tool_use blocks.

---------

Co-authored-by: An Tang <ta@stripe.com>
Co-authored-by: janfrederickk <75388864+janfrederickk@users.noreply.github.com>
Co-authored-by: Zhenting Huang <3061613175@qq.com>
Co-authored-by: Cesar Garcia <128240629+Chesars@users.noreply.github.com>
Co-authored-by: Darien Kindlund <darien@kindlund.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: yuneng-jiang <yuneng.jiang@gmail.com>
Co-authored-by: Ryan Crabbe <rcrabbe@berkeley.edu>
Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com>
Co-authored-by: LeeJuOh <56071126+LeeJuOh@users.noreply.github.com>
Co-authored-by: Monesh Ram <31161039+WhoisMonesh@users.noreply.github.com>
Co-authored-by: Trevor Prater <trevor.prater@gmail.com>
Co-authored-by: The Mavik <179817126+themavik@users.noreply.github.com>
Co-authored-by: Edwin Isac <33712823+edwiniac@users.noreply.github.com>
Co-authored-by: milan-berri <milan@berri.ai>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: Chesars <cesarponce19544@gmail.com>
Co-authored-by: Sameer Kankute <sameer@berri.ai>
Co-authored-by: Harshit Jain <harshitjain0562@gmail.com>
Co-authored-by: Harshit Jain <48647625+Harshit28j@users.noreply.github.com>
Co-authored-by: Ephrim Stanley <ephrim.stanley@point72.com>
Co-authored-by: TomAlon <tom@noma.security>
Co-authored-by: Julio Quinteros Pro <jquinter@gmail.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: ryan-crabbe <128659760+ryan-crabbe@users.noreply.github.com>
Co-authored-by: Ron Zhong <ron-zhong@hotmail.com>
Co-authored-by: Arindam Majumder <109217591+Arindam200@users.noreply.github.com>
Co-authored-by: Lei Nie <lenie@quora.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

4 participants