feat: Singapore guardrail policies (PDPA + MAS AI Risk Management) by ron-zhong · Pull Request #21948 · BerriAI/litellm

ron-zhong · 2026-02-23T18:27:41Z

Summary

Add Singapore PDPA based guardrail policy template with regex + conditional keyword guardrails
Add MAS AI Risk Management Guidelines based guardrail policy template for financial institutions
Add Singapore-specific regex patterns (NRIC/FIN, phone, postal code, passport, UEN, bank account)
Add policy template entries to policy registry and backup registry
Add focused tests for Singapore PDPA and MAS guardrails

What changed

PDPA (Personal Data Protection Act)

Added 6 regex patterns in the content filter patterns registry:
- sg_nric — Singapore NRIC/FIN format detection ([STFGM] + 7 digits + checksum letter)
- sg_phone — Singapore phone numbers (supports +65, 0065, 65 prefixes)
- sg_postal_code — Singapore 6-digit postal code detection (contextual)
- passport_singapore — Singapore passport numbers (E/K + 7 digits, contextual)
- sg_uen — Singapore Unique Entity Number detection (all 3 common formats)
- sg_bank_account — Singapore-style bank account numbers (dash format, contextual)
Added 5 YAML policy templates:
- sg_pdpa_personal_identifiers
- sg_pdpa_sensitive_data
- sg_pdpa_do_not_call
- sg_pdpa_data_transfer
- sg_pdpa_profiling_automated_decisions
Added pdpa-singapore policy template entry to policy_templates.json and litellm/policy_templates_backup.json
Added tests:
- tests/test_litellm/proxy/guardrails/guardrail_hooks/content_filter/test_sg_patterns.py
- tests/guardrails_tests/test_sg_pdpa_guardrails.py

MAS (Monetary Authority of Singapore) AI Risk Management Guidelines

Added 5 YAML policy templates:
- sg_mas_fairness_bias
- sg_mas_transparency_explainability
- sg_mas_human_oversight
- sg_mas_data_governance
- sg_mas_model_security
Added mas-ai-risk-management policy template entry to policy_templates.json and litellm/policy_templates_backup.json
Added tests:
- tests/guardrails_tests/test_sg_mas_ai_guardrails.py

Notes

Followed existing policy template and guardrail test patterns from prior regulatory templates.
Kept changes data-driven (YAML + JSON + tests), no core runtime logic changes.

Reference:

Related PRs:

Add Singapore Personal Data Protection Act (PDPA) guardrail support: Regex patterns (patterns.json): - sg_nric: NRIC/FIN detection ([STFGM] + 7 digits + checksum letter) - sg_phone: Singapore phone numbers (+65/0065/65 prefix) - sg_postal_code: 6-digit postal codes (contextual) - passport_singapore: Passport numbers (E/K + 7 digits, contextual) - sg_uen: Unique Entity Numbers (3 formats) - sg_bank_account: Bank account numbers (dash format, contextual) YAML policy templates (5 sub-guardrails): - sg_pdpa_personal_identifiers: s.13 Consent - sg_pdpa_sensitive_data: Advisory Guidelines - sg_pdpa_do_not_call: Part IX DNC Registry - sg_pdpa_data_transfer: s.26 overseas transfers - sg_pdpa_profiling_automated_decisions: Model AI Governance Framework Policy template entry in policy_templates.json with 9 guardrail definitions (4 regex-based + 5 YAML conditional keyword matching). Tests: - test_sg_patterns.py: regex pattern unit tests - test_sg_pdpa_guardrails.py: conditional keyword matching tests (100+ cases)

Add Monetary Authority of Singapore (MAS) AI Risk Management Guidelines guardrail support for financial institutions: YAML policy templates (5 sub-guardrails): - sg_mas_fairness_bias: Blocks discriminatory financial AI (credit/loans/insurance by protected attributes) - sg_mas_transparency_explainability: Blocks opaque/unexplainable AI for consequential financial decisions - sg_mas_human_oversight: Blocks fully automated financial decisions without human-in-the-loop - sg_mas_data_governance: Blocks unauthorized sharing/mishandling of financial customer data - sg_mas_model_security: Blocks adversarial attacks, model poisoning, inversion on financial AI Policy template entry in policy_templates.json with 5 guardrail definitions. Aligned with MAS FEAT Principles, Project MindForge, and NIST AI RMF. Tests: - test_sg_mas_ai_guardrails.py: conditional keyword matching tests (100+ cases)

vercel · 2026-02-23T18:27:46Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
litellm	Ready	Preview, Comment	Feb 23, 2026 7:36pm

CLAassistant · 2026-02-23T18:27:48Z

All committers have signed the CLA.

greptile-apps · 2026-02-23T18:36:23Z

Greptile Summary

Adds two Singapore-specific guardrail policy templates — PDPA (Personal Data Protection Act) and MAS AI Risk Management (for financial institutions) — along with 6 new regex patterns for Singapore PII detection (NRIC/FIN, phone, postal code, passport, UEN, bank account).

PDPA template includes 9 guardrails: 4 regex-based PII masking guardrails and 5 keyword-based conditional matching guardrails covering consent (s.13), sensitive data (Advisory Guidelines), DNC Registry (Part IX), overseas data transfers (s.26), and automated profiling (Model AI Governance Framework).
MAS AI Risk Management template includes 5 keyword-based conditional matching guardrails covering fairness/bias, transparency/explainability, human oversight, data governance, and model security.
All changes are data-driven (YAML + JSON) with no core runtime logic modifications.
Tests are comprehensive (156 regex pattern tests + ~130 conditional keyword matching test cases across PDPA and MAS), follow existing patterns from prior regulatory templates (EU AI Act), and make no network calls.
Minor inconsistency: policy_templates.json and litellm/policy_templates_backup.json have divergent MAS guardrail descriptions (em dash — vs hyphen -) and different templateData.description wording for the MAS template. This divergence already exists for other templates in the backup file, so it may be by design.

Confidence Score: 4/5

This PR is safe to merge — it adds only data-driven configuration (YAML, JSON) and tests with no runtime logic changes.
Score of 4 reflects: all changes are data-driven (no runtime code modifications), comprehensive test coverage for both regex patterns and keyword matching, follows established patterns from prior regulatory templates. Minor deduction for the inconsistency between policy_templates.json and the backup file, but this is a pre-existing pattern in the codebase.
The litellm/policy_templates_backup.json has description inconsistencies with policy_templates.json for the MAS guardrail entries (em dash vs hyphen, different templateData.description wording).

Important Files Changed

Filename	Overview
litellm/proxy/guardrails/guardrail_hooks/litellm_content_filter/patterns.json	Adds 6 Singapore PII regex patterns (NRIC, phone, postal code, passport, UEN, bank account). Patterns are well-designed with keyword_pattern context filters where appropriate. The sg_uen pattern now includes keyword_pattern (addressing prior review feedback). Minor: file still lacks trailing newline (pre-existing).
policy_templates.json	Adds two new policy template entries: `pdpa-singapore` (9 guardrails covering PII, sensitive data, DNC, data transfer, profiling) and `mas-ai-risk-management` (5 guardrails covering fairness, transparency, oversight, data governance, model security). Descriptions are consistent within the file. Minor inconsistency with backup file (em dash vs hyphen and templateData.description wording).
litellm/policy_templates_backup.json	Backup copy of policy templates with same two new entries. Contains description inconsistencies with the primary file: uses hyphens where primary uses em dashes, and the MAS templateData.description wording differs. This file is a backup and already diverges from primary in other ways (has example_sentences, extra templates).
litellm/proxy/guardrails/guardrail_hooks/litellm_content_filter/policy_templates/sg_pdpa_personal_identifiers.yaml	PDPA s.13 policy template for blocking unauthorized collection of Singapore personal identifiers. Well-structured with identifier_words, additional_block_words, always_block_keywords, and exceptions. Follows existing patterns.
litellm/proxy/guardrails/guardrail_hooks/litellm_content_filter/policy_templates/sg_mas_fairness_bias.yaml	MAS fairness and bias policy template. Blocks discriminatory AI practices in financial services based on protected attributes, with Singapore-specific ethnic groups (CMIO). Well-structured exceptions for fairness audits and research.
litellm/proxy/guardrails/guardrail_hooks/litellm_content_filter/policy_templates/sg_mas_model_security.yaml	MAS model security policy template. Blocks adversarial attacks, data poisoning, model inversion, and exfiltration targeting financial AI systems. Appropriate exceptions for red-teaming and security research.
tests/test_litellm/proxy/guardrails/guardrail_hooks/content_filter/test_sg_patterns.py	Unit tests for the 6 Singapore PII regex patterns. Tests valid/invalid format detection for NRIC, phone, postal code, passport, UEN, and bank account. Previous review feedback about case-insensitive test has been addressed (line 36 now expects match). No network calls — pure regex testing.
tests/guardrails_tests/test_sg_pdpa_guardrails.py	Integration tests for 5 PDPA sub-guardrails with parametrized test cases covering always-block, conditional matching, exceptions, and no-match scenarios. No network calls. Follows existing patterns from EU AI Act tests.
tests/guardrails_tests/test_sg_mas_ai_guardrails.py	Integration tests for 5 MAS sub-guardrails with parametrized test cases covering always-block, conditional matching, exceptions, and no-match scenarios. No network calls. Follows existing patterns from other guardrail tests.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    subgraph PDPA["PDPA Singapore Policy Template"]
        PII["pdpa-sg-pii-identifiers\n(regex: NRIC, passport)"]
        CONTACT["pdpa-sg-contact-information\n(regex: phone, postal, email)"]
        FIN["pdpa-sg-financial-data\n(regex: bank account, credit card)"]
        BIZ["pdpa-sg-business-identifiers\n(regex: UEN)"]
        PI["pdpa-sg-personal-identifiers\n(keyword: s.13 Consent)"]
        SD["pdpa-sg-sensitive-data\n(keyword: Advisory Guidelines)"]
        DNC["pdpa-sg-do-not-call\n(keyword: Part IX DNC)"]
        DT["pdpa-sg-data-transfer\n(keyword: s.26 Transfer)"]
        PROF["pdpa-sg-profiling-automated-decisions\n(keyword: AI Governance)"]
    end

    subgraph MAS["MAS AI Risk Management Policy Template"]
        FB["mas-sg-fairness-bias\n(keyword)"]
        TE["mas-sg-transparency-explainability\n(keyword)"]
        HO["mas-sg-human-oversight\n(keyword)"]
        DG["mas-sg-data-governance\n(keyword)"]
        MS["mas-sg-model-security\n(keyword)"]
    end

    subgraph Patterns["patterns.json — New Regex Patterns"]
        NRIC["sg_nric"]
        PHONE["sg_phone"]
        POSTAL["sg_postal_code"]
        PASS["passport_singapore"]
        UEN["sg_uen"]
        BANK["sg_bank_account"]
    end

    PII --> NRIC
    PII --> PASS
    CONTACT --> PHONE
    CONTACT --> POSTAL
    FIN --> BANK
    BIZ --> UEN

    INPUT["User Input"] --> PDPA
    INPUT --> MAS
    PDPA -->|MASK/BLOCK| OUTPUT["Filtered Output"]
    MAS -->|BLOCK| OUTPUT

_{Last reviewed commit: fe00f46}

greptile-apps

_{16 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

tests/test_litellm/proxy/guardrails/guardrail_hooks/content_filter/test_sg_patterns.py

litellm/proxy/guardrails/guardrail_hooks/litellm_content_filter/patterns.json

- Update NRIC lowercase test for IGNORECASE runtime behavior - Add keyword context guard to sg_uen pattern to reduce false positives

ron-zhong · 2026-02-23T18:55:18Z

Addressed both points from the review:

Failing lowercase NRIC test

Updated tests/test_litellm/proxy/guardrails/guardrail_hooks/content_filter/test_sg_patterns.py to align with re.IGNORECASE behavior in get_compiled_pattern():
- test_lowercase_letter_prefix_rejected -> test_lowercase_letter_prefix_detected_case_insensitive
- assertion updated to expect a match for s1234567A

sg_uen false-positive risk (missing context)

Updated litellm/proxy/guardrails/guardrail_hooks/litellm_content_filter/patterns.json for sg_uen with keyword gating:
- "keyword_pattern": "\\b(?:UEN|company\\s*registration|business\\s*registration|ACRA|entity\\s*number|ROC|ROS)\\b"
- "allow_word_numbers": false

Also added a case-insensitive UEN test (12345678a) to reflect runtime matching behavior.

- Explicitly mark MAS AIRM as Nov 2025 consultation draft - Add 2018 qualifier for FEAT principles in MAS policy descriptions - Update MAS guardrail wording to avoid release-year ambiguity

ron-zhong · 2026-02-23T18:59:40Z

Files updated in this follow-up commit:

policy_templates.json
litellm/policy_templates_backup.json
litellm/proxy/guardrails/guardrail_hooks/litellm_content_filter/policy_templates/sg_mas_fairness_bias.yaml
litellm/proxy/guardrails/guardrail_hooks/litellm_content_filter/policy_templates/sg_mas_human_oversight.yaml
litellm/proxy/guardrails/guardrail_hooks/litellm_content_filter/policy_templates/sg_mas_transparency_explainability.yaml

Commit: 6db71ad

ron-zhong · 2026-02-23T19:06:27Z

Resolved conflicts have been committed and pushed in caf9baa459.

Ready for re-review.

cc @ishaan-jaff @krrishdholakia

krrishdholakia · 2026-02-23T19:14:37Z

@greptile please re-review this

greptile-apps

_{16 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

litellm/policy_templates_backup.json

ron-zhong · 2026-02-23T19:36:10Z

Addressed this review feedback in the latest push:

Synced MAS guardrail descriptions between policy_templates.json and litellm/policy_templates_backup.json (using em dashes consistently).

Both registry entries are now aligned.

ron-zhong · 2026-02-23T19:57:50Z

Hi @krrishdholakia, will there be an image build that we can download (docker pull) for an internal preview before merging our contributions to the release branch? Thank you.

(Updated) I've found the build here, thanks.

https://github.com/BerriAI/litellm/releases/tag/litellmv1.81.15.presidio.dev

shin-bot-litellm · 2026-02-23T20:54:34Z

Hey @ron-zhong! Great PR on the Singapore guardrail policies — really solid work adding PDPA and MAS AI Risk Management support.

Your background as Solution Architect / DevOps Engineer at MAS is impressive. We're hiring engineers at LiteLLM and thought you might be interested!

If you're open to chatting, here's Ishaan's (our CTO) Calendly: https://calendly.com/ishaan-berri2/litellm-founding-engineer-round-1-interview-clone

Either way, thanks for the contribution!

…21970) * auth_with_role_name add region_name arg for cross-account sts * update tests to include case with aws_region_name for _auth_with_aws_role * Only pass region_name to STS client when aws_region_name is set * Add optional aws_sts_endpoint to _auth_with_aws_role * Parametrize ambient-credentials test for no opts, region_name, and aws_sts_endpoint * consistently passing region and endpoint args into explicit credentials irsa * fix env var leakage * fix: bedrock openai-compatible imported-model should also have model arn encoded * feat: show proxy url in ModelHub (#21660) * fix(bedrock): correct modelInput format for Converse API batch models (#21656) * fix(proxy): add model_ids param to access group endpoints for precise deployment tagging (#21655) POST /access_group/new and PUT /access_group/{name}/update now accept an optional model_ids list that targets specific deployments by their unique model_id, instead of tagging every deployment that shares a model_name. When model_ids is provided it takes priority over model_names, giving API callers the same single-deployment precision that the UI already has via PATCH /model/{model_id}/update. Backward compatible: model_names continues to work as before. Closes #21544 * feat(proxy): add custom favicon support\n\nAdd ability to configure a custom favicon for the litellm proxy UI.\n\n- Add favicon_url field to UIThemeConfig model\n- Add LITELLM_FAVICON_URL env var support\n- Add /get_favicon endpoint to serve custom favicons\n- Update ThemeContext to dynamically set favicon\n- Add favicon URL input to UI theme settings page\n- Add comprehensive tests\n\nCloses #8323 (#21653) * fix(bedrock): prevent double UUID in create_file S3 key (#21650) In create_file for Bedrock, get_complete_file_url is called twice: once in the sync handler (generating UUID-1 for api_base) and once inside transform_create_file_request (generating UUID-2 for the actual S3 upload). The Bedrock provider correctly writes UUID-2 into litellm_params["upload_url"], but the sync handler unconditionally overwrites it with api_base (UUID-1). This causes the returned file_id to point to a non-existent S3 key. Fix: only set upload_url to api_base when transform_create_file_request has not already set it, preserving the Bedrock provider's value. Closes #21546 * feat(semantic-cache): support configurable vector dimensions for Qdrant (#21649) Add vector_size parameter to QdrantSemanticCache and expose it through the Cache facade as qdrant_semantic_cache_vector_size. This allows users to use embedding models with dimensions other than the default 1536, enabling cheaper/stronger models like Stella (1024d), bge-en-icl (4096d), voyage, cohere, etc. The parameter defaults to QDRANT_VECTOR_SIZE (env var or 1536) for backward compatibility. When creating new collections, the configured vector_size is used instead of the hardcoded constant. Closes #9377 * fix(utils): normalize camelCase thinking param keys to snake_case (#21762) Clients like OpenCode's @ai-sdk/openai-compatible send budgetTokens (camelCase) instead of budget_tokens in the thinking parameter, causing validation errors. Add early normalization in completion(). * feat: add optional digest mode for Slack alert types (#21683) Adds per-alert-type digest mode that aggregates duplicate alerts within a configurable time window and emits a single summary message with count, start/end timestamps. Configuration via general_settings.alert_type_config: alert_type_config: llm_requests_hanging: digest: true digest_interval: 86400 Digest key: (alert_type, request_model, api_base) Default interval: 24 hours Window type: fixed interval Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add blog_posts.json and local backup * feat: add GetBlogPosts utility with GitHub fetch and local fallback Adds GetBlogPosts class that fetches blog posts from GitHub with a 1-hour in-process TTL cache, validates the response, and falls back to the bundled blog_posts_backup.json on any network or validation failure. * test: add cache reset fixture and LITELLM_LOCAL_BLOG_POSTS test Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat: add GET /public/litellm_blog_posts endpoint Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: log fallback warning in blog posts endpoint and tighten test * feat: add disable_show_blog to UISettings Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat: add useUISettings and useDisableShowBlog hooks * fix: rename useUISettings to useUISettingsFlags to avoid naming collision * fix: use existing useUISettings hook in useDisableShowBlog to avoid cache duplication Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat: add BlogDropdown component with react-query and error/retry state Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: enforce 5-post limit in BlogDropdown and add cap test * fix: add retry, stable post key, enabled guard in BlogDropdown Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat: add BlogDropdown to navbar after Docs link * feat: add network_mock transport for benchmarking proxy overhead without real API calls Intercepts at httpx transport layer so the full proxy path (auth, routing, OpenAI SDK, response transformation) is exercised with zero-latency responses. Activated via `litellm_settings: { network_mock: true }` in proxy config. * Litellm dev 02 19 2026 p2 (#21871) * feat(ui/): new guardrails monitor 'demo mock representation of what guardrails monitor looks like * fix: ui updates * style(ui/): fix styling * feat: enable running ai monitor on individual guardrails * feat: add backend logic for guardrail monitoring * fix(guardrails/usage_endpoints.py): fix usage dashboard * fix(budget): fix timezone config lookup and replace hardcoded timezone map with ZoneInfo (#21754) * fix(budget): fix timezone config lookup and replace hardcoded timezone map with ZoneInfo * fix(budget): update stale docstring on get_budget_reset_time * fix: add missing return type annotations to iterator protocol methods in streaming_handler (#21750) * fix: add return type annotations to iterator protocol methods in streaming_handler Add missing return type annotations to __iter__, __aiter__, __next__, and __anext__ methods in CustomStreamWrapper and related classes. - __iter__(self) -> Iterator["ModelResponseStream"] - __aiter__(self) -> AsyncIterator["ModelResponseStream"] - __next__(self) -> "ModelResponseStream" - __anext__(self) -> "ModelResponseStream" Also adds AsyncIterator and Iterator to typing imports. Fixes issue with PLR0915 noqa comments and ensures proper type checking support. Related to: #8304 * fix: add ruff PLR0915 noqa for files with too many statements * Add gollem Go agent framework cookbook example (#21747) Show how to use gollem, a production Go agent framework, with LiteLLM proxy for multi-provider LLM access including tool use and streaming. * fix: avoid mutating caller-owned dicts in SpendUpdateQueue aggregation (#21742) * fix(vertex_ai): enable context-1m-2025-08-07 beta header (#21870) * server root path regression doc * fixing syntax * fix: replace Zapier webhook with Google Form for survey submission (#21621) * Replace Zapier webhook with Google Form for survey submission * Add back error logging for survey submission debugging --------- Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com> * Revert "Merge pull request #21140 from BerriAI/litellm_perf_user_api_key_auth" This reverts commit 0e1db3f, reversing changes made to 7e2d6f2. * test_vertex_ai_gemini_2_5_pro_streaming * UI new build * fix rendering * ui new build * docs fix * docs fix * docs fix * docs fix * docs fix * docs fix * docs fix * docs fix * release note docs * docs * adding image * fix(vertex_ai): enable context-1m-2025-08-07 beta header The `context-1m-2025-08-07` Anthropic beta header was set to `null` for vertex_ai, causing it to be filtered out when users set `extra_headers: {anthropic-beta: context-1m-2025-08-07}`. This prevented using Claude's 1M context window feature via Vertex AI, resulting in `prompt is too long: 460500 tokens > 200000 maximum` errors. Fixes #21861 --------- Co-authored-by: yuneng-jiang <yuneng.jiang@gmail.com> Co-authored-by: milan-berri <milan@berri.ai> Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com> * Revert "fix(vertex_ai): enable context-1m-2025-08-07 beta header (#21870)" (#21876) This reverts commit bce078a. * docs(ui): add pre-PR checklist to UI contributing guide Add testing and build verification steps per maintainer feedback from @yjiang-litellm. Contributors should run their related tests per-file and ensure npm run build passes before opening PRs. * Fix entries with fast and us/ * Add tests for fast and us * Add support for Priority PayGo for vertex ai and gemini * Add model pricing * fix: ensure arrival_time is set before calculating queue time * Fix: Anthropic model wildcard access issue * Add incident report * Add ability to see which model cost map is getting used * Fix name of title * Readd tpm limit * State management fixes for CheckBatchCost * Fix PR review comments * State management fixes for CheckBatchCost - Address greptile comments * fix mypy issues: * Add Noma guardrails v2 based on custom guardrails (#21400) * Fix code qa issues * Fix mypy issues * Fix mypy issues * Fix test_aaamodel_prices_and_context_window_json_is_valid * fix: update calendly on repo * fix(tests): use counter-based mock for time.time in prisma self-heal test The test used a fixed side_effect list for time.time(), but the number of calls varies by Python version, causing StopIteration on 3.12 and AssertionError on 3.14. Replace with an infinite counter-based callable and assert the timestamp was updated rather than checking for an exact value. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(tests): use absolute path for model_prices JSON in validation test The test used a relative path 'litellm/model_prices_and_context_window.json' which only works when pytest runs from a specific working directory. Use os.path based on __file__ to resolve the path reliably. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Update tests/test_litellm/test_utils.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * fix(tests): use os.path instead of Path to avoid NameError Path is not imported at module level. Use os.path.join which is already available. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * clean up mock transport: remove streaming, add defensive parsing * docs: add Google GenAI SDK tutorial (JS & Python) (#21885) * docs: add Google GenAI SDK tutorial for JS and Python Add tutorial for using Google's official GenAI SDK (@google/genai for JS, google-genai for Python) with LiteLLM proxy. Covers pass-through and native router endpoints, streaming, multi-turn chat, and multi-provider routing via model_group_alias. Also updates pass-through docs to use the new SDK replacing the deprecated @google/generative-ai. * fix(docs): correct Python SDK env var name in GenAI tutorial GOOGLE_GENAI_API_KEY does not exist in the google-genai SDK. The correct env var is GEMINI_API_KEY (or GOOGLE_API_KEY). Also note that the Python SDK has no base URL env var. * fix(docs): replace non-existent GOOGLE_GENAI_BASE_URL env var in interactions.md The Python google-genai SDK does not read GOOGLE_GENAI_BASE_URL. Use http_options={"base_url": "..."} in code instead. * docs: add network mock benchmarking section * docs: tweak benchmarks wording * fix: add auth headers and empty latencies guard to benchmark script * refactor: use method-level import for MockOpenAITransport * fix: guard print_aggregate against empty latencies * fix: add INCOMPLETE status to Interactions API enum and test Google added INCOMPLETE to the Interactions API OpenAPI spec status enum. Update both the Status3 enum in the SDK types and the test's expected values to match. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Guardrail Monitor - measure guardrail reliability in prod (#21944) * fix: fix log viewer for guardrail monitoring * feat(ui/): fix rendering logs per guardrail * fix: fix viewing logs on overview tab of guardrail * fix: log viewer * fix: fix naming to align with metric * docs: add performance & reliability section to v1.81.14 release notes * fix(tests): make RPM limit test sequential to avoid race condition Concurrent requests via run_in_executor + asyncio.gather caused a race condition where more requests slipped through the rate limiter than expected, leading to flaky test failures (e.g. 3 successes instead of 2 with rpm_limit=2). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: Singapore guardrail policies (PDPA + MAS AI Risk Management) (#21948) * feat: Singapore PDPA PII protection guardrail policy template Add Singapore Personal Data Protection Act (PDPA) guardrail support: Regex patterns (patterns.json): - sg_nric: NRIC/FIN detection ([STFGM] + 7 digits + checksum letter) - sg_phone: Singapore phone numbers (+65/0065/65 prefix) - sg_postal_code: 6-digit postal codes (contextual) - passport_singapore: Passport numbers (E/K + 7 digits, contextual) - sg_uen: Unique Entity Numbers (3 formats) - sg_bank_account: Bank account numbers (dash format, contextual) YAML policy templates (5 sub-guardrails): - sg_pdpa_personal_identifiers: s.13 Consent - sg_pdpa_sensitive_data: Advisory Guidelines - sg_pdpa_do_not_call: Part IX DNC Registry - sg_pdpa_data_transfer: s.26 overseas transfers - sg_pdpa_profiling_automated_decisions: Model AI Governance Framework Policy template entry in policy_templates.json with 9 guardrail definitions (4 regex-based + 5 YAML conditional keyword matching). Tests: - test_sg_patterns.py: regex pattern unit tests - test_sg_pdpa_guardrails.py: conditional keyword matching tests (100+ cases) * feat: MAS AI Risk Management Guidelines guardrail policy template Add Monetary Authority of Singapore (MAS) AI Risk Management Guidelines guardrail support for financial institutions: YAML policy templates (5 sub-guardrails): - sg_mas_fairness_bias: Blocks discriminatory financial AI (credit/loans/insurance by protected attributes) - sg_mas_transparency_explainability: Blocks opaque/unexplainable AI for consequential financial decisions - sg_mas_human_oversight: Blocks fully automated financial decisions without human-in-the-loop - sg_mas_data_governance: Blocks unauthorized sharing/mishandling of financial customer data - sg_mas_model_security: Blocks adversarial attacks, model poisoning, inversion on financial AI Policy template entry in policy_templates.json with 5 guardrail definitions. Aligned with MAS FEAT Principles, Project MindForge, and NIST AI RMF. Tests: - test_sg_mas_ai_guardrails.py: conditional keyword matching tests (100+ cases) * fix: address SG pattern review feedback - Update NRIC lowercase test for IGNORECASE runtime behavior - Add keyword context guard to sg_uen pattern to reduce false positives * docs: clarify MAS AIRM timeline references - Explicitly mark MAS AIRM as Nov 2025 consultation draft - Add 2018 qualifier for FEAT principles in MAS policy descriptions - Update MAS guardrail wording to avoid release-year ambiguity * chore: commit resolved MAS policy conflicts * test: * chore: * Add OpenAI Agents SDK tutorial with LiteLLM Proxy to docs (#21221) * Add OpenAI Agents SDK tutorial to docs * Update OpenAI Agents SDK tutorial to use LiteLLM environment variables * Enhance OpenAI Agents SDK tutorial with built-in LiteLLM extension details and updated configuration steps. Adjust section headings for clarity and improve the flow of information regarding model setup and usage. * adjust blog posts to fetch from github first * feat(videos): add variant parameter to video content download (#21955) openai videos models support the features to download variants. See more details here: https://developers.openai.com/api/docs/guides/video-generation#use-image-references. Plumb variant (e.g. "thumbnail", "spritesheet") through the full video content download chain: avideo_content → video_content → video_content_handler → transform_video_content_request. OpenAI appends ?variant=<value> to the GET URL; other providers accept the parameter in their signature but ignore it. * fixing path * adjust blog post path * Revert duplicate issue checker to text-based matching, remove duplicate PR workflow Remove the Claude Code-powered duplicate PR detection workflow and revert the duplicate issue checker back to wow-actions/potential-duplicates with text similarity matching. * ui changes * adding tests * adjust default aggregation threshold * fix(videos): pass api_key from litellm_params to video remix handlers (#21965) video_remix_handler and async_video_remix_handler were not falling back to litellm_params.api_key when the api_key parameter was None, causing Authorization: Bearer None to be sent to the provider. This matches the pattern already used by async_video_generation_handler. * adding testing coverage + fixing flaky tests * fix(ollama): thread api_base through get_model_info and add graceful fallback When users pass api_base to litellm.completion() for Ollama, the model info fetch (context window, function_calling support) was ignoring the user's api_base and only reading OLLAMA_API_BASE env var or defaulting to localhost:11434. This caused confusing errors in logs when Ollama runs on a remote server. Thread api_base from litellm_params through the get_model_info call chain so OllamaConfig.get_model_info() uses the correct server. Also return safe defaults instead of raising when the server is unreachable. Fixes #21967 --------- Co-authored-by: An Tang <ta@stripe.com> Co-authored-by: janfrederickk <75388864+janfrederickk@users.noreply.github.com> Co-authored-by: Zhenting Huang <3061613175@qq.com> Co-authored-by: Darien Kindlund <darien@kindlund.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: yuneng-jiang <yuneng.jiang@gmail.com> Co-authored-by: Ryan Crabbe <rcrabbe@berkeley.edu> Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com> Co-authored-by: LeeJuOh <56071126+LeeJuOh@users.noreply.github.com> Co-authored-by: Monesh Ram <31161039+WhoisMonesh@users.noreply.github.com> Co-authored-by: Trevor Prater <trevor.prater@gmail.com> Co-authored-by: The Mavik <179817126+themavik@users.noreply.github.com> Co-authored-by: Edwin Isac <33712823+edwiniac@users.noreply.github.com> Co-authored-by: milan-berri <milan@berri.ai> Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com> Co-authored-by: Sameer Kankute <sameer@berri.ai> Co-authored-by: Harshit Jain <harshitjain0562@gmail.com> Co-authored-by: Harshit Jain <48647625+Harshit28j@users.noreply.github.com> Co-authored-by: Ephrim Stanley <ephrim.stanley@point72.com> Co-authored-by: TomAlon <tom@noma.security> Co-authored-by: Julio Quinteros Pro <jquinter@gmail.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Co-authored-by: ryan-crabbe <128659760+ryan-crabbe@users.noreply.github.com> Co-authored-by: Ron Zhong <ron-zhong@hotmail.com> Co-authored-by: Arindam Majumder <109217591+Arindam200@users.noreply.github.com> Co-authored-by: Lei Nie <lenie@quora.com>

…voke (#21964) * auth_with_role_name add region_name arg for cross-account sts * update tests to include case with aws_region_name for _auth_with_aws_role * Only pass region_name to STS client when aws_region_name is set * Add optional aws_sts_endpoint to _auth_with_aws_role * Parametrize ambient-credentials test for no opts, region_name, and aws_sts_endpoint * consistently passing region and endpoint args into explicit credentials irsa * fix env var leakage * fix: bedrock openai-compatible imported-model should also have model arn encoded * feat: show proxy url in ModelHub (#21660) * fix(bedrock): correct modelInput format for Converse API batch models (#21656) * fix(proxy): add model_ids param to access group endpoints for precise deployment tagging (#21655) POST /access_group/new and PUT /access_group/{name}/update now accept an optional model_ids list that targets specific deployments by their unique model_id, instead of tagging every deployment that shares a model_name. When model_ids is provided it takes priority over model_names, giving API callers the same single-deployment precision that the UI already has via PATCH /model/{model_id}/update. Backward compatible: model_names continues to work as before. Closes #21544 * feat(proxy): add custom favicon support\n\nAdd ability to configure a custom favicon for the litellm proxy UI.\n\n- Add favicon_url field to UIThemeConfig model\n- Add LITELLM_FAVICON_URL env var support\n- Add /get_favicon endpoint to serve custom favicons\n- Update ThemeContext to dynamically set favicon\n- Add favicon URL input to UI theme settings page\n- Add comprehensive tests\n\nCloses #8323 (#21653) * fix(bedrock): prevent double UUID in create_file S3 key (#21650) In create_file for Bedrock, get_complete_file_url is called twice: once in the sync handler (generating UUID-1 for api_base) and once inside transform_create_file_request (generating UUID-2 for the actual S3 upload). The Bedrock provider correctly writes UUID-2 into litellm_params["upload_url"], but the sync handler unconditionally overwrites it with api_base (UUID-1). This causes the returned file_id to point to a non-existent S3 key. Fix: only set upload_url to api_base when transform_create_file_request has not already set it, preserving the Bedrock provider's value. Closes #21546 * feat(semantic-cache): support configurable vector dimensions for Qdrant (#21649) Add vector_size parameter to QdrantSemanticCache and expose it through the Cache facade as qdrant_semantic_cache_vector_size. This allows users to use embedding models with dimensions other than the default 1536, enabling cheaper/stronger models like Stella (1024d), bge-en-icl (4096d), voyage, cohere, etc. The parameter defaults to QDRANT_VECTOR_SIZE (env var or 1536) for backward compatibility. When creating new collections, the configured vector_size is used instead of the hardcoded constant. Closes #9377 * fix(utils): normalize camelCase thinking param keys to snake_case (#21762) Clients like OpenCode's @ai-sdk/openai-compatible send budgetTokens (camelCase) instead of budget_tokens in the thinking parameter, causing validation errors. Add early normalization in completion(). * feat: add optional digest mode for Slack alert types (#21683) Adds per-alert-type digest mode that aggregates duplicate alerts within a configurable time window and emits a single summary message with count, start/end timestamps. Configuration via general_settings.alert_type_config: alert_type_config: llm_requests_hanging: digest: true digest_interval: 86400 Digest key: (alert_type, request_model, api_base) Default interval: 24 hours Window type: fixed interval Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add blog_posts.json and local backup * feat: add GetBlogPosts utility with GitHub fetch and local fallback Adds GetBlogPosts class that fetches blog posts from GitHub with a 1-hour in-process TTL cache, validates the response, and falls back to the bundled blog_posts_backup.json on any network or validation failure. * test: add cache reset fixture and LITELLM_LOCAL_BLOG_POSTS test Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat: add GET /public/litellm_blog_posts endpoint Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: log fallback warning in blog posts endpoint and tighten test * feat: add disable_show_blog to UISettings Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat: add useUISettings and useDisableShowBlog hooks * fix: rename useUISettings to useUISettingsFlags to avoid naming collision * fix: use existing useUISettings hook in useDisableShowBlog to avoid cache duplication Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat: add BlogDropdown component with react-query and error/retry state Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: enforce 5-post limit in BlogDropdown and add cap test * fix: add retry, stable post key, enabled guard in BlogDropdown Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat: add BlogDropdown to navbar after Docs link * feat: add network_mock transport for benchmarking proxy overhead without real API calls Intercepts at httpx transport layer so the full proxy path (auth, routing, OpenAI SDK, response transformation) is exercised with zero-latency responses. Activated via `litellm_settings: { network_mock: true }` in proxy config. * Litellm dev 02 19 2026 p2 (#21871) * feat(ui/): new guardrails monitor 'demo mock representation of what guardrails monitor looks like * fix: ui updates * style(ui/): fix styling * feat: enable running ai monitor on individual guardrails * feat: add backend logic for guardrail monitoring * fix(guardrails/usage_endpoints.py): fix usage dashboard * fix(budget): fix timezone config lookup and replace hardcoded timezone map with ZoneInfo (#21754) * fix(budget): fix timezone config lookup and replace hardcoded timezone map with ZoneInfo * fix(budget): update stale docstring on get_budget_reset_time * fix: add missing return type annotations to iterator protocol methods in streaming_handler (#21750) * fix: add return type annotations to iterator protocol methods in streaming_handler Add missing return type annotations to __iter__, __aiter__, __next__, and __anext__ methods in CustomStreamWrapper and related classes. - __iter__(self) -> Iterator["ModelResponseStream"] - __aiter__(self) -> AsyncIterator["ModelResponseStream"] - __next__(self) -> "ModelResponseStream" - __anext__(self) -> "ModelResponseStream" Also adds AsyncIterator and Iterator to typing imports. Fixes issue with PLR0915 noqa comments and ensures proper type checking support. Related to: #8304 * fix: add ruff PLR0915 noqa for files with too many statements * Add gollem Go agent framework cookbook example (#21747) Show how to use gollem, a production Go agent framework, with LiteLLM proxy for multi-provider LLM access including tool use and streaming. * fix: avoid mutating caller-owned dicts in SpendUpdateQueue aggregation (#21742) * fix(vertex_ai): enable context-1m-2025-08-07 beta header (#21870) * server root path regression doc * fixing syntax * fix: replace Zapier webhook with Google Form for survey submission (#21621) * Replace Zapier webhook with Google Form for survey submission * Add back error logging for survey submission debugging --------- Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com> * Revert "Merge pull request #21140 from BerriAI/litellm_perf_user_api_key_auth" This reverts commit 0e1db3f, reversing changes made to 7e2d6f2. * test_vertex_ai_gemini_2_5_pro_streaming * UI new build * fix rendering * ui new build * docs fix * docs fix * docs fix * docs fix * docs fix * docs fix * docs fix * docs fix * release note docs * docs * adding image * fix(vertex_ai): enable context-1m-2025-08-07 beta header The `context-1m-2025-08-07` Anthropic beta header was set to `null` for vertex_ai, causing it to be filtered out when users set `extra_headers: {anthropic-beta: context-1m-2025-08-07}`. This prevented using Claude's 1M context window feature via Vertex AI, resulting in `prompt is too long: 460500 tokens > 200000 maximum` errors. Fixes #21861 --------- Co-authored-by: yuneng-jiang <yuneng.jiang@gmail.com> Co-authored-by: milan-berri <milan@berri.ai> Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com> * Revert "fix(vertex_ai): enable context-1m-2025-08-07 beta header (#21870)" (#21876) This reverts commit bce078a. * docs(ui): add pre-PR checklist to UI contributing guide Add testing and build verification steps per maintainer feedback from @yjiang-litellm. Contributors should run their related tests per-file and ensure npm run build passes before opening PRs. * Fix entries with fast and us/ * Add tests for fast and us * Add support for Priority PayGo for vertex ai and gemini * Add model pricing * fix: ensure arrival_time is set before calculating queue time * Fix: Anthropic model wildcard access issue * Add incident report * Add ability to see which model cost map is getting used * Fix name of title * Readd tpm limit * State management fixes for CheckBatchCost * Fix PR review comments * State management fixes for CheckBatchCost - Address greptile comments * fix mypy issues: * Add Noma guardrails v2 based on custom guardrails (#21400) * Fix code qa issues * Fix mypy issues * Fix mypy issues * Fix test_aaamodel_prices_and_context_window_json_is_valid * fix: update calendly on repo * fix(tests): use counter-based mock for time.time in prisma self-heal test The test used a fixed side_effect list for time.time(), but the number of calls varies by Python version, causing StopIteration on 3.12 and AssertionError on 3.14. Replace with an infinite counter-based callable and assert the timestamp was updated rather than checking for an exact value. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(tests): use absolute path for model_prices JSON in validation test The test used a relative path 'litellm/model_prices_and_context_window.json' which only works when pytest runs from a specific working directory. Use os.path based on __file__ to resolve the path reliably. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Update tests/test_litellm/test_utils.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * fix(tests): use os.path instead of Path to avoid NameError Path is not imported at module level. Use os.path.join which is already available. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * clean up mock transport: remove streaming, add defensive parsing * docs: add Google GenAI SDK tutorial (JS & Python) (#21885) * docs: add Google GenAI SDK tutorial for JS and Python Add tutorial for using Google's official GenAI SDK (@google/genai for JS, google-genai for Python) with LiteLLM proxy. Covers pass-through and native router endpoints, streaming, multi-turn chat, and multi-provider routing via model_group_alias. Also updates pass-through docs to use the new SDK replacing the deprecated @google/generative-ai. * fix(docs): correct Python SDK env var name in GenAI tutorial GOOGLE_GENAI_API_KEY does not exist in the google-genai SDK. The correct env var is GEMINI_API_KEY (or GOOGLE_API_KEY). Also note that the Python SDK has no base URL env var. * fix(docs): replace non-existent GOOGLE_GENAI_BASE_URL env var in interactions.md The Python google-genai SDK does not read GOOGLE_GENAI_BASE_URL. Use http_options={"base_url": "..."} in code instead. * docs: add network mock benchmarking section * docs: tweak benchmarks wording * fix: add auth headers and empty latencies guard to benchmark script * refactor: use method-level import for MockOpenAITransport * fix: guard print_aggregate against empty latencies * fix: add INCOMPLETE status to Interactions API enum and test Google added INCOMPLETE to the Interactions API OpenAPI spec status enum. Update both the Status3 enum in the SDK types and the test's expected values to match. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Guardrail Monitor - measure guardrail reliability in prod (#21944) * fix: fix log viewer for guardrail monitoring * feat(ui/): fix rendering logs per guardrail * fix: fix viewing logs on overview tab of guardrail * fix: log viewer * fix: fix naming to align with metric * docs: add performance & reliability section to v1.81.14 release notes * fix(tests): make RPM limit test sequential to avoid race condition Concurrent requests via run_in_executor + asyncio.gather caused a race condition where more requests slipped through the rate limiter than expected, leading to flaky test failures (e.g. 3 successes instead of 2 with rpm_limit=2). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: Singapore guardrail policies (PDPA + MAS AI Risk Management) (#21948) * feat: Singapore PDPA PII protection guardrail policy template Add Singapore Personal Data Protection Act (PDPA) guardrail support: Regex patterns (patterns.json): - sg_nric: NRIC/FIN detection ([STFGM] + 7 digits + checksum letter) - sg_phone: Singapore phone numbers (+65/0065/65 prefix) - sg_postal_code: 6-digit postal codes (contextual) - passport_singapore: Passport numbers (E/K + 7 digits, contextual) - sg_uen: Unique Entity Numbers (3 formats) - sg_bank_account: Bank account numbers (dash format, contextual) YAML policy templates (5 sub-guardrails): - sg_pdpa_personal_identifiers: s.13 Consent - sg_pdpa_sensitive_data: Advisory Guidelines - sg_pdpa_do_not_call: Part IX DNC Registry - sg_pdpa_data_transfer: s.26 overseas transfers - sg_pdpa_profiling_automated_decisions: Model AI Governance Framework Policy template entry in policy_templates.json with 9 guardrail definitions (4 regex-based + 5 YAML conditional keyword matching). Tests: - test_sg_patterns.py: regex pattern unit tests - test_sg_pdpa_guardrails.py: conditional keyword matching tests (100+ cases) * feat: MAS AI Risk Management Guidelines guardrail policy template Add Monetary Authority of Singapore (MAS) AI Risk Management Guidelines guardrail support for financial institutions: YAML policy templates (5 sub-guardrails): - sg_mas_fairness_bias: Blocks discriminatory financial AI (credit/loans/insurance by protected attributes) - sg_mas_transparency_explainability: Blocks opaque/unexplainable AI for consequential financial decisions - sg_mas_human_oversight: Blocks fully automated financial decisions without human-in-the-loop - sg_mas_data_governance: Blocks unauthorized sharing/mishandling of financial customer data - sg_mas_model_security: Blocks adversarial attacks, model poisoning, inversion on financial AI Policy template entry in policy_templates.json with 5 guardrail definitions. Aligned with MAS FEAT Principles, Project MindForge, and NIST AI RMF. Tests: - test_sg_mas_ai_guardrails.py: conditional keyword matching tests (100+ cases) * fix: address SG pattern review feedback - Update NRIC lowercase test for IGNORECASE runtime behavior - Add keyword context guard to sg_uen pattern to reduce false positives * docs: clarify MAS AIRM timeline references - Explicitly mark MAS AIRM as Nov 2025 consultation draft - Add 2018 qualifier for FEAT principles in MAS policy descriptions - Update MAS guardrail wording to avoid release-year ambiguity * chore: commit resolved MAS policy conflicts * test: * chore: * Add OpenAI Agents SDK tutorial with LiteLLM Proxy to docs (#21221) * Add OpenAI Agents SDK tutorial to docs * Update OpenAI Agents SDK tutorial to use LiteLLM environment variables * Enhance OpenAI Agents SDK tutorial with built-in LiteLLM extension details and updated configuration steps. Adjust section headings for clarity and improve the flow of information regarding model setup and usage. * adjust blog posts to fetch from github first * feat(videos): add variant parameter to video content download (#21955) openai videos models support the features to download variants. See more details here: https://developers.openai.com/api/docs/guides/video-generation#use-image-references. Plumb variant (e.g. "thumbnail", "spritesheet") through the full video content download chain: avideo_content → video_content → video_content_handler → transform_video_content_request. OpenAI appends ?variant=<value> to the GET URL; other providers accept the parameter in their signature but ignore it. * fixing path * adjust blog post path * Revert duplicate issue checker to text-based matching, remove duplicate PR workflow Remove the Claude Code-powered duplicate PR detection workflow and revert the duplicate issue checker back to wow-actions/potential-duplicates with text similarity matching. * ui changes * adding tests * fix(anthropic): sanitize tool_use IDs in assistant messages Apply _sanitize_anthropic_tool_use_id to tool_use blocks in convert_to_anthropic_tool_invoke, not just tool_result blocks. IDs from external frameworks (e.g. MiniMax) may contain characters like colons that violate Anthropic's ^[a-zA-Z0-9_-]+$ pattern. Adds test for invalid ID sanitization in tool_use blocks. --------- Co-authored-by: An Tang <ta@stripe.com> Co-authored-by: janfrederickk <75388864+janfrederickk@users.noreply.github.com> Co-authored-by: Zhenting Huang <3061613175@qq.com> Co-authored-by: Cesar Garcia <128240629+Chesars@users.noreply.github.com> Co-authored-by: Darien Kindlund <darien@kindlund.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: yuneng-jiang <yuneng.jiang@gmail.com> Co-authored-by: Ryan Crabbe <rcrabbe@berkeley.edu> Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com> Co-authored-by: LeeJuOh <56071126+LeeJuOh@users.noreply.github.com> Co-authored-by: Monesh Ram <31161039+WhoisMonesh@users.noreply.github.com> Co-authored-by: Trevor Prater <trevor.prater@gmail.com> Co-authored-by: The Mavik <179817126+themavik@users.noreply.github.com> Co-authored-by: Edwin Isac <33712823+edwiniac@users.noreply.github.com> Co-authored-by: milan-berri <milan@berri.ai> Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com> Co-authored-by: Chesars <cesarponce19544@gmail.com> Co-authored-by: Sameer Kankute <sameer@berri.ai> Co-authored-by: Harshit Jain <harshitjain0562@gmail.com> Co-authored-by: Harshit Jain <48647625+Harshit28j@users.noreply.github.com> Co-authored-by: Ephrim Stanley <ephrim.stanley@point72.com> Co-authored-by: TomAlon <tom@noma.security> Co-authored-by: Julio Quinteros Pro <jquinter@gmail.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Co-authored-by: ryan-crabbe <128659760+ryan-crabbe@users.noreply.github.com> Co-authored-by: Ron Zhong <ron-zhong@hotmail.com> Co-authored-by: Arindam Majumder <109217591+Arindam200@users.noreply.github.com> Co-authored-by: Lei Nie <lenie@quora.com>

…erriAI#21948) * feat: Singapore PDPA PII protection guardrail policy template Add Singapore Personal Data Protection Act (PDPA) guardrail support: Regex patterns (patterns.json): - sg_nric: NRIC/FIN detection ([STFGM] + 7 digits + checksum letter) - sg_phone: Singapore phone numbers (+65/0065/65 prefix) - sg_postal_code: 6-digit postal codes (contextual) - passport_singapore: Passport numbers (E/K + 7 digits, contextual) - sg_uen: Unique Entity Numbers (3 formats) - sg_bank_account: Bank account numbers (dash format, contextual) YAML policy templates (5 sub-guardrails): - sg_pdpa_personal_identifiers: s.13 Consent - sg_pdpa_sensitive_data: Advisory Guidelines - sg_pdpa_do_not_call: Part IX DNC Registry - sg_pdpa_data_transfer: s.26 overseas transfers - sg_pdpa_profiling_automated_decisions: Model AI Governance Framework Policy template entry in policy_templates.json with 9 guardrail definitions (4 regex-based + 5 YAML conditional keyword matching). Tests: - test_sg_patterns.py: regex pattern unit tests - test_sg_pdpa_guardrails.py: conditional keyword matching tests (100+ cases) * feat: MAS AI Risk Management Guidelines guardrail policy template Add Monetary Authority of Singapore (MAS) AI Risk Management Guidelines guardrail support for financial institutions: YAML policy templates (5 sub-guardrails): - sg_mas_fairness_bias: Blocks discriminatory financial AI (credit/loans/insurance by protected attributes) - sg_mas_transparency_explainability: Blocks opaque/unexplainable AI for consequential financial decisions - sg_mas_human_oversight: Blocks fully automated financial decisions without human-in-the-loop - sg_mas_data_governance: Blocks unauthorized sharing/mishandling of financial customer data - sg_mas_model_security: Blocks adversarial attacks, model poisoning, inversion on financial AI Policy template entry in policy_templates.json with 5 guardrail definitions. Aligned with MAS FEAT Principles, Project MindForge, and NIST AI RMF. Tests: - test_sg_mas_ai_guardrails.py: conditional keyword matching tests (100+ cases) * fix: address SG pattern review feedback - Update NRIC lowercase test for IGNORECASE runtime behavior - Add keyword context guard to sg_uen pattern to reduce false positives * docs: clarify MAS AIRM timeline references - Explicitly mark MAS AIRM as Nov 2025 consultation draft - Add 2018 qualifier for FEAT principles in MAS policy descriptions - Update MAS guardrail wording to avoid release-year ambiguity * chore: commit resolved MAS policy conflicts * test: * chore:

…erriAI#21970) * auth_with_role_name add region_name arg for cross-account sts * update tests to include case with aws_region_name for _auth_with_aws_role * Only pass region_name to STS client when aws_region_name is set * Add optional aws_sts_endpoint to _auth_with_aws_role * Parametrize ambient-credentials test for no opts, region_name, and aws_sts_endpoint * consistently passing region and endpoint args into explicit credentials irsa * fix env var leakage * fix: bedrock openai-compatible imported-model should also have model arn encoded * feat: show proxy url in ModelHub (BerriAI#21660) * fix(bedrock): correct modelInput format for Converse API batch models (BerriAI#21656) * fix(proxy): add model_ids param to access group endpoints for precise deployment tagging (BerriAI#21655) POST /access_group/new and PUT /access_group/{name}/update now accept an optional model_ids list that targets specific deployments by their unique model_id, instead of tagging every deployment that shares a model_name. When model_ids is provided it takes priority over model_names, giving API callers the same single-deployment precision that the UI already has via PATCH /model/{model_id}/update. Backward compatible: model_names continues to work as before. Closes BerriAI#21544 * feat(proxy): add custom favicon support\n\nAdd ability to configure a custom favicon for the litellm proxy UI.\n\n- Add favicon_url field to UIThemeConfig model\n- Add LITELLM_FAVICON_URL env var support\n- Add /get_favicon endpoint to serve custom favicons\n- Update ThemeContext to dynamically set favicon\n- Add favicon URL input to UI theme settings page\n- Add comprehensive tests\n\nCloses BerriAI#8323 (BerriAI#21653) * fix(bedrock): prevent double UUID in create_file S3 key (BerriAI#21650) In create_file for Bedrock, get_complete_file_url is called twice: once in the sync handler (generating UUID-1 for api_base) and once inside transform_create_file_request (generating UUID-2 for the actual S3 upload). The Bedrock provider correctly writes UUID-2 into litellm_params["upload_url"], but the sync handler unconditionally overwrites it with api_base (UUID-1). This causes the returned file_id to point to a non-existent S3 key. Fix: only set upload_url to api_base when transform_create_file_request has not already set it, preserving the Bedrock provider's value. Closes BerriAI#21546 * feat(semantic-cache): support configurable vector dimensions for Qdrant (BerriAI#21649) Add vector_size parameter to QdrantSemanticCache and expose it through the Cache facade as qdrant_semantic_cache_vector_size. This allows users to use embedding models with dimensions other than the default 1536, enabling cheaper/stronger models like Stella (1024d), bge-en-icl (4096d), voyage, cohere, etc. The parameter defaults to QDRANT_VECTOR_SIZE (env var or 1536) for backward compatibility. When creating new collections, the configured vector_size is used instead of the hardcoded constant. Closes BerriAI#9377 * fix(utils): normalize camelCase thinking param keys to snake_case (BerriAI#21762) Clients like OpenCode's @ai-sdk/openai-compatible send budgetTokens (camelCase) instead of budget_tokens in the thinking parameter, causing validation errors. Add early normalization in completion(). * feat: add optional digest mode for Slack alert types (BerriAI#21683) Adds per-alert-type digest mode that aggregates duplicate alerts within a configurable time window and emits a single summary message with count, start/end timestamps. Configuration via general_settings.alert_type_config: alert_type_config: llm_requests_hanging: digest: true digest_interval: 86400 Digest key: (alert_type, request_model, api_base) Default interval: 24 hours Window type: fixed interval Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add blog_posts.json and local backup * feat: add GetBlogPosts utility with GitHub fetch and local fallback Adds GetBlogPosts class that fetches blog posts from GitHub with a 1-hour in-process TTL cache, validates the response, and falls back to the bundled blog_posts_backup.json on any network or validation failure. * test: add cache reset fixture and LITELLM_LOCAL_BLOG_POSTS test Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat: add GET /public/litellm_blog_posts endpoint Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: log fallback warning in blog posts endpoint and tighten test * feat: add disable_show_blog to UISettings Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat: add useUISettings and useDisableShowBlog hooks * fix: rename useUISettings to useUISettingsFlags to avoid naming collision * fix: use existing useUISettings hook in useDisableShowBlog to avoid cache duplication Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat: add BlogDropdown component with react-query and error/retry state Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: enforce 5-post limit in BlogDropdown and add cap test * fix: add retry, stable post key, enabled guard in BlogDropdown Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat: add BlogDropdown to navbar after Docs link * feat: add network_mock transport for benchmarking proxy overhead without real API calls Intercepts at httpx transport layer so the full proxy path (auth, routing, OpenAI SDK, response transformation) is exercised with zero-latency responses. Activated via `litellm_settings: { network_mock: true }` in proxy config. * Litellm dev 02 19 2026 p2 (BerriAI#21871) * feat(ui/): new guardrails monitor 'demo mock representation of what guardrails monitor looks like * fix: ui updates * style(ui/): fix styling * feat: enable running ai monitor on individual guardrails * feat: add backend logic for guardrail monitoring * fix(guardrails/usage_endpoints.py): fix usage dashboard * fix(budget): fix timezone config lookup and replace hardcoded timezone map with ZoneInfo (BerriAI#21754) * fix(budget): fix timezone config lookup and replace hardcoded timezone map with ZoneInfo * fix(budget): update stale docstring on get_budget_reset_time * fix: add missing return type annotations to iterator protocol methods in streaming_handler (BerriAI#21750) * fix: add return type annotations to iterator protocol methods in streaming_handler Add missing return type annotations to __iter__, __aiter__, __next__, and __anext__ methods in CustomStreamWrapper and related classes. - __iter__(self) -> Iterator["ModelResponseStream"] - __aiter__(self) -> AsyncIterator["ModelResponseStream"] - __next__(self) -> "ModelResponseStream" - __anext__(self) -> "ModelResponseStream" Also adds AsyncIterator and Iterator to typing imports. Fixes issue with PLR0915 noqa comments and ensures proper type checking support. Related to: BerriAI#8304 * fix: add ruff PLR0915 noqa for files with too many statements * Add gollem Go agent framework cookbook example (BerriAI#21747) Show how to use gollem, a production Go agent framework, with LiteLLM proxy for multi-provider LLM access including tool use and streaming. * fix: avoid mutating caller-owned dicts in SpendUpdateQueue aggregation (BerriAI#21742) * fix(vertex_ai): enable context-1m-2025-08-07 beta header (BerriAI#21870) * server root path regression doc * fixing syntax * fix: replace Zapier webhook with Google Form for survey submission (BerriAI#21621) * Replace Zapier webhook with Google Form for survey submission * Add back error logging for survey submission debugging --------- Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com> * Revert "Merge pull request BerriAI#21140 from BerriAI/litellm_perf_user_api_key_auth" This reverts commit 0e1db3f, reversing changes made to 7e2d6f2. * test_vertex_ai_gemini_2_5_pro_streaming * UI new build * fix rendering * ui new build * docs fix * docs fix * docs fix * docs fix * docs fix * docs fix * docs fix * docs fix * release note docs * docs * adding image * fix(vertex_ai): enable context-1m-2025-08-07 beta header The `context-1m-2025-08-07` Anthropic beta header was set to `null` for vertex_ai, causing it to be filtered out when users set `extra_headers: {anthropic-beta: context-1m-2025-08-07}`. This prevented using Claude's 1M context window feature via Vertex AI, resulting in `prompt is too long: 460500 tokens > 200000 maximum` errors. Fixes BerriAI#21861 --------- Co-authored-by: yuneng-jiang <yuneng.jiang@gmail.com> Co-authored-by: milan-berri <milan@berri.ai> Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com> * Revert "fix(vertex_ai): enable context-1m-2025-08-07 beta header (BerriAI#21870)" (BerriAI#21876) This reverts commit bce078a. * docs(ui): add pre-PR checklist to UI contributing guide Add testing and build verification steps per maintainer feedback from @yjiang-litellm. Contributors should run their related tests per-file and ensure npm run build passes before opening PRs. * Fix entries with fast and us/ * Add tests for fast and us * Add support for Priority PayGo for vertex ai and gemini * Add model pricing * fix: ensure arrival_time is set before calculating queue time * Fix: Anthropic model wildcard access issue * Add incident report * Add ability to see which model cost map is getting used * Fix name of title * Readd tpm limit * State management fixes for CheckBatchCost * Fix PR review comments * State management fixes for CheckBatchCost - Address greptile comments * fix mypy issues: * Add Noma guardrails v2 based on custom guardrails (BerriAI#21400) * Fix code qa issues * Fix mypy issues * Fix mypy issues * Fix test_aaamodel_prices_and_context_window_json_is_valid * fix: update calendly on repo * fix(tests): use counter-based mock for time.time in prisma self-heal test The test used a fixed side_effect list for time.time(), but the number of calls varies by Python version, causing StopIteration on 3.12 and AssertionError on 3.14. Replace with an infinite counter-based callable and assert the timestamp was updated rather than checking for an exact value. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(tests): use absolute path for model_prices JSON in validation test The test used a relative path 'litellm/model_prices_and_context_window.json' which only works when pytest runs from a specific working directory. Use os.path based on __file__ to resolve the path reliably. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Update tests/test_litellm/test_utils.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * fix(tests): use os.path instead of Path to avoid NameError Path is not imported at module level. Use os.path.join which is already available. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * clean up mock transport: remove streaming, add defensive parsing * docs: add Google GenAI SDK tutorial (JS & Python) (BerriAI#21885) * docs: add Google GenAI SDK tutorial for JS and Python Add tutorial for using Google's official GenAI SDK (@google/genai for JS, google-genai for Python) with LiteLLM proxy. Covers pass-through and native router endpoints, streaming, multi-turn chat, and multi-provider routing via model_group_alias. Also updates pass-through docs to use the new SDK replacing the deprecated @google/generative-ai. * fix(docs): correct Python SDK env var name in GenAI tutorial GOOGLE_GENAI_API_KEY does not exist in the google-genai SDK. The correct env var is GEMINI_API_KEY (or GOOGLE_API_KEY). Also note that the Python SDK has no base URL env var. * fix(docs): replace non-existent GOOGLE_GENAI_BASE_URL env var in interactions.md The Python google-genai SDK does not read GOOGLE_GENAI_BASE_URL. Use http_options={"base_url": "..."} in code instead. * docs: add network mock benchmarking section * docs: tweak benchmarks wording * fix: add auth headers and empty latencies guard to benchmark script * refactor: use method-level import for MockOpenAITransport * fix: guard print_aggregate against empty latencies * fix: add INCOMPLETE status to Interactions API enum and test Google added INCOMPLETE to the Interactions API OpenAPI spec status enum. Update both the Status3 enum in the SDK types and the test's expected values to match. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Guardrail Monitor - measure guardrail reliability in prod (BerriAI#21944) * fix: fix log viewer for guardrail monitoring * feat(ui/): fix rendering logs per guardrail * fix: fix viewing logs on overview tab of guardrail * fix: log viewer * fix: fix naming to align with metric * docs: add performance & reliability section to v1.81.14 release notes * fix(tests): make RPM limit test sequential to avoid race condition Concurrent requests via run_in_executor + asyncio.gather caused a race condition where more requests slipped through the rate limiter than expected, leading to flaky test failures (e.g. 3 successes instead of 2 with rpm_limit=2). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: Singapore guardrail policies (PDPA + MAS AI Risk Management) (BerriAI#21948) * feat: Singapore PDPA PII protection guardrail policy template Add Singapore Personal Data Protection Act (PDPA) guardrail support: Regex patterns (patterns.json): - sg_nric: NRIC/FIN detection ([STFGM] + 7 digits + checksum letter) - sg_phone: Singapore phone numbers (+65/0065/65 prefix) - sg_postal_code: 6-digit postal codes (contextual) - passport_singapore: Passport numbers (E/K + 7 digits, contextual) - sg_uen: Unique Entity Numbers (3 formats) - sg_bank_account: Bank account numbers (dash format, contextual) YAML policy templates (5 sub-guardrails): - sg_pdpa_personal_identifiers: s.13 Consent - sg_pdpa_sensitive_data: Advisory Guidelines - sg_pdpa_do_not_call: Part IX DNC Registry - sg_pdpa_data_transfer: s.26 overseas transfers - sg_pdpa_profiling_automated_decisions: Model AI Governance Framework Policy template entry in policy_templates.json with 9 guardrail definitions (4 regex-based + 5 YAML conditional keyword matching). Tests: - test_sg_patterns.py: regex pattern unit tests - test_sg_pdpa_guardrails.py: conditional keyword matching tests (100+ cases) * feat: MAS AI Risk Management Guidelines guardrail policy template Add Monetary Authority of Singapore (MAS) AI Risk Management Guidelines guardrail support for financial institutions: YAML policy templates (5 sub-guardrails): - sg_mas_fairness_bias: Blocks discriminatory financial AI (credit/loans/insurance by protected attributes) - sg_mas_transparency_explainability: Blocks opaque/unexplainable AI for consequential financial decisions - sg_mas_human_oversight: Blocks fully automated financial decisions without human-in-the-loop - sg_mas_data_governance: Blocks unauthorized sharing/mishandling of financial customer data - sg_mas_model_security: Blocks adversarial attacks, model poisoning, inversion on financial AI Policy template entry in policy_templates.json with 5 guardrail definitions. Aligned with MAS FEAT Principles, Project MindForge, and NIST AI RMF. Tests: - test_sg_mas_ai_guardrails.py: conditional keyword matching tests (100+ cases) * fix: address SG pattern review feedback - Update NRIC lowercase test for IGNORECASE runtime behavior - Add keyword context guard to sg_uen pattern to reduce false positives * docs: clarify MAS AIRM timeline references - Explicitly mark MAS AIRM as Nov 2025 consultation draft - Add 2018 qualifier for FEAT principles in MAS policy descriptions - Update MAS guardrail wording to avoid release-year ambiguity * chore: commit resolved MAS policy conflicts * test: * chore: * Add OpenAI Agents SDK tutorial with LiteLLM Proxy to docs (BerriAI#21221) * Add OpenAI Agents SDK tutorial to docs * Update OpenAI Agents SDK tutorial to use LiteLLM environment variables * Enhance OpenAI Agents SDK tutorial with built-in LiteLLM extension details and updated configuration steps. Adjust section headings for clarity and improve the flow of information regarding model setup and usage. * adjust blog posts to fetch from github first * feat(videos): add variant parameter to video content download (BerriAI#21955) openai videos models support the features to download variants. See more details here: https://developers.openai.com/api/docs/guides/video-generation#use-image-references. Plumb variant (e.g. "thumbnail", "spritesheet") through the full video content download chain: avideo_content → video_content → video_content_handler → transform_video_content_request. OpenAI appends ?variant=<value> to the GET URL; other providers accept the parameter in their signature but ignore it. * fixing path * adjust blog post path * Revert duplicate issue checker to text-based matching, remove duplicate PR workflow Remove the Claude Code-powered duplicate PR detection workflow and revert the duplicate issue checker back to wow-actions/potential-duplicates with text similarity matching. * ui changes * adding tests * adjust default aggregation threshold * fix(videos): pass api_key from litellm_params to video remix handlers (BerriAI#21965) video_remix_handler and async_video_remix_handler were not falling back to litellm_params.api_key when the api_key parameter was None, causing Authorization: Bearer None to be sent to the provider. This matches the pattern already used by async_video_generation_handler. * adding testing coverage + fixing flaky tests * fix(ollama): thread api_base through get_model_info and add graceful fallback When users pass api_base to litellm.completion() for Ollama, the model info fetch (context window, function_calling support) was ignoring the user's api_base and only reading OLLAMA_API_BASE env var or defaulting to localhost:11434. This caused confusing errors in logs when Ollama runs on a remote server. Thread api_base from litellm_params through the get_model_info call chain so OllamaConfig.get_model_info() uses the correct server. Also return safe defaults instead of raising when the server is unreachable. Fixes BerriAI#21967 --------- Co-authored-by: An Tang <ta@stripe.com> Co-authored-by: janfrederickk <75388864+janfrederickk@users.noreply.github.com> Co-authored-by: Zhenting Huang <3061613175@qq.com> Co-authored-by: Darien Kindlund <darien@kindlund.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: yuneng-jiang <yuneng.jiang@gmail.com> Co-authored-by: Ryan Crabbe <rcrabbe@berkeley.edu> Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com> Co-authored-by: LeeJuOh <56071126+LeeJuOh@users.noreply.github.com> Co-authored-by: Monesh Ram <31161039+WhoisMonesh@users.noreply.github.com> Co-authored-by: Trevor Prater <trevor.prater@gmail.com> Co-authored-by: The Mavik <179817126+themavik@users.noreply.github.com> Co-authored-by: Edwin Isac <33712823+edwiniac@users.noreply.github.com> Co-authored-by: milan-berri <milan@berri.ai> Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com> Co-authored-by: Sameer Kankute <sameer@berri.ai> Co-authored-by: Harshit Jain <harshitjain0562@gmail.com> Co-authored-by: Harshit Jain <48647625+Harshit28j@users.noreply.github.com> Co-authored-by: Ephrim Stanley <ephrim.stanley@point72.com> Co-authored-by: TomAlon <tom@noma.security> Co-authored-by: Julio Quinteros Pro <jquinter@gmail.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Co-authored-by: ryan-crabbe <128659760+ryan-crabbe@users.noreply.github.com> Co-authored-by: Ron Zhong <ron-zhong@hotmail.com> Co-authored-by: Arindam Majumder <109217591+Arindam200@users.noreply.github.com> Co-authored-by: Lei Nie <lenie@quora.com>

…voke (BerriAI#21964) * auth_with_role_name add region_name arg for cross-account sts * update tests to include case with aws_region_name for _auth_with_aws_role * Only pass region_name to STS client when aws_region_name is set * Add optional aws_sts_endpoint to _auth_with_aws_role * Parametrize ambient-credentials test for no opts, region_name, and aws_sts_endpoint * consistently passing region and endpoint args into explicit credentials irsa * fix env var leakage * fix: bedrock openai-compatible imported-model should also have model arn encoded * feat: show proxy url in ModelHub (BerriAI#21660) * fix(bedrock): correct modelInput format for Converse API batch models (BerriAI#21656) * fix(proxy): add model_ids param to access group endpoints for precise deployment tagging (BerriAI#21655) POST /access_group/new and PUT /access_group/{name}/update now accept an optional model_ids list that targets specific deployments by their unique model_id, instead of tagging every deployment that shares a model_name. When model_ids is provided it takes priority over model_names, giving API callers the same single-deployment precision that the UI already has via PATCH /model/{model_id}/update. Backward compatible: model_names continues to work as before. Closes BerriAI#21544 * feat(proxy): add custom favicon support\n\nAdd ability to configure a custom favicon for the litellm proxy UI.\n\n- Add favicon_url field to UIThemeConfig model\n- Add LITELLM_FAVICON_URL env var support\n- Add /get_favicon endpoint to serve custom favicons\n- Update ThemeContext to dynamically set favicon\n- Add favicon URL input to UI theme settings page\n- Add comprehensive tests\n\nCloses BerriAI#8323 (BerriAI#21653) * fix(bedrock): prevent double UUID in create_file S3 key (BerriAI#21650) In create_file for Bedrock, get_complete_file_url is called twice: once in the sync handler (generating UUID-1 for api_base) and once inside transform_create_file_request (generating UUID-2 for the actual S3 upload). The Bedrock provider correctly writes UUID-2 into litellm_params["upload_url"], but the sync handler unconditionally overwrites it with api_base (UUID-1). This causes the returned file_id to point to a non-existent S3 key. Fix: only set upload_url to api_base when transform_create_file_request has not already set it, preserving the Bedrock provider's value. Closes BerriAI#21546 * feat(semantic-cache): support configurable vector dimensions for Qdrant (BerriAI#21649) Add vector_size parameter to QdrantSemanticCache and expose it through the Cache facade as qdrant_semantic_cache_vector_size. This allows users to use embedding models with dimensions other than the default 1536, enabling cheaper/stronger models like Stella (1024d), bge-en-icl (4096d), voyage, cohere, etc. The parameter defaults to QDRANT_VECTOR_SIZE (env var or 1536) for backward compatibility. When creating new collections, the configured vector_size is used instead of the hardcoded constant. Closes BerriAI#9377 * fix(utils): normalize camelCase thinking param keys to snake_case (BerriAI#21762) Clients like OpenCode's @ai-sdk/openai-compatible send budgetTokens (camelCase) instead of budget_tokens in the thinking parameter, causing validation errors. Add early normalization in completion(). * feat: add optional digest mode for Slack alert types (BerriAI#21683) Adds per-alert-type digest mode that aggregates duplicate alerts within a configurable time window and emits a single summary message with count, start/end timestamps. Configuration via general_settings.alert_type_config: alert_type_config: llm_requests_hanging: digest: true digest_interval: 86400 Digest key: (alert_type, request_model, api_base) Default interval: 24 hours Window type: fixed interval Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add blog_posts.json and local backup * feat: add GetBlogPosts utility with GitHub fetch and local fallback Adds GetBlogPosts class that fetches blog posts from GitHub with a 1-hour in-process TTL cache, validates the response, and falls back to the bundled blog_posts_backup.json on any network or validation failure. * test: add cache reset fixture and LITELLM_LOCAL_BLOG_POSTS test Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat: add GET /public/litellm_blog_posts endpoint Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: log fallback warning in blog posts endpoint and tighten test * feat: add disable_show_blog to UISettings Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat: add useUISettings and useDisableShowBlog hooks * fix: rename useUISettings to useUISettingsFlags to avoid naming collision * fix: use existing useUISettings hook in useDisableShowBlog to avoid cache duplication Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat: add BlogDropdown component with react-query and error/retry state Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: enforce 5-post limit in BlogDropdown and add cap test * fix: add retry, stable post key, enabled guard in BlogDropdown Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat: add BlogDropdown to navbar after Docs link * feat: add network_mock transport for benchmarking proxy overhead without real API calls Intercepts at httpx transport layer so the full proxy path (auth, routing, OpenAI SDK, response transformation) is exercised with zero-latency responses. Activated via `litellm_settings: { network_mock: true }` in proxy config. * Litellm dev 02 19 2026 p2 (BerriAI#21871) * feat(ui/): new guardrails monitor 'demo mock representation of what guardrails monitor looks like * fix: ui updates * style(ui/): fix styling * feat: enable running ai monitor on individual guardrails * feat: add backend logic for guardrail monitoring * fix(guardrails/usage_endpoints.py): fix usage dashboard * fix(budget): fix timezone config lookup and replace hardcoded timezone map with ZoneInfo (BerriAI#21754) * fix(budget): fix timezone config lookup and replace hardcoded timezone map with ZoneInfo * fix(budget): update stale docstring on get_budget_reset_time * fix: add missing return type annotations to iterator protocol methods in streaming_handler (BerriAI#21750) * fix: add return type annotations to iterator protocol methods in streaming_handler Add missing return type annotations to __iter__, __aiter__, __next__, and __anext__ methods in CustomStreamWrapper and related classes. - __iter__(self) -> Iterator["ModelResponseStream"] - __aiter__(self) -> AsyncIterator["ModelResponseStream"] - __next__(self) -> "ModelResponseStream" - __anext__(self) -> "ModelResponseStream" Also adds AsyncIterator and Iterator to typing imports. Fixes issue with PLR0915 noqa comments and ensures proper type checking support. Related to: BerriAI#8304 * fix: add ruff PLR0915 noqa for files with too many statements * Add gollem Go agent framework cookbook example (BerriAI#21747) Show how to use gollem, a production Go agent framework, with LiteLLM proxy for multi-provider LLM access including tool use and streaming. * fix: avoid mutating caller-owned dicts in SpendUpdateQueue aggregation (BerriAI#21742) * fix(vertex_ai): enable context-1m-2025-08-07 beta header (BerriAI#21870) * server root path regression doc * fixing syntax * fix: replace Zapier webhook with Google Form for survey submission (BerriAI#21621) * Replace Zapier webhook with Google Form for survey submission * Add back error logging for survey submission debugging --------- Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com> * Revert "Merge pull request BerriAI#21140 from BerriAI/litellm_perf_user_api_key_auth" This reverts commit 0e1db3f, reversing changes made to 7e2d6f2. * test_vertex_ai_gemini_2_5_pro_streaming * UI new build * fix rendering * ui new build * docs fix * docs fix * docs fix * docs fix * docs fix * docs fix * docs fix * docs fix * release note docs * docs * adding image * fix(vertex_ai): enable context-1m-2025-08-07 beta header The `context-1m-2025-08-07` Anthropic beta header was set to `null` for vertex_ai, causing it to be filtered out when users set `extra_headers: {anthropic-beta: context-1m-2025-08-07}`. This prevented using Claude's 1M context window feature via Vertex AI, resulting in `prompt is too long: 460500 tokens > 200000 maximum` errors. Fixes BerriAI#21861 --------- Co-authored-by: yuneng-jiang <yuneng.jiang@gmail.com> Co-authored-by: milan-berri <milan@berri.ai> Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com> * Revert "fix(vertex_ai): enable context-1m-2025-08-07 beta header (BerriAI#21870)" (BerriAI#21876) This reverts commit bce078a. * docs(ui): add pre-PR checklist to UI contributing guide Add testing and build verification steps per maintainer feedback from @yjiang-litellm. Contributors should run their related tests per-file and ensure npm run build passes before opening PRs. * Fix entries with fast and us/ * Add tests for fast and us * Add support for Priority PayGo for vertex ai and gemini * Add model pricing * fix: ensure arrival_time is set before calculating queue time * Fix: Anthropic model wildcard access issue * Add incident report * Add ability to see which model cost map is getting used * Fix name of title * Readd tpm limit * State management fixes for CheckBatchCost * Fix PR review comments * State management fixes for CheckBatchCost - Address greptile comments * fix mypy issues: * Add Noma guardrails v2 based on custom guardrails (BerriAI#21400) * Fix code qa issues * Fix mypy issues * Fix mypy issues * Fix test_aaamodel_prices_and_context_window_json_is_valid * fix: update calendly on repo * fix(tests): use counter-based mock for time.time in prisma self-heal test The test used a fixed side_effect list for time.time(), but the number of calls varies by Python version, causing StopIteration on 3.12 and AssertionError on 3.14. Replace with an infinite counter-based callable and assert the timestamp was updated rather than checking for an exact value. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(tests): use absolute path for model_prices JSON in validation test The test used a relative path 'litellm/model_prices_and_context_window.json' which only works when pytest runs from a specific working directory. Use os.path based on __file__ to resolve the path reliably. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Update tests/test_litellm/test_utils.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * fix(tests): use os.path instead of Path to avoid NameError Path is not imported at module level. Use os.path.join which is already available. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * clean up mock transport: remove streaming, add defensive parsing * docs: add Google GenAI SDK tutorial (JS & Python) (BerriAI#21885) * docs: add Google GenAI SDK tutorial for JS and Python Add tutorial for using Google's official GenAI SDK (@google/genai for JS, google-genai for Python) with LiteLLM proxy. Covers pass-through and native router endpoints, streaming, multi-turn chat, and multi-provider routing via model_group_alias. Also updates pass-through docs to use the new SDK replacing the deprecated @google/generative-ai. * fix(docs): correct Python SDK env var name in GenAI tutorial GOOGLE_GENAI_API_KEY does not exist in the google-genai SDK. The correct env var is GEMINI_API_KEY (or GOOGLE_API_KEY). Also note that the Python SDK has no base URL env var. * fix(docs): replace non-existent GOOGLE_GENAI_BASE_URL env var in interactions.md The Python google-genai SDK does not read GOOGLE_GENAI_BASE_URL. Use http_options={"base_url": "..."} in code instead. * docs: add network mock benchmarking section * docs: tweak benchmarks wording * fix: add auth headers and empty latencies guard to benchmark script * refactor: use method-level import for MockOpenAITransport * fix: guard print_aggregate against empty latencies * fix: add INCOMPLETE status to Interactions API enum and test Google added INCOMPLETE to the Interactions API OpenAPI spec status enum. Update both the Status3 enum in the SDK types and the test's expected values to match. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Guardrail Monitor - measure guardrail reliability in prod (BerriAI#21944) * fix: fix log viewer for guardrail monitoring * feat(ui/): fix rendering logs per guardrail * fix: fix viewing logs on overview tab of guardrail * fix: log viewer * fix: fix naming to align with metric * docs: add performance & reliability section to v1.81.14 release notes * fix(tests): make RPM limit test sequential to avoid race condition Concurrent requests via run_in_executor + asyncio.gather caused a race condition where more requests slipped through the rate limiter than expected, leading to flaky test failures (e.g. 3 successes instead of 2 with rpm_limit=2). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: Singapore guardrail policies (PDPA + MAS AI Risk Management) (BerriAI#21948) * feat: Singapore PDPA PII protection guardrail policy template Add Singapore Personal Data Protection Act (PDPA) guardrail support: Regex patterns (patterns.json): - sg_nric: NRIC/FIN detection ([STFGM] + 7 digits + checksum letter) - sg_phone: Singapore phone numbers (+65/0065/65 prefix) - sg_postal_code: 6-digit postal codes (contextual) - passport_singapore: Passport numbers (E/K + 7 digits, contextual) - sg_uen: Unique Entity Numbers (3 formats) - sg_bank_account: Bank account numbers (dash format, contextual) YAML policy templates (5 sub-guardrails): - sg_pdpa_personal_identifiers: s.13 Consent - sg_pdpa_sensitive_data: Advisory Guidelines - sg_pdpa_do_not_call: Part IX DNC Registry - sg_pdpa_data_transfer: s.26 overseas transfers - sg_pdpa_profiling_automated_decisions: Model AI Governance Framework Policy template entry in policy_templates.json with 9 guardrail definitions (4 regex-based + 5 YAML conditional keyword matching). Tests: - test_sg_patterns.py: regex pattern unit tests - test_sg_pdpa_guardrails.py: conditional keyword matching tests (100+ cases) * feat: MAS AI Risk Management Guidelines guardrail policy template Add Monetary Authority of Singapore (MAS) AI Risk Management Guidelines guardrail support for financial institutions: YAML policy templates (5 sub-guardrails): - sg_mas_fairness_bias: Blocks discriminatory financial AI (credit/loans/insurance by protected attributes) - sg_mas_transparency_explainability: Blocks opaque/unexplainable AI for consequential financial decisions - sg_mas_human_oversight: Blocks fully automated financial decisions without human-in-the-loop - sg_mas_data_governance: Blocks unauthorized sharing/mishandling of financial customer data - sg_mas_model_security: Blocks adversarial attacks, model poisoning, inversion on financial AI Policy template entry in policy_templates.json with 5 guardrail definitions. Aligned with MAS FEAT Principles, Project MindForge, and NIST AI RMF. Tests: - test_sg_mas_ai_guardrails.py: conditional keyword matching tests (100+ cases) * fix: address SG pattern review feedback - Update NRIC lowercase test for IGNORECASE runtime behavior - Add keyword context guard to sg_uen pattern to reduce false positives * docs: clarify MAS AIRM timeline references - Explicitly mark MAS AIRM as Nov 2025 consultation draft - Add 2018 qualifier for FEAT principles in MAS policy descriptions - Update MAS guardrail wording to avoid release-year ambiguity * chore: commit resolved MAS policy conflicts * test: * chore: * Add OpenAI Agents SDK tutorial with LiteLLM Proxy to docs (BerriAI#21221) * Add OpenAI Agents SDK tutorial to docs * Update OpenAI Agents SDK tutorial to use LiteLLM environment variables * Enhance OpenAI Agents SDK tutorial with built-in LiteLLM extension details and updated configuration steps. Adjust section headings for clarity and improve the flow of information regarding model setup and usage. * adjust blog posts to fetch from github first * feat(videos): add variant parameter to video content download (BerriAI#21955) openai videos models support the features to download variants. See more details here: https://developers.openai.com/api/docs/guides/video-generation#use-image-references. Plumb variant (e.g. "thumbnail", "spritesheet") through the full video content download chain: avideo_content → video_content → video_content_handler → transform_video_content_request. OpenAI appends ?variant=<value> to the GET URL; other providers accept the parameter in their signature but ignore it. * fixing path * adjust blog post path * Revert duplicate issue checker to text-based matching, remove duplicate PR workflow Remove the Claude Code-powered duplicate PR detection workflow and revert the duplicate issue checker back to wow-actions/potential-duplicates with text similarity matching. * ui changes * adding tests * fix(anthropic): sanitize tool_use IDs in assistant messages Apply _sanitize_anthropic_tool_use_id to tool_use blocks in convert_to_anthropic_tool_invoke, not just tool_result blocks. IDs from external frameworks (e.g. MiniMax) may contain characters like colons that violate Anthropic's ^[a-zA-Z0-9_-]+$ pattern. Adds test for invalid ID sanitization in tool_use blocks. --------- Co-authored-by: An Tang <ta@stripe.com> Co-authored-by: janfrederickk <75388864+janfrederickk@users.noreply.github.com> Co-authored-by: Zhenting Huang <3061613175@qq.com> Co-authored-by: Cesar Garcia <128240629+Chesars@users.noreply.github.com> Co-authored-by: Darien Kindlund <darien@kindlund.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: yuneng-jiang <yuneng.jiang@gmail.com> Co-authored-by: Ryan Crabbe <rcrabbe@berkeley.edu> Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com> Co-authored-by: LeeJuOh <56071126+LeeJuOh@users.noreply.github.com> Co-authored-by: Monesh Ram <31161039+WhoisMonesh@users.noreply.github.com> Co-authored-by: Trevor Prater <trevor.prater@gmail.com> Co-authored-by: The Mavik <179817126+themavik@users.noreply.github.com> Co-authored-by: Edwin Isac <33712823+edwiniac@users.noreply.github.com> Co-authored-by: milan-berri <milan@berri.ai> Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com> Co-authored-by: Chesars <cesarponce19544@gmail.com> Co-authored-by: Sameer Kankute <sameer@berri.ai> Co-authored-by: Harshit Jain <harshitjain0562@gmail.com> Co-authored-by: Harshit Jain <48647625+Harshit28j@users.noreply.github.com> Co-authored-by: Ephrim Stanley <ephrim.stanley@point72.com> Co-authored-by: TomAlon <tom@noma.security> Co-authored-by: Julio Quinteros Pro <jquinter@gmail.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Co-authored-by: ryan-crabbe <128659760+ryan-crabbe@users.noreply.github.com> Co-authored-by: Ron Zhong <ron-zhong@hotmail.com> Co-authored-by: Arindam Majumder <109217591+Arindam200@users.noreply.github.com> Co-authored-by: Lei Nie <lenie@quora.com>

ron-zhong · 2026-02-26T08:41:55Z

Relevant PRs:

…21970) * auth_with_role_name add region_name arg for cross-account sts * update tests to include case with aws_region_name for _auth_with_aws_role * Only pass region_name to STS client when aws_region_name is set * Add optional aws_sts_endpoint to _auth_with_aws_role * Parametrize ambient-credentials test for no opts, region_name, and aws_sts_endpoint * consistently passing region and endpoint args into explicit credentials irsa * fix env var leakage * fix: bedrock openai-compatible imported-model should also have model arn encoded * feat: show proxy url in ModelHub (#21660) * fix(bedrock): correct modelInput format for Converse API batch models (#21656) * fix(proxy): add model_ids param to access group endpoints for precise deployment tagging (#21655) POST /access_group/new and PUT /access_group/{name}/update now accept an optional model_ids list that targets specific deployments by their unique model_id, instead of tagging every deployment that shares a model_name. When model_ids is provided it takes priority over model_names, giving API callers the same single-deployment precision that the UI already has via PATCH /model/{model_id}/update. Backward compatible: model_names continues to work as before. Closes #21544 * feat(proxy): add custom favicon support\n\nAdd ability to configure a custom favicon for the litellm proxy UI.\n\n- Add favicon_url field to UIThemeConfig model\n- Add LITELLM_FAVICON_URL env var support\n- Add /get_favicon endpoint to serve custom favicons\n- Update ThemeContext to dynamically set favicon\n- Add favicon URL input to UI theme settings page\n- Add comprehensive tests\n\nCloses #8323 (#21653) * fix(bedrock): prevent double UUID in create_file S3 key (#21650) In create_file for Bedrock, get_complete_file_url is called twice: once in the sync handler (generating UUID-1 for api_base) and once inside transform_create_file_request (generating UUID-2 for the actual S3 upload). The Bedrock provider correctly writes UUID-2 into litellm_params["upload_url"], but the sync handler unconditionally overwrites it with api_base (UUID-1). This causes the returned file_id to point to a non-existent S3 key. Fix: only set upload_url to api_base when transform_create_file_request has not already set it, preserving the Bedrock provider's value. Closes #21546 * feat(semantic-cache): support configurable vector dimensions for Qdrant (#21649) Add vector_size parameter to QdrantSemanticCache and expose it through the Cache facade as qdrant_semantic_cache_vector_size. This allows users to use embedding models with dimensions other than the default 1536, enabling cheaper/stronger models like Stella (1024d), bge-en-icl (4096d), voyage, cohere, etc. The parameter defaults to QDRANT_VECTOR_SIZE (env var or 1536) for backward compatibility. When creating new collections, the configured vector_size is used instead of the hardcoded constant. Closes #9377 * fix(utils): normalize camelCase thinking param keys to snake_case (#21762) Clients like OpenCode's @ai-sdk/openai-compatible send budgetTokens (camelCase) instead of budget_tokens in the thinking parameter, causing validation errors. Add early normalization in completion(). * feat: add optional digest mode for Slack alert types (#21683) Adds per-alert-type digest mode that aggregates duplicate alerts within a configurable time window and emits a single summary message with count, start/end timestamps. Configuration via general_settings.alert_type_config: alert_type_config: llm_requests_hanging: digest: true digest_interval: 86400 Digest key: (alert_type, request_model, api_base) Default interval: 24 hours Window type: fixed interval Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add blog_posts.json and local backup * feat: add GetBlogPosts utility with GitHub fetch and local fallback Adds GetBlogPosts class that fetches blog posts from GitHub with a 1-hour in-process TTL cache, validates the response, and falls back to the bundled blog_posts_backup.json on any network or validation failure. * test: add cache reset fixture and LITELLM_LOCAL_BLOG_POSTS test Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat: add GET /public/litellm_blog_posts endpoint Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: log fallback warning in blog posts endpoint and tighten test * feat: add disable_show_blog to UISettings Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat: add useUISettings and useDisableShowBlog hooks * fix: rename useUISettings to useUISettingsFlags to avoid naming collision * fix: use existing useUISettings hook in useDisableShowBlog to avoid cache duplication Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat: add BlogDropdown component with react-query and error/retry state Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: enforce 5-post limit in BlogDropdown and add cap test * fix: add retry, stable post key, enabled guard in BlogDropdown Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat: add BlogDropdown to navbar after Docs link * feat: add network_mock transport for benchmarking proxy overhead without real API calls Intercepts at httpx transport layer so the full proxy path (auth, routing, OpenAI SDK, response transformation) is exercised with zero-latency responses. Activated via `litellm_settings: { network_mock: true }` in proxy config. * Litellm dev 02 19 2026 p2 (#21871) * feat(ui/): new guardrails monitor 'demo mock representation of what guardrails monitor looks like * fix: ui updates * style(ui/): fix styling * feat: enable running ai monitor on individual guardrails * feat: add backend logic for guardrail monitoring * fix(guardrails/usage_endpoints.py): fix usage dashboard * fix(budget): fix timezone config lookup and replace hardcoded timezone map with ZoneInfo (#21754) * fix(budget): fix timezone config lookup and replace hardcoded timezone map with ZoneInfo * fix(budget): update stale docstring on get_budget_reset_time * fix: add missing return type annotations to iterator protocol methods in streaming_handler (#21750) * fix: add return type annotations to iterator protocol methods in streaming_handler Add missing return type annotations to __iter__, __aiter__, __next__, and __anext__ methods in CustomStreamWrapper and related classes. - __iter__(self) -> Iterator["ModelResponseStream"] - __aiter__(self) -> AsyncIterator["ModelResponseStream"] - __next__(self) -> "ModelResponseStream" - __anext__(self) -> "ModelResponseStream" Also adds AsyncIterator and Iterator to typing imports. Fixes issue with PLR0915 noqa comments and ensures proper type checking support. Related to: #8304 * fix: add ruff PLR0915 noqa for files with too many statements * Add gollem Go agent framework cookbook example (#21747) Show how to use gollem, a production Go agent framework, with LiteLLM proxy for multi-provider LLM access including tool use and streaming. * fix: avoid mutating caller-owned dicts in SpendUpdateQueue aggregation (#21742) * fix(vertex_ai): enable context-1m-2025-08-07 beta header (#21870) * server root path regression doc * fixing syntax * fix: replace Zapier webhook with Google Form for survey submission (#21621) * Replace Zapier webhook with Google Form for survey submission * Add back error logging for survey submission debugging --------- Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com> * Revert "Merge pull request #21140 from BerriAI/litellm_perf_user_api_key_auth" This reverts commit 0e1db3f, reversing changes made to 7e2d6f2. * test_vertex_ai_gemini_2_5_pro_streaming * UI new build * fix rendering * ui new build * docs fix * docs fix * docs fix * docs fix * docs fix * docs fix * docs fix * docs fix * release note docs * docs * adding image * fix(vertex_ai): enable context-1m-2025-08-07 beta header The `context-1m-2025-08-07` Anthropic beta header was set to `null` for vertex_ai, causing it to be filtered out when users set `extra_headers: {anthropic-beta: context-1m-2025-08-07}`. This prevented using Claude's 1M context window feature via Vertex AI, resulting in `prompt is too long: 460500 tokens > 200000 maximum` errors. Fixes #21861 --------- Co-authored-by: yuneng-jiang <yuneng.jiang@gmail.com> Co-authored-by: milan-berri <milan@berri.ai> Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com> * Revert "fix(vertex_ai): enable context-1m-2025-08-07 beta header (#21870)" (#21876) This reverts commit bce078a. * docs(ui): add pre-PR checklist to UI contributing guide Add testing and build verification steps per maintainer feedback from @yjiang-litellm. Contributors should run their related tests per-file and ensure npm run build passes before opening PRs. * Fix entries with fast and us/ * Add tests for fast and us * Add support for Priority PayGo for vertex ai and gemini * Add model pricing * fix: ensure arrival_time is set before calculating queue time * Fix: Anthropic model wildcard access issue * Add incident report * Add ability to see which model cost map is getting used * Fix name of title * Readd tpm limit * State management fixes for CheckBatchCost * Fix PR review comments * State management fixes for CheckBatchCost - Address greptile comments * fix mypy issues: * Add Noma guardrails v2 based on custom guardrails (#21400) * Fix code qa issues * Fix mypy issues * Fix mypy issues * Fix test_aaamodel_prices_and_context_window_json_is_valid * fix: update calendly on repo * fix(tests): use counter-based mock for time.time in prisma self-heal test The test used a fixed side_effect list for time.time(), but the number of calls varies by Python version, causing StopIteration on 3.12 and AssertionError on 3.14. Replace with an infinite counter-based callable and assert the timestamp was updated rather than checking for an exact value. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(tests): use absolute path for model_prices JSON in validation test The test used a relative path 'litellm/model_prices_and_context_window.json' which only works when pytest runs from a specific working directory. Use os.path based on __file__ to resolve the path reliably. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Update tests/test_litellm/test_utils.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * fix(tests): use os.path instead of Path to avoid NameError Path is not imported at module level. Use os.path.join which is already available. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * clean up mock transport: remove streaming, add defensive parsing * docs: add Google GenAI SDK tutorial (JS & Python) (#21885) * docs: add Google GenAI SDK tutorial for JS and Python Add tutorial for using Google's official GenAI SDK (@google/genai for JS, google-genai for Python) with LiteLLM proxy. Covers pass-through and native router endpoints, streaming, multi-turn chat, and multi-provider routing via model_group_alias. Also updates pass-through docs to use the new SDK replacing the deprecated @google/generative-ai. * fix(docs): correct Python SDK env var name in GenAI tutorial GOOGLE_GENAI_API_KEY does not exist in the google-genai SDK. The correct env var is GEMINI_API_KEY (or GOOGLE_API_KEY). Also note that the Python SDK has no base URL env var. * fix(docs): replace non-existent GOOGLE_GENAI_BASE_URL env var in interactions.md The Python google-genai SDK does not read GOOGLE_GENAI_BASE_URL. Use http_options={"base_url": "..."} in code instead. * docs: add network mock benchmarking section * docs: tweak benchmarks wording * fix: add auth headers and empty latencies guard to benchmark script * refactor: use method-level import for MockOpenAITransport * fix: guard print_aggregate against empty latencies * fix: add INCOMPLETE status to Interactions API enum and test Google added INCOMPLETE to the Interactions API OpenAPI spec status enum. Update both the Status3 enum in the SDK types and the test's expected values to match. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Guardrail Monitor - measure guardrail reliability in prod (#21944) * fix: fix log viewer for guardrail monitoring * feat(ui/): fix rendering logs per guardrail * fix: fix viewing logs on overview tab of guardrail * fix: log viewer * fix: fix naming to align with metric * docs: add performance & reliability section to v1.81.14 release notes * fix(tests): make RPM limit test sequential to avoid race condition Concurrent requests via run_in_executor + asyncio.gather caused a race condition where more requests slipped through the rate limiter than expected, leading to flaky test failures (e.g. 3 successes instead of 2 with rpm_limit=2). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: Singapore guardrail policies (PDPA + MAS AI Risk Management) (#21948) * feat: Singapore PDPA PII protection guardrail policy template Add Singapore Personal Data Protection Act (PDPA) guardrail support: Regex patterns (patterns.json): - sg_nric: NRIC/FIN detection ([STFGM] + 7 digits + checksum letter) - sg_phone: Singapore phone numbers (+65/0065/65 prefix) - sg_postal_code: 6-digit postal codes (contextual) - passport_singapore: Passport numbers (E/K + 7 digits, contextual) - sg_uen: Unique Entity Numbers (3 formats) - sg_bank_account: Bank account numbers (dash format, contextual) YAML policy templates (5 sub-guardrails): - sg_pdpa_personal_identifiers: s.13 Consent - sg_pdpa_sensitive_data: Advisory Guidelines - sg_pdpa_do_not_call: Part IX DNC Registry - sg_pdpa_data_transfer: s.26 overseas transfers - sg_pdpa_profiling_automated_decisions: Model AI Governance Framework Policy template entry in policy_templates.json with 9 guardrail definitions (4 regex-based + 5 YAML conditional keyword matching). Tests: - test_sg_patterns.py: regex pattern unit tests - test_sg_pdpa_guardrails.py: conditional keyword matching tests (100+ cases) * feat: MAS AI Risk Management Guidelines guardrail policy template Add Monetary Authority of Singapore (MAS) AI Risk Management Guidelines guardrail support for financial institutions: YAML policy templates (5 sub-guardrails): - sg_mas_fairness_bias: Blocks discriminatory financial AI (credit/loans/insurance by protected attributes) - sg_mas_transparency_explainability: Blocks opaque/unexplainable AI for consequential financial decisions - sg_mas_human_oversight: Blocks fully automated financial decisions without human-in-the-loop - sg_mas_data_governance: Blocks unauthorized sharing/mishandling of financial customer data - sg_mas_model_security: Blocks adversarial attacks, model poisoning, inversion on financial AI Policy template entry in policy_templates.json with 5 guardrail definitions. Aligned with MAS FEAT Principles, Project MindForge, and NIST AI RMF. Tests: - test_sg_mas_ai_guardrails.py: conditional keyword matching tests (100+ cases) * fix: address SG pattern review feedback - Update NRIC lowercase test for IGNORECASE runtime behavior - Add keyword context guard to sg_uen pattern to reduce false positives * docs: clarify MAS AIRM timeline references - Explicitly mark MAS AIRM as Nov 2025 consultation draft - Add 2018 qualifier for FEAT principles in MAS policy descriptions - Update MAS guardrail wording to avoid release-year ambiguity * chore: commit resolved MAS policy conflicts * test: * chore: * Add OpenAI Agents SDK tutorial with LiteLLM Proxy to docs (#21221) * Add OpenAI Agents SDK tutorial to docs * Update OpenAI Agents SDK tutorial to use LiteLLM environment variables * Enhance OpenAI Agents SDK tutorial with built-in LiteLLM extension details and updated configuration steps. Adjust section headings for clarity and improve the flow of information regarding model setup and usage. * adjust blog posts to fetch from github first * feat(videos): add variant parameter to video content download (#21955) openai videos models support the features to download variants. See more details here: https://developers.openai.com/api/docs/guides/video-generation#use-image-references. Plumb variant (e.g. "thumbnail", "spritesheet") through the full video content download chain: avideo_content → video_content → video_content_handler → transform_video_content_request. OpenAI appends ?variant=<value> to the GET URL; other providers accept the parameter in their signature but ignore it. * fixing path * adjust blog post path * Revert duplicate issue checker to text-based matching, remove duplicate PR workflow Remove the Claude Code-powered duplicate PR detection workflow and revert the duplicate issue checker back to wow-actions/potential-duplicates with text similarity matching. * ui changes * adding tests * adjust default aggregation threshold * fix(videos): pass api_key from litellm_params to video remix handlers (#21965) video_remix_handler and async_video_remix_handler were not falling back to litellm_params.api_key when the api_key parameter was None, causing Authorization: Bearer None to be sent to the provider. This matches the pattern already used by async_video_generation_handler. * adding testing coverage + fixing flaky tests * fix(ollama): thread api_base through get_model_info and add graceful fallback When users pass api_base to litellm.completion() for Ollama, the model info fetch (context window, function_calling support) was ignoring the user's api_base and only reading OLLAMA_API_BASE env var or defaulting to localhost:11434. This caused confusing errors in logs when Ollama runs on a remote server. Thread api_base from litellm_params through the get_model_info call chain so OllamaConfig.get_model_info() uses the correct server. Also return safe defaults instead of raising when the server is unreachable. Fixes #21967 --------- Co-authored-by: An Tang <ta@stripe.com> Co-authored-by: janfrederickk <75388864+janfrederickk@users.noreply.github.com> Co-authored-by: Zhenting Huang <3061613175@qq.com> Co-authored-by: Darien Kindlund <darien@kindlund.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: yuneng-jiang <yuneng.jiang@gmail.com> Co-authored-by: Ryan Crabbe <rcrabbe@berkeley.edu> Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com> Co-authored-by: LeeJuOh <56071126+LeeJuOh@users.noreply.github.com> Co-authored-by: Monesh Ram <31161039+WhoisMonesh@users.noreply.github.com> Co-authored-by: Trevor Prater <trevor.prater@gmail.com> Co-authored-by: The Mavik <179817126+themavik@users.noreply.github.com> Co-authored-by: Edwin Isac <33712823+edwiniac@users.noreply.github.com> Co-authored-by: milan-berri <milan@berri.ai> Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com> Co-authored-by: Sameer Kankute <sameer@berri.ai> Co-authored-by: Harshit Jain <harshitjain0562@gmail.com> Co-authored-by: Harshit Jain <48647625+Harshit28j@users.noreply.github.com> Co-authored-by: Ephrim Stanley <ephrim.stanley@point72.com> Co-authored-by: TomAlon <tom@noma.security> Co-authored-by: Julio Quinteros Pro <jquinter@gmail.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Co-authored-by: ryan-crabbe <128659760+ryan-crabbe@users.noreply.github.com> Co-authored-by: Ron Zhong <ron-zhong@hotmail.com> Co-authored-by: Arindam Majumder <109217591+Arindam200@users.noreply.github.com> Co-authored-by: Lei Nie <lenie@quora.com>

…voke (#21964) * auth_with_role_name add region_name arg for cross-account sts * update tests to include case with aws_region_name for _auth_with_aws_role * Only pass region_name to STS client when aws_region_name is set * Add optional aws_sts_endpoint to _auth_with_aws_role * Parametrize ambient-credentials test for no opts, region_name, and aws_sts_endpoint * consistently passing region and endpoint args into explicit credentials irsa * fix env var leakage * fix: bedrock openai-compatible imported-model should also have model arn encoded * feat: show proxy url in ModelHub (#21660) * fix(bedrock): correct modelInput format for Converse API batch models (#21656) * fix(proxy): add model_ids param to access group endpoints for precise deployment tagging (#21655) POST /access_group/new and PUT /access_group/{name}/update now accept an optional model_ids list that targets specific deployments by their unique model_id, instead of tagging every deployment that shares a model_name. When model_ids is provided it takes priority over model_names, giving API callers the same single-deployment precision that the UI already has via PATCH /model/{model_id}/update. Backward compatible: model_names continues to work as before. Closes #21544 * feat(proxy): add custom favicon support\n\nAdd ability to configure a custom favicon for the litellm proxy UI.\n\n- Add favicon_url field to UIThemeConfig model\n- Add LITELLM_FAVICON_URL env var support\n- Add /get_favicon endpoint to serve custom favicons\n- Update ThemeContext to dynamically set favicon\n- Add favicon URL input to UI theme settings page\n- Add comprehensive tests\n\nCloses #8323 (#21653) * fix(bedrock): prevent double UUID in create_file S3 key (#21650) In create_file for Bedrock, get_complete_file_url is called twice: once in the sync handler (generating UUID-1 for api_base) and once inside transform_create_file_request (generating UUID-2 for the actual S3 upload). The Bedrock provider correctly writes UUID-2 into litellm_params["upload_url"], but the sync handler unconditionally overwrites it with api_base (UUID-1). This causes the returned file_id to point to a non-existent S3 key. Fix: only set upload_url to api_base when transform_create_file_request has not already set it, preserving the Bedrock provider's value. Closes #21546 * feat(semantic-cache): support configurable vector dimensions for Qdrant (#21649) Add vector_size parameter to QdrantSemanticCache and expose it through the Cache facade as qdrant_semantic_cache_vector_size. This allows users to use embedding models with dimensions other than the default 1536, enabling cheaper/stronger models like Stella (1024d), bge-en-icl (4096d), voyage, cohere, etc. The parameter defaults to QDRANT_VECTOR_SIZE (env var or 1536) for backward compatibility. When creating new collections, the configured vector_size is used instead of the hardcoded constant. Closes #9377 * fix(utils): normalize camelCase thinking param keys to snake_case (#21762) Clients like OpenCode's @ai-sdk/openai-compatible send budgetTokens (camelCase) instead of budget_tokens in the thinking parameter, causing validation errors. Add early normalization in completion(). * feat: add optional digest mode for Slack alert types (#21683) Adds per-alert-type digest mode that aggregates duplicate alerts within a configurable time window and emits a single summary message with count, start/end timestamps. Configuration via general_settings.alert_type_config: alert_type_config: llm_requests_hanging: digest: true digest_interval: 86400 Digest key: (alert_type, request_model, api_base) Default interval: 24 hours Window type: fixed interval Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add blog_posts.json and local backup * feat: add GetBlogPosts utility with GitHub fetch and local fallback Adds GetBlogPosts class that fetches blog posts from GitHub with a 1-hour in-process TTL cache, validates the response, and falls back to the bundled blog_posts_backup.json on any network or validation failure. * test: add cache reset fixture and LITELLM_LOCAL_BLOG_POSTS test Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat: add GET /public/litellm_blog_posts endpoint Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: log fallback warning in blog posts endpoint and tighten test * feat: add disable_show_blog to UISettings Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat: add useUISettings and useDisableShowBlog hooks * fix: rename useUISettings to useUISettingsFlags to avoid naming collision * fix: use existing useUISettings hook in useDisableShowBlog to avoid cache duplication Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat: add BlogDropdown component with react-query and error/retry state Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: enforce 5-post limit in BlogDropdown and add cap test * fix: add retry, stable post key, enabled guard in BlogDropdown Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat: add BlogDropdown to navbar after Docs link * feat: add network_mock transport for benchmarking proxy overhead without real API calls Intercepts at httpx transport layer so the full proxy path (auth, routing, OpenAI SDK, response transformation) is exercised with zero-latency responses. Activated via `litellm_settings: { network_mock: true }` in proxy config. * Litellm dev 02 19 2026 p2 (#21871) * feat(ui/): new guardrails monitor 'demo mock representation of what guardrails monitor looks like * fix: ui updates * style(ui/): fix styling * feat: enable running ai monitor on individual guardrails * feat: add backend logic for guardrail monitoring * fix(guardrails/usage_endpoints.py): fix usage dashboard * fix(budget): fix timezone config lookup and replace hardcoded timezone map with ZoneInfo (#21754) * fix(budget): fix timezone config lookup and replace hardcoded timezone map with ZoneInfo * fix(budget): update stale docstring on get_budget_reset_time * fix: add missing return type annotations to iterator protocol methods in streaming_handler (#21750) * fix: add return type annotations to iterator protocol methods in streaming_handler Add missing return type annotations to __iter__, __aiter__, __next__, and __anext__ methods in CustomStreamWrapper and related classes. - __iter__(self) -> Iterator["ModelResponseStream"] - __aiter__(self) -> AsyncIterator["ModelResponseStream"] - __next__(self) -> "ModelResponseStream" - __anext__(self) -> "ModelResponseStream" Also adds AsyncIterator and Iterator to typing imports. Fixes issue with PLR0915 noqa comments and ensures proper type checking support. Related to: #8304 * fix: add ruff PLR0915 noqa for files with too many statements * Add gollem Go agent framework cookbook example (#21747) Show how to use gollem, a production Go agent framework, with LiteLLM proxy for multi-provider LLM access including tool use and streaming. * fix: avoid mutating caller-owned dicts in SpendUpdateQueue aggregation (#21742) * fix(vertex_ai): enable context-1m-2025-08-07 beta header (#21870) * server root path regression doc * fixing syntax * fix: replace Zapier webhook with Google Form for survey submission (#21621) * Replace Zapier webhook with Google Form for survey submission * Add back error logging for survey submission debugging --------- Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com> * Revert "Merge pull request #21140 from BerriAI/litellm_perf_user_api_key_auth" This reverts commit 0e1db3f, reversing changes made to 7e2d6f2. * test_vertex_ai_gemini_2_5_pro_streaming * UI new build * fix rendering * ui new build * docs fix * docs fix * docs fix * docs fix * docs fix * docs fix * docs fix * docs fix * release note docs * docs * adding image * fix(vertex_ai): enable context-1m-2025-08-07 beta header The `context-1m-2025-08-07` Anthropic beta header was set to `null` for vertex_ai, causing it to be filtered out when users set `extra_headers: {anthropic-beta: context-1m-2025-08-07}`. This prevented using Claude's 1M context window feature via Vertex AI, resulting in `prompt is too long: 460500 tokens > 200000 maximum` errors. Fixes #21861 --------- Co-authored-by: yuneng-jiang <yuneng.jiang@gmail.com> Co-authored-by: milan-berri <milan@berri.ai> Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com> * Revert "fix(vertex_ai): enable context-1m-2025-08-07 beta header (#21870)" (#21876) This reverts commit bce078a. * docs(ui): add pre-PR checklist to UI contributing guide Add testing and build verification steps per maintainer feedback from @yjiang-litellm. Contributors should run their related tests per-file and ensure npm run build passes before opening PRs. * Fix entries with fast and us/ * Add tests for fast and us * Add support for Priority PayGo for vertex ai and gemini * Add model pricing * fix: ensure arrival_time is set before calculating queue time * Fix: Anthropic model wildcard access issue * Add incident report * Add ability to see which model cost map is getting used * Fix name of title * Readd tpm limit * State management fixes for CheckBatchCost * Fix PR review comments * State management fixes for CheckBatchCost - Address greptile comments * fix mypy issues: * Add Noma guardrails v2 based on custom guardrails (#21400) * Fix code qa issues * Fix mypy issues * Fix mypy issues * Fix test_aaamodel_prices_and_context_window_json_is_valid * fix: update calendly on repo * fix(tests): use counter-based mock for time.time in prisma self-heal test The test used a fixed side_effect list for time.time(), but the number of calls varies by Python version, causing StopIteration on 3.12 and AssertionError on 3.14. Replace with an infinite counter-based callable and assert the timestamp was updated rather than checking for an exact value. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(tests): use absolute path for model_prices JSON in validation test The test used a relative path 'litellm/model_prices_and_context_window.json' which only works when pytest runs from a specific working directory. Use os.path based on __file__ to resolve the path reliably. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Update tests/test_litellm/test_utils.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * fix(tests): use os.path instead of Path to avoid NameError Path is not imported at module level. Use os.path.join which is already available. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * clean up mock transport: remove streaming, add defensive parsing * docs: add Google GenAI SDK tutorial (JS & Python) (#21885) * docs: add Google GenAI SDK tutorial for JS and Python Add tutorial for using Google's official GenAI SDK (@google/genai for JS, google-genai for Python) with LiteLLM proxy. Covers pass-through and native router endpoints, streaming, multi-turn chat, and multi-provider routing via model_group_alias. Also updates pass-through docs to use the new SDK replacing the deprecated @google/generative-ai. * fix(docs): correct Python SDK env var name in GenAI tutorial GOOGLE_GENAI_API_KEY does not exist in the google-genai SDK. The correct env var is GEMINI_API_KEY (or GOOGLE_API_KEY). Also note that the Python SDK has no base URL env var. * fix(docs): replace non-existent GOOGLE_GENAI_BASE_URL env var in interactions.md The Python google-genai SDK does not read GOOGLE_GENAI_BASE_URL. Use http_options={"base_url": "..."} in code instead. * docs: add network mock benchmarking section * docs: tweak benchmarks wording * fix: add auth headers and empty latencies guard to benchmark script * refactor: use method-level import for MockOpenAITransport * fix: guard print_aggregate against empty latencies * fix: add INCOMPLETE status to Interactions API enum and test Google added INCOMPLETE to the Interactions API OpenAPI spec status enum. Update both the Status3 enum in the SDK types and the test's expected values to match. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Guardrail Monitor - measure guardrail reliability in prod (#21944) * fix: fix log viewer for guardrail monitoring * feat(ui/): fix rendering logs per guardrail * fix: fix viewing logs on overview tab of guardrail * fix: log viewer * fix: fix naming to align with metric * docs: add performance & reliability section to v1.81.14 release notes * fix(tests): make RPM limit test sequential to avoid race condition Concurrent requests via run_in_executor + asyncio.gather caused a race condition where more requests slipped through the rate limiter than expected, leading to flaky test failures (e.g. 3 successes instead of 2 with rpm_limit=2). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: Singapore guardrail policies (PDPA + MAS AI Risk Management) (#21948) * feat: Singapore PDPA PII protection guardrail policy template Add Singapore Personal Data Protection Act (PDPA) guardrail support: Regex patterns (patterns.json): - sg_nric: NRIC/FIN detection ([STFGM] + 7 digits + checksum letter) - sg_phone: Singapore phone numbers (+65/0065/65 prefix) - sg_postal_code: 6-digit postal codes (contextual) - passport_singapore: Passport numbers (E/K + 7 digits, contextual) - sg_uen: Unique Entity Numbers (3 formats) - sg_bank_account: Bank account numbers (dash format, contextual) YAML policy templates (5 sub-guardrails): - sg_pdpa_personal_identifiers: s.13 Consent - sg_pdpa_sensitive_data: Advisory Guidelines - sg_pdpa_do_not_call: Part IX DNC Registry - sg_pdpa_data_transfer: s.26 overseas transfers - sg_pdpa_profiling_automated_decisions: Model AI Governance Framework Policy template entry in policy_templates.json with 9 guardrail definitions (4 regex-based + 5 YAML conditional keyword matching). Tests: - test_sg_patterns.py: regex pattern unit tests - test_sg_pdpa_guardrails.py: conditional keyword matching tests (100+ cases) * feat: MAS AI Risk Management Guidelines guardrail policy template Add Monetary Authority of Singapore (MAS) AI Risk Management Guidelines guardrail support for financial institutions: YAML policy templates (5 sub-guardrails): - sg_mas_fairness_bias: Blocks discriminatory financial AI (credit/loans/insurance by protected attributes) - sg_mas_transparency_explainability: Blocks opaque/unexplainable AI for consequential financial decisions - sg_mas_human_oversight: Blocks fully automated financial decisions without human-in-the-loop - sg_mas_data_governance: Blocks unauthorized sharing/mishandling of financial customer data - sg_mas_model_security: Blocks adversarial attacks, model poisoning, inversion on financial AI Policy template entry in policy_templates.json with 5 guardrail definitions. Aligned with MAS FEAT Principles, Project MindForge, and NIST AI RMF. Tests: - test_sg_mas_ai_guardrails.py: conditional keyword matching tests (100+ cases) * fix: address SG pattern review feedback - Update NRIC lowercase test for IGNORECASE runtime behavior - Add keyword context guard to sg_uen pattern to reduce false positives * docs: clarify MAS AIRM timeline references - Explicitly mark MAS AIRM as Nov 2025 consultation draft - Add 2018 qualifier for FEAT principles in MAS policy descriptions - Update MAS guardrail wording to avoid release-year ambiguity * chore: commit resolved MAS policy conflicts * test: * chore: * Add OpenAI Agents SDK tutorial with LiteLLM Proxy to docs (#21221) * Add OpenAI Agents SDK tutorial to docs * Update OpenAI Agents SDK tutorial to use LiteLLM environment variables * Enhance OpenAI Agents SDK tutorial with built-in LiteLLM extension details and updated configuration steps. Adjust section headings for clarity and improve the flow of information regarding model setup and usage. * adjust blog posts to fetch from github first * feat(videos): add variant parameter to video content download (#21955) openai videos models support the features to download variants. See more details here: https://developers.openai.com/api/docs/guides/video-generation#use-image-references. Plumb variant (e.g. "thumbnail", "spritesheet") through the full video content download chain: avideo_content → video_content → video_content_handler → transform_video_content_request. OpenAI appends ?variant=<value> to the GET URL; other providers accept the parameter in their signature but ignore it. * fixing path * adjust blog post path * Revert duplicate issue checker to text-based matching, remove duplicate PR workflow Remove the Claude Code-powered duplicate PR detection workflow and revert the duplicate issue checker back to wow-actions/potential-duplicates with text similarity matching. * ui changes * adding tests * fix(anthropic): sanitize tool_use IDs in assistant messages Apply _sanitize_anthropic_tool_use_id to tool_use blocks in convert_to_anthropic_tool_invoke, not just tool_result blocks. IDs from external frameworks (e.g. MiniMax) may contain characters like colons that violate Anthropic's ^[a-zA-Z0-9_-]+$ pattern. Adds test for invalid ID sanitization in tool_use blocks. --------- Co-authored-by: An Tang <ta@stripe.com> Co-authored-by: janfrederickk <75388864+janfrederickk@users.noreply.github.com> Co-authored-by: Zhenting Huang <3061613175@qq.com> Co-authored-by: Cesar Garcia <128240629+Chesars@users.noreply.github.com> Co-authored-by: Darien Kindlund <darien@kindlund.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: yuneng-jiang <yuneng.jiang@gmail.com> Co-authored-by: Ryan Crabbe <rcrabbe@berkeley.edu> Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com> Co-authored-by: LeeJuOh <56071126+LeeJuOh@users.noreply.github.com> Co-authored-by: Monesh Ram <31161039+WhoisMonesh@users.noreply.github.com> Co-authored-by: Trevor Prater <trevor.prater@gmail.com> Co-authored-by: The Mavik <179817126+themavik@users.noreply.github.com> Co-authored-by: Edwin Isac <33712823+edwiniac@users.noreply.github.com> Co-authored-by: milan-berri <milan@berri.ai> Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com> Co-authored-by: Chesars <cesarponce19544@gmail.com> Co-authored-by: Sameer Kankute <sameer@berri.ai> Co-authored-by: Harshit Jain <harshitjain0562@gmail.com> Co-authored-by: Harshit Jain <48647625+Harshit28j@users.noreply.github.com> Co-authored-by: Ephrim Stanley <ephrim.stanley@point72.com> Co-authored-by: TomAlon <tom@noma.security> Co-authored-by: Julio Quinteros Pro <jquinter@gmail.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Co-authored-by: ryan-crabbe <128659760+ryan-crabbe@users.noreply.github.com> Co-authored-by: Ron Zhong <ron-zhong@hotmail.com> Co-authored-by: Arindam Majumder <109217591+Arindam200@users.noreply.github.com> Co-authored-by: Lei Nie <lenie@quora.com>

ron-zhong added 2 commits February 24, 2026 02:16

vercel bot deployed to Preview February 23, 2026 18:29 View deployment

greptile-apps bot reviewed Feb 23, 2026

View reviewed changes

tests/test_litellm/proxy/guardrails/guardrail_hooks/content_filter/test_sg_patterns.py Outdated Show resolved Hide resolved

litellm/proxy/guardrails/guardrail_hooks/litellm_content_filter/patterns.json Show resolved Hide resolved

fix: address SG pattern review feedback

fd52a67

- Update NRIC lowercase test for IGNORECASE runtime behavior - Add keyword context guard to sg_uen pattern to reduce false positives

vercel bot deployed to Preview February 23, 2026 18:56 View deployment

docs: clarify MAS AIRM timeline references

6db71ad

- Explicitly mark MAS AIRM as Nov 2025 consultation draft - Add 2018 qualifier for FEAT principles in MAS policy descriptions - Update MAS guardrail wording to avoid release-year ambiguity

vercel bot deployed to Preview February 23, 2026 19:00 View deployment

chore: commit resolved MAS policy conflicts

caf9baa

vercel bot deployed to Preview February 23, 2026 19:07 View deployment

test:

fe00f46

vercel bot deployed to Preview February 23, 2026 19:18 View deployment

greptile-apps bot reviewed Feb 23, 2026

View reviewed changes

litellm/policy_templates_backup.json Outdated Show resolved Hide resolved

ron-zhong added 2 commits February 24, 2026 03:33

Merge branch 'main' into litellm_feat/sg-guardrail-policies

fc0709b

chore:

3a75d73

vercel bot deployed to Preview February 23, 2026 19:36 View deployment

ron-zhong requested a review from krrishdholakia February 23, 2026 19:58

krrishdholakia merged commit 73fd5a4 into BerriAI:main Feb 23, 2026
29 of 30 checks passed

Uh oh!

Conversation

ron-zhong commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changed

PDPA (Personal Data Protection Act)

MAS (Monetary Authority of Singapore) AI Risk Management Guidelines

Notes

Reference:

Related PRs:

Uh oh!

vercel bot commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CLAassistant commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greptile-apps bot commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ron-zhong commented Feb 23, 2026

Uh oh!

ron-zhong commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ron-zhong commented Feb 23, 2026

Uh oh!

krrishdholakia commented Feb 23, 2026

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ron-zhong commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ron-zhong commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

shin-bot-litellm commented Feb 23, 2026

Uh oh!

ron-zhong commented Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ron-zhong commented Feb 23, 2026 •

edited

Loading

vercel bot commented Feb 23, 2026 •

edited

Loading

CLAassistant commented Feb 23, 2026 •

edited

Loading

greptile-apps bot commented Feb 23, 2026 •

edited

Loading

ron-zhong commented Feb 23, 2026 •

edited

Loading

ron-zhong commented Feb 23, 2026 •

edited

Loading

ron-zhong commented Feb 23, 2026 •

edited

Loading