Add semgrep & Fix OOMs by AlexsanderHamir · Pull Request #20912 · BerriAI/litellm

AlexsanderHamir · 2026-02-11T01:50:02Z

Relevant issues

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

I have Added testing in the tests/litellm/ directory, Adding at least 1 test is a hard requirement - see details
My PR passes all unit tests on make test-unit
My PR's scope is as isolated as possible, it only solves 1 specific problem

CI (LiteLLM team)

CI status guideline:

50-55 passing tests: main is stable with minor issues.

45-49 passing tests: acceptable but needs attention

<= 40 passing tests: unstable; be careful with your merges and assess the risk.

Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:

Type

🧹 Refactoring

Changes

Added semgrep for extensive custom rules.
Fixed OOMs across integrations.

vercel · 2026-02-11T01:50:07Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
litellm	Ready	Preview, Comment	Feb 11, 2026 1:50am

greptile-apps · 2026-02-11T01:56:35Z

Greptile Overview

Greptile Summary

Adds Semgrep to CI as a release gate and bounds all asyncio.Queue() instances with a configurable maxsize (default 1000 via LITELLM_ASYNCIO_QUEUE_MAXSIZE) to prevent unbounded memory growth.

Semgrep CI integration: New job runs custom rules from .semgrep/rules/ on main and litellm_* branches, gating publish_to_pypi.
Bounded queues: GCS bucket logger, spend update queues, and daily spend update queues now use bounded asyncio.Queue(maxsize=...).
Dead aggregation checks: The spend queue aggregation safety net (qsize >= MAX_SIZE_IN_MEMORY_QUEUE=2000) is now unreachable because put() blocks at maxsize=1000. This means the in-memory compaction logic will never trigger, and under high load the blocking put() will stall the request path until the periodic consumer drains the queue.
GCS race condition: The full() → flush_queue() → put() pattern is not atomic, and the flush_queue() override skips the flush_lock. Concurrent producers can cause put() to still block despite the flush attempt.
No tests added: The PR checklist indicates tests were added, but the diff contains no test files.

Confidence Score: 2/5

The queue bounding introduces blocking backpressure on the request path and makes the existing aggregation safety net unreachable — needs adjustment before merging.
The core goal (preventing OOM via bounded queues) is sound, but the implementation creates new issues: blocking put() stalls request processing, the aggregation checks become dead code due to maxsize < threshold mismatch, and the GCS flush has a TOCTOU race. No tests were added despite the checklist claim.
litellm/proxy/db/db_transaction_queue/base_update_queue.py, litellm/proxy/db/db_transaction_queue/spend_update_queue.py, litellm/proxy/db/db_transaction_queue/daily_spend_update_queue.py, litellm/integrations/gcs_bucket/gcs_bucket.py

Important Files Changed

Filename	Overview
.circleci/config.yml	Adds `semgrep` CI job with custom rules only, gates `publish_to_pypi` on semgrep passing. Clean addition.
cookbook/nova_sonic_realtime.py	Bounds audio queue with configurable maxsize (default 10,000). Uses shared env var name `LITELLM_ASYNCIO_QUEUE_MAXSIZE` but different default (10,000 vs 1,000).
litellm/constants.py	Adds `LITELLM_ASYNCIO_QUEUE_MAXSIZE` constant (default 1000). Clean addition.
litellm/integrations/gcs_bucket/gcs_bucket.py	Bounds GCS log queue and adds pre-put flush. Race condition: `full()` + `flush_queue()` + `put()` is not atomic; concurrent producers can cause `put()` to still block. Also `flush_queue()` override does not use the `flush_lock` from the parent class.
litellm/proxy/db/db_transaction_queue/base_update_queue.py	Bounds base queue to `LITELLM_ASYNCIO_QUEUE_MAXSIZE` (1000). `add_update()` uses blocking `put()` which will block callers on the request path when queue is full.
litellm/proxy/db/db_transaction_queue/daily_spend_update_queue.py	Overrides queue with same bounded maxsize. Aggregation check (`qsize >= 2000`) is unreachable since queue maxsize is 1000.
litellm/proxy/db/db_transaction_queue/spend_update_queue.py	Same issue as daily_spend_update_queue: aggregation check at 2000 is unreachable with maxsize 1000.

Sequence Diagram

sequenceDiagram
    participant Req as Request Handler
    participant SQ as SpendUpdateQueue
    participant AQ as asyncio.Queue(maxsize=1000)
    participant Sched as Scheduler (10-15s)
    participant DB as Database

    Req->>SQ: add_update(spend_item)
    SQ->>AQ: await put(item)
    alt Queue not full
        AQ-->>SQ: item added
        SQ->>SQ: check qsize >= 2000 (unreachable)
    else Queue full (1000 items)
        AQ-->>SQ: BLOCKS until space available
        Note over Req,SQ: Request stalls here
    end

    loop Every 10-15 seconds
        Sched->>SQ: flush_all_updates_from_in_memory_queue()
        SQ->>AQ: get() up to 1000 items
        AQ-->>SQ: drained items
        SQ->>DB: write aggregated spend
        Note over AQ: Space freed, blocked put() resumes
    end

greptile-apps

_{10 files reviewed, 5 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-11T01:56:49Z

litellm/integrations/gcs_bucket/gcs_bucket.py

+            if self.log_queue.full():
+                await self.flush_queue()
            await self.log_queue.put(


TOCTOU race: between full() returning True and put() executing, concurrent coroutines can re-fill the queue after the flush. The put() on line 78 can still block. Also, the overridden flush_queue() (line 381) doesn't acquire self.flush_lock, so concurrent flushes from periodic_flush() and this manual flush can race.

Consider using put_nowait() in a try/except QueueFull block to avoid blocking the logging path.

greptile-apps · 2026-02-11T01:56:50Z

cookbook/nova_sonic_realtime.py

 from typing import Optional

+# Bounded queue size for audio chunks (configurable via env to avoid unbounded memory)
+AUDIO_QUEUE_MAXSIZE = int(os.getenv("LITELLM_ASYNCIO_QUEUE_MAXSIZE", 10_000))


This uses the same env var LITELLM_ASYNCIO_QUEUE_MAXSIZE as the proxy queues but with a different default (10,000 vs 1,000). This could confuse users who set the env var expecting uniform behavior — proxy queues get 1000 from constants.py while this cookbook gets 10,000 from the hardcoded fallback here.

greptile-apps · 2026-02-11T01:56:52Z

Additional Comments (3)

litellm/proxy/db/db_transaction_queue/base_update_queue.py
await self.update_queue.put(update) will block the caller when the queue is full (maxsize=1000). Since add_update is called on the request path (spend tracking), this introduces backpressure that stalls request processing until the periodic consumer (every ~10-15s) drains the queue. Consider using put_nowait() with a try/except asyncio.QueueFull to either drop, log, or aggregate instead of blocking the request.

litellm/proxy/db/db_transaction_queue/spend_update_queue.py
self.MAX_SIZE_IN_MEMORY_QUEUE defaults to 2000, but the queue maxsize is now 1000. put() on line 40 blocks at 1000 items, so qsize() can never reach 2000 — this aggregation check is now dead code.

Either lower MAX_SIZE_IN_MEMORY_QUEUE to be below LITELLM_ASYNCIO_QUEUE_MAXSIZE, or trigger aggregation before/at the maxsize boundary.

litellm/proxy/db/db_transaction_queue/daily_spend_update_queue.py
Same issue as SpendUpdateQueue: qsize() >= 2000 is unreachable since the queue blocks put() at maxsize=1000. The aggregation safety net is dead code.

…e OpenAI (#20883) * feat: add opus 4.5 and 4.6 to use outout_format param * generate poetry lock with 2.3.2 poetry * restore poetry lock * e2e tests, key delete, update tpm rpm, and regenerate * Split e2e ui testing for browser * new login with sso button in login page * option to hide usage indicator * fix(cloudzero): update CBF field mappings per LIT-1907 (#20906) * fix(cloudzero): update CBF field mappings per LIT-1907 Phase 1 field updates for CloudZero integration: ADD/UPDATE: - resource/account: Send concat(api_key_alias, '|', api_key_prefix) - resource/service: Send model_group instead of service_type - resource/usage_family: Send provider instead of hardcoded 'llm-usage' - action/operation: NEW - Send team_id - resource/id: Send model name instead of CZRN - resource/tag:organization_alias: Add if exists - resource/tag:project_alias: Add if exists - resource/tag:user_alias: Add if exists REMOVE: - resource/tag:total_tokens: Removed - resource/tag:team_id: Removed (team_id now in action/operation) Fixes LIT-1907 * Update litellm/integrations/cloudzero/transform.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * fix: define api_key_alias variable, update CBFRecord docstring - Fix F821 lint error: api_key_alias was used but not defined - Update CBFRecord docstring to reflect LIT-1907 field mappings - Remove unused Optional import --------- Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Add banner notifying of breaking change * Add semgrep & Fix OOMs (#20912) * [Feat] Policies - Allow connecting Policies to Tags, Simulating Policies, Viewing how many keys, teams it applies on (#20904) * init schema with TAGS * ui: add policy test * resolvePoliciesCall * add_policy_sources_to_metadata + headers * types Policy * preview Impact * def _describe_match_reason( * match based on TAGs * TestTagBasedAttachments * test fixes * add policy_resolve_router * add_guardrails_from_policy_engine * TestMatchAttribution * refactor * fix * fix: address Greptile review feedback on policy resolve endpoints - Track unnamed keys/teams as separate counts instead of inflating affected_keys_count with duplicate "(unnamed key)" placeholders. Added unnamed_keys_count and unnamed_teams_count to response. - Push alias pattern matching to DB via _build_alias_where() which converts exact patterns to Prisma "in" and suffix wildcards to "startsWith" filters. - Gate sync_policies_from_db/sync_attachments_from_db behind force_sync query param (default false) to avoid 2 DB round-trips on every /policies/resolve request. - Remove worktree-only conftest.py that cleared sys.modules at import time — no longer needed since code moved to main repo. - Rename MAX_ESTIMATE_IMPACT_ROWS → MAX_POLICY_ESTIMATE_IMPACT_ROWS. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: eliminate duplicate DB queries and fix header delimiter ambiguity - Fetch teams table once in estimate_attachment_impact and reuse for both tag-based and alias-based lookups (was querying teams twice when both tag_patterns and team_patterns were provided). - Convert tag/team filter functions from async DB queries to sync filters that operate on pre-fetched data (_filter_keys_by_tags, _filter_teams_by_tags). - Fix comma ambiguity in x-litellm-policy-sources header: use '; ' as entry delimiter since matched_via values can contain commas. - Use '+' as the within-value separator in matched_via reason strings (e.g. "tag:healthcare+team:health-team") to avoid conflict with header delimiters. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Update litellm/proxy/policy_engine/policy_resolve_endpoints.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * fix: type error & better error handling (#20689) * [Docs] Add docs guide for using policies (#20914) * init schema with TAGS * ui: add policy test * resolvePoliciesCall * add_policy_sources_to_metadata + headers * types Policy * preview Impact * def _describe_match_reason( * match based on TAGs * TestTagBasedAttachments * test fixes * add policy_resolve_router * add_guardrails_from_policy_engine * TestMatchAttribution * refactor * fix * fix: address Greptile review feedback on policy resolve endpoints - Track unnamed keys/teams as separate counts instead of inflating affected_keys_count with duplicate "(unnamed key)" placeholders. Added unnamed_keys_count and unnamed_teams_count to response. - Push alias pattern matching to DB via _build_alias_where() which converts exact patterns to Prisma "in" and suffix wildcards to "startsWith" filters. - Gate sync_policies_from_db/sync_attachments_from_db behind force_sync query param (default false) to avoid 2 DB round-trips on every /policies/resolve request. - Remove worktree-only conftest.py that cleared sys.modules at import time — no longer needed since code moved to main repo. - Rename MAX_ESTIMATE_IMPACT_ROWS → MAX_POLICY_ESTIMATE_IMPACT_ROWS. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: eliminate duplicate DB queries and fix header delimiter ambiguity - Fetch teams table once in estimate_attachment_impact and reuse for both tag-based and alias-based lookups (was querying teams twice when both tag_patterns and team_patterns were provided). - Convert tag/team filter functions from async DB queries to sync filters that operate on pre-fetched data (_filter_keys_by_tags, _filter_teams_by_tags). - Fix comma ambiguity in x-litellm-policy-sources header: use '; ' as entry delimiter since matched_via values can contain commas. - Use '+' as the within-value separator in matched_via reason strings (e.g. "tag:healthcare+team:health-team") to avoid conflict with header delimiters. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs v1 guide with UI imgs * docs fix --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> * feat: add dashscope/qwen3-max model with tiered pricing (#20919) Add support for Alibaba Cloud's Qwen3-Max model with: - 258K input tokens, 65K output tokens - Tiered pricing based on context window usage (0-32K, 32K-128K, 128K-252K) - Function calling and tool choice support - Reasoning capabilities enabled Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> * fix linting * docs: add Greptile review requirement to PR template (#20762) * fix(azure): preserve content_policy_violation error details from Azure OpenAI Closes #20811 Azure OpenAI returns rich error payloads for content policy violations (inner_error with ResponsibleAIPolicyViolation, content_filter_results, revised_prompt). Previously these details were lost when: 1. The top-level error code was not "content_policy_violation" but the inner_error.code was "ResponsibleAIPolicyViolation" -- the structured check only examined the top-level code. 2. The DALL-E image generation polling path stringified the error JSON into the message field instead of setting the structured body, making it impossible for exception_type() to extract error details. 3. The string-based fallback detector used "invalid_request_error" as a content-policy indicator, which is too broad and could misclassify regular bad-request errors. Changes: - exception_mapping_utils.py: Check inner_error.code for ResponsibleAIPolicyViolation when top-level code is not content_policy_violation. Replace overly broad "invalid_request_error" string match with specific Azure safety-system messages. - azure.py: Set structured body on AzureOpenAIError in both async and sync DALL-E polling paths so exception_type() can inspect error details. - test_azure_exception_mapping.py: Add regression tests covering the exact error payloads from issue #20811. - Fix pre-existing lint: duplicate PerplexityResponsesConfig dict key, unused RouteChecks top-level import. --------- Co-authored-by: Kelvin Tran <kelvin-tran@users.noreply.github.com> Co-authored-by: yuneng-jiang <yuneng.jiang@gmail.com> Co-authored-by: shin-bot-litellm <shin-bot-litellm@berri.ai> Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Co-authored-by: Alexsander Hamir <alexsanderhamirgomesbaptista@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Harshit Jain <48647625+Harshit28j@users.noreply.github.com> Co-authored-by: ken <122603020@qq.com> Co-authored-by: Sameer Kankute <sameer@berri.ai>

This reverts commit b7993b1.

AlexsanderHamir added 3 commits February 10, 2026 16:56

adding semgrep

41282fc

Move Semgrep to CircleCI, add as release gate

c6745db

Fix OOM issues across integrations

b5bfcca

AlexsanderHamir merged commit b7993b1 into main Feb 11, 2026
10 of 12 checks passed

greptile-apps bot reviewed Feb 11, 2026

View reviewed changes

yuneng-jiang added a commit that referenced this pull request Feb 12, 2026

Revert "Add semgrep & Fix OOMs (#20912)"

5d93af4

This reverts commit b7993b1.

greptile-apps bot mentioned this pull request Feb 16, 2026

docs: add Semgrep & OOM fixes section to v1.81.12 release notes #21334

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add semgrep & Fix OOMs#20912

Add semgrep & Fix OOMs#20912
AlexsanderHamir merged 3 commits intomainfrom
litellm_adding_semgrep

AlexsanderHamir commented Feb 11, 2026

Uh oh!

vercel bot commented Feb 11, 2026

Uh oh!

Uh oh!

greptile-apps bot commented Feb 11, 2026

Important Files Changed

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot Feb 11, 2026

Uh oh!

greptile-apps bot Feb 11, 2026

Uh oh!

greptile-apps bot commented Feb 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

AlexsanderHamir commented Feb 11, 2026

Relevant issues

Pre-Submission checklist

CI (LiteLLM team)

Type

Changes

Uh oh!

vercel bot commented Feb 11, 2026

Uh oh!

Uh oh!

greptile-apps bot commented Feb 11, 2026

Greptile Overview

Greptile Summary

Confidence Score: 2/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot commented Feb 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant