Skip to content

Add File deletion criteria with batch references#21456

Merged
Sameerlite merged 3 commits intomainfrom
litellm_fix_delete_file_managed_access
Feb 18, 2026
Merged

Add File deletion criteria with batch references#21456
Sameerlite merged 3 commits intomainfrom
litellm_fix_delete_file_managed_access

Conversation

@Sameerlite
Copy link
Copy Markdown
Contributor

@Sameerlite Sameerlite commented Feb 18, 2026

Relevant issues

Fixes LIT-1987

File Deletion Blocking Feature

This feature blocks file deletion API calls when files are referenced by batches in non-terminal states. This ensures that cost tracking is not disrupted by premature file deletion.

Please complete all items before asking a LiteLLM maintainer to review your PR

  • I have Added testing in the tests/litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem
  • I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

CI (LiteLLM team)

CI status guideline:

  • 50-55 passing tests: main is stable with minor issues.
  • 45-49 passing tests: acceptable but needs attention
  • <= 40 passing tests: unstable; be careful with your merges and assess the risk.
  • Branch creation CI run
    Link:

  • CI run for the last commit
    Link:

  • Merge / cherry-pick CI run
    Links:

Type

🐛 Bug Fix

Changes

@vercel
Copy link
Copy Markdown

vercel bot commented Feb 18, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment Feb 18, 2026 6:50am

Request Review

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Feb 18, 2026

Greptile Summary

This PR adds file deletion blocking when files are referenced by batches in non-terminal states (validating, in_progress, finalizing), gated by whether batch polling is configured. The intent is to prevent premature file deletion from disrupting cost tracking.

  • Adds _is_batch_polling_enabled(), _get_batches_referencing_file(), and _check_file_deletion_allowed() methods to _PROXY_LiteLLMManagedFiles
  • Hooks into afile_delete() to check before proceeding with deletion
  • Includes comprehensive mock-based tests covering the main scenarios

Key issue: The _is_batch_polling_enabled() gate checks if proxy_batch_polling_interval > 0, but this value defaults to 3600 (1 hour) in litellm/constants.py. This means the check will always return True in any running proxy, effectively making file deletion always blocked when there are non-terminal batch references — regardless of whether the user has explicitly opted into batch cost tracking. This appears to be an unintentional behavior change that could break file deletion workflows for all enterprise users.

  • The _get_batches_referencing_file() method queries all non-terminal batches without a limit, then filters in Python — this could be a performance concern at scale
  • The PR checklist items are not checked, including the requirement for unit tests passing

Confidence Score: 2/5

  • This PR has a logic issue where the polling gate always evaluates to true, which could block file deletion for all enterprise users unintentionally.
  • The core logic flaw — _is_batch_polling_enabled() always returning True due to the default proxy_batch_polling_interval of 3600 — means the feature doesn't work as documented. The intent is to gate deletion blocking on whether polling is configured, but the gate is effectively always open. Additionally, the unbounded DB query could cause performance issues at scale.
  • Pay close attention to enterprise/litellm_enterprise/proxy/hooks/managed_files.py — the _is_batch_polling_enabled() method needs to be reworked to correctly detect intentional batch polling configuration.

Important Files Changed

Filename Overview
enterprise/litellm_enterprise/proxy/hooks/managed_files.py Adds file deletion blocking logic with 3 new methods. Key concern: _is_batch_polling_enabled() always returns True since proxy_batch_polling_interval defaults to 3600, making the gate ineffective. Also has an unbounded DB query and an unused variable.
tests/test_litellm/enterprise/proxy/test_file_deletion_blocking.py Comprehensive mock-only tests covering batch polling checks, batch reference detection, deletion blocking, and error messages. Tests are well-structured but don't exercise the real default value of proxy_batch_polling_interval.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[afile_delete called] --> B[_check_file_deletion_allowed]
    B --> C{_is_batch_polling_enabled?}
    C -->|No: interval=0 or None| D[Allow deletion]
    C -->|Yes: interval > 0| E[_get_batches_referencing_file]
    E --> F[Query ALL non-terminal batches from DB]
    F --> G[get_model_file_id_mapping for provider IDs]
    G --> H[Parse each batch file_object JSON]
    H --> I{Any batch references this file?}
    I -->|No| D
    I -->|Yes| J[Raise HTTPException 400]
    D --> K[Proceed with file deletion]
    
    style C fill:#ff9999,stroke:#333
    style F fill:#ffcc99,stroke:#333
    style J fill:#ff6666,stroke:#333
Loading

Last reviewed commit: 8f80b10

Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, 4 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +1054 to +1077
def _is_batch_polling_enabled(self) -> bool:
"""
Check if batch polling is configured, which indicates user wants cost tracking.

Returns:
bool: True if batch polling is enabled (interval > 0), False otherwise
"""
try:
# Import here to avoid circular dependencies
import litellm.proxy.proxy_server as proxy_server_module

proxy_batch_polling_interval = getattr(
proxy_server_module, 'proxy_batch_polling_interval', None
)

# If interval is set and greater than 0, polling is enabled
if proxy_batch_polling_interval is not None and proxy_batch_polling_interval > 0:
return True
return False
except Exception as e:
verbose_logger.warning(
f"Error checking batch polling configuration: {e}. Assuming disabled."
)
return False
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_is_batch_polling_enabled is always true in practice

proxy_batch_polling_interval has a default value of 3600 (from PROXY_BATCH_POLLING_INTERVAL in litellm/constants.py:1301), which means it is always > 0 in any running proxy instance. The _is_batch_polling_enabled() check will effectively always return True, making this gate a no-op.

This means file deletion will be blocked for all enterprise users whenever a non-terminal batch references the file — regardless of whether batch cost tracking is actually meaningful to them. If the intent is to only block when the user has explicitly opted into cost tracking, a different signal is needed (e.g., a dedicated config flag, or checking whether the CheckBatchCost scheduler job was successfully registered).

Comment on lines +1115 to +1120
batches = await self.prisma_client.db.litellm_managedobjecttable.find_many(
where={
"file_purpose": "batch",
"status": {"in": ["validating", "in_progress", "finalizing"]},
}
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unbounded DB query fetches all non-terminal batches

This find_many query fetches all non-terminal batches from the database, then filters in Python by parsing each batch's file_object JSON. In a production deployment with many concurrent batches, this could be a performance issue on the critical deletion path. The query has no take limit (unlike the similar query at line 296 which uses fetch_limit).

Consider either:

  1. Adding a take limit to bound the query, or
  2. Restructuring the schema/query to filter by file_id at the database level rather than fetching all non-terminal batches and filtering in application code.

Context Used: Rule from dashboard - What: Avoid creating new database requests or Router objects in the critical request path.

Why: Cre... (source)


if referencing_batches:
# File is referenced by non-terminal batches and polling is enabled
batch_ids = [b["batch_id"] for b in referencing_batches]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused variable batch_ids

batch_ids is computed but never referenced. It should either be removed or used in place of the duplicate list comprehension on line 1177 (which re-extracts b["batch_id"]).

Suggested change
batch_ids = [b["batch_id"] for b in referencing_batches]
batch_statuses = [f"{b['batch_id']}: {b['status']}" for b in referencing_batches]

"""
try:
# Import here to avoid circular dependencies
import litellm.proxy.proxy_server as proxy_server_module
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inline import inside a method

Per the project's code style guidelines (CLAUDE.md): "Avoid imports within methods — place all imports at the top of the file (module-level)." This inline import of litellm.proxy.proxy_server should be moved to the module level if possible, or documented as a necessary circular-dependency workaround.

Note: litellm.proxy.proxy_server is not currently imported at the top of this file. If the circular dependency concern is valid, consider adding a comment explaining why this must be inline.

Context Used: Context from dashboard - CLAUDE.md (source)

@Sameerlite
Copy link
Copy Markdown
Contributor Author

@greptile-apps

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Feb 18, 2026

Greptile Summary

This PR adds a file deletion blocking feature to the enterprise managed files hook. When a user attempts to delete a file via the API, the system now checks whether any batches in non-terminal states (validating, in_progress, finalizing) reference that file. If so, and if the batch cost tracking scheduler job is registered, the deletion is blocked with a descriptive error message.

  • Adds _is_batch_polling_enabled() which checks for the check_batch_cost_job in the APScheduler instance
  • Adds _get_batches_referencing_file() which queries the litellm_managedobjecttable for non-terminal batches and filters by file ID in application code
  • Adds _check_file_deletion_allowed() which orchestrates the check and raises an HTTPException with batch details if deletion should be blocked
  • The batch lookup query fetches up to 500 non-terminal batches and filters them in Python rather than at the database level, which could be a performance concern in high-batch-volume deployments
  • get_model_file_id_mapping is called twice on the delete path — once during the batch reference check and again during the actual deletion
  • Test suite covers the happy path, blocking, and edge cases but has an unused helper and a stale docstring

Confidence Score: 3/5

  • This PR is generally safe but has performance concerns from an unbounded-style DB query on the file deletion path.
  • The logic is correct and well-tested. The main concerns are: (1) the DB query in _get_batches_referencing_file fetches up to 500 non-terminal batches and filters in Python rather than at the DB level, (2) duplicate get_model_file_id_mapping calls on the delete path. These are performance/efficiency issues rather than correctness bugs, but they affect the critical request path.
  • Pay close attention to enterprise/litellm_enterprise/proxy/hooks/managed_files.py — specifically the _get_batches_referencing_file method's DB query pattern.

Important Files Changed

Filename Overview
enterprise/litellm_enterprise/proxy/hooks/managed_files.py Adds file deletion blocking logic with three new methods: _is_batch_polling_enabled (checks scheduler job), _get_batches_referencing_file (fetches up to 500 non-terminal batches and filters in Python), and _check_file_deletion_allowed (orchestrates the check). The DB query could be optimized to filter at the database level rather than fetching all non-terminal batches. A duplicate get_model_file_id_mapping call occurs on the delete path.
tests/test_litellm/enterprise/proxy/test_file_deletion_blocking.py Comprehensive test suite with 11 tests covering batch polling detection, file-to-batch reference lookups, deletion blocking/allowing logic, early exit optimization, and error message formatting. All tests use mocks correctly. Has one unused helper function (_make_user_api_key_dict) and an outdated docstring referencing proxy_batch_polling_interval.

Sequence Diagram

sequenceDiagram
    participant User as API Client
    participant EP as files_endpoints.py
    participant MF as ManagedFiles.afile_delete
    participant Check as _check_file_deletion_allowed
    participant Poll as _is_batch_polling_enabled
    participant DB as Database
    participant Sched as APScheduler

    User->>EP: DELETE /files/{file_id}
    EP->>MF: afile_delete(file_id)
    MF->>Check: _check_file_deletion_allowed(file_id)
    Check->>Poll: _is_batch_polling_enabled()
    Poll->>Sched: get_job('check_batch_cost_job')
    Sched-->>Poll: job or None
    Poll-->>Check: True/False

    alt Polling disabled
        Check-->>MF: return (allow)
    else Polling enabled
        Check->>DB: get_model_file_id_mapping (cache/DB)
        DB-->>Check: provider file IDs
        Check->>DB: find_many(non-terminal batches, take=500)
        DB-->>Check: batch records
        Check->>Check: filter batches by file_id in Python
        alt No referencing batches
            Check-->>MF: return (allow)
        else Referencing batches found
            Check-->>MF: raise HTTPException(400)
            MF-->>EP: HTTPException
            EP-->>User: 400 error with batch details
        end
    end

    MF->>DB: get_model_file_id_mapping (duplicate call)
    MF->>EP: delete from providers + DB
    EP-->>User: 200 OK
Loading

Last reviewed commit: 9f5580f

Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, 5 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +1106 to +1107
model_file_id_mapping = await self.get_model_file_id_mapping(
[file_id], litellm_parent_otel_span=None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate get_model_file_id_mapping call on deletion path

get_model_file_id_mapping is called here (line 1106) inside _get_batches_referencing_file, and then called again at line 1237 inside afile_delete after the check passes. Each call hits the cache or DB. Consider passing the resolved mapping as a parameter to avoid the redundant lookup, or restructuring so afile_delete resolves the mapping once and passes it into the deletion check.

Context Used: Rule from dashboard - What: Avoid creating new database requests or Router objects in the critical request path.

Why: Cre... (source)

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Comment on lines +1106 to +1107
model_file_id_mapping = await self.get_model_file_id_mapping(
[file_id], litellm_parent_otel_span=None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate get_model_file_id_mapping call on delete path

_get_batches_referencing_file calls self.get_model_file_id_mapping() at line 1106 to resolve provider file IDs. Then, after _check_file_deletion_allowed returns (no block), afile_delete calls self.get_model_file_id_mapping() again at line 1237. This results in two cache/DB lookups for the same file ID on every deletion request. Consider passing the result from the first call through or caching it on the instance to avoid the redundant lookup.

Context Used: Rule from dashboard - What: Avoid creating new database requests or Router objects in the critical request path.

Why: Cre... (source)

Comment on lines +33 to +38
def _make_user_api_key_dict(user_id: str = "user-A") -> UserAPIKeyAuth:
return UserAPIKeyAuth(
api_key="sk-test",
user_id=user_id,
parent_otel_span=None,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused helper function _make_user_api_key_dict

This helper function is defined but never called by any test in this file. It should be removed to keep the test file clean.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Comment on lines +1 to +8
"""
Tests for file deletion blocking when referenced by non-terminal batches.

This tests the feature where file deletion is blocked when:
1. File is referenced by a batch in non-terminal state (validating, in_progress, finalizing)
2. Batch polling is configured (proxy_batch_polling_interval > 0)

This ensures cost tracking is not disrupted by premature file deletion.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Module docstring mentions proxy_batch_polling_interval > 0 which no longer applies

The docstring at line 6 says deletion is blocked when "Batch polling is configured (proxy_batch_polling_interval > 0)", but the actual implementation in _is_batch_polling_enabled() checks whether the check_batch_cost_job scheduler job is registered — not proxy_batch_polling_interval. This docstring should be updated to reflect the actual check.

Comment on lines +1123 to +1130
batches = await self.prisma_client.db.litellm_managedobjecttable.find_many(
where={
"file_purpose": "batch",
"status": {"in": ["validating", "in_progress", "finalizing"]},
},
take=MAX_BATCHES_TO_CHECK,
order={"created_at": "desc"},
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DB query fetches all non-terminal batches without filtering by file_id

This query fetches up to 500 non-terminal batches from the database and then filters them in Python by parsing each batch's file_object JSON. In deployments with many concurrent batches, this puts unnecessary load on both the database and the application. The query has no filter related to the file_id being deleted, so it loads batch records that may be completely unrelated.

Consider adding a database-level filter (e.g., using a JSON contains clause if Prisma supports it, or storing input_file_id as a dedicated indexed column) to narrow the result set before application-level processing. Even an early-exit optimization (already present at line 1135) only helps after fetching and deserializing records.

Context Used: Rule from dashboard - What: Avoid creating new database requests or Router objects in the critical request path.

Why: Cre... (source)

@Sameerlite Sameerlite merged commit 6f82a3e into main Feb 18, 2026
69 of 84 checks passed
@ishaan-berri ishaan-berri deleted the litellm_fix_delete_file_managed_access branch March 26, 2026 22:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant