Add File deletion criteria with batch references#21456
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Greptile SummaryThis PR adds file deletion blocking when files are referenced by batches in non-terminal states (
Key issue: The
Confidence Score: 2/5
|
| Filename | Overview |
|---|---|
| enterprise/litellm_enterprise/proxy/hooks/managed_files.py | Adds file deletion blocking logic with 3 new methods. Key concern: _is_batch_polling_enabled() always returns True since proxy_batch_polling_interval defaults to 3600, making the gate ineffective. Also has an unbounded DB query and an unused variable. |
| tests/test_litellm/enterprise/proxy/test_file_deletion_blocking.py | Comprehensive mock-only tests covering batch polling checks, batch reference detection, deletion blocking, and error messages. Tests are well-structured but don't exercise the real default value of proxy_batch_polling_interval. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[afile_delete called] --> B[_check_file_deletion_allowed]
B --> C{_is_batch_polling_enabled?}
C -->|No: interval=0 or None| D[Allow deletion]
C -->|Yes: interval > 0| E[_get_batches_referencing_file]
E --> F[Query ALL non-terminal batches from DB]
F --> G[get_model_file_id_mapping for provider IDs]
G --> H[Parse each batch file_object JSON]
H --> I{Any batch references this file?}
I -->|No| D
I -->|Yes| J[Raise HTTPException 400]
D --> K[Proceed with file deletion]
style C fill:#ff9999,stroke:#333
style F fill:#ffcc99,stroke:#333
style J fill:#ff6666,stroke:#333
Last reviewed commit: 8f80b10
| def _is_batch_polling_enabled(self) -> bool: | ||
| """ | ||
| Check if batch polling is configured, which indicates user wants cost tracking. | ||
|
|
||
| Returns: | ||
| bool: True if batch polling is enabled (interval > 0), False otherwise | ||
| """ | ||
| try: | ||
| # Import here to avoid circular dependencies | ||
| import litellm.proxy.proxy_server as proxy_server_module | ||
|
|
||
| proxy_batch_polling_interval = getattr( | ||
| proxy_server_module, 'proxy_batch_polling_interval', None | ||
| ) | ||
|
|
||
| # If interval is set and greater than 0, polling is enabled | ||
| if proxy_batch_polling_interval is not None and proxy_batch_polling_interval > 0: | ||
| return True | ||
| return False | ||
| except Exception as e: | ||
| verbose_logger.warning( | ||
| f"Error checking batch polling configuration: {e}. Assuming disabled." | ||
| ) | ||
| return False |
There was a problem hiding this comment.
_is_batch_polling_enabled is always true in practice
proxy_batch_polling_interval has a default value of 3600 (from PROXY_BATCH_POLLING_INTERVAL in litellm/constants.py:1301), which means it is always > 0 in any running proxy instance. The _is_batch_polling_enabled() check will effectively always return True, making this gate a no-op.
This means file deletion will be blocked for all enterprise users whenever a non-terminal batch references the file — regardless of whether batch cost tracking is actually meaningful to them. If the intent is to only block when the user has explicitly opted into cost tracking, a different signal is needed (e.g., a dedicated config flag, or checking whether the CheckBatchCost scheduler job was successfully registered).
| batches = await self.prisma_client.db.litellm_managedobjecttable.find_many( | ||
| where={ | ||
| "file_purpose": "batch", | ||
| "status": {"in": ["validating", "in_progress", "finalizing"]}, | ||
| } | ||
| ) |
There was a problem hiding this comment.
Unbounded DB query fetches all non-terminal batches
This find_many query fetches all non-terminal batches from the database, then filters in Python by parsing each batch's file_object JSON. In a production deployment with many concurrent batches, this could be a performance issue on the critical deletion path. The query has no take limit (unlike the similar query at line 296 which uses fetch_limit).
Consider either:
- Adding a
takelimit to bound the query, or - Restructuring the schema/query to filter by
file_idat the database level rather than fetching all non-terminal batches and filtering in application code.
Context Used: Rule from dashboard - What: Avoid creating new database requests or Router objects in the critical request path.
Why: Cre... (source)
|
|
||
| if referencing_batches: | ||
| # File is referenced by non-terminal batches and polling is enabled | ||
| batch_ids = [b["batch_id"] for b in referencing_batches] |
There was a problem hiding this comment.
Unused variable batch_ids
batch_ids is computed but never referenced. It should either be removed or used in place of the duplicate list comprehension on line 1177 (which re-extracts b["batch_id"]).
| batch_ids = [b["batch_id"] for b in referencing_batches] | |
| batch_statuses = [f"{b['batch_id']}: {b['status']}" for b in referencing_batches] |
| """ | ||
| try: | ||
| # Import here to avoid circular dependencies | ||
| import litellm.proxy.proxy_server as proxy_server_module |
There was a problem hiding this comment.
Inline import inside a method
Per the project's code style guidelines (CLAUDE.md): "Avoid imports within methods — place all imports at the top of the file (module-level)." This inline import of litellm.proxy.proxy_server should be moved to the module level if possible, or documented as a necessary circular-dependency workaround.
Note: litellm.proxy.proxy_server is not currently imported at the top of this file. If the circular dependency concern is valid, consider adding a comment explaining why this must be inline.
Context Used: Context from dashboard - CLAUDE.md (source)
Greptile SummaryThis PR adds a file deletion blocking feature to the enterprise managed files hook. When a user attempts to delete a file via the API, the system now checks whether any batches in non-terminal states (
Confidence Score: 3/5
|
| Filename | Overview |
|---|---|
| enterprise/litellm_enterprise/proxy/hooks/managed_files.py | Adds file deletion blocking logic with three new methods: _is_batch_polling_enabled (checks scheduler job), _get_batches_referencing_file (fetches up to 500 non-terminal batches and filters in Python), and _check_file_deletion_allowed (orchestrates the check). The DB query could be optimized to filter at the database level rather than fetching all non-terminal batches. A duplicate get_model_file_id_mapping call occurs on the delete path. |
| tests/test_litellm/enterprise/proxy/test_file_deletion_blocking.py | Comprehensive test suite with 11 tests covering batch polling detection, file-to-batch reference lookups, deletion blocking/allowing logic, early exit optimization, and error message formatting. All tests use mocks correctly. Has one unused helper function (_make_user_api_key_dict) and an outdated docstring referencing proxy_batch_polling_interval. |
Sequence Diagram
sequenceDiagram
participant User as API Client
participant EP as files_endpoints.py
participant MF as ManagedFiles.afile_delete
participant Check as _check_file_deletion_allowed
participant Poll as _is_batch_polling_enabled
participant DB as Database
participant Sched as APScheduler
User->>EP: DELETE /files/{file_id}
EP->>MF: afile_delete(file_id)
MF->>Check: _check_file_deletion_allowed(file_id)
Check->>Poll: _is_batch_polling_enabled()
Poll->>Sched: get_job('check_batch_cost_job')
Sched-->>Poll: job or None
Poll-->>Check: True/False
alt Polling disabled
Check-->>MF: return (allow)
else Polling enabled
Check->>DB: get_model_file_id_mapping (cache/DB)
DB-->>Check: provider file IDs
Check->>DB: find_many(non-terminal batches, take=500)
DB-->>Check: batch records
Check->>Check: filter batches by file_id in Python
alt No referencing batches
Check-->>MF: return (allow)
else Referencing batches found
Check-->>MF: raise HTTPException(400)
MF-->>EP: HTTPException
EP-->>User: 400 error with batch details
end
end
MF->>DB: get_model_file_id_mapping (duplicate call)
MF->>EP: delete from providers + DB
EP-->>User: 200 OK
Last reviewed commit: 9f5580f
| model_file_id_mapping = await self.get_model_file_id_mapping( | ||
| [file_id], litellm_parent_otel_span=None |
There was a problem hiding this comment.
Duplicate get_model_file_id_mapping call on deletion path
get_model_file_id_mapping is called here (line 1106) inside _get_batches_referencing_file, and then called again at line 1237 inside afile_delete after the check passes. Each call hits the cache or DB. Consider passing the resolved mapping as a parameter to avoid the redundant lookup, or restructuring so afile_delete resolves the mapping once and passes it into the deletion check.
Context Used: Rule from dashboard - What: Avoid creating new database requests or Router objects in the critical request path.
Why: Cre... (source)
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
| model_file_id_mapping = await self.get_model_file_id_mapping( | ||
| [file_id], litellm_parent_otel_span=None |
There was a problem hiding this comment.
Duplicate get_model_file_id_mapping call on delete path
_get_batches_referencing_file calls self.get_model_file_id_mapping() at line 1106 to resolve provider file IDs. Then, after _check_file_deletion_allowed returns (no block), afile_delete calls self.get_model_file_id_mapping() again at line 1237. This results in two cache/DB lookups for the same file ID on every deletion request. Consider passing the result from the first call through or caching it on the instance to avoid the redundant lookup.
Context Used: Rule from dashboard - What: Avoid creating new database requests or Router objects in the critical request path.
Why: Cre... (source)
| def _make_user_api_key_dict(user_id: str = "user-A") -> UserAPIKeyAuth: | ||
| return UserAPIKeyAuth( | ||
| api_key="sk-test", | ||
| user_id=user_id, | ||
| parent_otel_span=None, | ||
| ) |
There was a problem hiding this comment.
Unused helper function _make_user_api_key_dict
This helper function is defined but never called by any test in this file. It should be removed to keep the test file clean.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
| """ | ||
| Tests for file deletion blocking when referenced by non-terminal batches. | ||
|
|
||
| This tests the feature where file deletion is blocked when: | ||
| 1. File is referenced by a batch in non-terminal state (validating, in_progress, finalizing) | ||
| 2. Batch polling is configured (proxy_batch_polling_interval > 0) | ||
|
|
||
| This ensures cost tracking is not disrupted by premature file deletion. |
There was a problem hiding this comment.
Module docstring mentions proxy_batch_polling_interval > 0 which no longer applies
The docstring at line 6 says deletion is blocked when "Batch polling is configured (proxy_batch_polling_interval > 0)", but the actual implementation in _is_batch_polling_enabled() checks whether the check_batch_cost_job scheduler job is registered — not proxy_batch_polling_interval. This docstring should be updated to reflect the actual check.
| batches = await self.prisma_client.db.litellm_managedobjecttable.find_many( | ||
| where={ | ||
| "file_purpose": "batch", | ||
| "status": {"in": ["validating", "in_progress", "finalizing"]}, | ||
| }, | ||
| take=MAX_BATCHES_TO_CHECK, | ||
| order={"created_at": "desc"}, | ||
| ) |
There was a problem hiding this comment.
DB query fetches all non-terminal batches without filtering by file_id
This query fetches up to 500 non-terminal batches from the database and then filters them in Python by parsing each batch's file_object JSON. In deployments with many concurrent batches, this puts unnecessary load on both the database and the application. The query has no filter related to the file_id being deleted, so it loads batch records that may be completely unrelated.
Consider adding a database-level filter (e.g., using a JSON contains clause if Prisma supports it, or storing input_file_id as a dedicated indexed column) to narrow the result set before application-level processing. Even an early-exit optimization (already present at line 1135) only helps after fetching and deserializing records.
Context Used: Rule from dashboard - What: Avoid creating new database requests or Router objects in the critical request path.
Why: Cre... (source)
Relevant issues
Fixes LIT-1987
File Deletion Blocking Feature
This feature blocks file deletion API calls when files are referenced by batches in non-terminal states. This ensures that cost tracking is not disrupted by premature file deletion.
Please complete all items before asking a LiteLLM maintainer to review your PR
tests/litellm/directory, Adding at least 1 test is a hard requirement - see detailsmake test-unit@greptileaiand received a Confidence Score of at least 4/5 before requesting a maintainer reviewCI (LiteLLM team)
Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:
Type
🐛 Bug Fix
Changes