fix(proxy): auto-configure PROMETHEUS_MULTIPROC_DIR for multi-worker setups#20911
fix(proxy): auto-configure PROMETHEUS_MULTIPROC_DIR for multi-worker setups#20911jquinter wants to merge 2 commits intoBerriAI:mainfrom
Conversation
…setups When running LiteLLM proxy with multiple uvicorn workers and Prometheus callbacks enabled, automatically create and set PROMETHEUS_MULTIPROC_DIR so metrics are correctly aggregated across all worker processes. Fixes BerriAI#10595 Supersedes BerriAI#11067 — reimplemented against current codebase, crediting original author @Penagwin for the approach. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Greptile OverviewGreptile SummaryAuto-configures
Confidence Score: 3/5
|
| Filename | Overview |
|---|---|
| litellm/proxy/proxy_cli.py | Adds auto-configuration of PROMETHEUS_MULTIPROC_DIR for multi-worker setups. Logic is correct but only checks callbacks, missing success_callback as another valid configuration path for prometheus. |
| tests/test_litellm/proxy/test_proxy_cli.py | Adds 3 well-structured mock-only tests covering auto-creation, single-worker skip, and existing env var preservation. Proper cleanup in finally blocks. No real network calls. |
…to top - Also check `litellm_settings.success_callback` for prometheus (not just `callbacks`) - Move atexit, shutil, tempfile imports to module-level per CLAUDE.md style guide - Add test for prometheus detection via success_callback Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
@greptile-apps wake up dude! can you re-review this? |
Greptile OverviewGreptile SummaryAuto-configures
Confidence Score: 4/5
|
| Filename | Overview |
|---|---|
| litellm/proxy/proxy_cli.py | Added auto-configuration of PROMETHEUS_MULTIPROC_DIR for multi-worker setups. Checks both callbacks and success_callback, creates a temp directory, sets the env var before workers fork, and registers an atexit cleanup handler. Imports moved to top of file per style guide. |
| tests/test_litellm/proxy/test_proxy_cli.py | Added 4 well-structured mock-only tests covering: auto-creation with callbacks, single-worker no-op, existing env var preservation, and auto-creation with success_callback. Proper env var save/restore in all test cases. |
Summary
When running the LiteLLM proxy with multiple uvicorn workers (
--num_workers > 1) and Prometheus callbacks enabled, Prometheus metrics are silently lost because each worker process maintains its own metrics registry. This PR auto-detects this scenario inproxy_cli.pyand creates a temporary shared directory forPROMETHEUS_MULTIPROC_DIR, enablingMultiProcessCollector(already supported inprometheus.py) to aggregate metrics across workers.PROMETHEUS_MULTIPROC_DIRwhennum_workers > 1andprometheusis inlitellm_settings.callbacksatexithandler to clean up the temp directory on shutdownPROMETHEUS_MULTIPROC_DIRenvironment variable (does not overwrite)Fixes #10595
Supersedes #11067 — reimplemented against current codebase. Full credit to @Penagwin for the original approach and thorough investigation.
Test plan
TestPrometheusMultiprocessSetupwith 3 test cases:test_prometheus_multiproc_dir_auto_created— verifies env var is set and directory exists whennum_workers=4with prometheus callbacktest_prometheus_multiproc_dir_not_set_for_single_worker— verifies env var is NOT set for single workertest_prometheus_multiproc_dir_respects_existing_env— verifies pre-existing env var is not overwrittenmake test-unitpasses🤖 Generated with Claude Code