Skip to content

litellm ryan march 20#24323

Merged
yuneng-jiang merged 18 commits intomainfrom
litellm_ryan_march_20
Mar 21, 2026
Merged

litellm ryan march 20#24323
yuneng-jiang merged 18 commits intomainfrom
litellm_ryan_march_20

Conversation

yuneng-jiang and others added 15 commits March 9, 2026 09:48
Audit logs (CRUD events on keys, teams, users, models) were only stored in
the Prisma DB. This adds a pluggable callback system so audit logs can be
forwarded to external services like S3 for ingestion into security monitoring
tools.

New config key `audit_log_callbacks` under `litellm_settings` reuses the
existing callback infrastructure. Any CustomLogger subclass can opt in by
overriding `async_log_audit_log_event()`. S3Logger (s3_v2) is implemented
as the first handler, storing audit logs under `audit_logs/{date}/` prefix.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Two fixes based on PR feedback:

1. Move callback dispatch before the prisma_client check so audit logs
   still reach S3/Datadog even if the DB is down. Also changed the
   prisma_client=None case from raising an exception to logging an error
   and returning gracefully.

2. Attach a done_callback to asyncio tasks created for audit log
   callbacks so exceptions are logged through verbose_proxy_logger
   instead of silently swallowed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add a CopyOutlined icon next to the truncated User ID that copies
the full UUID to clipboard on click. Follows the existing pattern
used in model_hub_table_columns.tsx.
Made-with: Cursor
[Feature] Add Audit Log Export to External Callbacks
The create key form used getPredefinedTags() which only extracted tags
from existing keys' metadata. If no keys had tags, the dropdown was
empty. Switch to the existing useTags() React Query hook that fetches
from /tag/list, matching the edit key form behavior.
…opdown

Litellm fix create key tags dropdown
…ict dumps

Add key-name-based regex patterns (master_key, database_url, auth_token,
etc.) to SecretRedactionFilter so secrets embedded in dict/config dumps
are redacted by key name, regardless of value format.

Fixes a leak where general_settings containing master_key and
database_url was logged in full because the secret values didn't match
any existing value-format regex pattern.
…lobal

fix: global secret redaction via root logger + key-name-based pattern matching
polish: add click-to-copy icon on User ID in internal users table
Hide the red asterisk indicators on Username and Password fields
while keeping the validation rules intact.
polish: remove required asterisks from v3 login form fields
@vercel
Copy link
Copy Markdown

vercel bot commented Mar 21, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment Mar 21, 2026 10:43pm

Request Review

@codspeed-hq
Copy link
Copy Markdown
Contributor

codspeed-hq bot commented Mar 21, 2026

Merging this PR will not alter performance

✅ 16 untouched benchmarks


Comparing litellm_ryan_march_20 (3e27ff1) with main (1986f10)

Open in CodSpeed

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 21, 2026

Greptile Summary

This is a semi-daily integration branch bundling several features and polish fixes: a new High Availability Control Plane architecture (docs + React diagram component), Audit Log Export to External Callbacks (dispatching audit events to CustomLogger subclasses like S3), secret-redaction improvements in _logging.py, and several UI fixes (tags dropdown refresh, click-to-copy User ID, login form polish).

Key changes:

  • Audit log callbacks: litellm.audit_log_callbacks is a new list (List[CALLBACK_TYPES]) that dispatches StandardAuditLogPayload to registered loggers fire-and-forget via asyncio.create_task. Callbacks fire before the DB write so they succeed even when the DB is unavailable. The S3Logger batches audit logs under an audit_logs/YYYY-MM-DD/ S3 prefix.
  • Secret redaction: key-name patterns (master_key, database_url, access_token, etc.) are added to _build_secret_patterns() to catch secrets leaked inside dict/config repr strings.
  • Tags fix: CreateKey component now sources tag options from the useTags API hook instead of the stale getPredefinedTags(data) prop-derived state.
  • P1: _dispatch_audit_log_to_callbacks silently skips callbacks that are plain Callable types — a valid member of CALLBACK_TYPES — with no warning, making misconfiguration very hard to diagnose.
  • P2: The useTags mock in create_key_button.test.tsx uses an array for data, but TagListResponse is Record<string, Tag>; the mock accidentally works due to Object.values() behavior on arrays.

Confidence Score: 3/5

  • Generally safe to merge but has a silent-failure path for Callable-typed audit log callbacks that could confuse users in production.
  • The audit log dispatch logic has a clear gap: CALLBACK_TYPES includes Callable, but _dispatch_audit_log_to_callbacks silently ignores any callable that isn't a CustomLogger instance. There are no runtime errors, but a user who registers a callable audit callback will see no invocation and no warning, making it extremely hard to debug. All other changes (docs, UI polish, secret redaction, S3 batching) look correct and are well-tested.
  • litellm/proxy/management_helpers/audit_logs.py (silent callable drop) and ui/litellm-dashboard/src/components/organisms/create_key_button.test.tsx (wrong mock shape for TagListResponse)

Important Files Changed

Filename Overview
litellm/proxy/management_helpers/audit_logs.py Core audit log callback dispatch logic added; callable-type callbacks in CALLBACK_TYPES are silently dropped without any warning log, and DB write now happens after callback dispatch (intentionally).
litellm/integrations/s3_v2.py Adds async_log_audit_log_event to S3Logger, correctly batching audit logs using the existing log_queue and s3BatchLoggingElement infrastructure.
litellm/proxy/proxy_server.py Wires audit_log_callbacks config key into the proxy config loader; correctly warns when store_audit_logs is not set.
litellm/_logging.py Adds key-name-based secret redaction patterns for common sensitive config keys like master_key, database_url, access_token, etc.
ui/litellm-dashboard/src/components/organisms/create_key_button.tsx Replaces stale predefinedTags state with live useTags hook; Object.values() usage is correct since TagListResponse is Record<string, Tag>.
ui/litellm-dashboard/src/components/organisms/create_key_button.test.tsx Adds useTags mock and tags dropdown test, but the mock uses an array shape instead of the correct Record<string, Tag> shape defined by TagListResponse.

Sequence Diagram

sequenceDiagram
    participant PS as proxy_server.py
    participant AL as audit_logs.py
    participant CB as _dispatch_audit_log_to_callbacks
    participant S3 as S3Logger
    participant CL as CustomLogger (subclass)
    participant DB as Prisma DB

    PS->>AL: create_audit_log_for_update(request_data)
    AL->>AL: check store_audit_logs & premium_user
    AL->>AL: serialize updated_values/before_value to JSON
    AL->>CB: await _dispatch_audit_log_to_callbacks(request_data)
    CB->>CB: _build_audit_log_payload(request_data)
    loop for each callback in audit_log_callbacks
        alt callback is CustomLogger instance
            CB-->>CL: asyncio.create_task(async_log_audit_log_event(payload))
        else callback is str (e.g. "s3_v2")
            CB->>CB: _resolve_audit_log_callback(name) [cached]
            CB-->>S3: asyncio.create_task(async_log_audit_log_event(payload))
        else callback is Callable
            CB->>CB: silently skipped (no warning logged)
        end
    end
    AL->>DB: prisma_client.db.litellm_auditlog.create(...)
    S3-->>S3: append to log_queue, flush if batch_size reached
Loading

Comments Outside Diff (1)

  1. ui/litellm-dashboard/src/components/organisms/create_key_button.test.tsx, line 1653-1661 (link)

    P2 Test mock uses wrong shape for TagListResponse

    TagListResponse is defined as Record<string, Tag> (a keyed object), not an array. The mock below returns an array, which accidentally works at runtime because Object.values([...]) on an array just returns the same elements — but the mock does not accurately represent the real data contract.

    If the component code changes to access the data by key (e.g., tagsData["production"]) rather than using Object.values(), this test will silently pass with the wrong mock shape while production breaks.

    The mock should match the real TagListResponse shape:

Last reviewed commit: "fix: resolve mypy ty..."

Comment on lines +23 to +43
_audit_log_callback_cache: Dict[str, CustomLogger] = {}


def _resolve_audit_log_callback(name: str) -> Optional[CustomLogger]:
"""Resolve a string callback name to a CustomLogger instance, with caching."""
if name in _audit_log_callback_cache:
return _audit_log_callback_cache[name]

from litellm.litellm_core_utils.litellm_logging import (
_init_custom_logger_compatible_class,
)

instance = _init_custom_logger_compatible_class(
logging_integration=name, # type: ignore
internal_usage_cache=None,
llm_router=None,
)

if instance is not None:
_audit_log_callback_cache[name] = instance
return instance
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Stale callback cache on config reload

_audit_log_callback_cache is a module-level dict that is never cleared. In proxy_server.py, when audit_log_callbacks is re-configured (e.g., on a hot-reload), litellm.audit_log_callbacks is reset to [] and rebuilt — but _audit_log_callback_cache is not invalidated. This means any string-named callback like "s3_v2" will continue resolving to the old cached instance (with stale bucket name, credentials, etc.) after a config reload.

To fix, add a cache-clearing step whenever litellm.audit_log_callbacks is reconfigured. One approach is to expose a _clear_audit_log_callback_cache() helper and call it from proxy_server.py:

def _clear_audit_log_callback_cache() -> None:
    """Invalidate the resolved-callback cache (call on config reload)."""
    _audit_log_callback_cache.clear()

And in proxy_server.py, before rebuilding audit_log_callbacks:

elif key == "audit_log_callbacks":
    from litellm.proxy.management_helpers.audit_logs import _clear_audit_log_callback_cache
    _clear_audit_log_callback_cache()
    litellm.audit_log_callbacks = []
    ...

Comment on lines +2982 to +2986
else:
verbose_proxy_logger.warning(
"'audit_log_callbacks' is configured but 'store_audit_logs' is not enabled. "
"Audit log callbacks will not fire until 'store_audit_logs: true' is added to litellm_settings."
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Warning message omits the LITELLM_STORE_AUDIT_LOGS env var alternative

create_audit_log_for_update checks litellm.store_audit_logs or get_secret_bool("LITELLM_STORE_AUDIT_LOGS"), so users can enable audit logs via environment variable without touching litellm_settings. The current warning message only mentions store_audit_logs: true, which could cause confusion for users who rely on the env-var approach.

Suggested change
else:
verbose_proxy_logger.warning(
"'audit_log_callbacks' is configured but 'store_audit_logs' is not enabled. "
"Audit log callbacks will not fire until 'store_audit_logs: true' is added to litellm_settings."
)
verbose_proxy_logger.warning(
"'audit_log_callbacks' is configured but 'store_audit_logs' is not enabled. "
"Audit log callbacks will not fire until 'store_audit_logs: true' is added to litellm_settings "
"or the LITELLM_STORE_AUDIT_LOGS environment variable is set to 'true'."
)

Extract multiline `or` chain from LiteLLM_AuditLogs constructor to fix
pydantic mypy plugin field-type misattribution, and add explicit
Optional[bool] annotation to avoid variable name shadowing conflict.
Comment on lines +59 to +65
r"(?:master_key|database_url|db_url|connection_string|"
r"private_key|signing_key|encryption_key|"
r"auth_token|access_token|refresh_token|"
r"slack_webhook_url|webhook_url|"
r"database_connection_string|"
r"huggingface_token|jwt_secret)"
r"""['\"]?\s*[:=]\s*['\"]?[^\s,'\"})\]{}>]+""",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Missing word boundaries — key-name patterns can over-redact

The key-name patterns (e.g. access_token, refresh_token, webhook_url) are not anchored with word boundaries or negative lookbehinds. This means a key such as oauth_access_token, last_access_token, or my_webhook_url would produce a partial match, e.g.:

"last_access_token": "my-value"
# becomes → "last_REDACTED"

The regex matches access_token" (consuming the trailing " of the key name), then : ", then my-value — leaving a dangling "last_ prefix and a broken closing " in the log line.

Consider adding a word-boundary or negative lookbehind to anchor each key name:

r"(?<!['\"\w])(?:master_key|database_url|db_url|connection_string|"
r"private_key|signing_key|encryption_key|"
r"auth_token|access_token|refresh_token|"
r"slack_webhook_url|webhook_url|"
r"database_connection_string|"
r"huggingface_token|jwt_secret)"
r"""['\"]?\s*[:=]\s*['\"]?[^\s,'\"})\]{}>]+""",

Note: the existing api[_-]?key pattern already sidesteps this by requiring {8,} characters for the value. The new patterns have no such minimum length guard either.

@ryan-crabbe ryan-crabbe marked this pull request as draft March 21, 2026 22:25
New docs page covering the HA control plane architecture where each
worker instance has its own DB, Redis, and master key. Includes a
React component diagram, setup configs, SSO notes, and local testing
instructions.
@ryan-crabbe ryan-crabbe marked this pull request as ready for review March 21, 2026 22:52
@yuneng-jiang yuneng-jiang enabled auto-merge March 21, 2026 22:54
@yuneng-jiang yuneng-jiang self-requested a review March 21, 2026 22:54
@yuneng-jiang yuneng-jiang merged commit e3d4c29 into main Mar 21, 2026
99 of 101 checks passed
Comment on lines +90 to +107
for callback in litellm.audit_log_callbacks:
try:
resolved: Optional[CustomLogger] = callback if isinstance(callback, CustomLogger) else None
if isinstance(callback, str):
resolved = _resolve_audit_log_callback(callback)
if resolved is None:
verbose_proxy_logger.warning(
"Could not resolve audit log callback: %s", callback
)
continue

if isinstance(resolved, CustomLogger):
task = asyncio.create_task(resolved.async_log_audit_log_event(payload))
task.add_done_callback(_audit_log_task_done_callback)
except Exception as e:
verbose_proxy_logger.error(
"Failed dispatching audit log to callback: %s", e
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Callable callbacks silently dropped without warning

CALLBACK_TYPES is defined as Union[str, Callable, "CustomLogger"] (see litellm/__init__.py), meaning a user can legally add a plain callable to audit_log_callbacks. However, _dispatch_audit_log_to_callbacks only handles CustomLogger instances and strings — if a callback is a Callable, resolved stays None and the entry is silently skipped with no log message whatsoever.

This is a silent failure: a user who registers a callable expecting it to be invoked will see no error and no invocation. At minimum, an else branch with a warning is needed:

for callback in litellm.audit_log_callbacks:
    try:
        resolved: Optional[CustomLogger] = callback if isinstance(callback, CustomLogger) else None
        if isinstance(callback, str):
            resolved = _resolve_audit_log_callback(callback)
            if resolved is None:
                verbose_proxy_logger.warning(
                    "Could not resolve audit log callback: %s", callback
                )
                continue
        elif not isinstance(callback, CustomLogger):
            verbose_proxy_logger.warning(
                "Unsupported audit_log_callback type %s (%r) — only CustomLogger instances "
                "and string names are supported.",
                type(callback).__name__,
                callback,
            )
            continue

        if isinstance(resolved, CustomLogger):
            task = asyncio.create_task(resolved.async_log_audit_log_event(payload))
            task.add_done_callback(_audit_log_task_done_callback)
    except Exception as e:
        ...

@ishaan-berri ishaan-berri deleted the litellm_ryan_march_20 branch March 26, 2026 22:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants