Skip to content

fix: shorten guardrail benchmark result filenames for Windows long path support#22039

Merged
jquinter merged 2 commits intoBerriAI:mainfrom
demoray:bcaswell/fix-long-path-filenames
Feb 28, 2026
Merged

fix: shorten guardrail benchmark result filenames for Windows long path support#22039
jquinter merged 2 commits intoBerriAI:mainfrom
demoray:bcaswell/fix-long-path-filenames

Conversation

@demoray
Copy link
Contributor

@demoray demoray commented Feb 24, 2026

Fixes #21941

The generated result filenames from save_confusion_results contained parentheses, dots, and full yaml filenames, producing paths that exceed the Windows 260-char MAX_PATH limit. Rework the safe_label logic to produce short {topic}_{method_abbrev} filenames (e.g. insults_cf.json) while preserving the full label inside the JSON content.

Rename existing tracked result files to match the new naming convention.

Relevant issues

#21941

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

  • I have Added testing in the tests/litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem
  • I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

CI (LiteLLM team)

CI status guideline:

  • 50-55 passing tests: main is stable with minor issues.
  • 45-49 passing tests: acceptable but needs attention
  • <= 40 passing tests: unstable; be careful with your merges and assess the risk.
  • Branch creation CI run
    Link:

  • CI run for the last commit
    Link:

  • Merge / cherry-pick CI run
    Links:

Type

🐛 Bug Fix
✅ Test

Changes

@vercel
Copy link

vercel bot commented Feb 24, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment Feb 28, 2026 3:29am

Request Review

@demoray
Copy link
Contributor Author

demoray commented Feb 24, 2026

@greptileai

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 24, 2026

Greptile Summary

This PR fixes #21941 by shortening guardrail benchmark result filenames to avoid exceeding the Windows 260-character MAX_PATH limit. The _save_confusion_results function in test_eval.py now parses the label string into {topic}_{method_abbrev} format (e.g., insults_cf.json) instead of using the full label with parentheses and YAML filenames. The two existing tracked result files are renamed to match the new convention.

  • Reworked safe_label logic in _save_confusion_results to produce short, filesystem-safe filenames while preserving the full label inside the JSON content
  • Renamed block_insults_-_contentfilter_(denied_insults.yaml).jsoninsults_cf.json and block_investment_-_contentfilter_(denied_financial_advice.yaml).jsoninvestment_cf.json
  • Minor style note: dots in model version numbers (e.g., claude-haiku-4.5) are silently stripped by the sanitization regex, producing 45 instead of 4-5 in filenames
  • No tests were added in tests/litellm/ as noted in the PR checklist, though the change itself is to the benchmark test file

Confidence Score: 4/5

  • This PR is safe to merge — it only affects benchmark tooling and result filenames, not production code paths.
  • The changes are limited to benchmark test infrastructure (filename generation logic and file renames). The new safe_label logic correctly handles all current label formats, and file contents are unchanged. One minor cosmetic issue exists with dot-stripping in version numbers, but it doesn't affect correctness. No production code paths are impacted.
  • Minor attention needed on test_eval.py for the dot-stripping behavior in the sanitization regex (line 126).

Important Files Changed

Filename Overview
litellm/proxy/guardrails/guardrail_hooks/litellm_content_filter/guardrail_benchmarks/test_eval.py Reworked _save_confusion_results to produce shorter filenames. Logic correctly handles all ContentFilter labels. Minor cosmetic issue: dots in model version numbers (e.g., "4.5") are silently stripped during sanitization, producing slightly misleading filenames for LLM judge results.
litellm/proxy/guardrails/guardrail_hooks/litellm_content_filter/guardrail_benchmarks/results/insults_cf.json Renamed from long path with parentheses to short insults_cf.json. Contents unchanged — file content correctly preserved.
litellm/proxy/guardrails/guardrail_hooks/litellm_content_filter/guardrail_benchmarks/results/investment_cf.json Renamed from long path with parentheses to short investment_cf.json. Contents unchanged — file content correctly preserved.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["_save_confusion_results(label)"] --> B["Split label on em dash (—)"]
    B --> C["Extract topic: strip 'block ', lowercase, underscores"]
    B --> D["Extract method_full from second part"]
    D --> E["Parse method_name: remove parenthetical"]
    D --> F["Parse qualifier: extract from parens, drop file extension"]
    E --> G{method_name == 'contentfilter'?}
    G -- Yes --> H["safe_label = {topic}_cf"]
    G -- No --> I{qualifier present?}
    I -- Yes --> J["safe_label = {topic}_{method}_{qualifier}"]
    I -- No --> K["safe_label = {topic}_{method}"]
    H --> L["Sanitize: replace spaces, remove invalid chars, collapse underscores"]
    J --> L
    K --> L
    L --> M["Write {safe_label}.json to results/"]
Loading

Last reviewed commit: d3e2439

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

@ghost
Copy link

ghost commented Feb 25, 2026

Review

1. Does this PR fix the issue it describes?
Yes. Fixes #21941 — guardrail benchmark result filenames exceeded Windows 260-char MAX_PATH limit. New naming scheme uses short {topic}_{method_abbrev} format (e.g., insults_cf.json).

2. Has this issue already been solved elsewhere?
No — this is specific to the guardrail benchmarks directory.

3. Are there other PRs addressing the same problem?
No duplicates found for #21941.

4. Are there other issues this potentially closes?
Not directly, but Windows users running guardrail benchmarks will benefit.

✅ LGTM — clean fix for Windows compatibility.

demoray and others added 2 commits February 28, 2026 00:27
…th support

Fixes BerriAI#21941

The generated result filenames from _save_confusion_results contained
parentheses, dots, and full yaml filenames, producing paths that exceed
the Windows 260-char MAX_PATH limit. Rework the safe_label logic to
produce short {topic}_{method_abbrev} filenames (e.g. insults_cf.json)
while preserving the full label inside the JSON content.

Rename existing tracked result files to match the new naming convention.
…r/guardrail_benchmarks/test_eval.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
@jquinter jquinter force-pushed the bcaswell/fix-long-path-filenames branch from 1c5ff70 to bcf9acf Compare February 28, 2026 03:28
@jquinter jquinter merged commit 9db4ab1 into BerriAI:main Feb 28, 2026
27 of 30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: benchmark results should not be in the distributed package

2 participants