fix: shorten guardrail benchmark result filenames for Windows long path support by demoray · Pull Request #22039 · BerriAI/litellm

demoray · 2026-02-24T22:35:10Z

Fixes #21941

The generated result filenames from save_confusion_results contained parentheses, dots, and full yaml filenames, producing paths that exceed the Windows 260-char MAX_PATH limit. Rework the safe_label logic to produce short {topic}_{method_abbrev} filenames (e.g. insults_cf.json) while preserving the full label inside the JSON content.

Rename existing tracked result files to match the new naming convention.

Relevant issues

#21941

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

I have Added testing in the tests/litellm/ directory, Adding at least 1 test is a hard requirement - see details
My PR passes all unit tests on make test-unit
My PR's scope is as isolated as possible, it only solves 1 specific problem
I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

CI (LiteLLM team)

CI status guideline:

50-55 passing tests: main is stable with minor issues.

45-49 passing tests: acceptable but needs attention

<= 40 passing tests: unstable; be careful with your merges and assess the risk.

Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:

Type

🐛 Bug Fix
✅ Test

Changes

vercel · 2026-02-24T22:35:15Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
litellm	Ready	Preview, Comment	Feb 28, 2026 3:29am

demoray · 2026-02-24T22:35:38Z

@greptileai

greptile-apps · 2026-02-24T22:39:50Z

Greptile Summary

This PR fixes #21941 by shortening guardrail benchmark result filenames to avoid exceeding the Windows 260-character MAX_PATH limit. The _save_confusion_results function in test_eval.py now parses the label string into {topic}_{method_abbrev} format (e.g., insults_cf.json) instead of using the full label with parentheses and YAML filenames. The two existing tracked result files are renamed to match the new convention.

Reworked safe_label logic in _save_confusion_results to produce short, filesystem-safe filenames while preserving the full label inside the JSON content
Renamed block_insults_-_contentfilter_(denied_insults.yaml).json → insults_cf.json and block_investment_-_contentfilter_(denied_financial_advice.yaml).json → investment_cf.json
Minor style note: dots in model version numbers (e.g., claude-haiku-4.5) are silently stripped by the sanitization regex, producing 45 instead of 4-5 in filenames
No tests were added in tests/litellm/ as noted in the PR checklist, though the change itself is to the benchmark test file

Confidence Score: 4/5

This PR is safe to merge — it only affects benchmark tooling and result filenames, not production code paths.
The changes are limited to benchmark test infrastructure (filename generation logic and file renames). The new safe_label logic correctly handles all current label formats, and file contents are unchanged. One minor cosmetic issue exists with dot-stripping in version numbers, but it doesn't affect correctness. No production code paths are impacted.
Minor attention needed on test_eval.py for the dot-stripping behavior in the sanitization regex (line 126).

Important Files Changed

Filename	Overview
litellm/proxy/guardrails/guardrail_hooks/litellm_content_filter/guardrail_benchmarks/test_eval.py	Reworked `_save_confusion_results` to produce shorter filenames. Logic correctly handles all ContentFilter labels. Minor cosmetic issue: dots in model version numbers (e.g., "4.5") are silently stripped during sanitization, producing slightly misleading filenames for LLM judge results.
litellm/proxy/guardrails/guardrail_hooks/litellm_content_filter/guardrail_benchmarks/results/insults_cf.json	Renamed from long path with parentheses to short `insults_cf.json`. Contents unchanged — file content correctly preserved.
litellm/proxy/guardrails/guardrail_hooks/litellm_content_filter/guardrail_benchmarks/results/investment_cf.json	Renamed from long path with parentheses to short `investment_cf.json`. Contents unchanged — file content correctly preserved.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["_save_confusion_results(label)"] --> B["Split label on em dash (—)"]
    B --> C["Extract topic: strip 'block ', lowercase, underscores"]
    B --> D["Extract method_full from second part"]
    D --> E["Parse method_name: remove parenthetical"]
    D --> F["Parse qualifier: extract from parens, drop file extension"]
    E --> G{method_name == 'contentfilter'?}
    G -- Yes --> H["safe_label = {topic}_cf"]
    G -- No --> I{qualifier present?}
    I -- Yes --> J["safe_label = {topic}_{method}_{qualifier}"]
    I -- No --> K["safe_label = {topic}_{method}"]
    H --> L["Sanitize: replace spaces, remove invalid chars, collapse underscores"]
    J --> L
    K --> L
    L --> M["Write {safe_label}.json to results/"]

_{Last reviewed commit: d3e2439}

greptile-apps

_{3 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

...lm/proxy/guardrails/guardrail_hooks/litellm_content_filter/guardrail_benchmarks/test_eval.py

ghost · 2026-02-25T16:35:53Z

Review

1. Does this PR fix the issue it describes?
Yes. Fixes #21941 — guardrail benchmark result filenames exceeded Windows 260-char MAX_PATH limit. New naming scheme uses short {topic}_{method_abbrev} format (e.g., insults_cf.json).

2. Has this issue already been solved elsewhere?
No — this is specific to the guardrail benchmarks directory.

3. Are there other PRs addressing the same problem?
No duplicates found for #21941.

4. Are there other issues this potentially closes?
Not directly, but Windows users running guardrail benchmarks will benefit.

✅ LGTM — clean fix for Windows compatibility.

…th support Fixes BerriAI#21941 The generated result filenames from _save_confusion_results contained parentheses, dots, and full yaml filenames, producing paths that exceed the Windows 260-char MAX_PATH limit. Rework the safe_label logic to produce short {topic}_{method_abbrev} filenames (e.g. insults_cf.json) while preserving the full label inside the JSON content. Rename existing tracked result files to match the new naming convention.

…r/guardrail_benchmarks/test_eval.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

vercel bot deployed to Preview February 24, 2026 22:36 View deployment

greptile-apps bot reviewed Feb 24, 2026

View reviewed changes

...lm/proxy/guardrails/guardrail_hooks/litellm_content_filter/guardrail_benchmarks/test_eval.py Outdated Show resolved Hide resolved

vercel bot deployed to Preview February 24, 2026 23:46 View deployment

demoray and others added 2 commits February 28, 2026 00:27

Update litellm/proxy/guardrails/guardrail_hooks/litellm_content_filte…

bcf9acf

…r/guardrail_benchmarks/test_eval.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

jquinter force-pushed the bcaswell/fix-long-path-filenames branch from 1c5ff70 to bcf9acf Compare February 28, 2026 03:28

jquinter merged commit 9db4ab1 into BerriAI:main Feb 28, 2026
27 of 30 checks passed

vercel bot deployed to Preview February 28, 2026 03:29 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: shorten guardrail benchmark result filenames for Windows long path support#22039

fix: shorten guardrail benchmark result filenames for Windows long path support#22039
jquinter merged 2 commits intoBerriAI:mainfrom
demoray:bcaswell/fix-long-path-filenames

demoray commented Feb 24, 2026 •

edited

Loading

Uh oh!

vercel bot commented Feb 24, 2026 •

edited

Loading

Uh oh!

demoray commented Feb 24, 2026

Uh oh!

greptile-apps bot commented Feb 24, 2026

Important Files Changed

Uh oh!

greptile-apps bot left a comment

Uh oh!

Uh oh!

ghost commented Feb 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

demoray commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Relevant issues

Pre-Submission checklist

CI (LiteLLM team)

Type

Changes

Uh oh!

vercel bot commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

demoray commented Feb 24, 2026

Uh oh!

greptile-apps bot commented Feb 24, 2026

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ghost commented Feb 25, 2026

Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

demoray commented Feb 24, 2026 •

edited

Loading

vercel bot commented Feb 24, 2026 •

edited

Loading