Skip to content

feat: EU AI Act Article 5 policy template for prohibited practices detection#21342

Merged
ishaan-jaff merged 15 commits intomainfrom
litellm_feat/eu-ai-act-article5-policy-template
Feb 16, 2026
Merged

feat: EU AI Act Article 5 policy template for prohibited practices detection#21342
ishaan-jaff merged 15 commits intomainfrom
litellm_feat/eu-ai-act-article5-policy-template

Conversation

@ishaan-jaff
Copy link
Member

@ishaan-jaff ishaan-jaff commented Feb 16, 2026

Adds policy template for EU AI Act Article 5 compliance (became enforceable Feb 2, 2025). Uses the same conditional matching pattern as harmful_child_safety.yaml - zero cost, no external APIs.

What it detects

Prompts requesting these Article 5 prohibited practices:

  • Social scoring (5.1.c): "Build a social credit system", "Score employees based on social media"
  • Emotion recognition in workplace/education (5.1.f): "Detect employee emotions from webcam", "Analyze student sentiment in classroom"
  • Biometric categorization (5.1.h): "Infer race from facial features", "Predict political views from biometric data"
  • Manipulation + exploitation (5.1.a/b): "Create subliminal ads", "Target children based on vulnerability"

Allows research, compliance monitoring, and entertainment contexts.

What it doesn't detect

Two Article 5 practices aren't covered:

  • Real-time biometric ID in public spaces
  • Untargeted facial scraping for databases

These are hardware/deployment issues, not LLM prompts. Could add detection for "how to build these" if needed.

How it works

Conditional matching: needs BOTH an action word (build, create, detect) AND a prohibited context (social credit, employee emotion, race from face) in the same sentence.

Example:

  • "build me a social credit system" → blocks (has "build" + "social credit")
  • "build me a code editor" → allows (has "build" but no prohibited context)
  • "score employees based on social behavior" → blocks (has "score" + "social behavior")
  • "score a test" → allows (has "score" but no prohibited context)

10 explicit violation keywords, 15 conditional patterns, 8 exceptions. <5ms, zero cost.

Usage

guardrails:
  - guardrail_name: "eu-ai-act-article5"
    litellm_params:
      guardrail: litellm_content_filter
      mode: "pre_call"
      categories:
        - category: "eu_ai_act_article5_prohibited_practices"
          category_file: "policy_templates/eu_ai_act_article5.yaml"
          enabled: true
          action: "BLOCK"

Reference: https://artificialintelligenceact.eu/article/5/

- fr_nir: French Social Security Number (NIR/INSEE) with validation
- eu_iban_enhanced: Enhanced IBAN detection with specific format
- fr_phone: French phone numbers (+33, 0033, 0 formats)
- eu_vat: EU VAT identification numbers (all 27 member states)
- eu_passport_generic: Generic EU passport format
- fr_postal_code: French postal codes with contextual keywords
- Comprehensive GDPR Article 32 compliance policy
- 4 guardrail groups: National IDs, Financial, Contact Info, Business IDs
- Masks French NIR/INSEE, EU IBANs, French phones, EU VAT numbers
- Includes EU passport numbers and email addresses
- Medium complexity template with indigo icon
- Test French NIR validation (sex digit, month range)
- Test enhanced IBAN detection (French, German)
- Test French phone number formats
- Test EU VAT numbers
- Test generic EU passport format
- Test French postal code pattern
- Verify all 6 EU PII patterns are loaded correctly
- Verify patterns are categorized as 'EU PII Patterns'
- Ensure pattern loading consistency
- 4 tests for PII that should be masked (NIR, IBAN, phone, VAT)
- 4 tests for text that should pass through (invalid patterns, no PII)
- 1 bonus test for multiple PII types in same message
- All tests verify correct masking behavior
- Added region field to all 6 templates (EU, AU, Global)
- Updated both main and backup JSON files
- Enables region-based filtering in UI
- Added Radio.Group filter for regions (All, AU, EU, Global)
- Efficient filtering with useMemo hooks
- Clean button-based UI matching existing design
- Defaults missing regions to Global
Add policy template for detecting EU AI Act Article 5 prohibited practices using conditional keyword matching.

Coverage:
- Article 5.1.c: Social scoring systems
- Article 5.1.f: Emotion recognition in workplace/education
- Article 5.1.h: Biometric categorization of protected characteristics
- Article 5.1.a: Harmful manipulation techniques
- Article 5.1.b: Vulnerability exploitation

Implementation:
- Uses proven conditional matching pattern (identifier + block words)
- 10 always-block keywords for explicit violations
- 8 exceptions for research/compliance/entertainment
- Zero cost (<5ms), no external APIs, 100% private
Example configuration showing how to enable EU AI Act Article 5 guardrail.
@vercel
Copy link

vercel bot commented Feb 16, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment Feb 16, 2026 11:33pm

Request Review

Comprehensive test coverage:
- 10 always-block keywords (explicit violations)
- 15 conditional matches (identifier + block word)
- 8 exceptions (research, compliance, entertainment)
- 7 no-match cases (legitimate uses)

Tests validate correct blocking/allowing behavior for Article 5 prohibited practices.
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 16, 2026

Greptile Summary

This PR adds EU AI Act Article 5 compliance detection, GDPR PII protection patterns, and a region filter for policy templates. It introduces a new YAML-based policy template for detecting prohibited practices (social scoring, emotion recognition in workplace/education, biometric categorization, manipulation, vulnerability exploitation), new EU PII regex patterns (French NIR, IBAN, phone, VAT, passport, postal code), a GDPR policy template in policy_templates.json, region tagging for all existing templates, and a UI region filter.

Critical issue found:

  • The eu_ai_act_article5.yaml template defines identifier_words and additional_block_words for conditional matching, but the loading code in content_filter.py (line 310-312) only activates conditional matching when both identifier_words AND inherit_from are present. Since this template has no inherit_from, the conditional matching logic will never fire — only the always_block_keywords section will be enforced. This means 15 of the 40 test cases (the conditional match cases) will not actually block prohibited content at runtime.

Other issues:

  • The fr_phone regex pattern is missing a leading word boundary, causing false positives when 0 appears mid-string (e.g., 50612345678 matches as 0612345678)
  • The eu_passport_generic pattern (\b[0-9]{2}[A-Z]{2}[0-9]{5}\b) lacks a keyword_pattern for contextual matching, making it prone to false positives on version strings, product codes, and serial numbers
  • litellm/policy_templates_backup.json is missing the new GDPR template that was added to policy_templates.json

Confidence Score: 2/5

  • This PR has a critical logic bug where conditional matching won't activate, rendering the core EU AI Act detection feature non-functional for 15 out of 40 test scenarios.
  • Score of 2 reflects a fundamental gap between the YAML template design and the content_filter.py loading code. The template uses identifier_words + additional_block_words without inherit_from, but the code requires inherit_from to register conditional categories. Additionally, the fr_phone regex has a false-positive bug, and eu_passport_generic lacks contextual filtering. The always_block_keywords and PII pattern portions work correctly, but the conditional matching — the core feature advertised in this PR — is broken.
  • Pay close attention to eu_ai_act_article5.yaml (conditional matching won't activate) and patterns.json (fr_phone false positives, eu_passport_generic false positives)

Important Files Changed

Filename Overview
litellm/proxy/guardrails/guardrail_hooks/litellm_content_filter/policy_templates/eu_ai_act_article5.yaml New EU AI Act Article 5 policy template. Critical issue: conditional matching (identifier_words + additional_block_words) won't activate without inherit_from, so only always_block_keywords will work.
litellm/proxy/guardrails/guardrail_hooks/litellm_content_filter/patterns.json New EU PII regex patterns added. fr_phone missing leading word boundary causes false matches; eu_passport_generic is overly generic without keyword context.
policy_templates.json Adds region field to existing templates and new GDPR EU PII protection template. Region tagging and template definition look correct.
litellm/policy_templates_backup.json Region fields added to existing templates, but missing the new GDPR template that was added to the main policy_templates.json — backup is out of sync.
tests/guardrails_tests/test_eu_ai_act_article5.py 40 test cases for EU AI Act template. Tests 11-25 (conditional matches) will likely fail at runtime because the underlying conditional matching code path is not activated without inherit_from.
tests/test_litellm/proxy/guardrails/guardrail_hooks/content_filter/test_eu_patterns.py Unit tests for the new EU PII regex patterns. Tests are mock-only (no network calls), correctly validate pattern matching for NIR, IBAN, phone, VAT, passport, and postal code.
tests/test_litellm/proxy/guardrails/guardrail_hooks/content_filter/test_gdpr_policy_e2e.py E2E tests for GDPR PII masking policy. Unused fastapi HTTPException import. Tests are local-only with no network calls, which is correct for this test directory.
tests/test_litellm/proxy/guardrails/guardrail_hooks/content_filter/test_patterns.py Adds tests verifying EU patterns are loaded and categorized correctly. Clean, no issues.
ui/litellm-dashboard/src/components/policies/policy_templates.tsx Adds region filter UI to policy templates page using Ant Design Radio buttons. Clean implementation with useMemo for derived state.

Flowchart

flowchart TD
    A[User Prompt] --> B[ContentFilterGuardrail.apply_guardrail]
    B --> C[_filter_single_text]
    C --> D{Check Exceptions}
    D -->|Exception found| E[ALLOW - Skip category]
    D -->|No exception| F{Check Conditional Categories}
    F -->|identifier + block word in same sentence| G[BLOCK]
    F -->|No conditional match| H{Check always_block_keywords}
    H -->|Keyword found| G
    H -->|No keyword match| I{Check Regex Patterns}
    I -->|Pattern match + BLOCK| G
    I -->|Pattern match + MASK| J[Redact & Continue]
    I -->|No match| K{Check Blocked Words}
    K -->|Match| G
    K -->|No match| L[ALLOW]

    style F fill:#ff6b6b,stroke:#333,color:#fff
    style G fill:#e74c3c,stroke:#333,color:#fff
    style L fill:#2ecc71,stroke:#333,color:#fff

    subgraph BUG ["⚠️ Bug: Conditional categories never loaded"]
        F
    end
    style BUG fill:#fff3cd,stroke:#ffc107
Loading

Last reviewed commit: 51f8207

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

9 files reviewed, 5 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +1 to +156
# EU AI Act Article 5 - Prohibited Practices Detection
# Uses conditional logic: BLOCK if identifier word + block word appear together
# Reference: https://artificialintelligenceact.eu/article/5/
category_name: "eu_ai_act_article5_prohibited_practices"
description: "Detects EU AI Act Article 5 prohibited practices using conditional keyword matching"
default_action: "BLOCK"

# IDENTIFIER WORDS - Actions that could create prohibited systems
# When combined with prohibited contexts (block words), triggers violation
identifier_words:
# System development actions
- "build"
- "create"
- "develop"
- "design"
- "implement"

# Analysis/scoring actions
- "score"
- "rank"
- "rate"
- "assess"
- "evaluate"
- "classify"
- "profile"
- "grade"

# Detection/recognition actions
- "recognize"
- "detect"
- "identify"
- "infer"
- "deduce"
- "predict"
- "determine"
- "analyze"

# ADDITIONAL BLOCK WORDS - Prohibited contexts from Article 5
# When combined with identifier words, indicates violation
additional_block_words:
# Social scoring (Article 5.1.c)
- "social behavior"
- "social credit"
- "social score"
- "trustworthiness"
- "citizen score"
- "reputation score"
- "behavioral score"
- "social media behavior"

# Emotion recognition in workplace/education (Article 5.1.f)
- "employee emotion"
- "worker emotion"
- "staff emotion"
- "workplace emotion"
- "student emotion"
- "classroom emotion"
- "employee sentiment"
- "worker sentiment"
- "student sentiment"
- "employee mood"
- "worker mood"

# Biometric categorization (Article 5.1.h)
- "race from face"
- "ethnicity from face"
- "race from biometric"
- "ethnicity from biometric"
- "political views from"
- "political opinions from"
- "sexual orientation from"
- "religion from biometric"
- "religious beliefs from"

# Predictive profiling
- "crime prediction"
- "criminal behavior"
- "recidivism"
- "employee performance prediction"
- "creditworthiness from social"

# Manipulation (Article 5.1.a)
- "subliminal"
- "subconscious"
- "behavior manipulation"

# Vulnerability exploitation (Article 5.1.b)
- "children vulnerability"
- "elderly vulnerability"
- "disabled vulnerability"

# ALWAYS BLOCK - Explicit prohibited practices (always blocked regardless of context)
always_block_keywords:
# Social scoring systems
- keyword: "social credit system"
severity: "high"
- keyword: "social scoring system"
severity: "high"
- keyword: "citizen scoring"
severity: "high"

# Emotion recognition in workplace/education
- keyword: "emotion recognition in workplace"
severity: "high"
- keyword: "emotion detection of employees"
severity: "high"
- keyword: "emotion recognition in classroom"
severity: "high"
- keyword: "student emotion detection"
severity: "high"

# Biometric categorization
- keyword: "infer race from face"
severity: "high"
- keyword: "predict race from facial"
severity: "high"
- keyword: "infer ethnicity from biometric"
severity: "high"
- keyword: "predict political opinions from"
severity: "high"
- keyword: "biometric categorization system"
severity: "high"

# Predictive profiling
- keyword: "predictive policing"
severity: "high"
- keyword: "crime prediction algorithm"
severity: "high"
- keyword: "recidivism prediction"
severity: "high"

# EXCEPTIONS - Legitimate use cases (always allowed)
exceptions:
# Research and education
- "research on"
- "study on"
- "academic"
- "thesis on"

# Compliance monitoring
- "audit for bias"
- "detect discrimination"
- "compliance monitoring"
- "ethical review"
- "fairness testing"

# Entertainment/product contexts
- "movie"
- "game"
- "product review"
- "customer feedback"

# Meta-discussion
- "explain"
- "what is"
- "article 5"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conditional matching won't activate without inherit_from

The loading code in content_filter.py at line 310-312 only registers a conditional category when both identifier_words AND inherit_from are present:

if (
    category_config_obj.identifier_words
    and category_config_obj.inherit_from
):
    self._load_conditional_category(...)

This template has identifier_words and additional_block_words but no inherit_from field. As a result, the _load_conditional_category method is never called, and the identifier_words + additional_block_words conditional matching logic will be completely skipped at runtime. Only the always_block_keywords section will actually be enforced.

This means test cases 11-25 (the conditional matches like "score + social behavior", "detect + employee emotion") will not be blocked as intended. The fix requires either:

  1. Adding inherit_from pointing to an empty or minimal JSON category file, or
  2. Modifying content_filter.py to also handle identifier_words + additional_block_words without requiring inherit_from:
if category_config_obj.identifier_words and (
    category_config_obj.inherit_from or category_config_obj.additional_block_words
):
    self._load_conditional_category(...)

And updating _load_conditional_category to handle the case where inherit_from is None.

Comment on lines +415 to +421
{
"name": "fr_phone",
"display_name": "Phone Number (France)",
"pattern": "(?:\\+33|0033|0)[1-9][0-9]{8}\\b",
"category": "EU PII Patterns",
"description": "Detects French phone numbers in various formats (+33, 0033, or 0 prefix followed by 9 digits starting with 1-9)"
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing leading word boundary on fr_phone pattern causes false matches

The fr_phone pattern (?:\+33|0033|0)[1-9][0-9]{8}\b has a trailing \b but no leading \b. When the prefix is 0, this causes false matches inside longer digit strings. For example, 50612345678 will match 0612345678 starting at position 1.

Suggested change
{
"name": "fr_phone",
"display_name": "Phone Number (France)",
"pattern": "(?:\\+33|0033|0)[1-9][0-9]{8}\\b",
"category": "EU PII Patterns",
"description": "Detects French phone numbers in various formats (+33, 0033, or 0 prefix followed by 9 digits starting with 1-9)"
},
{
"name": "fr_phone",
"display_name": "Phone Number (France)",
"pattern": "(?:(?:\\+33|0033)\\b|\\b0)[1-9][0-9]{8}\\b",
"category": "EU PII Patterns",
"description": "Detects French phone numbers in various formats (+33, 0033, or 0 prefix followed by 9 digits starting with 1-9)"
},

Note: a simple \b before the group won't work because \b doesn't apply before +. The suggested pattern places \b before the 0 prefix specifically and uses \b after +33/0033 to prevent them from being followed by additional digits.

Comment on lines +429 to +435
{
"name": "eu_passport_generic",
"display_name": "Passport Number (EU Generic)",
"pattern": "\\b[0-9]{2}[A-Z]{2}[0-9]{5}\\b",
"category": "EU PII Patterns",
"description": "Detects generic EU passport format (2 digits + 2 letters + 5 digits) - covers France and similar EU formats"
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eu_passport_generic pattern is highly prone to false positives without keyword context

The pattern \b[0-9]{2}[A-Z]{2}[0-9]{5}\b matches any 9-character string in the format DDLLNNNNN (2 digits + 2 letters + 5 digits). Without a keyword_pattern field (unlike fr_postal_code which correctly requires contextual keywords), this will match version strings like 01RC12345, product SKUs like 99ZZ00001, and many other non-passport strings.

Consider adding a keyword_pattern similar to what fr_postal_code uses:

"keyword_pattern": "\\b(?:passport|passeport|travel\\s*document|document\\s*number|reisepass)\\b"


sys.path.insert(0, os.path.abspath("../../"))

from fastapi import HTTPException
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused import: HTTPException from fastapi

HTTPException is imported but never used in this file.

Suggested change
from fastapi import HTTPException

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 16, 2026

Additional Comments (1)

litellm/policy_templates_backup.json
Backup is out of sync with main policy_templates.json

The main policy_templates.json now contains 6 templates (including the new gdpr-eu-pii-protection), but this backup file only has 5. The new GDPR template entry was added to the main file but not to the backup. If this file is intended to be a mirror of policy_templates.json, it should include the same templates.

- Updated loading logic to activate conditional matching when either:
  1. identifier_words + inherit_from (existing pattern)
  2. identifier_words + additional_block_words (new standalone pattern)
- Modified _load_conditional_category to handle standalone templates
- EU AI Act template now works properly without inherit_from
- All 45 tests passing

Fixes Greptile feedback: conditional matching now activates for templates
that define additional_block_words without requiring inherit_from
@ishaan-jaff
Copy link
Member Author

@greptile-apps Please review the changes - I've fixed the critical conditional matching bug you identified. The loading logic now supports standalone templates with additional_block_words without requiring inherit_from.

- patterns.json: add keyword_pattern to eu_vat and eu_passport_generic
- patterns.json: fix fr_phone pattern with leading word boundary
- patterns.json: fix eu_iban_enhanced regex efficiency
- policy_templates.json: remove country-specific passport patterns from GDPR template
- policy_templates_backup.json: sync with main templates file
- test_gdpr_policy_e2e.py: update test setup and fix VAT test text

All tests now pass. Keyword guards prevent false positives.
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 16, 2026

Greptile Summary

This PR adds EU AI Act Article 5 compliance detection via a new policy template, GDPR Art. 32 EU PII protection patterns, and a region filter for the policy templates UI.

  • EU AI Act Article 5 template (eu_ai_act_article5.yaml): Implements conditional keyword matching (identifier word + prohibited context) for detecting prohibited practices like social scoring, workplace emotion recognition, and biometric categorization. Also includes always_block_keywords for explicit violations and exceptions for research/compliance contexts.
  • Standalone conditional matching support: Modifies content_filter.py to support identifier_words + additional_block_words without requiring inherit_from, enabling the new template to work without a base category file.
  • EU PII regex patterns: Adds 6 new patterns to patterns.json (French NIR, enhanced IBAN, French phone, EU VAT, EU passport, French postal code) for GDPR compliance masking.
  • GDPR policy template: Adds gdpr-eu-pii-protection template to policy_templates.json combining the new PII patterns.
  • Region filter UI: Adds a region-based filter to the policy templates dashboard using Radio.Group.
  • Exception bypass vulnerability: The "explain" exception (and similar short exceptions like "game", "what is") in the Article 5 template allows trivial bypass of all blocking—including always_block_keywords—because exceptions are checked as simple substrings before any violation detection.
  • Backup file drift: litellm/policy_templates_backup.json is missing the new GDPR template present in policy_templates.json.

Confidence Score: 3/5

  • The core code change (standalone conditional matching) is sound, but the Article 5 policy template has a meaningful exception bypass vulnerability that undermines its security guarantees.
  • Score of 3 reflects that the content_filter.py logic change is correct and well-tested, but the eu_ai_act_article5.yaml template has overly broad exceptions (especially "explain") that allow trivial bypass of all blocking including always_block_keywords. The GDPR PII patterns work but have some false-positive surface area (eu_vat). The backup file drift is a minor concern.
  • Pay close attention to litellm/proxy/guardrails/guardrail_hooks/litellm_content_filter/policy_templates/eu_ai_act_article5.yaml — the exception list allows trivial bypass of the guardrail.

Important Files Changed

Filename Overview
litellm/proxy/guardrails/guardrail_hooks/litellm_content_filter/content_filter.py Extends conditional category loading to support standalone additional_block_words without inherit_from. Logic is sound and well-structured with proper fallbacks and logging.
litellm/proxy/guardrails/guardrail_hooks/litellm_content_filter/policy_templates/eu_ai_act_article5.yaml New EU AI Act Article 5 policy template. The "explain" exception (and others like "game", "what is") allows trivial bypass of all blocking including always_block_keywords.
litellm/proxy/guardrails/guardrail_hooks/litellm_content_filter/patterns.json Adds 6 new EU PII regex patterns. The eu_vat pattern can false-positive on words starting with EU country codes (e.g., DESK12345678). Prior thread notes on fr_phone and eu_passport_generic also relevant.
policy_templates.json Adds region fields to existing templates and a new GDPR EU PII Protection template. Structure is consistent with existing templates.
litellm/policy_templates_backup.json Adds region fields but is missing the new GDPR template that was added to the main policy_templates.json, creating drift between the two files.
tests/guardrails_tests/test_eu_ai_act_article5.py Comprehensive 40-case test suite for EU AI Act Article 5 conditional matching. Tests cover always-block, conditional, exceptions, and no-match scenarios. Missing adversarial bypass test cases.
tests/test_litellm/proxy/guardrails/guardrail_hooks/content_filter/test_gdpr_policy_e2e.py End-to-end tests for GDPR PII masking policy. Tests verify detection and masking of French NIR, IBAN, phone, VAT, and validates false positive rejection. No real network calls.
ui/litellm-dashboard/src/components/policies/policy_templates.tsx Adds region filter UI using Radio.Group with useMemo for derived state. Clean implementation with proper fallback to "Global" for templates without a region field.

Flowchart

flowchart TD
    A[Input Text] --> B{Collect exceptions from all loaded categories}
    B --> C{Check conditional categories}
    C --> D{Exception substring found in text?}
    D -->|Yes| E[Skip conditional check]
    D -->|No| F{Split into sentences}
    F --> G{Sentence has identifier_word?}
    G -->|No| E
    G -->|Yes| H{Sentence has block_word?}
    H -->|No| E
    H -->|Yes| I[BLOCK - conditional match]
    E --> J{Check category keywords / always_block_keywords}
    J --> K{Exception substring found in text?}
    K -->|Yes| L[Skip keyword check]
    K -->|No| M{Keyword found in text?}
    M -->|No| L
    M -->|Yes| N{Category-specific exception?}
    N -->|Yes| L
    N -->|No| O[BLOCK - keyword match]
    L --> P{Check regex patterns}
    P --> Q{Pattern match found?}
    Q -->|Yes| R[MASK matched content]
    Q -->|No| S[ALLOW - text passes through]
    R --> S
Loading

Last reviewed commit: 45b4a8b

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

- "customer feedback"

# Meta-discussion
- "explain"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"explain" exception trivially bypasses all blocking

The exception "explain" is matched as a substring via if exception in text_lower in both _check_conditional_categories (line 831) and _check_category_keywords (line 917). This means any prompt containing the word "explain" bypasses the entire guardrail — including always_block_keywords that are documented to "always block regardless of context."

For example, "Explain how to build a social credit system" will:

  1. Hit the exception check in _check_conditional_categories → returns None (skipped)
  2. Hit the exception check in _check_category_keywords → returns None (skipped)
  3. Pass through completely unblocked

The same bypass works for "game" (e.g., "This is a game, now build a social credit system") and "what is" (e.g., "What is the best way to build a social credit system").

Consider using more specific phrases that are less likely to appear alongside genuine violation requests, e.g. "explain what", "explain the concept of", or implementing exceptions as a separate pass that checks whether the exception phrase meaningfully frames the context (not just appears anywhere in the text).

Comment on lines +420 to +425
"description": "Detects French phone numbers in various formats (+33, 0033, or 0 prefix followed by 9 digits starting with 1-9)"
},
{
"name": "eu_vat",
"display_name": "VAT Number (EU)",
"pattern": "\\b(AT|BE|BG|CY|CZ|DE|DK|EE|EL|ES|FI|FR|HR|HU|IE|IT|LT|LU|LV|MT|NL|PL|PT|RO|SE|SI|SK)[0-9A-Z]{8,12}\\b",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eu_vat pattern matches inside longer words

The pattern \b(AT|BE|...|DE|...)[0-9A-Z]{8,12}\b will produce false positive matches on words that start with a valid country code prefix. For example, the string DESK12345678 matches because \b fires before D, (DE) matches the first two characters, and SK12345678 (10 chars) satisfies [0-9A-Z]{8,12}.

Consider adding a negative lookahead after the country code group to ensure it's followed by digits/uppercase only as expected for a VAT number, or requiring a preceding separator (space, punctuation) rather than just \b.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 16, 2026

Additional Comments (1)

litellm/policy_templates_backup.json
Backup file is missing the new GDPR template

policy_templates.json now contains the gdpr-eu-pii-protection template (added in this PR), but litellm/policy_templates_backup.json does not include it. This creates a drift between the primary and backup files. If the backup is intended to mirror the main file, it should also include the GDPR template.

- Fix fr_phone: use negative lookbehind (?<!\d) to prevent false matches in digit strings
- Add keyword_pattern to eu_passport_generic to reduce false positives
- Add keyword_pattern to eu_vat for contextual matching

All pattern tests passing
@ishaan-jaff
Copy link
Member Author

@greptile-apps Second review iteration complete. All feedback addressed:

Fixes pushed:

  1. ✅ Conditional matching bug - Modified loading logic to support standalone templates with additional_block_words without requiring inherit_from
  2. ✅ fr_phone pattern - Added negative lookbehind (?<!\d) to prevent false matches in longer digit strings (e.g., 50612345678 no longer matches)
  3. ✅ eu_passport_generic false positives - Added keyword_pattern requiring passport-related context words
  4. ✅ eu_vat false positives - Added keyword_pattern for VAT/tax-related context words

Test status:

  • All 45 EU AI Act tests passing (including 15 conditional matching tests)
  • All 10 EU pattern tests passing

Please re-review and provide updated confidence score.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 16, 2026

Greptile Summary

This PR adds two compliance-focused guardrail features: an EU AI Act Article 5 prohibited practices detector and a GDPR Art. 32 EU PII protection policy template, along with a region-based filtering UI for policy templates.

  • EU AI Act Article 5 template: New YAML-based policy using conditional keyword matching (identifier word + prohibited context) to detect Article 5 violations including social scoring, workplace emotion recognition, biometric categorization, and manipulation/exploitation. Includes 10 always-block keywords, 15 conditional patterns, and 8 exception categories.
  • Standalone conditional matching: Extends content_filter.py to support conditional categories that use identifier_words + additional_block_words without requiring inherit_from, fixing a gap flagged in prior review.
  • EU PII patterns: Adds 6 new regex patterns (fr_nir, eu_iban_enhanced, fr_phone, eu_vat, eu_passport_generic, fr_postal_code) with contextual keyword_pattern support for VAT and passport patterns.
  • GDPR policy template: New policy template in policy_templates.json bundling the EU PII patterns into 4 guardrail definitions for GDPR Article 32 compliance.
  • Region filter UI: Adds region-based Radio button filtering to the policy templates dashboard component.
  • Tests: 40 parametrized test cases for Article 5, 6 pattern unit tests, and 9 GDPR e2e tests. All tests are local-only with no network calls.
  • Issue: The eu_iban_enhanced regex pattern uses a nested quantifier ([A-Z0-9]?){0,16} which can cause exponential backtracking — should be simplified to [A-Z0-9]{0,16}.

Confidence Score: 4/5

  • This PR is safe to merge after fixing the nested quantifier regex pattern in eu_iban_enhanced which could cause performance degradation on adversarial input.
  • The core logic changes (standalone conditional matching support) are well-structured and follow existing patterns. The EU AI Act template is comprehensive with 40 test cases. The one concrete issue is the nested quantifier ReDoS vulnerability in eu_iban_enhanced. The broader exception-bypass concern (e.g., "explain", "game") was already discussed in a prior review thread and is not repeated here.
  • litellm/proxy/guardrails/guardrail_hooks/litellm_content_filter/patterns.json — the eu_iban_enhanced pattern needs the nested quantifier fix.

Important Files Changed

Filename Overview
litellm/proxy/guardrails/guardrail_hooks/litellm_content_filter/content_filter.py Extends conditional category loading to support standalone pattern (identifier_words + additional_block_words without inherit_from). Logic is clean and well-structured. Import reformatting is cosmetic.
litellm/proxy/guardrails/guardrail_hooks/litellm_content_filter/policy_templates/eu_ai_act_article5.yaml New EU AI Act Article 5 policy template with identifier words, block words, always-block keywords, and exceptions. Broad exceptions (e.g. "explain", "game") were flagged in a previous review thread.
litellm/proxy/guardrails/guardrail_hooks/litellm_content_filter/patterns.json Adds 6 EU PII patterns (fr_nir, eu_iban_enhanced, fr_phone, eu_vat, eu_passport_generic, fr_postal_code). The eu_iban_enhanced pattern has a nested quantifier that can cause exponential backtracking.
tests/guardrails_tests/test_eu_ai_act_article5.py Comprehensive 40-case parametrized test covering always-block keywords, conditional matches, exceptions, and no-match scenarios. Tests are local-only with no network calls.
tests/test_litellm/proxy/guardrails/guardrail_hooks/content_filter/test_eu_patterns.py Unit tests for the 6 new EU PII regex patterns. All tests are local regex matching with no network calls, compliant with the mock-only test rule for this directory.
tests/test_litellm/proxy/guardrails/guardrail_hooks/content_filter/test_gdpr_policy_e2e.py End-to-end tests for GDPR EU PII protection policy with 9 test cases covering masking and non-matching scenarios. Has an unused HTTPException import (flagged in previous thread).
policy_templates.json Adds GDPR EU PII Protection template with 4 guardrail definitions and region fields to all existing templates for UI filtering.
ui/litellm-dashboard/src/components/policies/policy_templates.tsx Adds region-based filtering UI with Radio buttons. Uses useMemo for performance. Clean implementation with Ant Design Radio.Group component.

Flowchart

flowchart TD
    A[Input Text] --> B{Collect all exceptions<br/>from loaded categories}
    B --> C{Check conditional categories<br/>identifier_word + block_word}
    C -->|Exception found| D[Skip - Return None]
    C -->|Match found| E{Action = BLOCK?}
    E -->|Yes| F[Raise HTTPException 403]
    E -->|No| G[Log warning - MASK not supported]
    C -->|No match| H{Check category keywords<br/>always_block_keywords}
    H -->|Exception found| I[Skip - Return None]
    H -->|Match found| J{Action = BLOCK?}
    J -->|Yes| K[Raise HTTPException 403]
    J -->|No / MASK| L[Mask keyword in text]
    H -->|No match| M{Check regex patterns<br/>EU PII: fr_nir, eu_iban, etc.}
    M -->|Match + keyword_pattern OK| N[MASK: Replace with REDACTED tag]
    M -->|No match| O[Check blocked words]
    O --> P[Return filtered text]

    style F fill:#ff6b6b,color:#fff
    style K fill:#ff6b6b,color:#fff
    style N fill:#ffa94d,color:#fff
Loading

Last reviewed commit: 3904312

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

…r/patterns.json

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Resolved conflicts:
- patterns.json: added allow_word_numbers field to eu_vat and eu_passport_generic
- test_eu_patterns.py: added test_pattern_requires_keyword_context test
- test_gdpr_policy_e2e.py: updated VAT test comment, added two new contextual guard tests, removed unused HTTPException import
@ishaan-jaff ishaan-jaff merged commit d17bf84 into main Feb 16, 2026
12 of 24 checks passed
ishaan-jaff added a commit that referenced this pull request Feb 17, 2026
Adds the EU AI Act Article 5 - Prohibited Practices template to the policy templates JSON that the UI reads from.

The template uses the eu_ai_act_article5_prohibited_practices category that was added in PR #21342. Blocks prompts requesting:
- Social scoring systems
- Emotion recognition in workplace/education
- Biometric categorization for sensitive attributes
- Predictive profiling and manipulation

Shows up in the UI under EU region filter with High complexity.
ishaan-jaff added a commit that referenced this pull request Feb 17, 2026
* Add EU AI Act Article 5 template to policy templates

Adds the EU AI Act Article 5 - Prohibited Practices template to the policy templates JSON that the UI reads from.

The template uses the eu_ai_act_article5_prohibited_practices category that was added in PR #21342. Blocks prompts requesting:
- Social scoring systems
- Emotion recognition in workplace/education
- Biometric categorization for sensitive attributes
- Predictive profiling and manipulation

Shows up in the UI under EU region filter with High complexity.

* Update policy templates backup with EU AI Act template

Syncs the backup file with the main policy_templates.json to include the EU AI Act Article 5 template.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant