-
-
Notifications
You must be signed in to change notification settings - Fork 6.8k
feat: EU AI Act Article 5 policy template for prohibited practices detection #21342
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
5327e3f
ff908fe
b8889c6
8d30bd1
2697745
f94b9fc
896b368
871bfcb
a0761db
51f8207
a836367
45b4a8b
3904312
0672d9c
790f87b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,157 @@ | ||
| # EU AI Act Article 5 - Prohibited Practices Detection | ||
| # Uses conditional logic: BLOCK if identifier word + block word appear together | ||
| # Reference: https://artificialintelligenceact.eu/article/5/ | ||
| category_name: "eu_ai_act_article5_prohibited_practices" | ||
| description: "Detects EU AI Act Article 5 prohibited practices using conditional keyword matching" | ||
| default_action: "BLOCK" | ||
|
|
||
| # IDENTIFIER WORDS - Actions that could create prohibited systems | ||
| # When combined with prohibited contexts (block words), triggers violation | ||
| identifier_words: | ||
| # System development actions | ||
| - "build" | ||
| - "create" | ||
| - "develop" | ||
| - "design" | ||
| - "implement" | ||
|
|
||
| # Analysis/scoring actions | ||
| - "score" | ||
| - "rank" | ||
| - "rate" | ||
| - "assess" | ||
| - "evaluate" | ||
| - "classify" | ||
| - "profile" | ||
| - "grade" | ||
|
|
||
| # Detection/recognition actions | ||
| - "recognize" | ||
| - "detect" | ||
| - "identify" | ||
| - "infer" | ||
| - "deduce" | ||
| - "predict" | ||
| - "determine" | ||
| - "analyze" | ||
|
|
||
| # ADDITIONAL BLOCK WORDS - Prohibited contexts from Article 5 | ||
| # When combined with identifier words, indicates violation | ||
| additional_block_words: | ||
| # Social scoring (Article 5.1.c) | ||
| - "social behavior" | ||
| - "social credit" | ||
| - "social score" | ||
| - "trustworthiness" | ||
| - "citizen score" | ||
| - "reputation score" | ||
| - "behavioral score" | ||
| - "social media behavior" | ||
|
|
||
| # Emotion recognition in workplace/education (Article 5.1.f) | ||
| - "employee emotion" | ||
| - "worker emotion" | ||
| - "staff emotion" | ||
| - "workplace emotion" | ||
| - "student emotion" | ||
| - "classroom emotion" | ||
| - "employee sentiment" | ||
| - "worker sentiment" | ||
| - "student sentiment" | ||
| - "employee mood" | ||
| - "worker mood" | ||
|
|
||
| # Biometric categorization (Article 5.1.h) | ||
| - "race from face" | ||
| - "ethnicity from face" | ||
| - "race from biometric" | ||
| - "ethnicity from biometric" | ||
| - "political views from" | ||
| - "political opinions from" | ||
| - "sexual orientation from" | ||
| - "religion from biometric" | ||
| - "religious beliefs from" | ||
|
|
||
| # Predictive profiling | ||
| - "crime prediction" | ||
| - "criminal behavior" | ||
| - "recidivism" | ||
| - "employee performance prediction" | ||
| - "creditworthiness from social" | ||
|
|
||
| # Manipulation (Article 5.1.a) | ||
| - "subliminal" | ||
| - "subconscious" | ||
| - "behavior manipulation" | ||
|
|
||
| # Vulnerability exploitation (Article 5.1.b) | ||
| - "children vulnerability" | ||
| - "elderly vulnerability" | ||
| - "disabled vulnerability" | ||
|
|
||
| # ALWAYS BLOCK - Explicit prohibited practices (always blocked regardless of context) | ||
| always_block_keywords: | ||
| # Social scoring systems | ||
| - keyword: "social credit system" | ||
| severity: "high" | ||
| - keyword: "social scoring system" | ||
| severity: "high" | ||
| - keyword: "citizen scoring" | ||
| severity: "high" | ||
|
|
||
| # Emotion recognition in workplace/education | ||
| - keyword: "emotion recognition in workplace" | ||
| severity: "high" | ||
| - keyword: "emotion detection of employees" | ||
| severity: "high" | ||
| - keyword: "emotion recognition in classroom" | ||
| severity: "high" | ||
| - keyword: "student emotion detection" | ||
| severity: "high" | ||
|
|
||
| # Biometric categorization | ||
| - keyword: "infer race from face" | ||
| severity: "high" | ||
| - keyword: "predict race from facial" | ||
| severity: "high" | ||
| - keyword: "infer ethnicity from biometric" | ||
| severity: "high" | ||
| - keyword: "predict political opinions from" | ||
| severity: "high" | ||
| - keyword: "biometric categorization system" | ||
| severity: "high" | ||
|
|
||
| # Predictive profiling | ||
| - keyword: "predictive policing" | ||
| severity: "high" | ||
| - keyword: "crime prediction algorithm" | ||
| severity: "high" | ||
| - keyword: "recidivism prediction" | ||
| severity: "high" | ||
|
|
||
| # EXCEPTIONS - Legitimate use cases (always allowed) | ||
| exceptions: | ||
| # Research and education | ||
| - "research on" | ||
| - "study on" | ||
| - "academic" | ||
| - "thesis on" | ||
|
|
||
| # Compliance monitoring | ||
| - "audit for bias" | ||
| - "detect discrimination" | ||
| - "compliance monitoring" | ||
| - "ethical review" | ||
| - "fairness testing" | ||
|
|
||
| # Entertainment/product contexts | ||
| - "movie" | ||
| - "game" | ||
| - "product review" | ||
| - "customer feedback" | ||
|
|
||
| # Meta-discussion | ||
| - "explain" | ||
| - "what is" | ||
| - "article 5" | ||
|
Comment on lines
+1
to
+156
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Conditional matching won't activate without The loading code in if (
category_config_obj.identifier_words
and category_config_obj.inherit_from
):
self._load_conditional_category(...)This template has This means test cases 11-25 (the conditional matches like "score + social behavior", "detect + employee emotion") will not be blocked as intended. The fix requires either:
if category_config_obj.identifier_words and (
category_config_obj.inherit_from or category_config_obj.additional_block_words
):
self._load_conditional_category(...)And updating |
||
| - "prohibited by" | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"explain"exception trivially bypasses all blockingThe exception
"explain"is matched as a substring viaif exception in text_lowerin both_check_conditional_categories(line 831) and_check_category_keywords(line 917). This means any prompt containing the word "explain" bypasses the entire guardrail — includingalways_block_keywordsthat are documented to "always block regardless of context."For example,
"Explain how to build a social credit system"will:_check_conditional_categories→ returnsNone(skipped)_check_category_keywords→ returnsNone(skipped)The same bypass works for
"game"(e.g., "This is a game, now build a social credit system") and"what is"(e.g., "What is the best way to build a social credit system").Consider using more specific phrases that are less likely to appear alongside genuine violation requests, e.g.
"explain what","explain the concept of", or implementing exceptions as a separate pass that checks whether the exception phrase meaningfully frames the context (not just appears anywhere in the text).