-
Notifications
You must be signed in to change notification settings - Fork 47
Description
Currently, our Moderation Guardrails support only English language, which limits their effectiveness in detecting and blocking multilingual jailbreak attacks. To address this, we propose extending support to multiple languages using a model-based translation approach.
Why This Is Needed:
Existing translation options like Google Translate and Azure Translate rely on external APIs, which may pose security risks.
Model-based guardrails perform better with English prompts and offer more secure and consistent results.
Multilingual support is essential to ensure robust moderation across diverse user inputs.
Proposed Solution:
Integrate the facebook/m2m_100_418M model, which supports 100+ languages.
Translate incoming prompts to English and detect the original language.
Focus on priority languages such as Dutch, French, German, Italian, and Spanish.
Currently available in the development environment.
Feature Highlights:
Secure, model-based translation pipeline.
Language detection and prompt translation.
Enables moderation guardrails to handle multilingual inputs effectively.