Refactor translation functionality to use M2M100 model from Hugging Face Transformers (Japanese Translation Support) #63

hasnatelias · 2025-10-07T22:09:18Z

Removed dependency on Azure Translation API and Google Translate API.
Integrated M2M100ForConditionalGeneration and M2M100Tokenizer for translation.
Added language detection using langdetect library.
Updated the translate method to handle text translation and logging.
Improved error handling and fallback mechanism.

This implementation is sufficient for Japanese translation. Here's why:

sequenceDiagram
    participant Client
    participant router_py as router.py
    participant service_py as service.py
    participant translate_py as translate.py (ModelBasedTranslate)
    participant langdetect
    participant transformers as transformers (M2M100 Model)

    Client->>router_py: POST /rai/v1/moderations (Japanese Prompt)
    activate router_py

    router_py->>service_py: getModerationResult(payload)
    activate service_py

    alt Language is not English
        service_py->>translate_py: translator.translate(prompt)
        activate translate_py

        translate_py->>langdetect: detect(Japanese Prompt)
        activate langdetect
        langdetect-->>translate_py: returns 'ja'
        deactivate langdetect

        translate_py->>transformers: Set source lang to 'ja'
        translate_py->>transformers: Generate translation for 'en'
        activate transformers
        transformers-->>translate_py: returns English text
        deactivate transformers

        translate_py-->>service_py: "Translated English Text", "ja"
        deactivate translate_py
    end

    service_py-->>service_py: Perform moderation on English text
    service_py-->>router_py: Moderation Result
    deactivate service_py

    router_py-->>Client: JSON Response
    deactivate router_py

Model Support: The facebook/m2m100_418M model that has been integrated is a multilingual translation model that explicitly supports Japanese among the 100 languages it was trained on.

Language Detection: The [langdetect] library is used to automatically identify the language of the input text. When a user provides a prompt in Japanese, [langdetect] will identify its language code as ja.

Translation Process: The [ModelBasedTranslate] class uses this detected language code (ja) to set the source language for the tokenizer. It then instructs the model to translate the text into English (en), which is the language the moderation guardrails are designed to process.

Therefore, the pipeline is fully equipped to receive Japanese text, translate it to English, and then pass it to the moderation checks, fulfilling the requirements of the feature.

related issue
#21

…ace Transformers - Removed dependency on Azure Translation API and Google Translate API. - Integrated M2M100ForConditionalGeneration and M2M100Tokenizer for translation. - Added language detection using langdetect library. - Updated the translate method to handle text translation and logging. - Improved error handling and fallback mechanism.

hasnatelias changed the title ~~Refactor translation functionality to use M2M100 model from Hugging Face Transformers (Japanese Translation SUpport)~~ Refactor translation functionality to use M2M100 model from Hugging Face Transformers (Japanese Translation Support) Oct 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor translation functionality to use M2M100 model from Hugging Face Transformers (Japanese Translation Support) #63

Refactor translation functionality to use M2M100 model from Hugging Face Transformers (Japanese Translation Support) #63

Uh oh!

hasnatelias commented Oct 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Refactor translation functionality to use M2M100 model from Hugging Face Transformers (Japanese Translation Support) #63

Are you sure you want to change the base?

Refactor translation functionality to use M2M100 model from Hugging Face Transformers (Japanese Translation Support) #63

Uh oh!

Conversation

hasnatelias commented Oct 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant