Skip to content

Commit 8fb444d

Browse files
Mohamed Shabanhodd
authored andcommitted
improve docs and samples for glossaries and custom models (Azure#18587)
* update the readme * update readme file * added custom translation samples * fix 'no-locale' thing in links * update glossary docs * update glossaries * link to sample glossaries instead of writing code in readme * update custom model sample linking * remove relative linking in readme * make subheadings in bold text to be more readable * conform with 'Document Translation' naming * disambiguate container sas url * capitaliz Azure name * remove misplaced period * update samples -> custom model * update async sample -> custom model * remove localization from url * update readme with new file types for glossaries * adding sample glossaries -> xlf * white space * use simplified single input method * update 'job' terminology * update azure-core naming * update glossary blob file reference name * link to supported glossaries table * remove locale from url
1 parent c4c4f35 commit 8fb444d

File tree

6 files changed

+233
-0
lines changed

6 files changed

+233
-0
lines changed

sdk/translation/azure-ai-translation-document/README.md

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -323,6 +323,40 @@ To see how to use the Document Translation client library with Azure Storage Blo
323323
for your containers, and download the finished translated documents, see this [sample][sample_translation_with_azure_blob].
324324
Note that you will need to install the [azure-storage-blob][azure_storage_blob] library to run this sample.
325325

326+
## Advanced Topics
327+
328+
The following section provides some insights for some of the advanced translation features such as glossaries and custom translation models.
329+
330+
### **Glossaries**
331+
Glossaries are domain-specific dictionaries. For example, if you want to translate some medical-related documents, you may need support for the many words, terminology, and idioms in the medical field which you can't find in the standard translation dictionary or you simply need specific translation. This is why Document Translation provides support for glossaries.
332+
333+
#### **How To Create Glossary File**
334+
335+
Document Translation supports glossaries in the following formats:
336+
337+
|**File Type**|**Extension**|**Description**|**Samples**|
338+
|---------------|---------------|---------------|---------------|
339+
|Tab-Separated Values/TAB|.tsv, .tab|Read more on [wikipedia][tsv_files_wikipedia]|[glossary_sample.tsv][sample_tsv_file]|
340+
|Comma-Seperated Values|.csv|Read more on [wikipedia][csv_files_wikipedia]|[glossary_sample.csv][sample_csv_file]|
341+
|Localization Interchange File Format|.xlf, .xliff|Read more on [wikipedia][xlf_files_wikipedia]|[glossary_sample.xlf][sample_xlf_file]|
342+
343+
View all supported formats [here][supported_glossary_formats].
344+
345+
#### **How Use Glossaries in Document Translation**
346+
In order to use glossaries with Document Translation, you first need to upload your glossaries file to some blob container, and then provide the SaS url to of this glossary file to Document Translation as in the code samples [sample_translation_with_glossaries.py][sample_translation_with_glossaries].
347+
348+
349+
### **Custom Translation Models**
350+
Instead of using Document Translation's engine for translation, you can use your own custom Azure machine/deep learning model.
351+
352+
#### **How To Create a Custom Translation Model**
353+
For more info on how to create, provision, and deploy your own custom Azure translation model, please follow the instructions here: [Build, deploy, and use a custom model for translation][custom_translation_article]
354+
355+
#### **How To Use a Custom Translation Model With Document Translation**
356+
In order to use a custom translation model with Document Translation, you first
357+
need to create and deploy your model, then follow the code sample [sample_translation_with_custom_model.py][sample_translation_with_custom_model] to use with Document Translation.
358+
359+
326360
## Troubleshooting
327361

328362
### General
@@ -436,6 +470,17 @@ This project has adopted the [Microsoft Open Source Code of Conduct][code_of_con
436470
[sample_translation_with_glossaries_async]: https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/translation/azure-ai-translation-document/samples/async_samples/sample_translation_with_glossaries_async.py
437471
[sample_translation_with_azure_blob]: https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/translation/azure-ai-translation-document/samples/sample_translation_with_azure_blob.py
438472
[sample_translation_with_azure_blob_async]: https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/translation/azure-ai-translation-document/samples/async_samples/sample_translation_with_azure_blob_async.py
473+
[sample_translation_with_custom_model]: https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/translation/azure-ai-translation-document/samples/sample_translation_with_custom_model.py
474+
[sample_translation_with_custom_model_async]: https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/translation/azure-ai-translation-document/samples/async_samples/sample_translation_with_custom_model_async.py
475+
476+
[supported_glossary_formats]: https://docs.microsoft.com/azure/cognitive-services/translator/document-translation/overview#supported-glossary-formats
477+
[custom_translation_article]: https://docs.microsoft.com/azure/cognitive-services/translator/custom-translator/quickstart-build-deploy-custom-model
478+
[tsv_files_wikipedia]: https://wikipedia.org/wiki/Tab-separated_values
479+
[xlf_files_wikipedia]: https://wikipedia.org/wiki/XLIFF
480+
[csv_files_wikipedia]: https://wikipedia.org/wiki/Comma-separated_values
481+
[sample_tsv_file]: https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/translation/azure-ai-translation-document/samples/assets/glossary_sample.tsv
482+
[sample_csv_file]: https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/translation/azure-ai-translation-document/samples/assets/glossary_sample.csv
483+
[sample_xlf_file]: https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/translation/azure-ai-translation-document/samples/assets/glossary_sample.xlf
439484

440485
[cla]: https://cla.microsoft.com
441486
[code_of_conduct]: https://opensource.microsoft.com/codeofconduct/
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
skull,le crâne
2+
body,corps
3+
heart,cœur
4+
lungs,poumons
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
skull le crâne
2+
body corps
3+
heart cœur
4+
lungs poumons
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
<?xml version="1.0" encoding="UTF-8"?>
2+
<xliff xmlns="urn:oasis:names:tc:xliff:document:1.2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="1.2" xsi:schemaLocation="urn:oasis:names:tc:xliff:document:1.2 http://docs.oasis-open.org/xliff/v1.2/os/xliff-core-1.2-strict.xsd">
3+
<file original="EN-TA.tmx" source-language="en-US" target-language="fr-FR" datatype="xml">
4+
<body>
5+
<trans-unit id="1" datatype="plaintext">
6+
<source>skull</source>
7+
<target state="translated">le crâne</target>
8+
</trans-unit>
9+
<trans-unit id="2" datatype="plaintext">
10+
<source>body</source>
11+
<target state="translated">corps</target>
12+
</trans-unit>
13+
<trans-unit id="3" datatype="plaintext">
14+
<source>heart</source>
15+
<target state="translated">cœur</target>
16+
</trans-unit>
17+
<trans-unit id="4" datatype="plaintext">
18+
<source>lungs</source>
19+
<target state="translated">poumons</target>
20+
</trans-unit>
21+
</body>
22+
</file>
23+
</xliff>
Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
# coding=utf-8
2+
# ------------------------------------
3+
# Copyright (c) Microsoft Corporation.
4+
# Licensed under the MIT License.
5+
# ------------------------------------
6+
7+
"""
8+
FILE: sample_translation_with_custom_model_async.py
9+
10+
DESCRIPTION:
11+
This sample demonstrates how to create a translation operation and apply custom azure translation model when doing the translation.
12+
13+
To set up your containers for translation and generate SAS tokens to your containers (or files)
14+
with the appropriate permissions, see the README.
15+
16+
USAGE:
17+
python sample_translation_with_custom_model_async.py
18+
19+
Set the environment variables with your own values before running the sample:
20+
1) AZURE_DOCUMENT_TRANSLATION_ENDPOINT - the endpoint to your Document Translation resource.
21+
2) AZURE_DOCUMENT_TRANSLATION_KEY - your Document Translation API key.
22+
3) AZURE_SOURCE_CONTAINER_URL - the container SAS URL to your source container which has the documents
23+
to be translated.
24+
4) AZURE_TARGET_CONTAINER_URL - the container SAS URL to your target container where the translated documents
25+
will be written.
26+
5) AZURE_CUSTOM_MODEL_ID - the URL to your Azure custom translation model.
27+
"""
28+
29+
import asyncio
30+
31+
32+
async def sample_translation_with_custom_model_async():
33+
import os
34+
from azure.core.credentials import AzureKeyCredential
35+
from azure.ai.translation.document.aio import DocumentTranslationClient
36+
37+
endpoint = os.environ["AZURE_DOCUMENT_TRANSLATION_ENDPOINT"]
38+
key = os.environ["AZURE_DOCUMENT_TRANSLATION_KEY"]
39+
source_container_url = os.environ["AZURE_SOURCE_CONTAINER_URL"]
40+
target_container_url = os.environ["AZURE_TARGET_CONTAINER_URL"]
41+
custom_model_id = os.environ["AZURE_CUSTOM_MODEL_ID"]
42+
43+
client = DocumentTranslationClient(endpoint, AzureKeyCredential(key))
44+
45+
46+
47+
async with client:
48+
poller = await client.begin_translation(
49+
source_container_url,
50+
target_container_url,
51+
"es",
52+
category_id=custom_model_id
53+
)
54+
result = await poller.result()
55+
56+
print("Operation status: {}".format(result.status))
57+
print("Operation created on: {}".format(result.created_on))
58+
print("Operation last updated on: {}".format(result.last_updated_on))
59+
print("Total number of translations on documents: {}".format(result.documents_total_count))
60+
61+
print("\nOf total documents...")
62+
print("{} failed".format(result.documents_failed_count))
63+
print("{} succeeded".format(result.documents_succeeded_count))
64+
65+
doc_results = client.list_all_document_statuses(result.id)
66+
async for document in doc_results:
67+
print("Document ID: {}".format(document.id))
68+
print("Document status: {}".format(document.status))
69+
if document.status == "Succeeded":
70+
print("Source document location: {}".format(document.source_document_url))
71+
print("Translated document location: {}".format(document.translated_document_url))
72+
print("Translated to language: {}\n".format(document.translate_to))
73+
else:
74+
print("Error Code: {}, Message: {}\n".format(document.error.code, document.error.message))
75+
76+
77+
async def main():
78+
await sample_translation_with_custom_model_async()
79+
80+
if __name__ == '__main__':
81+
loop = asyncio.get_event_loop()
82+
loop.run_until_complete(main())
Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
# coding=utf-8
2+
# ------------------------------------
3+
# Copyright (c) Microsoft Corporation.
4+
# Licensed under the MIT License.
5+
# ------------------------------------
6+
7+
"""
8+
FILE: sample_translation_with_custom_model.py
9+
10+
DESCRIPTION:
11+
This sample demonstrates how to create a translation operation and apply custom azure translation model when doing the translation.
12+
13+
To set up your containers for translation and generate SAS tokens to your containers (or files)
14+
with the appropriate permissions, see the README.
15+
16+
USAGE:
17+
python sample_translation_with_custom_model.py
18+
19+
Set the environment variables with your own values before running the sample:
20+
1) AZURE_DOCUMENT_TRANSLATION_ENDPOINT - the endpoint to your Document Translation resource.
21+
2) AZURE_DOCUMENT_TRANSLATION_KEY - your Document Translation API key.
22+
3) AZURE_SOURCE_CONTAINER_URL - the container SAS URL to your source container which has the documents
23+
to be translated.
24+
4) AZURE_TARGET_CONTAINER_URL - the container SAS URL to your target container where the translated documents
25+
will be written.
26+
5) AZURE_CUSTOM_MODEL_ID - the URL to your Azure custom translation model.
27+
"""
28+
29+
30+
def sample_translation_with_custom_model():
31+
import os
32+
from azure.core.credentials import AzureKeyCredential
33+
from azure.ai.translation.document import (
34+
DocumentTranslationClient
35+
)
36+
37+
endpoint = os.environ["AZURE_DOCUMENT_TRANSLATION_ENDPOINT"]
38+
key = os.environ["AZURE_DOCUMENT_TRANSLATION_KEY"]
39+
source_container_url = os.environ["AZURE_SOURCE_CONTAINER_URL"]
40+
target_container_url = os.environ["AZURE_TARGET_CONTAINER_URL"]
41+
custom_model_id = os.environ["AZURE_CUSTOM_MODEL_ID"]
42+
43+
client = DocumentTranslationClient(endpoint, AzureKeyCredential(key))
44+
45+
poller = client.begin_translation(
46+
source_container_url,
47+
target_container_url,
48+
"es",
49+
category_id=custom_model_id
50+
)
51+
result = poller.result()
52+
53+
print("Operation status: {}".format(result.status))
54+
print("Operation created on: {}".format(result.created_on))
55+
print("Operation last updated on: {}".format(result.last_updated_on))
56+
print("Total number of translations on documents: {}".format(result.documents_total_count))
57+
58+
print("\nOf total documents...")
59+
print("{} failed".format(result.documents_failed_count))
60+
print("{} succeeded".format(result.documents_succeeded_count))
61+
62+
doc_results = client.list_all_document_statuses(result.id)
63+
for document in doc_results:
64+
print("Document ID: {}".format(document.id))
65+
print("Document status: {}".format(document.status))
66+
if document.status == "Succeeded":
67+
print("Source document location: {}".format(document.source_document_url))
68+
print("Translated document location: {}".format(document.translated_document_url))
69+
print("Translated to language: {}\n".format(document.translate_to))
70+
else:
71+
print("Error Code: {}, Message: {}\n".format(document.error.code, document.error.message))
72+
73+
74+
if __name__ == '__main__':
75+
sample_translation_with_custom_model()

0 commit comments

Comments
 (0)