-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support to translate files #84
Comments
Can you read the files into R at all? Chunk up the text and send in |
But reading your docs, it doesn't look like a big change, since there is an option to upload to Cloud Storage already. |
If I just want the text, it's indeed not difficult - but I need to translate the full files, so that the formatting remains (somewhat) intact as they contain tables. I did not understand how to send the file with the request, so I ended up using import google.cloud.translate as translate
import os
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "key.json"
os.environ["GOOGLE_PROJECT_ID"] = "XXXX"
def translate_pdf(file_path: str, destination: str, target_lang: str = 'en', source_lang: str = ''):
if not os.path.isfile(file_path):
raise ValueError("Error: The file does not exist or is not a regular file.")
if not file_path.lower().endswith('.pdf'):
raise ValueError("Error: The file is not a PDF file.")
client = translate.TranslationServiceClient()
location = "us-central1"
parent = f"projects/{os.environ['GOOGLE_PROJECT_ID']}/locations/{location}"
# Supported file types: https://cloud.google.com/translate/docs/supported-formats
with open(file_path, "rb") as document:
document_content = document.read()
document_input_config = {
"content": document_content,
"mime_type": "application/pdf",
}
response = client.translate_document(
request={
"parent": parent,
"target_language_code": target_lang,
"source_language_code": source_lang,
"document_input_config": document_input_config,
}
)
# To output the translated document, uncomment the code below.
f = open(destination, 'wb')
f.write(response.document_translation.byte_stream_outputs[0])
f.close() |
Thanks! This is helpful |
@MarkEdmondson1234 I was trying to browse through the documentation at this link: |
There were two mirrored websites so bit confused why but this one is still live: https://docs.ropensci.org/googleLanguageR/ |
@MarkEdmondson1234 I tried the following:
This stops working pretty quickly. I suspect that the PDF produced gets time-related metadata added, so they won't be 100% equivalent. Even the file-size is different by a couple of bytes. I'll try to rework the test using pdftools - although this will add a package dependency. |
I am trying to translate some files with the Google Translate API. I don't think that is currently supported - but would be a great option for gl_translate, as I have not found any R code to do it and am a bit daunted by the API docs ... https://cloud.google.com/translate/docs/advanced/translate-documents ... might that be possible?
The text was updated successfully, but these errors were encountered: