In today’s interconnected world, effective communication often requires overcoming language barriers. Organizations routinely produce documents, spreadsheets, and presentations, which are essential for sharing information and maintaining records. However, as we engage with diverse audiences, the demand for accurate translation of these materials has surged. While some larger companies may employ professional translators, this can be both time-consuming and costly. Although there are various online tools available for text translation, few provide a secure and efficient way to translate complete documents while preserving their original formatting.
Amazon Translate now facilitates the translation of Office Open XML files, including DOCX, PPTX, and XLSX formats. This fully managed neural machine translation service offers high-quality and cost-effective translation in 55 different languages. For a comprehensive list of supported languages, refer to the Supported Languages and Language Codes section. The document translation feature is accessible wherever batch translation is enabled; for further details, see Asynchronous Batch Processing.
In this article, we will guide you through the process of translating documents using the AWS Management Console. Additionally, you can utilize the Amazon Translate BatchTranslation API for document translation via the AWS Command Line Interface (AWS CLI) or AWS SDK.
Overview of the Solution
This article outlines the following steps:
- Create an AWS Identity and Access Management (IAM) role that allows access to your Amazon Simple Storage Service (Amazon S3) buckets.
- Organize your documents by file type and language.
- Execute the batch translation.
Creating an IAM Role for S3 Access
We will create a role that permits access to all S3 buckets within your account to facilitate document translation. This role will be provided to Amazon Translate, enabling the service to access your designated input and output S3 locations. For more information, refer to the AWS Identity and Access Management Documentation.
- Log in to your AWS account.
- In the IAM console, under Access Management, select Roles.
- Click on Create role.
- Choose Another AWS account.
- Enter your Account ID.
- Proceed to the next page.
- In the Filter policies section, search for and add the AmazonS3FullAccess policy.
- Proceed to the next page.
- Assign a name to the role, such as TranslateBatchAPI.
- Navigate to the role you just created.
- On the Trust relationships tab, select Edit trust relationship.
- Enter the following service principals:
"Service": [ "translate.aws.internal", "translate.amazonaws.com" ],
(Refer to the accompanying screenshot for clarity.)
Organizing Your Documents
To utilize Amazon Translate’s batch translation feature, documents must be stored in a designated folder within an S3 bucket. Note that batch translation does not function if files are located in the root of the S3 bucket or if they are nested. Therefore, upload the documents you wish to translate into a folder within an S3 bucket, ensuring that files are organized by type (DOCX, PPTX, XLSX) and language. If you have multiple documents of varying types to translate, create separate S3 prefixes for each document type in a single language.
- From the Amazon S3 console, select Create bucket.
- Follow the prompts to create your buckets.
In this example, we will create two buckets named input-translate-bucket and output-translate-bucket.
The buckets will include the following folders for each file type:
- docx
- pptx
- xlsx
Executing Batch Translation
To carry out your batch translation, follow these steps:
- Open the Amazon Translate console and select Batch Translation.
- Click on Create job.
For this instance, we will focus on translating documents in DOCX format.
- For Name, enter BatchTranslation.
- For Source language, select En.
- For Target language, choose Es.
- For Input S3 location, specify s3://input-translate-bucket/docx/.
- For File format, select docx.
- For Output S3 location, input s3://output-translate-bucket/.
- For Access permissions, opt for Use an existing IAM role.
- For IAM role, enter TranslateBatchAPI.
Since this is an asynchronous translation, the process will commence after the necessary machine resources are allocated, which may take up to 15 minutes. For additional information on starting batch translation jobs, see Starting a Batch Translation Job. A screenshot illustrating the details of your BatchTranslation job is available for reference.
Upon completion of the translation, you will find the output in a designated folder within your S3 bucket.
Conclusion
In this article, we explored the implementation of asynchronous batch translation for documents in DOCX format. The same procedure can be applied to translate spreadsheets and presentations. The process is straightforward, and you will only be charged for the number of characters translated in each format. Begin translating your office documents today in all regions that support batch translation. If you’re new to Amazon Translate, consider taking advantage of the Free Tier, which allows for 2 million characters per month for the first 12 months, starting with your initial translation request. For further insights on this topic, check out this blog post from Chanci Turner and another authoritative source on the subject, CHVNCI. Additionally, if you’re interested in a career opportunity, you can explore this excellent resource for training roles.
Leave a Reply