In today’s fast-paced translation and localization sectors, companies are constantly striving to deliver swift services at minimal costs. To tackle this challenge, many organizations are turning to Machine Translation (MT) to enhance their teams of translators. MT employs automated software to translate text without human intervention.
A notable innovation in this area is Active Custom Translation (ACT), which allows organizations to customize translated content according to specific language styles or terminologies as per client preferences. Previously, firms had to develop custom models to integrate ACT into their systems. With Amazon Translate’s Active Custom Translation feature, clients can seamlessly incorporate configurable MT capabilities into their translation frameworks without the need to build them from scratch.
This article outlines a complete automated translation workflow, including strategies for managing data within the ACT process. This solution integrates Amazon Translate with various Amazon Web Services (AWS) such as AWS DataSync and AWS Lambda. Before delving into the architecture, let’s clarify some fundamental concepts pertinent to the translation and localization industry.
Fundamental Translation Concepts
Translation Memory: It is standard practice to reuse previously generated outputs as components in machine translation systems. This data is generally referred to as Translation Memory and is stored and exchanged in standardized formats (TMX, TSV, or CSV).
Source Text: Input data for translation is frequently exchanged as XML Localization Interchange File Format (XLIFF) documents. Recently, Amazon Translate added support for XLIFF documents for batch processing.
Translation Workflow Overview: As illustrated in Figure 1, the standard translation process involves machine translation and translation memory. Once the output is finalized and reviewed, it becomes part of the organization’s intellectual property (IP) and can be reintegrated into the workflow for future translation tasks.
Translation Assistant Solution Overview
When utilizing Amazon Translate in batch mode, the following steps are essential:
- Collect and prepare translation input data for the job.
- Monitor the processing and retrieval of outputs.
- Implement necessary processes to integrate your Translation Management System (TMS) with AWS.
These steps can entail numerous manual tasks, such as downloading large files, uploading them to Amazon Simple Storage Service (S3), and configuring jobs. The solution depicted in Figure 2 outlines these automation processes.
Translation Automation Processes
- Upload translation job input data (source files, custom terminology, translation memory files).
- Begin the preprocessing stage by scanning input files and identifying language pairs.
- Generate an Amazon Simple Queue Service (SQS) message for each language pair and translation project.
- Create S3 buckets and prefixes for each translation job.
- Set up an Amazon Translate job.
- Initiate a post-processing workflow, as shown in Figure 3 (utilizing AWS Step Functions).
- Transfer the translation output to the output bucket.
- Send an Amazon SNS notification to update on job completion status.
- Download translated files back into the client’s environment.
In this scenario, translators work from their company’s internal infrastructure, although their TMS may also be cloud-hosted. They first gather translation input data from their TMS and upload the files to a shared file server. These files can encompass XLIFF, TMX, or CSV formats. AWS DataSync is employed to orchestrate and initiate the data transfer from on-premises to an Amazon S3 staging bucket. AWS DataSync offers several advantages:
- A low-code solution that manages the upload/download of translation data to/from AWS.
- Scheduling capabilities for both upstream and downstream synchronizations, optimizing translation job batching and cost efficiency for Amazon Translate.
- A centralized access point for translation data, minimizing the need for managing user accounts and permissions.
Once the files are uploaded into the input bucket, DataSync triggers an event through Amazon EventBridge, which invokes an AWS Lambda function that pushes a message to an Amazon SQS queue containing the list of files for the current batch. This decoupling of data upload from processing enhances scalability, service quota control, and error handling.
The queue activates another Lambda function that organizes the file structure in S3 for each translation job. Naming conventions can be utilized to differentiate jobs. This function also prepares the translation memory and custom terminology as necessary before creating and submitting the translation job.
Post-Processing AWS Step Functions Workflow
Amazon Translate is capable of generating events in EventBridge upon job completion or failure. This feature is leveraged to invoke a post-processing AWS Step Functions workflow. For instance, certain clients need to mark machine-translated segments within an XLIFF file for quick identification by their translators. The flow depicted in the state machine performs the following:
- Validates the output from Amazon Translate, ensuring all segments were successfully translated.
- Enhances the translation data by flagging machine-translated segments through a comparison of input and output.
- Copies output to a staging bucket in preparation for final upload.
- Sends SNS notifications to inform operators that the batch is complete.
This solution operates entirely serverless, alleviating the burden of infrastructure or software platform maintenance. This allows you to concentrate on business logic and what sets you apart from competitors.
As translation projects increase over time, you can leverage Amazon S3 storage classes to optimize document archiving. A translation service provider can establish specific rules per customer or project, with automatic configuration as data is copied into S3. Consequently, files can be transitioned to more cost-effective storage tiers with predefined retention periods.
Conclusion
In this article, we’ve presented a solution that automates the collection and transfer of translation data. It also facilitates the scheduling and orchestration of translation jobs, resulting in enhanced productivity, cost reduction, and expedited time-to-market. By employing AWS, you can minimize maintenance while creating a highly scalable and economical solution. Thanks to the AWS pay-as-you-go model, you can evaluate project costs, which can inform your pricing strategy, benefiting your customers. For more insightful information, you can also check out this excellent resource on Amazon’s warehouse workers training and automation.
To learn more about Amazon Translate and get started, you can explore these additional blogs:
Create a serverless pipeline to translate large documents with Amazon Translate and
How a company saved substantially on translation costs with Amazon Translate.
Leave a Reply