Accelerate Translation Projects with a Fully Automated Translation System Assistant | Amazon VGT2 Las Vegas

Accelerate Translation Projects with a Fully Automated Translation System Assistant | Amazon VGT2 Las VegasMore Info

Translation and localization businesses, like many others, grapple with the demand for rapid turnaround times at minimal costs. To tackle this issue, organizations have increasingly turned to Machine Translation (MT), which employs automated software to translate text without human intervention. A notable innovation in this field is Active Custom Translation (ACT), which allows for the customization of translations to align with specific linguistic styles or terminology based on client requirements. Historically, companies had to construct custom models to integrate ACT into their translation systems. However, Amazon Translate now offers an Active Custom Translation feature that enables clients to seamlessly incorporate configurable MT capabilities without the need for extensive development.

This article outlines a comprehensive automated translation workflow, providing guidelines for managing data within the ACT process. The solution leverages Amazon Translate alongside various Amazon Web Services (AWS) such as AWS DataSync and AWS Lambda. Before delving into the architecture of this solution, let’s clarify some essential concepts pertinent to the translation and localization sector.

Fundamental Translation Concepts

  • Translation Memory: It is standard practice to reuse previously generated outputs as components for machine translation systems. This data, often referred to as Translation Memory, is stored and exchanged in standardized formats (TMX, TSV, or CSV).
  • Source Text: Translation input data is typically provided in XML Localization Interchange File Format (XLIFF) documents. Recently, Amazon Translate has added support for processing XLIFF documents in batches.

The standard translation flow, illustrated in Figure 1, depicts the integration of machine translation and translation memory. Once the output is reviewed and finalized, it becomes part of the organization’s intellectual property (IP), allowing it to be reintegrated into future translation jobs.

Overview of the Translation Assistant Solution

When utilizing Amazon Translate in batch mode, the following steps are essential:

  1. Compile and make translation input data accessible for the translation job.
  2. Monitor the processing and retrieval of the output.
  3. Implement necessary processes to connect your Translation Management System (TMS) with AWS.

As depicted in Figure 2, this process can involve numerous manual tasks including downloading large files, uploading them to Amazon Simple Storage Service (S3), and configuring jobs.

Translation Automation Activities Include:

  • Uploading input data for the translation job (source files, custom terminology, translation memory).
  • Initiating the preprocessing step to scan input files and identify language pairs.
  • Creating an Amazon Simple Queue Service (SQS) message for each language pair and translation project.
  • Setting up S3 buckets and prefixes for every translation job.
  • Creating and submitting an Amazon Translate job.
  • Initiating a post-processing workflow, as shown in Figure 3 (AWS Step Functions).
  • Copying the translation output to the designated output bucket.
  • Publishing an Amazon SNS notification to communicate job completion status.
  • Downloading translated files back into the customer’s environment.

In this architecture, translators operate from their organization’s internal infrastructure, although their TMS can also be cloud-based. They gather translation input data from their TMS and transfer the files to a shared file server. These files may be formatted as XLIFF, TMX, or CSV. We utilize AWS DataSync to orchestrate the data transfer from on-premises systems to an Amazon S3 staging bucket. AWS DataSync offers several advantages, including:

  • A low-code solution for managing the upload/download of translation data to/from AWS.
  • The ability to schedule synchronization for both upstream and downstream, optimizing the batching of translation jobs while controlling costs associated with Amazon Translate.
  • A centralized access point to translation data, minimizing the need to manage multiple user accounts.

Once the files are uploaded to the input bucket, DataSync triggers an event through Amazon EventBridge. This event activates an AWS Lambda function that sends a message to an Amazon SQS queue, listing the files to be translated in the current batch. SQS decouples the data upload from the actual processing, enhancing scalability, managing service quotas, and improving error handling.

The queue subsequently triggers another Lambda function that organizes a file hierarchy in S3 for each translation job. It employs naming conventions to differentiate jobs and prepares translation memory and custom terminology as necessary, before finally creating and submitting the translation job.

The Post-Processing AWS Step Functions Workflow

Amazon Translate can generate events in EventBridge upon job completion or failure. We leverage this capability to activate a post-processing AWS Step Functions workflow. For example, some clients require machine-translated segments within an XLIFF file to be flagged for manual review by translators.

The flow implemented in the state machine performs the following tasks (as shown in Figure 3):

  • Verifies the output of Amazon Translate, ensuring completeness and confirming that all segments were translated successfully.
  • Enriches the translation data by flagging machine-translated segments through comparison of input and output.
  • Copies the output to a staging bucket in preparation for the final upload.
  • Sends SNS notifications to inform operators that the batch is complete.

This solution is entirely serverless, allowing you to focus on your core business logic without the burden of maintaining infrastructure or software platforms. As translation projects increase over time, you can utilize Amazon S3 storage classes to optimize document archiving. Translation service providers can establish specific rules for clients or projects that can be automatically configured as data is transferred into S3. Consequently, files can be moved to more affordable storage tiers with predefined retention periods.

Conclusion

In this article, we’ve outlined a solution that automates the collection and transfer of translation data while facilitating the scheduling and orchestration of translation jobs. This leads to enhanced productivity, reduced costs, and quicker time-to-market. Leveraging AWS allows for decreased maintenance burdens and the creation of a scalable, cost-effective solution. Thanks to the AWS pay-as-you-go model, you can evaluate pricing per project, which can be incorporated into your pricing model and communicated to your customers. For further insights, check out another blog post here.

To learn more about optimizing your translation processes, visit this authority on the subject, or explore this excellent resource.


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *