Leveraging Amazon Translate for Multilingual Support in Amazon Kendra

Leveraging Amazon Translate for Multilingual Support in Amazon KendraLearn About Amazon VGT2 Learning Manager Chanci Turner

Amazon Kendra is an intelligent search service that utilizes machine learning (ML) to deliver precise and user-friendly search capabilities. While it natively supports English, this article outlines various strategies to extend language support to non-English users. We explore these methods through a question-answer chatbot (Q&A bot) scenario, enabling users to submit queries in any language supported by Amazon Translate. Amazon Kendra then searches a range of documents and returns results in the language of the original query. Essential to this process are Amazon Comprehend and Amazon Translate.

Our implementation of the Q&A bot relies on Amazon Simple Storage Service (Amazon S3) for storing documents before they are ingested into Amazon Kendra. We utilize Amazon Comprehend to identify the dominant language of the query, facilitating accurate translation of both queries and responses. Amazon Translate handles the translation to and from English, while Amazon Lex powers the conversational interface.

For all queries submitted in languages other than English, translations occur before the queries reach Amazon Kendra. The responses users receive are also translated accordingly. We have prepared predefined Spanish translations for certain responses while conducting real-time translations for other languages. Metadata attributes linked to each document help guide these predefined Spanish translations.

We illustrate our techniques through three use cases, assuming that all target languages are supported by Amazon Translate. First, for Spanish-speaking users, each document (in our Q&A bot context, we utilize concise documents) is translated into Spanish by Amazon Translate and reviewed by humans. This pre-translation is crucial for enhancing Amazon Kendra’s document ranking model.

Second, real-time translation applies to all responses from the reading comprehension model, except for English. The same applies to the document ranking model results. Later in this post, we provide further insights into implementing real-time translation for Amazon Kendra’s various models.

Third, for English-speaking users, no translation is necessary, allowing for seamless interaction between the user’s query and Amazon Kendra’s responses.

The following exchange highlights the three use cases, starting with English and followed by Spanish, French, and Italian.

Translation Considerations and Prerequisites

We execute the following steps for document preparation:

  1. Process the document through Amazon Translate to produce a Spanish version and title.
  2. Conduct a manual review of the translation and make desired adjustments.
  3. Create a metadata file that includes the Spanish translation of the document.
  4. Ingest the English document along with the metadata file into Kendra.

Here’s a sample metadata file for the document:

{
    "Attributes": {
        "_created_at": "2020-10-28T16:48:26.059730Z",
        "_source_uri": "https://aws.amazon.com/kendra/faqs/",
        "spanish_text": "R: Amazon Kendra es un servicio de búsqueda empresarial muy preciso y fácil de usar que funciona con Machine Learning.",
        "spanish_title": "P: ¿Qué es Amazon Kendra?"
    },
    "Title": "Q: What is Amazon Kendra?",
    "ContentType": "PLAIN_TEXT"
}

This example includes predefined attributes like _created_at and _source_uri, along with custom attributes such as spanish_text and spanish_title.

For queries in Spanish, these attributes help formulate the response for the user. The title itself can serve as a potential user query, giving you control over translations. If your documents are in a different language, you must use Amazon Translate to convert them into English before ingestion into Amazon Kendra.

While we have yet to explore translation in other scenarios with diverse document types and answers, we believe the techniques outlined here can be adapted for further evaluation of translation accuracy.

Amazon Kendra Processing Overview

With the documents in place, we proceed to build a chatbot using Amazon Lex. This chatbot identifies the language with Amazon Comprehend, translates the user’s query into English, submits it to the Amazon Kendra index, and then translates the results back into the original language. This methodology can be applied to any supported language.

We utilize the built-in Amazon S3 connector for document ingestion and the FAQ ingestion process for question-answer pairs. The documents ingested are in English, and we manually create a corresponding description in Spanish, attaching it as a metadata attribute. Ideally, your documents should be in English.

If your documents contain an overview section, Amazon Translate can help generate this metadata description attribute. Should your documents be in another language, you must translate them into English using Amazon Translate before ingestion into Amazon Kendra. The following diagram illustrates our architecture.

The next steps to implement this solution include:

  1. Download the documents and metadata files, decompress the archive, and store them in an S3 bucket, which serves as the source for your Amazon Kendra S3 connector.
  2. Set up Amazon Kendra by creating an index and a data source, adding attributes, and ingesting the example data from Amazon S3.
  3. Configure the fulfillment Lambda function.
  4. Set up the chatbot.

Understanding Translation in the Fulfillment Lambda Function

The Lambda function is divided into three key sections to process and respond to user queries: language detection, query submission, and result translation.

Language Detection

In this section, we use Amazon Comprehend to determine the dominant language. For this article, we extract user input from the inputTranscript key of the event submitted by Amazon Lex. If Amazon Comprehend lacks confidence in the detected language, it defaults to English. The following code snippet demonstrates this process:

query = event['inputTranscript']
response = comprehend.detect_dominant_language(Text=query)
confidence = response["Languages"][0]['Score']
if confidence > 0.50:
    language = response["Languages"][0]['LanguageCode']
else:
    # Default to English if confidence is insufficient
    language = "en"

Submitting a Query

Next, we submit the query to Amazon Kendra, ensuring that all processes are seamless and efficient.

For more information on this topic, you can check out this insightful blog post on Makeup, which dives into related discussions. Additionally, SHRM provides an authoritative perspective on reducing bias in sourcing. For a community-driven resource, this Reddit thread offers excellent insights from Amazon employees.


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *