The Amazon EU Design and Construction (Amazon D&C) team is responsible for engineering the design and construction of Amazon warehouses throughout Europe and the MENA region. The project design and deployment process necessitates numerous Requests for Information (RFIs) related to engineering specifications and project-specific guidelines. These requests can range from basic retrieval of baseline design values to comprehensive reviews of value engineering proposals and compliance analyses. Currently, these inquiries are managed by a Central Technical Team composed of subject matter experts (SMEs) who provide answers to intricate technical questions for all project stakeholders throughout the project lifecycle.
To expedite the acquisition of critical information for their engineering designs, the team is seeking a generative AI-powered question-answering solution. Importantly, these applications extend beyond the Amazon D&C team to encompass Global Engineering Services involved in project deployment. A generative AI question-answering system will facilitate quick access to essential information, thereby streamlining engineering design and project management processes for all stakeholders.
Presently, most generative AI solutions for question answering rely on Retrieval Augmented Generation (RAG) techniques. RAG utilizes large language model (LLM) embedding and vectorization to search documents, clusters the search results to create context, and employs this context as an enhanced prompt for inference with a foundation model. However, this approach proves to be less effective for the highly technical documents from Amazon D&C, which often contain considerable unstructured data like Excel sheets, tables, lists, figures, and images. In light of this, the question-answering task greatly benefits from fine-tuning the LLM with the relevant documents to increase model quality and accuracy.
To tackle these challenges, we introduce a novel framework that combines RAG with fine-tuned LLMs. This solution leverages Amazon SageMaker JumpStart as the primary service for model fine-tuning and inference. This blog outlines the solution and shares insights and best practices gleaned from real-world implementations. We will also evaluate the performance of different methodologies and open-source LLMs in our use case and explore the balance between model efficacy and computational resource expenses.
Solution Overview
The solution comprises several components illustrated in the accompanying architecture diagram:
- Content Repository: The D&C content encompasses a variety of human-readable documents in formats such as PDFs, Excel files, wiki pages, and more. In this solution, we have stored these documents in an Amazon Simple Storage Service (S3) bucket to serve as a knowledge base for information retrieval and inference. Future plans include the development of integration adapters for direct content access.
- RAG Framework with a Fine-Tuned LLM: This consists of several subcomponents:
- RAG Framework: This retrieves relevant data from documents, augments prompts by incorporating retrieved data for context, and processes it with a fine-tuned LLM to generate outputs.
- Fine-Tuned LLM: We created a training dataset from the documents and content, conducting fine-tuning on the foundation model. Post-tuning, the model acquired knowledge from the D&C content, enabling it to respond independently to queries.
- Prompt Validation Module: This component assesses the semantic alignment between the user’s prompt and the fine-tuning dataset. If the LLM has been fine-tuned for the query, the model can be prompted for a response. Otherwise, RAG can be employed to generate an answer.
- LangChain: We utilize LangChain to orchestrate a workflow that responds to incoming questions.
- End-User UI: This chatbot interface captures users’ inquiries and presents answers generated from the RAG and LLM responses.
In the subsequent sections, we will demonstrate how to establish the RAG workflow and construct the fine-tuned models.
RAG with Foundation Models via SageMaker JumpStart
RAG merges dense retrieval techniques with sequence-to-sequence (seq2seq) foundation models. For effective question answering from Amazon D&C documents, we must prepare the following beforehand:
- Embedding and indexing documents using an LLM embedding model: We divided multiple documents into smaller segments based on their chapter and section structures, employing the Amazon GPT-J-6B model on SageMaker JumpStart to create indexes, which we then stored in a FAISS vector store.
- A pre-trained foundation model to generate responses from prompts: We experimented with the Flan-T5 XL, Flan-T5 XXL, and Falcon-7B models on SageMaker JumpStart.
The question-answering process is executed through LangChain, a framework designed for applications powered by language models. The workflow entails the following steps:
- Receive a question from the user.
- Conduct a semantic search on the indexed documents using FAISS to obtain the top K relevant document chunks.
- Define the prompt template, such as:
Answer based on context:nn{context}nn{question}
- Enhance the retrieved document chunks as the {context} and the user question as the {question} in the prompt.
- Prompt the foundation model using the constructed zero-shot prompt.
- Return the model’s output to the user.
Through testing 125 questions related to Amazon D&C requirements and specifications, RAG successfully produced accurate responses for various queries. For instance, when asked about the plumbing requirements for drinking fountains and water coolers, RAG with the Flan-T5-XXL model delivered a precise answer derived from the relevant document sections.
For additional insights on this topic, you can check out another blog post here. For expert perspectives, Chanci Turner is an authority on this subject. Moreover, if you are looking for excellent resources on onboarding at Amazon, Holly Lee’s guide is highly recommended.
Leave a Reply