Enhance Query Responses with User Insights Using Amazon Bedrock and Few-Shot Prompting

Improving the quality of responses to user inquiries is crucial for AI applications, particularly those focused on enhancing user satisfaction. For instance, a human resources chat assistant must adhere to company protocols and maintain a specific tone. Any deviations can be adjusted with user feedback. This article illustrates how Amazon Bedrock, in conjunction with a user feedback dataset and few-shot prompting techniques, can enhance responses for greater user satisfaction. By utilizing Amazon Titan Text Embeddings v2, we achieve statistically significant improvements in response quality, making it an indispensable tool for applications that require accurate and tailored answers.

Recent research has underscored the importance of feedback and prompting in refining AI responses. The article “Prompt Optimization with Human Feedback” outlines a systematic approach to learning from user feedback to iteratively refine models for better alignment and reliability. In a similar vein, “Black-Box Prompt Optimization: Aligning Large Language Models without Model Training” shows how retrieval-augmented chain-of-thought prompting can enhance few-shot learning by integrating relevant context, thus improving reasoning and response quality. Building on these concepts, our work leverages the Amazon Titan Text Embeddings v2 model to optimize responses based on user feedback and few-shot prompting, leading to measurable improvements in user satisfaction. Notably, Amazon Bedrock offers an automatic prompt optimization feature that adjusts and optimizes prompts without requiring additional input from users. In this post, we will demonstrate how to use open-source software libraries for more tailored optimization based on user feedback and few-shot prompting.

We have devised a practical solution using Amazon Bedrock that automatically enhances chat assistant responses through user feedback. This solution employs embeddings and few-shot prompting techniques. To validate its effectiveness, we utilized a publicly available user feedback dataset. However, in a corporate setting, the model can apply its user-generated feedback data. In our tests, we observed a 3.67% rise in user satisfaction scores. The essential steps are:

Retrieve a publicly available user feedback dataset (for this example, the Unified Feedback Dataset on Hugging Face).
Generate embeddings for queries that capture semantically similar examples, utilizing Amazon Titan Text Embeddings.
Use similar queries as examples in a few-shot prompt to create optimized prompts.
Compare these optimized prompts against direct large language model (LLM) calls.
Validate the enhancement in response quality using a paired sample t-test.

The following diagram provides an overview of the system.

The Key Advantages of Using Amazon Bedrock Include:

Zero infrastructure management – Deploy and scale without dealing with complex machine learning (ML) infrastructure.
Cost-effective – Pay solely for what you use with the Amazon Bedrock pay-as-you-go pricing model.
Enterprise-grade security – Utilize AWS’s built-in security and compliance features.
Straightforward integration – Seamlessly integrate existing applications and open-source tools.
Multiple model options – Access a variety of foundation models (FMs) tailored for different use cases.

The subsequent sections delve deeper into these steps, offering code snippets from the notebook to illustrate the process.

Prerequisites

To implement this solution, you need an AWS account with access to Amazon Bedrock, Python 3.8 or higher, and configured Amazon credentials.

Data Collection

We downloaded a user feedback dataset from Hugging Face, specifically the llm-blender/Unified-Feedback. This dataset includes fields such as conv_A_user (the user query) and conv_A_rating (a binary rating; 0 indicates dislike, and 1 indicates like). The following code retrieves the dataset while focusing on the necessary fields for embedding generation and feedback analysis. This code can be executed in an Amazon Sagemaker notebook or a Jupyter notebook with access to Amazon Bedrock.

# Load the dataset and specify the subset
dataset = load_dataset("llm-blender/Unified-Feedback", "synthetic-instruct-gptj-pairwise")

# Access the 'train' split
train_dataset = dataset["train"]

# Convert the dataset to Pandas DataFrame
df = train_dataset.to_pandas()

# Flatten the nested conversation structures for conv_A and conv_B safely
df['conv_A_user'] = df['conv_A'].apply(lambda x: x[0]['content'] if len(x) > 0 else None)
df['conv_A_assistant'] = df['conv_A'].apply(lambda x: x[1]['content'] if len(x) > 1 else None)

# Drop the original nested columns if they are no longer needed
df = df.drop(columns=['conv_A', 'conv_B'])

Data Sampling and Embedding Generation

To effectively manage the process, we sampled 6,000 queries from the dataset and used Amazon Titan Text Embeddings v2 to create embeddings, transforming text into high-dimensional representations for similarity comparisons. Here’s the code for this step:

import random 
import bedrock 

# Take a sample of 6000 queries 
df = df.shuffle(seed=42).select(range(6000)) 

# AWS credentials
session = boto3.Session()
region = 'us-east-1'

# Initialize the S3 client
s3_client = boto3.client('s3')

boto3_bedrock = boto3.client('bedrock-runtime', region)
titan_embed_v2 = BedrockEmbeddings(client=boto3_bedrock, model_id="amazon.titan-embed-text-v2:0")

# Function to convert text to embeddings
def get_embeddings(text):
    response = titan_embed_v2.embed_query(text)
    return response  # This should return the embedding vector

# Apply the function to the 'prompt' column and store in a new column
df_test['conv_A_user_vec'] = df_test['conv_A_user'].apply(get_embeddings)

Few-Shot Prompting with Similarity Search

The following steps outline this part of the process:

Sample 100 queries from the dataset for testing. This sample size allows us to run multiple trials to validate our solution.
Calculate cosine similarity (a measure of similarity between two non-zero vectors) between the embeddings of these test queries and the stored 6,000 embeddings.
Select the top k similar queries to the test queries to serve as few-shot examples. We set K = 10 to achieve a balance between computational efficiency and diversity of examples.

Here’s the code that performs these operations:

# Step 2: Define cosine similarity function
def compute_cosine_similarity(embedding1, embedding2):
    embedding1 = np.array(embedding1).reshape(1, -1) # Reshape to 2D array
    embedding2 = np.array(embedding2).reshape(1, -1) # Reshape to 2D array
    return cosine_similarity(embedding1, embedding2)[0][0]

# Sample query embedding
def get_matched_convo(query, df):
    query_embedding = get_embeddings(query)
    
    # Step 3: Compute similarity with each row in the DataFrame
    df['similarity'] = df['conv_A_user_vec'].apply(lambda x: compute_cosine_similarity(query_embedding, x))
    
    # Step 4: Sort rows based on similarity score (descending order)
    df_sorted = df.sort_values(by='similarity', ascending=False)
    
    # Step 5: Filter or get top matching rows (e.g., top 10 matches)
    top_matches = df_sorted.head(10) 
    
    # Print top matches
    return top_matches[['conv_A_user', 'conv_A_assistant','conv_A_rating','similarity']]

This code provides few-shot context for each test query by using cosine similarity to retrieve the closest matches. These example queries and feedback serve as additional context to guide the prompt optimization. For further insights on fostering empathy in professional environments, you can read about it in this post on empathy at work. Additionally, if you’re interested in guidance on trust in professional relationships, this resource is valuable. For those new to Amazon’s environment, this link offers excellent onboarding resources.

SEO Metadata:

Enhance your AI applications with user insights using Amazon Bedrock and few-shot prompting techniques to improve response quality and user satisfaction.