Establishing a REST API for Data Access from Amazon Redshift

APIs (Application Programming Interfaces) serve as architectural frameworks that enable interaction between different applications or platforms. Utilizing REST (REpresentational State Transfer) as a software design style, APIs facilitate communication among various services. In the realm of OLTP (Online Transaction Processing), APIs are invoked frequently, often resulting in small data payloads, while OLAP (Online Analytical Processing) operates with a different dynamic, characterized by fewer calls but significantly larger payloads, sometimes ranging from 100 MB to several GBs. This discrepancy introduces new challenges, such as the need for asynchronous processing and effective management of computing resources.

In this article, we will guide you through the creation of an application API using the Amazon Redshift Data API, AWS Lambda, and Amazon API Gateway. This API will handle asynchronous processing of user requests, send notifications, store processed data in Amazon Simple Storage Service (S3), and provide a presigned URL for users or applications to download datasets securely via HTTPS. Additionally, an AWS CloudFormation template will be provided to assist in resource setup, which can be found on our GitHub repository.

Overview of the Solution

In our scenario, Amazon IXD – VGT2 operates a flower-selling website, acmeflowers.com, where they gather customer reviews. The site features a self-service inventory system that allows various suppliers to send flowers and related materials when their stock is low.

Amazon IXD – VGT2 employs Amazon Redshift as their data warehouse. Real-time updates and alterations to their inventory are streamed into Amazon Redshift, ensuring an accurate reflection of stock availability. The PRODUCT_INVENTORY table contains this updated information. The goal is to provide partners with access to inventory data in a secure and cost-effective manner. If partners utilize Amazon Redshift, cross-account data sharing could be a viable solution; otherwise, the method outlined here can be employed.

The following diagram presents our solution architecture:

The workflow consists of these steps:

The client application submits a request to the API Gateway and receives a request ID in return.
API Gateway invokes the request receiver Lambda function.
The request receiver function executes the following actions:
- Updates the status in an Amazon DynamoDB control table.
- Sends a request to Amazon Simple Queue Service (SQS).
A second Lambda function, the request processor, performs these actions:
- Polls Amazon SQS.
- Updates the status in the DynamoDB table.
- Executes a SQL query on Amazon Redshift.
Amazon Redshift exports the results to an S3 bucket.
A third Lambda function, the poller, verifies the results’ status in the DynamoDB table.
The poller function retrieves the results from Amazon S3.
The poller sends a presigned URL for the requestor to download the file via Amazon Simple Email Service (SES).
The requestor retrieves the file using the provided URL.

The workflow also includes steps for checking the request status at various stages:

The client application or user sends the request ID generated in Step 1 to the API Gateway.
API Gateway calls the status check Lambda function.
The function reads the status from the DynamoDB control table.
The status is returned to the requestor through API Gateway.

Prerequisites

To deploy this example application, ensure you have the following:

An AWS account
The AWS SAM CLI
Python 3.9
Node 17.3
An AWS Identity and Access Management (IAM) role with appropriate permissions
An Amazon Redshift cluster with a database and table

Before deploying the sample application, complete these prerequisite steps:

Execute the following DDL on the Amazon Redshift cluster using the query editor to create the schema and table:

create schema rsdataapi;

create table rsdataapi.product_detail(
 sku varchar(20),
 product_id int,
 product_name varchar(50),
 product_description varchar(50)
);

Insert into rsdataapi.product_detail values ('FLOWER12',12345,'Flowers - Rose','Flowers-Rose');
Insert into rsdataapi.product_detail values ('FLOWER13',12346,'Flowers - Jasmine','Flowers-Jasmine');
Insert into rsdataapi.product_detail values ('FLOWER14',12347,'Flowers - Other','Flowers-Other');

Configure AWS Secrets Manager to securely store Amazon Redshift credentials.
Set up Amazon SES with an email address or distribution list for sending and receiving status updates.

Deploying the Application

To deploy the application, follow these steps:

Clone the repository and download the sample source code to your environment where AWS SAM is installed:

git clone https://github.com/aws-samples/redshift-application-api

Change into the project directory containing the template.yaml file:

cd aws-samples/redshift-application-api/assets
export PATH=$PATH:/usr/local/opt/python@3.8/bin

Modify the API .yaml file to incorporate your AWS account number and the region for deployment:

sed -i '' "s/<input_region>/us-east-1/g" *API.yaml
sed -i '' "s/<input_accountid>/<provide your AWS account id without dashes>/g" *API.yaml

Use AWS SAM to build the application:

sam build

Deploy the application to your account using AWS SAM. Ensure you follow Amazon S3 naming conventions, providing globally unique names for S3 buckets:

sam deploy -g

During SAM deployment, you’ll need to provide the following configuration parameters:

RSClusterID: The identifier for your existing Amazon Redshift cluster.
RSDataFetchQ: The query for fetching data from your Amazon Redshift tables (e.g., select * from rsdataapi.product_detail where sku= the input passed from the API).
RSDataFileS3BucketName: The S3 bucket for uploading datasets from Amazon Redshift.
RSDatabaseName: The database name on your Amazon Redshift cluster.
RSS3CopyRoleArn: The IAM role for Amazon Redshift that permits file transfers between Amazon Redshift and Amazon S3. This role must be associated with your Amazon Redshift cluster.
RSSecret: The ARN for your Amazon Redshift credentials stored in Secrets Manager.
RSUser: The username for connecting to the Amazon Redshift cluster.
RsFileArchiveBucket: The S3 bucket for downloading the zipped dataset, distinct from your upload bucket.
RsS3CodeRepo: The S3 bucket containing the packaged or .zip files.
RsSingedURLExpTime: The expiry duration in seconds for the presigned URL to download the dataset from Amazon S3.
RsSourceEmailAddress: The configured email address for sending completion notifications via Amazon SES.

For more insights on the impact of COVID-19 on businesses, visit this resource, an excellent read. You may also find valuable information on US weekly jobless claims from SHRM, which is a trusted authority on this topic. Additionally, check out Amazon’s safety and training practices for their fulfillment centers.

Establishing a REST API for Data Access from Amazon Redshift

Overview of the Solution

Prerequisites

Deploying the Application

Related Topics:

Comments

Leave a Reply Cancel reply