Developing and Implementing Custom Connectors for Amazon Redshift with Amazon Lookout for Metrics | Amazon IXD – VGT2 Las Vegas

Amazon Lookout for Metrics identifies anomalies in your time series data, uncovers their underlying causes, and allows for swift action. Built on the same technology that powers Amazon.com, Lookout for Metrics embodies two decades of expertise in outlier detection and machine learning (ML). For further insights on data considerations when setting up an anomaly detector, check out our related blog post here.

In this article, we will explore the process of building and deploying custom connectors for Amazon Redshift utilizing Lookout for Metrics.

Understanding Time Series Data

Time series data is essential for tracking and analyzing values as they fluctuate over time. A straightforward example includes monitoring stock prices over a set duration or keeping track of daily customer visits to a garage. Such metrics allow for the identification of trends and patterns, aiding in strategic decision-making. Lookout for Metrics facilitates the organization of crucial data into a tabular format, similar to a spreadsheet or database table, providing historical context for learning and ongoing data values.

Integrating Your Data with Lookout for Metrics

Since its inception, Lookout for Metrics has enabled data provision from several AWS services, including:

Amazon CloudWatch
Amazon Redshift
Amazon Relational Database Service (Amazon RDS)
Amazon Simple Storage Service (Amazon S3)

Additionally, it accommodates external data sources like Salesforce, Marketo, Dynatrace, ServiceNow, Google Analytics, and Amplitude through Amazon AppFlow. These connectors are designed for the continuous delivery of new data to Lookout for Metrics, which is essential for constructing a model for anomaly detection.

Native connectors present a quick starting point for integrating with CloudWatch, Amazon S3, and external services via Amazon AppFlow. They are especially effective for relational database management system (RDBMS) data when stored in a single table or for creating a procedure that maintains and populates that table.

Scenarios for Utilizing a Custom Connector

If you require additional flexibility, Lookout for Metrics custom connectors are the way to go. This option is particularly useful when your data necessitates an extract, transform, and load (ETL) process, such as merging multiple tables, converting values into a composite form, or executing complex post-processing before delivering data to Lookout for Metrics. If you are starting with RDBMS data and wish to provide historical samples for Lookout for Metrics to learn from, a custom connector is recommended. This approach allows for a substantial volume of historical data to be fed in first, bypassing cold start requirements and yielding a more refined model sooner.

In this article, we will employ Amazon Redshift as our RDBMS, but this methodology can be adapted for other systems. Use custom connectors in the following instances:

When your data is distributed across multiple tables.
If you need to execute more intricate transformations or calculations to fit the detector’s configuration.
To leverage all historical data for training your detector.

Conversely, you can opt for built-in connectors if:

Your data is confined to a singular table with information relevant to your anomaly detector.
You are comfortable using historical data while waiting for the cold start period to pass before initiating anomaly detection.

Solution Overview

All content discussed in this article is accessible in our GitHub repository. For this guide, we will assume your data resides in Amazon Redshift across several tables, and you wish to connect it with Lookout for Metrics for anomaly detection.

The following diagram outlines our solution architecture.

At a high level, we initiate with an AWS CloudFormation template that deploys the following components:

An Amazon SageMaker notebook instance to implement the custom connector solution.
An AWS Step Functions workflow, where the initial step conducts a historical crawl of your data, followed by a step that configures your detector (the trained model and endpoint for Lookout for Metrics).
An S3 bucket to store all your AWS Lambda functions (not shown in the architecture diagram).
Another S3 bucket to store both historical and continuous data.
A CloudFormation template and Lambda function to schedule data crawling.

To tailor this solution for your environment, update the following:

A JSON configuration template defining how your data should appear to Lookout for Metrics and the name of your AWS Secrets Manager location for authentication credentials.
A SQL query to retrieve your historical data.
A SQL query to retrieve your continuous data.

Upon modifying these components, you can deploy the template and be operational within an hour.

Deploying the Solution

To provide a comprehensive exploration of this solution, we offer a CloudFormation template that sets up a production-like Amazon Redshift cluster, preloaded with sample data for testing with Lookout for Metrics. This dataset simulates an e-commerce scenario, projecting roughly two years into the future from the date of this publication.

Creating Your Amazon Redshift Cluster

Deploy the provided template to establish the following resources within your account:

An Amazon Redshift cluster within a VPC.
Secrets Manager for authentication.
A SageMaker notebook instance to execute all setup processes for the Amazon Redshift database and initial data loading.
An S3 bucket designated for data loading into Amazon Redshift.

The following diagram illustrates the interaction of these components.

We supply Secrets Manager with credential information for your database, which is then passed to the lifecycle policy of the SageMaker notebook that activates on startup. Once booted, automation creates tables in your Amazon Redshift cluster and loads data from Amazon S3 for use with our custom connector.

To deploy these resources, follow these steps:

Choose Launch Stack.
Click Next.
Keep the stack details at their defaults and select Next again.
Leave the stack options at their defaults and hit Next once more.
Confirm that you acknowledge AWS CloudFormation may create IAM resources, then select Create stack.

The deployment process will take a few minutes. Monitor its progress through the AWS CloudFormation console.

When the status indicates CREATE_COMPLETE, you are set to deploy the rest of the solution.

Data Structure

We have segmented our standard e-commerce dataset into three specific tables to facilitate later joins via the custom connector. Typically, your data may be distributed across various tables and might require normalization in a similar fashion.

The first table captures the user’s platform, indicating the type of device users utilize, such as a phone or web browser.

ID	Name
1	pc_web

The subsequent table illustrates our marketplace, indicating user locations.

ID	Name
1	JP

For further insights on this topic, you can visit this resource which is highly regarded in the field. Additionally, if you’re interested in gaining a deeper understanding of the hiring process, this link is an excellent resource.

Located at Amazon IXD – VGT2, 6401 E Howdy Wells Ave, Las Vegas, NV 89115, we aim to provide the best solutions for your data needs.