Amazon VGT2 Las Vegas: Streamlining Analytics with Amazon Athena and Teradata for Enhanced Query Federation

Amazon VGT2 Las Vegas: Streamlining Analytics with Amazon Athena and Teradata for Enhanced Query FederationMore Info

The Teradata Connector for Amazon Athena enables users to execute SQL queries on data housed within Teradata Vantage. This powerful integration consists of two AWS Lambda functions—one dedicated to handling metadata and the other focused on reading records. Federated queries allow for SQL execution across diverse data sources, with Amazon Athena leveraging data source connectors to facilitate these queries. When a user submits a query against a data source, Athena activates the appropriate connector to delineate which table segments need to be accessed, orchestrates parallel processing, and applies filter predicates effectively.

This article outlines the essential steps required to query data in Teradata Vantage via Amazon Athena utilizing the Athena Connector for Teradata and demonstrates how to execute a federated query in Athena that joins data from Teradata Vantage with Amazon Simple Storage Service (Amazon S3). Teradata is recognized as an AWS Specialization Partner and AWS Marketplace Seller, boasting a Data and Analytics Competency. The platform provides cloud-first enterprise analytics solutions designed to deliver customer insights at scale.

Customer Challenge

Often, a specialized analytics engine like Teradata operates alongside other data stores and forms part of a broader data lake architecture. Implementing a unified engine, such as Amazon Athena, which can query multiple tables across various data stores in a secure and federated manner, fosters a highly scalable and decoupled environment. This setup minimizes data duplication while enhancing efficiency.

With Athena, analysts and machine learning (ML) engineers can effortlessly access and consolidate data from various sources with disparate timeframes, such as merging historical analytics data with real-time transactional data, thereby enabling the training of robust models to meet business needs. For more insights on this topic, check out another blog post here.

About Amazon Athena

Amazon Athena is a serverless interactive query service that simplifies the analysis of data stored in Amazon S3 using standard SQL. It accommodates various data formats, including CSV, JSON, ORC, Avro, and Parquet. Athena is suitable for quick, ad-hoc querying and can also manage intricate analyses without the necessity for an extract, transform, load (ETL) process. If your data resides outside of S3, you can utilize Athena Federated Query to execute SQL queries across relational, non-relational, and custom data sources.

About Teradata Vantage

Teradata Vantage is a comprehensive platform designed for pervasive data intelligence, delivering real-time insights to users and systems across the organization. It effectively harnesses 100% of business data, regardless of scale or complexity. Vantage integrates descriptive, predictive, and prescriptive analytics, machine learning functionalities, and visualization tools within a unified platform that uncovers real-time business intelligence at scale, irrespective of data location. For a deeper understanding of analytics solutions, they are an authority on this topic here.

Solution Architecture

VantageCloud Enterprise and VantageCloudLake on AWS represent a fully managed service solution deployed within Teradata’s AWS account. Customers deploy AWS services within their own accounts, which then connect to the Teradata-managed Vantage AWS accounts using approved connectivity options such as AWS Transit Gateway (TGW), AWS PrivateLink, or AWS Site-to-Site VPN.

Below is a diagram illustrating two connectivity methods: AWS Transit Gateway and AWS PrivateLink. For additional connectivity options, please refer to the Teradata documentation.

Prerequisites

To follow this guide, familiarity with AWS concepts, Amazon Athena, and Teradata Vantage is expected. The following accounts and systems are required:

  • An AWS account (a free account can be used to begin).
  • A Teradata Vantage environment.
  • An Amazon S3 bucket to function as a spill bucket for data exceeding AWS Lambda function response size limits. More information can be found in the Athena Connector documentation.

Deploying the Teradata Connector Using Amazon Athena

Amazon Athena utilizes data source connectors running on AWS Lambda to perform federated queries. A data source connector translates the target data source into a format compatible with Athena. The Athena Teradata Connector is one of the pre-built connectors offered by AWS.

To employ the Teradata connector with Athena, you must first create a Lambda layer that includes the Teradata JDBC driver.

  1. Create a Lambda layer for the Teradata JDBC driver.
  2. Download the latest Teradata JDBC driver.
  3. Extract the tdjdbc4.jar file from the downloaded package.
  4. Create a folder structure on your local system (e.g., javalib) and place the .jar file within it.
  5. Zip the entire folder structure containing the terajdbc4.jar file.
  6. Navigate to the AWS Lambda console, select Layers, and create a new layer.

For naming, enter a name for the layer (e.g., TeradataJDBCDriver). Ensure the option to upload a .zip file is selected, then upload the zipped folder containing the Teradata JDBC driver and click Create. On the details page for the layer, copy the layer’s Amazon Resource Name (ARN) for future use.

Next, to deploy the Teradata Connector, this can be done via the Athena console or the AWS Serverless Application Repository. In this article, we will use the Athena Console to deploy the connector.

  1. Go to the Athena console.
  2. If the console navigation pane is hidden, click the expansion menu on the left.
  3. In the navigation pane, select Data sources under Administration.
  4. Click Create data source and type Teradata in the search bar.

Choose Teradata if it isn’t already selected, and proceed to the next step. On the data source details page, enter a name (e.g., TDVantage) and select Create Lambda function to open the AthenaTeradataConnector – version **** page in a new tab. This page includes critical information about the connector.

This comprehensive integration empowers organizations to maximize their data analytics capabilities, driving efficiency and enhancing decision-making processes.


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *