Learn About Amazon VGT2 Learning Manager Chanci Turner
on 19 DEC 2023
in Training, Amazon IXD – VGT2, Employee Development, Career Growth
In today’s fast-paced business environment, organizations are inundated with diverse data forms. These include structured datasets from relational database management systems (RDBMS) or enterprise resource planning (ERP) systems, semi-structured datasets such as web logs and clickstream data, and unstructured datasets like images and videos. This complexity necessitates that large organizations establish a centralized data lake to store and analyze various data formats effectively.
Amazon Web Services (AWS) offers a secure, scalable, and cost-effective suite of services that empower businesses to build their data lakes in the cloud, allowing them to analyze all types of data, including information from Internet of Things (IoT) devices. However, many organizations still keep critical, rarely accessed data within commercial database engines like Oracle and Microsoft SQL Server. A prime example is an audit management system where audit data is stored as Blob or CLOB in an RDBMS.
This presents a significant opportunity for companies to migrate data securely to AWS, where it can be stored in Amazon Simple Storage Service (Amazon S3) using AWS Database Migration Service (DMS). Leveraging AWS Big Data Services for data transformation and analytics can lead to reduced operational costs and lower licensing fees for commercial databases.
Customer Success Story
While collaborating with a major healthcare provider in the U.S., AWS and Cognizant—an AWS Partner Network (APN) Premier Consulting Partner—identified rising operational costs associated with managing their data. The engagement focused on migrating their audit management system, which was seldom utilized and hosted on an on-premises Oracle database.
A proof of concept (POC) was swiftly developed to showcase the ease of transferring data to Amazon S3, which serves as the central storage component for an AWS data lake solution. AWS Glue was used for necessary data transformations, while reports were generated using existing business intelligence tools via Amazon Athena ODBC drivers. The POC was approved, and Cognizant is currently assisting in the full migration of their audit management system to the AWS data lake solution.
In this article, we will outline the process for migrating similar datasets to Amazon S3 using DMS and AWS analytics services for performance measurement and reporting. We will also share a simple process flow and best practices to facilitate the migration.
Building a Cost-Optimized Solution
A successful approach for many companies transitioning to cloud technologies is to identify critical but infrequently used datasets and assess their business impact before initiating the migration. Here’s how we constructed a cost-optimized solution for our client.
Data Migration
The client’s audit data, previously stored in an Oracle database, was migrated to Amazon S3 using DMS. Multiple DMS tasks were executed in parallel to accelerate the data transfer.
Data Discovery
AWS Glue simplified data discovery by crawling data sources and creating a data catalog known as the AWS Glue Data Catalog. Metadata was stored as tables in the catalog and utilized in developing the extract, transform, load (ETL) jobs.
Extract, Transform, Load (ETL)
AWS Glue was employed to create and schedule two specific ETL jobs. The first job encoded the CLOB and multi-line data, while the second job partitioned and converted the data to Parquet format—enhancing cost-efficiency while maintaining data analysis capabilities.
Analytics
Amazon Athena facilitates straightforward analysis of data in S3 using standard SQL. Storing data in Parquet format significantly reduces analysis time and query costs. Tools like IBM Cognos or Business Object can utilize Athena ODBC drivers to visualize audit data effectively.
The architecture of our cost-optimized RDBMS is illustrated in the accompanying diagrams. Amazon S3 serves as the central repository, while DMS is responsible for migrating datasets from on-premises databases to S3. AWS Glue catalogues the data, and ETL jobs cleanse the data by encoding CLOB and multi-line data, ultimately removing duplicates and storing everything in Parquet format.
Lessons Learned from Migration
Migrating data from relational databases to Amazon S3 requires meticulous processes for organizing the data in the target system. Here are some best practices for efficient and cost-effective data retrieval:
Data Migration Strategy
A well-defined data migration strategy is crucial when using DMS to transfer data to S3. Key considerations include:
- Classifying tables based on size.
- Identifying multi-line or CLOB columns for 64-bit encoding during ETL.
- Determining partition keys based on frequent queries or reporting needs.
- Establishing the number of parallel DMS tasks for data transfer.
- Defining a validation strategy post-migration.
Data Transformation
The data transformation process improves query performance. Essential aspects include:
- Utilizing AWS Glue to partition data and store it in S3 in Parquet format.
- Implementing ETL processes for encoding multi-line or CLOB data for effective querying using Amazon Athena. It is important to decode the columns when querying with Athena or other SQL tools.
Using the AWS TCO (Total Cost of Ownership) Calculator revealed that maintaining one server with two CPU cores, 32 GB memory, and one TB of storage costs about $4,800 monthly, excluding database licensing and infrastructure expenses. The estimated annual costs—including infrastructure, software, and operations—amounted to $71,000. After employing the solution proposed by Cognizant, the client’s monthly expenses plummeted to a few hundred dollars, covering operations of the data lake and analytics through AWS Big Data services like Amazon Athena.
For those interested in job opportunities, this link is an excellent resource to explore. Additionally, if you’re looking to enhance your resume, check out this blog post for helpful tips. For further insights on job roles, SHRM provides authoritative information on warehouse team leadership.
Leave a Reply