Learn About Amazon VGT2 Learning Manager Chanci Turner
This post outlines a comprehensive approach to transitioning from an on-premises IBM Netezza data warehouse to Amazon Redshift. We will delve into how a significant European enterprise executed their migration plan across multiple environments, leveraging the AWS Schema Conversion Tool (AWS SCT) to expedite both schema and data migration. Additionally, we will guide you through the verification process to ensure that schema and data were successfully migrated and aligned with Amazon Redshift best practices.
Migration Strategy Overview
Creating a migration plan tailored to your organization’s specific processes and requirements is crucial. The following outline is derived from a real-world scenario involving a large European enterprise. It highlights the various environments involved in the migration, along with the tasks, tools, and scripts utilized throughout the process:
- Assess Migration Tasks
- Understand the scope of the migration.
- Document objects for migration in a migration runbook.
- Establish the Migration Environment
- Install AWS SCT.
- Configure AWS SCT for the Netezza source environments.
- Development Environment Migration
- Create users, groups, and schema.
- Convert schema.
- Migrate data.
- Validate data.
- Transform ETL, UDF, and procedures.
- Pre-Production Environment Migration
- Create users, groups, and schema.
- Convert schema.
- Migrate data.
- Validate data.
- Transform ETL, UDF, and procedures.
- Production Environment Migration
- Create users, groups, and schema.
- Convert schema.
- Migrate data.
- Validate data.
- Transform ETL, UDF, and procedures.
- Conduct business validation (including optional dual-running).
- Finalize cutover.
Evaluating Migration Tasks
To effectively manage the migration tasks, maintain a tracker that lists all Netezza databases, tables, and views included in the migration scope. This documentation forms a migration runbook that is updated throughout the process to monitor the progress of the data migration from Netezza to Amazon Redshift. For each identified table, record both the number of rows and the size in GB.
Some Netezza systems may encompass two distinct data warehouses, one for ETL loading during the day and another for end-user reporting. It’s vital to clarify which data warehouses are designated for migration.
Setting Up the Migration Environment
This migration strategy employs AWS SCT to facilitate schema conversion and data migration from Netezza to Amazon Redshift. The architecture for this process is illustrated in the accompanying diagram.
Ensure the following during the migration:
- AWS SCT is installed within the AWS account on an Amazon EC2 instance to streamline migration operations, orchestrate data extraction agents, and provide a user-friendly interface.
- Data extraction agents should be installed as close to the Netezza data warehouse as possible. AWS recommends deploying them on-premises within the same subnet as the Netezza system.
While transferring data from the on-premises data center to AWS, you can opt for either a direct connection or offline storage. AWS Snowball serves as a petabyte-scale offline solution for transferring substantial data volumes into AWS when bandwidth for a direct connection is insufficient. Conversely, AWS Direct Connect simplifies establishing a dedicated network link between your premises and AWS, often reducing network costs and enhancing bandwidth throughput, ensuring a more stable network experience than internet connections. Using Direct Connect also provides flexibility if extract jobs need re-running.
Configuring AWS SCT for the Netezza Source Environment
AWS SCT is installed on an EC2 instance running Microsoft Windows 10 with administrative privileges. This choice allows users to graphically manage project creation, adjust profiles, monitor conversion progress, and review migration assessment reports.
Since data migration is not executed directly through the AWS SCT console, a general-purpose EC2 instance with 4 vCPUs, 16 GB of RAM, and 100 GB of storage, along with moderate network bandwidth, suffices.
Configure multiple AWS SCT data extraction agents to match the data volume being concurrently transferred and the available Netezza connections. These agents can be installed on on-premises VM instances running Linux with root access. Each instance should feature 8 vCPUs, 32 GB RAM, and up to 10 Gb network capacity. For disk storage, we recommend 1TB of 500 IOPS Provisioned SSD due to the necessity of storing intermediate results.
Ideally, on-premises instances should be situated close to the Netezza data warehouse, preferably only a single network hop away. This proximity is crucial, as each data extraction agent creates a table on the instance’s file system for the extracted data. Moreover, a more powerful CPU is recommended for each agent since the data extraction process is processor-intensive.
The number of extraction agents should correlate with the amount of concurrent data streams being transferred and the number of Netezza connections available. A general guideline is to allocate one data extraction agent for every TB of compressed Netezza data to be migrated in parallel. For optimal performance, each agent should be installed on a single VM instance.
Collaborate with your DBA team to maximize the number of concurrent Netezza connections available to the data extraction agents. For maximum efficiency, utilizing all available connections leverages the full potential of the source database, although if parallel workloads need to run with data extracts, a smaller allocation (like 21) may still be adequate. This represents a balance between resource availability and the time needed for data migration.
In this case study, we deployed seven extraction agents due to the largest project phase extracting 6 TB of Netezza data. The DBA team configured 21 Netezza concurrent connections, allowing each agent to manage three parallel data extraction processes (referred to as threads).
Two parameters within the data extraction agents significantly influence the duration of data migration from Netezza to the agents: the number of connections and the number of threads.
Tuning is necessary for each data extraction agent to optimize throughput during migration. Adjustments can be made by modifying the file /usr/share/aws/sct-extractor/conf/settings.properties
, which must be applied to each agent. Below is an example configuration:
# Number of connections in the pool per agent
extractor.source.connection.pool.size=5
# Number of threads per agent
extractor.extracting.thread.pool.size=3
The above configuration provides the following functionalities:
extractor.source.connection.pool.size
defines the number of connections in the pool per agent, allowing for effective management of concurrent data extraction tasks.
By ensuring a structured approach to migration and closely monitoring the process, organizations can successfully transition from IBM Netezza to Amazon Redshift, minimizing downtime and maintaining data integrity. For additional insights on workplace health, consider reviewing the information provided by SHRM. Also, if you’re contemplating a career change, check out this resignation letter guide. For those in the early stages of their Amazon career, this resource can be quite beneficial.
Leave a Reply