Learn About Amazon VGT2 Learning Manager Chanci Turner
In Advanced (300), Amazon Redshift, Analytics, Intermediate (200), Technical How-to
Amazon Redshift is a fully managed service designed for data lakes, analytics, and data warehousing, catering to startups to large enterprises. It is employed by thousands of businesses worldwide to modernize their data analytics systems. Conversely, Greenplum is an open-source, massively parallel database primarily utilized for analytics on on-premises infrastructure and is based on the PostgreSQL database engine.
Migrating from Greenplum to Amazon Redshift has become an appealing choice for many customers, mainly to avoid the complexities of managing on-premises Greenplum. Some key advantages include:
- The chance to modernize the data lake and warehouse environment.
- Access to additional AWS services like Amazon S3, Amazon CloudWatch, Amazon EMR, Amazon SageMaker, and more.
Despite both platforms using the PostgreSQL database engine, the migration process demands careful planning and manual intervention. This article focuses on essential functions and considerations for code conversion from Greenplum to Amazon Redshift, particularly regarding procedures, functions, and views.
Migration Overview
The AWS Database Migration Service (AWS DMS) and the AWS Schema Conversion Tool (AWS SCT) facilitate the migration of most objects from Greenplum to Amazon Redshift. However, code conversion teams may encounter errors and warnings when creating views, procedures, and functions in Amazon Redshift. In such cases, manual code conversion is necessary.
This article will specifically address how to manage:
- Arrays
- Dates and timestamps
- Regular expressions (regex)
For this discussion, we refer to Greenplum 4.3 and Amazon Redshift PostgreSQL 8.2.
Working with Array Functions
The AWS SCT does not convert array functions during the migration from Greenplum or PostgreSQL to Amazon Redshift. Developers must convert these functions manually. This section highlights some common array functions:
- ARRAY_UPPER
- JSON_EXTRACT_ARRAY_ELEMENT_TEXT and JSON_ARRAY_LENGTH
- UNNEST()
- STRING_AGG()
- ANY ARRAY()
ARRAY_UPPER()
This function retrieves the upper bound of an array, which allows for extracting the nth element from an array in PostgreSQL or Greenplum. The following Greenplum code illustrates this:
WITH temp1 AS (
SELECT 'John' AS FirstName, 'Smith' AS LastName,
array['"111-222-3333"', '"101-201-3001"', '"XXX-YYY-ZZZZ"', 'NULL'] AS PhoneNumbers
UNION ALL
SELECT 'Bob' AS FirstName, 'Haris' AS LastName,
array['222-333-4444', '201-301-4001', 'AAA-BBB-CCCC'] AS PhoneNumbers
UNION ALL
SELECT 'Mary' AS FirstName, 'Jane' AS LastName,
array['333-444-5555', '301-401-3001', 'DDD-EEE-FFFF'] AS PhoneNumbers
)
SELECT FirstName, PhoneNumbers[ARRAY_UPPER(PhoneNumbers, 1)];
Amazon Redshift lacks a direct function to extract array elements, but two JSON functions can be utilized for this:
WITH temp1 AS (
SELECT 'John' AS FirstName, 'Smith' AS LastName,
array['"111-222-3333"', '"101-201-3001"', '"XXX-YYY-ZZZZ"'] AS PhoneNumbers
UNION ALL
SELECT 'Bob' AS FirstName, 'Haris' AS LastName,
array['"222-333-4444"', '"201-301-4001"', '"AAA-BBB-CCCC"'] AS PhoneNumbers
UNION ALL
SELECT 'Mary' AS FirstName, 'Jane' AS LastName,
array['"333-444-5555"', '"301-401-3001"', '"DDD-EEE-FFFF"'] AS PhoneNumbers
)
SELECT
FirstName,
('[' + array_to_string(PhoneNumbers, ',') + ']') AS JSONConvertedField,
JSON_EXTRACT_ARRAY_ELEMENT_TEXT(
'[' + array_to_string(PhoneNumbers, ',') + ']',
JSON_ARRAY_LENGTH('[' + array_to_string(PhoneNumbers, ',') + ']') - 1
) AS LastElementFromArray
FROM temp1;
UNNEST()
The UNNEST() function in PostgreSQL expands an array into a set of rows, enhancing the performance of numerous records for inserts, updates, and deletes. In Greenplum, it is used as follows:
SELECT 'A', unnest(array([1, 2]));
In Amazon Redshift, while the UNNEST function is unsupported, a workaround can be implemented:
WITH temp1 AS (
SELECT 'John' AS FirstName, 'Smith' AS LastName,
'111-222-3333' AS MobilePhone, '101-201-3001' AS HomePhone
UNION ALL
SELECT 'Bob' AS FirstName, 'Haris' AS LastName,
'222-333-4444' AS MobilePhone, '201-301-4001' AS HomePhone
UNION ALL
SELECT 'Mary' AS FirstName, 'Jane' AS LastName,
'333-444-5555' AS MobilePhone, '301-401-3001' AS HomePhone
),
ns AS (
SELECT row_number() OVER(ORDER BY 1) AS n FROM pg_tables
)
SELECT
FirstName,
LastName,
split_part('Mobile,Home', ',', ns.n::int) AS PhoneType,
split_part(MobilePhone || '&&' || HomePhone, '&&', ns.n::int) AS PhoneNumber
FROM temp1, ns
WHERE ns.n <= regexp_count('Mobile,Home', ',') + 1
ORDER BY 1, 2, 3;
When elements of an array are themselves arrays, you can utilize the JSON_EXTRACT_ARRAY_ELEMENT_TEXT() function along with JSON_ARRAY_LENGTH:
WITH ns AS (
SELECT row_number() OVER(ORDER BY 1) AS n FROM pg_tables
)
SELECT JSON_EXTRACT_ARRAY_ELEMENT_TEXT('["arrayelement1", "arrayelement2"]', ns.n - 1)
FROM ns
WHERE ns.n <= JSON_ARRAY_LENGTH('["arrayelement1", "arrayelement2"]');
STRING_AGG()
The STRING_AGG() function aggregates a list of strings with a specified separator, excluding a trailing separator. Here’s how it is employed in Greenplum:
WITH temp1 AS (
SELECT 'Finance'::text AS Dept, 'John'::text AS FirstName, 'Smith'::text AS LastName
UNION ALL
SELECT 'Finance'::text AS Dept, 'John'::text AS FirstName, 'Doe'::text AS LastName
UNION ALL
SELECT 'Finance'::text AS Dept, 'Mary'::text AS FirstName, 'Jane'::text AS LastName
UNION ALL
SELECT 'Marketing'::text AS Dept, 'Bob'::text AS FirstName, 'Smith'::text AS LastName
UNION ALL
SELECT 'Marketing'::text AS Dept, 'Steve'::text AS FirstName, 'Smith'::text AS LastName
UNION ALL
SELECT 'Account'::text AS Dept, 'Phil'::text AS FirstName, 'Adams'::text AS LastName
UNION ALL
SELECT 'Account'::text AS Dept, 'Jim'::text AS FirstName, 'Smith'::text AS LastName
);
In conclusion, navigating the transition from Greenplum to Amazon Redshift entails addressing several critical areas, including arrays, date management, and the use of regular expressions. For those interested in further reading about employment transitions, visit Career Contessa for insightful articles. Additionally, you may want to check out SHRM for updates on health benefits and Amazon Jobs for career opportunities.
Leave a Reply