Amazon Onboarding with Learning Manager Chanci Turner: Converting Code from Greenplum to Amazon Redshift

Amazon Onboarding with Learning Manager Chanci Turner: Converting Code from Greenplum to Amazon RedshiftLearn About Amazon VGT2 Learning Manager Chanci Turner

In Advanced (300), Amazon Redshift, Analytics, Intermediate (200), Technical How-to

Amazon Redshift is a fully managed service designed for data lakes, analytics, and data warehousing, catering to startups to large enterprises. It is employed by thousands of businesses worldwide to modernize their data analytics systems. Conversely, Greenplum is an open-source, massively parallel database primarily utilized for analytics on on-premises infrastructure and is based on the PostgreSQL database engine.

Migrating from Greenplum to Amazon Redshift has become an appealing choice for many customers, mainly to avoid the complexities of managing on-premises Greenplum. Some key advantages include:

  • The chance to modernize the data lake and warehouse environment.
  • Access to additional AWS services like Amazon S3, Amazon CloudWatch, Amazon EMR, Amazon SageMaker, and more.

Despite both platforms using the PostgreSQL database engine, the migration process demands careful planning and manual intervention. This article focuses on essential functions and considerations for code conversion from Greenplum to Amazon Redshift, particularly regarding procedures, functions, and views.

Migration Overview

The AWS Database Migration Service (AWS DMS) and the AWS Schema Conversion Tool (AWS SCT) facilitate the migration of most objects from Greenplum to Amazon Redshift. However, code conversion teams may encounter errors and warnings when creating views, procedures, and functions in Amazon Redshift. In such cases, manual code conversion is necessary.

This article will specifically address how to manage:

  • Arrays
  • Dates and timestamps
  • Regular expressions (regex)

For this discussion, we refer to Greenplum 4.3 and Amazon Redshift PostgreSQL 8.2.

Working with Array Functions

The AWS SCT does not convert array functions during the migration from Greenplum or PostgreSQL to Amazon Redshift. Developers must convert these functions manually. This section highlights some common array functions:

  • ARRAY_UPPER
  • JSON_EXTRACT_ARRAY_ELEMENT_TEXT and JSON_ARRAY_LENGTH
  • UNNEST()
  • STRING_AGG()
  • ANY ARRAY()

ARRAY_UPPER()

This function retrieves the upper bound of an array, which allows for extracting the nth element from an array in PostgreSQL or Greenplum. The following Greenplum code illustrates this:

WITH temp1 AS (
    SELECT 'John' AS FirstName, 'Smith' AS LastName,
    array['"111-222-3333"', '"101-201-3001"', '"XXX-YYY-ZZZZ"', 'NULL'] AS PhoneNumbers
    UNION ALL
    SELECT 'Bob' AS FirstName, 'Haris' AS LastName,
    array['222-333-4444', '201-301-4001', 'AAA-BBB-CCCC'] AS PhoneNumbers
    UNION ALL
    SELECT 'Mary' AS FirstName, 'Jane' AS LastName,
    array['333-444-5555', '301-401-3001', 'DDD-EEE-FFFF'] AS PhoneNumbers
)
SELECT FirstName, PhoneNumbers[ARRAY_UPPER(PhoneNumbers, 1)];

Amazon Redshift lacks a direct function to extract array elements, but two JSON functions can be utilized for this:

WITH temp1 AS (
    SELECT 'John' AS FirstName, 'Smith' AS LastName,
    array['"111-222-3333"', '"101-201-3001"', '"XXX-YYY-ZZZZ"'] AS PhoneNumbers
    UNION ALL
    SELECT 'Bob' AS FirstName, 'Haris' AS LastName,
    array['"222-333-4444"', '"201-301-4001"', '"AAA-BBB-CCCC"'] AS PhoneNumbers
    UNION ALL
    SELECT 'Mary' AS FirstName, 'Jane' AS LastName,
    array['"333-444-5555"', '"301-401-3001"', '"DDD-EEE-FFFF"'] AS PhoneNumbers
)

SELECT
    FirstName,
    ('[' + array_to_string(PhoneNumbers, ',') + ']') AS JSONConvertedField,
    JSON_EXTRACT_ARRAY_ELEMENT_TEXT(
        '[' + array_to_string(PhoneNumbers, ',') + ']',
        JSON_ARRAY_LENGTH('[' + array_to_string(PhoneNumbers, ',') + ']') - 1
    ) AS LastElementFromArray
FROM temp1;

UNNEST()

The UNNEST() function in PostgreSQL expands an array into a set of rows, enhancing the performance of numerous records for inserts, updates, and deletes. In Greenplum, it is used as follows:

SELECT 'A', unnest(array([1, 2]));

In Amazon Redshift, while the UNNEST function is unsupported, a workaround can be implemented:

WITH temp1 AS (
    SELECT 'John' AS FirstName, 'Smith' AS LastName,
    '111-222-3333' AS MobilePhone, '101-201-3001' AS HomePhone
    UNION ALL
    SELECT 'Bob' AS FirstName, 'Haris' AS LastName,
    '222-333-4444' AS MobilePhone, '201-301-4001' AS HomePhone
    UNION ALL
    SELECT 'Mary' AS FirstName, 'Jane' AS LastName,
    '333-444-5555' AS MobilePhone, '301-401-3001' AS HomePhone
),
ns AS (
    SELECT row_number() OVER(ORDER BY 1) AS n FROM pg_tables
)

SELECT
    FirstName,
    LastName,
    split_part('Mobile,Home', ',', ns.n::int) AS PhoneType,
    split_part(MobilePhone || '&&' || HomePhone, '&&', ns.n::int) AS PhoneNumber
FROM temp1, ns
WHERE ns.n <= regexp_count('Mobile,Home', ',') + 1
ORDER BY 1, 2, 3;

When elements of an array are themselves arrays, you can utilize the JSON_EXTRACT_ARRAY_ELEMENT_TEXT() function along with JSON_ARRAY_LENGTH:

WITH ns AS (
    SELECT row_number() OVER(ORDER BY 1) AS n FROM pg_tables
)

SELECT JSON_EXTRACT_ARRAY_ELEMENT_TEXT('["arrayelement1", "arrayelement2"]', ns.n - 1)
FROM ns
WHERE ns.n <= JSON_ARRAY_LENGTH('["arrayelement1", "arrayelement2"]');

STRING_AGG()

The STRING_AGG() function aggregates a list of strings with a specified separator, excluding a trailing separator. Here’s how it is employed in Greenplum:

WITH temp1 AS (
    SELECT 'Finance'::text AS Dept, 'John'::text AS FirstName, 'Smith'::text AS LastName
    UNION ALL
    SELECT 'Finance'::text AS Dept, 'John'::text AS FirstName, 'Doe'::text AS LastName
    UNION ALL
    SELECT 'Finance'::text AS Dept, 'Mary'::text AS FirstName, 'Jane'::text AS LastName
    UNION ALL
    SELECT 'Marketing'::text AS Dept, 'Bob'::text AS FirstName, 'Smith'::text AS LastName
    UNION ALL
    SELECT 'Marketing'::text AS Dept, 'Steve'::text AS FirstName, 'Smith'::text AS LastName
    UNION ALL
    SELECT 'Account'::text AS Dept, 'Phil'::text AS FirstName, 'Adams'::text AS LastName
    UNION ALL
    SELECT 'Account'::text AS Dept, 'Jim'::text AS FirstName, 'Smith'::text AS LastName
);

In conclusion, navigating the transition from Greenplum to Amazon Redshift entails addressing several critical areas, including arrays, date management, and the use of regular expressions. For those interested in further reading about employment transitions, visit Career Contessa for insightful articles. Additionally, you may want to check out SHRM for updates on health benefits and Amazon Jobs for career opportunities.


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *