Utilizing Logical Replication to Transfer Managed Amazon RDS for PostgreSQL and Amazon Aurora to Self-Managed PostgreSQL

PostgreSQL version 10 introduced numerous features and enhancements, one of which is logical replication based on a publish-and-subscribe architecture. AWS offers two managed PostgreSQL solutions: Amazon RDS for PostgreSQL and Amazon Aurora PostgreSQL. This article explores how to leverage this existing framework to establish a self-managed read replica from either Amazon RDS for PostgreSQL or Aurora. A similar approach can be applied to set up a read replica on PostgreSQL hosted on Amazon Elastic Compute Cloud (Amazon EC2), as well as on Amazon RDS for PostgreSQL and Aurora PostgreSQL.

Amazon RDS for PostgreSQL supports the publication and subscription model from engine version 10.4 and later, while Aurora PostgreSQL supports it from engine version 2.2 (compatible with 10.6) and above.

Solution Overview

Typically, Amazon RDS for PostgreSQL and Aurora PostgreSQL databases are configured such that application writes are handled by a master instance, while read operations are delegated to AWS-managed replicas. Nonetheless, certain business scenarios necessitate the creation of an additional self-managed replica instance to support other independent downstream applications that only require access to a specific subset of data, such as select databases or tables. This type of partial replication enables individual replicas to independently manage portions of the workload, thus enhancing the overall scalability of the system.

With the advent of logical replication, facilitated by the publish-and-subscribe framework, the rds_superuser role can now establish a custom self-managed replication for PostgreSQL versions 10 and above. While logical replication is a database-level solution capable of replicating some or all tables within a database, it is still advisable to consider AWS-generated read replicas for Amazon RDS for PostgreSQL and Aurora PostgreSQL to effectively handle the read traffic from the primary production application, thereby allowing for elastic scaling beyond the limitations of a single master database instance. For more information, see Working with PostgreSQL Read Replicas in Amazon RDS and Replication with Amazon Aurora PostgreSQL. Later in this article, we will delve into additional considerations regarding the features of logical replication.

Logical replication employs a publish-and-subscribe model in which subscribers pull data from the publications they subscribe to. The process begins with an initial snapshot of existing data from the publisher database being copied. Once this is completed, changes made on the publisher (INSERT, UPDATE, and DELETE operations) are relayed to the subscriber in near-real time, maintaining the order of commits from the publisher to ensure transactional consistency.

This method contrasts with physical replication, where exact block addresses are utilized for byte-for-byte replication.

Steps Involved in the Process:

The publisher instance employs the CREATE PUBLICATION command to designate a set of tables whose data changes are meant to be replicated.
The subscriber instance uses the CREATE SUBSCRIPTION command to specify the name and connection details of the publication.
A successful execution of CREATE SUBSCRIPTION initiates a TCP connection to the publisher instance.
An incoming connection from the subscriber prompts the establishment of a temporary logical replication slot at the publisher (using a logical decoding plugin known as pgoutput).

To provide context, a PostgreSQL instance tracks its transactions within a series of ordered 16 MB binary files called write-ahead logs (WAL). From PostgreSQL version 11 onward, you can adjust the WAL size during instance initialization. Instances of Amazon RDS for PostgreSQL version 11 and Aurora engine version 3.0 (compatible with PostgreSQL 11.4) have a WAL size of 64 MB.

A replication slot enables the master instance to monitor how far behind the standby is, preventing the deletion of WAL files that the standby may still require. When applied in the context of streaming replication, these slots are referred to as physical replication slots. However, in logical replication, they utilize a decoding plugin (pgoutput, in this case) that converts the changes read from the WAL into the logical replication protocol, filtering the data as specified (according to the publication specification). This decoding produces all persistent changes in a clear, coherent format that can be interpreted without extensive knowledge of the database’s internal state. In the context of logical replication, these slots are termed logical replication slots.

By default, the temporary logical slot is named {sub name}_{sub oid}_sync_{reloid}, where {sub name} is the subscription name specified using CREATE SUBSCRIPTION. This behavior can be modified using the slot_name option of the command.

The initial data from existing subscribed tables is used to create a snapshot and is transferred to the subscriber using the COPY command.
The initial sync worker at the subscriber receives this snapshot, maps the payload, and applies the necessary operations.

Flow for Transactional Data (Post-Initial Snapshot):

After the initial synchronization, a permanent slot is created (defaulting to the same name as the subscription) via a logical decoding plugin called pgoutput. This slot persists for the duration of the related subscription.

The walsender process begins extracting all persistent changes from the received WALs (a process known as logical decoding).
The plugin converts the changes extracted from the WAL into the logical replication protocol and filters the data according to the publication specification.
The data is subsequently transferred to the apply worker, which maps the payload to local tables and applies the individual changes.

Prerequisites

Before implementing this solution, you must set up logical replication. For detailed instructions for Amazon RDS for PostgreSQL, refer to Logical Replication for PostgreSQL on Amazon RDS. For Aurora PostgreSQL, see Configuring Logical Replication.

Considerations with Logical Replication

Some important considerations when utilizing logical replication include:

Each publication exists in a single database.
As of this writing, publications can only contain tables. The following cannot be replicated:
- Views, materialized views, partition root tables, or foreign tables.
- Large objects. The bytea data type is supported as a workaround.
- Sequences. Serial or identity columns backed by sequences are replicated as part of the table.
Tables can be added to multiple publications as needed.
Publications can have multiple subscribers.
Each subscription receives changes via a single replication slot, along with additional temporary replication slots created for the initial data synchronization of pre-existing tables.
Schema definitions at the publisher aren’t replicated to the subscriber; you must manually create the object schema at the subscriber to initiate replication for that object. For instance, a table created at the publisher after replication has commenced.

Utilizing Logical Replication to Transfer Managed Amazon RDS for PostgreSQL and Amazon Aurora to Self-Managed PostgreSQL

Solution Overview

Steps Involved in the Process:

Flow for Transactional Data (Post-Initial Snapshot):

Prerequisites

Considerations with Logical Replication

Related Topics:

Comments

Leave a Reply Cancel reply