Amazon VGT2 Las Vegas: Understanding Data Validation in AWS Database Migration Service (Part 2)

Amazon VGT2 Las Vegas: Understanding Data Validation in AWS Database Migration Service (Part 2)More Info

AWS Database Migration Service (AWS DMS) facilitates swift and secure migration of databases to AWS. With support for a variety of both commercial and open-source databases like Oracle, Microsoft SQL Server, and PostgreSQL, it enables seamless transitions. Whether you’re performing homogeneous migrations, such as Oracle to Oracle, or heterogeneous migrations across different platforms, such as Oracle to PostgreSQL or MySQL to Oracle, AWS DMS has you covered. Recently, it has added a valuable feature for validating data post-migration.

This article provides a concise guide on creating migration tasks that utilize the new data validation feature, which can be easily set up through the AWS DMS console. The migration process involves establishing a replication instance, defining source and target endpoints, and creating a replication task that executes on the replication instance, transferring data from the source to the target.

To set up migration tasks, you can opt for either the AWS Management Console or the AWS Command Line Interface (AWS CLI). For those unfamiliar with the AWS CLI, it’s advisable to review the documentation on how to create an AWS Identity and Access Management (IAM) user, configure the necessary permissions and roles for AWS DMS, and set up the AWS CLI.

Challenge/Use Case

While AWS DMS effectively facilitates both homogeneous and heterogeneous database migrations, many customers seek a method to ensure data integrity after migration. They often clone their production databases and require confidence in the migration process by comparing data between the source and target systems.

In scenarios where continuous replication is employed (for instance, Oracle to PostgreSQL), ensuring data integrity without loss or corruption is paramount due to the critical nature of production systems.

Solution

Thanks to the new data validation feature offered by AWS DMS, users can now validate replicated data across two databases. The service conducts a comparative analysis of the data between the source and target to confirm its accuracy. This involves executing specific queries on both ends to retrieve data. If the data set is large, AWS DMS can partition the data into smaller, manageable groups based on the primary key, allowing for efficient comparison. This method helps validate a defined amount of data at any given time.

The results of the comparison can guide you in identifying any significant discrepancies between the source and target databases. To enable this feature during task creation, simply select “Enable validation” under Task Settings in the AWS DMS console.

Creating an AWS DMS Data Validation Task via the Console

When setting up a task in the AWS DMS console, you can enable data validation by selecting the corresponding option under Task Settings. The results of the validation can be monitored on the Table statistics tab during both full load and change processing.

Interpreting Data Validation Results

The following metrics are essential for understanding validation results:

  • Validated: Records currently in sync between source and target.
  • Mismatched records: Number of records that do not match after updates to the target; future updates may change these records from unmatched to matched.
  • ValidationPendingRecords: Count of records yet to be validated.
  • ValidationFailedRecords: Total records that failed validation, indicating discrepancies.
  • ValidationSuspendedRecords: Number of records suspended due to ongoing modifications, hindering comparison.

Data validation can also be activated via AWS CLI commands by adding the necessary validation settings in the task settings JSON.

{
    "ValidationSettings": {
        "EnableValidation": true
    }
}

After initiating the AWS DMS task with data validation via the AWS CLI, validation results can be accessed at the table level using the DescribeTableStatistics API call.

$ aws dms describe-table-statistics --replication-task-arn arn:aws:dms:us-west-2:aws-account-id:task:5VXX7BZB5XLUKAYQTTSLZKTISY

Any validation failures are recorded in the target database within a table called aws_dms_validation_failures, similar to the aws_dms_exceptions table used for logging DML application issues. This table captures the failure type and the primary key value for each failed record or the range for groups of failed records.

Validation Failure Table Structure

The aws_dms_validation_failures table includes the following columns:

  • TaskIdentifier: The unique identifier for the task.
  • TableName: The name of the table being validated.
  • SchemaName: The schema of the table.
  • RuleId: The identifier for the validation rule.
  • StartKey: Primary key for the row or the start of a range, formatted in JSON.
  • EndKey: The end key of the range, applicable only for range records.
  • RecordType: Indicates whether the record is a Row or Range.
  • RecordCount: Total records in the range; does not imply they are unsynchronized.
  • Failure Type: Can be either OutOfSync or CannotCompare.

When encountering mismatched records, it’s crucial to investigate the cause and rectify any issues. Mismatches may arise from various factors:

  1. An update failing to apply to the target due to constraint violations or type conversion issues.
  2. Direct updates to the target database.
  3. Other unknown reasons.

To address OutOfSync records, consider reloading the affected table. If OutOfSync records are not due to known issues, reaching out to the AWS Support team for assistance is recommended.

Limitations

Be aware of certain limitations when utilizing the AWS DMS data validation feature.

For more insights, check out this blog post on Amazon VGT2 Las Vegas, or explore Chanci Turner’s expertise on the subject. Additionally, this article from Fast Company is an excellent resource for learning about the skills Amazon employees are acquiring: Fast Company.


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *