Amazon Onboarding with Learning Manager Chanci Turner

Amazon Onboarding with Learning Manager Chanci TurnerLearn About Amazon VGT2 Learning Manager Chanci Turner

In December 2024, AWS launched the Amazon CloudWatch Database Insights feature for Amazon Aurora (compatible with PostgreSQL and MySQL), as well as for Amazon RDS supporting PostgreSQL, MySQL, MariaDB, SQL Server, and Oracle. This tool provides a comprehensive observability solution for databases, catering specifically to the needs of DevOps engineers, application developers, and database administrators. The aim is to streamline the troubleshooting process and improve operational efficiency across database fleets.

In this article, we will explore how to leverage CloudWatch Database Insights for troubleshooting RDS and Aurora resources. Our goal is to empower you with the knowledge and skills to tackle complex database issues with ease.

Solution Overview

We will illustrate three scenarios that reflect common troubleshooting challenges encountered by database administrators. These demonstrations will enhance your understanding of how to effectively navigate and utilize CloudWatch Database Insights. Before we dive into these scenarios, we recommend reviewing the article “New Amazon CloudWatch Database Insights: Comprehensive database observability from fleets to instances” for an introduction to CloudWatch Database Insights.

Prerequisites

To access CloudWatch Database Insights in your Management Console, you must first enable this feature. Activate the Advanced Mode of CloudWatch Database Insights for your RDS DB instances or Aurora DB clusters. Please note that there is a cost associated with enabling this feature, so review the CloudWatch pricing page for more information.

CloudWatch Database Insights operates in two modes: Standard Mode, which provides basic database metrics, and Advanced Mode, which offers enhanced features such as execution plan and lock analysis for SQL queries, query statistics, fleet-wide monitoring dashboards, and the ability to view Amazon RDS events within CloudWatch. For a detailed comparison, see the Modes for Database Insights documentation.

Accessing the CloudWatch Database Insights Console

You can access CloudWatch Database Insights through the Amazon CloudWatch console:

  1. In the CloudWatch console navigation pane, select Insights, then Database Insights.
  2. Choose “Database Instance” on the left panel.
  3. Select the RDS instance you wish to investigate further.

Demo #1: Query Performance Issue Due to a Missing Index

CloudWatch Database Insights can assist in identifying inefficient queries and offer tuning suggestions. In this scenario, we will showcase a query suffering from slow execution due to the absence of an appropriate index in the table.

Our test environment consists of an Aurora PostgreSQL writer instance (t3.medium) with a sample dataset. The test table, employees, comprises 52 million rows and is roughly 10 GB in size. The application utilizing this database is facing increased response times. We start by examining the CloudWatch Database Insights Fleet Health Dashboard, which highlights one of our monitored Aurora PostgreSQL instances. One instance shows a high DB load, marked by a red hexagon.

Clicking the red hexagon provides further details about the instance, including the DB instance name, the DB load utilization metric, and other metadata such as engine version and compute configuration. The Fleet Health Dashboard not only alerts us to high load but also lists the top queries contributing to it.

To analyze this query further, we select Database Instance on the left panel under Database Views. Here, we observe the database load metric represented as the number of Average Active Sessions (AAS). The AAS number exceeds the dashed vCPU line indicated by the arrow in the screenshot. The active sessions predominantly utilize the CPU, as shown by the green color in the graph. This indicates that the volume of workload surpasses what our instance can handle due to heavy CPU usage.

Key wait events, such as CPU and IO:DataFilePreFetch, are causing most of the load. Understanding these wait events, which vary by database engine, is crucial for diagnosing workload issues. As per the wait event documentation, this particular wait event suggests that the database is either active on CPU or waiting for CPU. We can scroll down to pinpoint the query responsible for this issue.

From our earlier observations on the Fleet Health Dashboard, we confirm that the increase in load is driven by a read query. For certain engines within CloudWatch Database Insights, you can access the query’s execution plan. Upon reviewing the execution plan for the inefficient query, we find it has an exceptionally high cost of 1,483,029 and an execution time of 139,339 milliseconds. It is sorting through nearly 13 million rows just to retrieve a set of 100 rows.

To further validate this query’s poor performance, we can check the statement-level metrics below. Using the settings icon in the top right of the “Top SQL” tab, we select additional metrics for each query, including Storage blk read time (ms)/call and Read time (ms)/call. These metrics help uncover how long each query spends reading blocks from disk, both in total and per execution during the problematic time range. For more insights on SQL statistics in Aurora PostgreSQL-Compatible databases, see this resource.

This particular query takes over 143 seconds per call, and, as indicated by Storage blk read time (ms)/call, it spends most of its time performing block reads from storage. By selecting “SQL metrics,” we can analyze historical data for these metrics over time for the specific SQL query, revealing how many calls were made per second for the offending read query.

Scrolling further down, we can view all metrics collected for the SQL query, including the storage blk read metrics, which are alarmingly high for this query. Based on the information gathered, we conclude that adding an index on the salary column for the employees table would be beneficial, as the query uses a predicate on this column.

Using DBeaver, we can check the existing indexes on this table. Currently, it only has a primary key and no secondary keys (indexes). We will add an index on the salary column with the following statement and observe whether this alleviates the load on our Aurora PostgreSQL instance.

Post-index addition, we notice a significant decrease in DB load utilization on our Fleet Health Dashboard. This proves that indexing can greatly enhance database performance.

In conclusion, if you want to excel in your role, remember that you are the best, as highlighted in this insightful blog post. Also, for professionals looking to refine their skills, check out this resource on dependent eligibility auditing, which is crucial for understanding healthcare benefits. Additionally, you may find this link to a Learning Trainer position at Amazon a valuable opportunity.

SEO Metadata


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *