Amazon Onboarding with Learning Manager Chanci Turner

Amazon Aurora is a fully managed relational database service that combines the performance, scalability, and availability of high-end commercial databases with the simplicity and cost-effectiveness of open-source solutions. The Amazon Aurora MySQL-Compatible Edition is wire-compatible with MySQL, making it an appealing option for businesses already utilizing MySQL technology.

In contrast to traditional MySQL databases, storage management in Amazon Aurora MySQL follows a unique approach. This article delves into the various types of storage available in Amazon Aurora MySQL, how these storage types are utilized, and how to monitor storage consumption effectively. Additionally, we will cover database queries and Amazon CloudWatch metrics that can assist in estimating Aurora storage costs. For more financial tips, check out this blog post on saving money, which can be found here.

Types of Storage

Amazon Aurora employs two types of storage:

Cluster volume storage – This shared storage layer is distributed across three Availability Zones within an AWS Region, ensuring durability, fault tolerance, redundancy, and high availability. It stores InnoDB tables and indexes, database metadata, persistent objects such as functions and procedures, and other data like binary logs and relay logs.
Local storage – Each Aurora MySQL instance in the cluster comes with local storage volumes supported by Amazon Elastic Block Store (Amazon EBS). These local volumes are used for non-persistent temporary files, non-InnoDB temporary tables, sorting large datasets, and storing various engine logs, including error, audit, and general logs. For detailed insights, refer to Temporary storage limits for Aurora MySQL.

In the sections that follow, we will discuss common sources of storage utilization within the Aurora MySQL cluster and how to utilize database metrics and metadata to monitor storage usage.

User Tables, Indexes, and Tablespaces

User tables and indexes significantly consume the persistent storage space in any relational database system. MySQL storage engines, responsible for handling SQL operations across different table types, include InnoDB as the default and general-purpose engine. It’s important to note that InnoDB is the only storage engine supported by Amazon Aurora MySQL for persistent database tables; thus, we will focus on storage utilization specifically regarding the InnoDB storage engine.

In traditional MySQL, InnoDB tables are stored within data files called tablespaces, with the “.ibd” file extension. Although Amazon Aurora does not utilize traditional files and block-based filesystems for InnoDB data storage, the core concepts remain intact. Aurora adopts the idea of InnoDB tablespaces, but these tablespaces exist as objects within Aurora’s uniquely designed storage volume rather than as files on a block storage device.

The InnoDB storage engine typically stores tables in file-per-table tablespaces, controlled by the innodb_file_per_table parameter. When this parameter is set to ON, the engine operates as follows:

Each table is assigned its own tablespace, analogous to an .ibd file in traditional MySQL.
When a tablespace is deleted, the freed database pages become available for reuse.
Aurora can dynamically reclaim these freed pages over time, optimizing storage volume and reducing costs.

Several database operations, such as dropping tables, truncating them, or optimizing tables, can lead to tablespaces being removed, thereby freeing pages for Aurora to reclaim. This applies to table partitions as well, since each partition has its own tablespace. However, it’s worth noting that the volume size won’t immediately decrease after these operations. Instead, Aurora gradually reclaims free space in the background at a rate of up to 10 TB per day. For further details, see Storage scaling.

When innodb_file_per_table is set to OFF, the behavior changes:

Tables do not have individual tablespaces; instead, their data resides within the system tablespace.
Actions like dropping or truncating a table will free pages within the system tablespace, but will not reduce the system tablespace size. Consequently, Aurora’s dynamic volume resizing cannot reclaim space occupied by those pages.

To assess the space utilized by tablespaces, you can query the INFORMATION_SCHEMA.FILES table, which provides metadata for various InnoDB tablespace types, including file-per-table tablespaces and system tablespaces. For instance, the following query allows you to list tablespace names alongside their sizes:

SELECT FILE_NAME, 
   TABLESPACE_NAME, 
   ROUND((TOTAL_EXTENTS * EXTENT_SIZE) / 1024 / 1024 / 1024, 4) AS SIZE_GB 
FROM INFORMATION_SCHEMA.FILES 
ORDER BY SIZE_GB DESC LIMIT 10;

This query is applicable to both Amazon Aurora MySQL version 2 (compatible with MySQL 5.7) and version 3 (compatible with MySQL 8.0). Keep in mind that tablespaces have a minimum size even when they are empty. When innodb_file_per_table is ON, even an empty table or partition occupies a small amount of storage, typically a few megabytes. Unless you plan to manage tens of millions of tables in a single Aurora cluster, this is generally not a concern. It is advisable to maintain the default ON setting for innodb_file_per_table whenever possible.

Additionally, you should contemplate utilizing the INFORMATION_SCHEMA.FILES table as a more accurate measure for calculating storage space used by tables, indexes, and schemas instead of relying solely on INFORMATION_SCHEMA.TABLES. This is because the latter may contain outdated cached statistics unless the tables have been analyzed recently. The information_schema_stats_expiry system variable (applicable to Aurora MySQL version 3) defines the duration before cached statistics expire, with the default being 86,400 seconds (24 hours). To force an update of the cached values for a specific table, execute the ANALYZE TABLE command and then check the statistics within INFORMATION_SCHEMA.TABLES. Note that the accuracy of this operation can depend on the configuration of innodb_stats_persistent and innodb_stats_transient_sample_pages parameters.

Temporary Tables and Temporary Tablespaces

Before delving into temporary tablespaces, it’s essential to understand the concept of temporary tables, their usage, and the differences in handling temporary tables between Amazon Aurora MySQL version 2 and version 3.

In Aurora MySQL, there are two categories of temporary tables:

Internal (or implicit) temporary tables – Automatically created by the database engine for operations like sorting, aggregation, derived tables, and common table expressions (CTEs). Users have no direct control over these tables. For more details about internal temporary tables in MySQL 5.7, refer to Internal Temporary Table Use in MySQL and for MySQL 8.0, consult Internal Temporary Table Use in MySQL.
User-created (or explicit) temporary tables – These are defined by users for specific use cases.

For further insights regarding the onboarding experience, consider this excellent resource on Reddit, where you can find shared experiences. Also, for guidance on crafting effective mission statements, SHRM is an authoritative source.

Amazon Onboarding with Learning Manager Chanci Turner

Types of Storage

User Tables, Indexes, and Tablespaces

Temporary Tables and Temporary Tablespaces

Related Topics:

Comments

Leave a Reply Cancel reply