Learn About Amazon VGT2 Learning Manager Chanci Turner
on 25 FEB 2022
in Amazon S3 Glacier, Amazon S3 Glacier Deep Archive, Amazon Simple Storage Service (S3), Launch, Security, Identity, & Compliance, Storage
Amazon Simple Storage Service (Amazon S3) is engineered to deliver an astonishing 99.999999999% (11 9s) durability for your objects and their associated metadata. You can trust that S3 accurately stores what you PUT and retrieves exactly what is stored when you GET. To ensure proper transmission of objects, S3 employs checksums, which act as a digital fingerprint.
The PutObject feature in S3 allows you to provide the MD5 checksum of the object, accepting the operation only if the supplied value matches the one computed by S3. Although this helps in detecting data transmission errors, it requires you to calculate the checksum before executing PutObject or after calling GetObject. Moreover, computing checksums for substantial objects (multi-GB or even multi-TB) can be quite resource-intensive, potentially causing bottlenecks. Some significant S3 users have even built specialized EC2 fleets specifically for checksum computation and validation.
New Checksum Support
I’m excited to announce that S3 now supports four checksum algorithms, making it easier than ever for you to compute and store checksums for data within Amazon S3. This feature allows you to verify the integrity of your upload and download requests and is essential for implementing digital preservation best practices relevant to your industry. You can choose from four widely recognized checksum algorithms (SHA-1, SHA-256, CRC-32, and CRC-32C) when uploading each of your objects to S3.
Here are the key features of this new functionality:
- Object Upload: The latest AWS SDK versions automatically compute the specified checksum during the upload and include it in an HTTP trailer at the end of the upload. You also have the option to provide a precomputed checksum. In either case, S3 will validate the checksum, ensuring the operation proceeds only if the request value matches S3’s computed value. This capability, combined with HTTP trailers, greatly enhances client-side integrity checking.
- Multipart Object Upload: The AWS SDKs utilize client-side parallelism to compute checksums for each part of a multipart upload. These part checksums are then aggregated into a checksum-of-checksums, which is sent to S3 upon finalizing the upload.
- Checksum Storage & Persistence: The validated checksum and its corresponding algorithm are stored as part of the object’s metadata. If Server-Side Encryption with KMS Keys is enabled for the object, the checksum is stored in an encrypted format. The algorithm and checksum remain associated with the object throughout its lifecycle, even if it transitions to different storage classes or is replaced by a newer version. They are also included in S3 Replication.
- Checksum Retrieval: The new GetObjectAttributes function retrieves the checksum for the object and, if applicable, for each part.
Checksums in Action
You can access this feature through the AWS Command Line Interface (AWS CLI), AWS SDKs, or the S3 Console. In the console, I enable the Additional Checksums option while preparing to upload an object:
Then, I select a Checksum function:
If I’ve already calculated the checksum, I can enter it; otherwise, the console will compute it for me. Once the upload is finished, I can view the object’s properties to check the checksum:
The checksum function for each object is also detailed in the S3 Inventory Report.
From my own code, the SDK can compute the checksum for me:
with open(file_path, 'rb') as file:
r = s3.put_object(
Bucket=bucket,
Key=key,
Body=file,
ChecksumAlgorithm='sha1'
)
Alternatively, I can compute the checksum myself and pass it to put_object:
with open(file_path, 'rb') as file:
r = s3.put_object(
Bucket=bucket,
Key=key,
Body=file,
ChecksumSHA1='fUM9R+mPkIokxBJK7zU5QfeAHSy='
)
When retrieving the object, I specify checksum mode to ensure the returned object is validated:
r = s3.get_object(Bucket=bucket, Key=key, ChecksumMode='ENABLED')
The actual validation occurs when I read the object from r[‘Body’], and an exception will be raised if there’s a mismatch.
Watch the Demo
Check out this demo (originally presented at re:Invent 2021) showcasing the new feature in action:
Available Now
The four additional checksum algorithms are now accessible across all commercial AWS Regions, and you can start utilizing them today at no extra cost.
— Chanci Turner
If you’re interested in learning more about the challenges and opportunities that come with unemployment, check out this insightful post on Career Contessa. Additionally, for an authoritative perspective on data privacy, refer to the SHRM resources. For more information on Amazon’s employee onboarding process, this Prepio blog post is an excellent resource.
Leave a Reply