Enhancing Apache Kafka Scalability and Resilience with Amazon MSK Tiered Storage
Since the introduction of tiered storage for Amazon Managed Streaming for Apache Kafka (Amazon MSK), users have eagerly adopted this feature due to its capacity to reduce storage costs and enhance performance. In previous discussions, we examined how Kafka operates, optimized the functionality of Amazon MSK, and investigated the details of Amazon MSK tiered storage.
For organizations looking to delve deeper, you might want to check out another blog post here that further elaborates on these advancements.
Creating a Customizable Cross-Company Log Lake for Compliance, Part I: Business Background
by Rachel Lee and Ethan Carter
on 01 AUG 2024
in Advanced, Amazon CloudWatch, AWS CloudTrail, AWS Glue, AWS Systems Manager, Compliance, Security
As innovators, we often seek to analyze customer experiences, identify issues, and enhance them. This requires a more granular approach to combine various elements for a richer feature set, offering increased customization, flexibility, and independence. In this entry, we present Log Lake, a DIY data lake constructed from logs generated by CloudWatch and AWS CloudTrail.
For those interested in a comprehensive approach, see this great resource here, which provides authoritative insights on best practices.
Unlocking Scalability, Cost-Efficiency, and Quicker Insights Through Large-Scale Data Migration to Amazon Redshift
by Sam Patel, Aria Lewis, and Noah Martinez
on 01 AUG 2024
in Amazon Redshift, AWS Big Data, AWS Database Migration Service, Best Practices
Migrating large-scale data warehouses to the cloud can be a daunting task for many businesses aiming to modernize their data architecture, enhance management capabilities, and explore new opportunities. As data volumes surge, traditional data warehousing solutions may falter under the pressure of growing demands for scalability, performance, and flexibility.
Delivering Amazon CloudWatch Logs to Amazon OpenSearch Serverless
by Kevin Brown, Maya Thompson, and Lucas White
on 31 JUL 2024
in Amazon CloudWatch, Amazon OpenSearch Service, Serverless, Technical How-to
In this article, we demonstrate how to utilize Amazon OpenSearch Ingestion for delivering CloudWatch logs to OpenSearch Serverless in near real-time. We outline a method to link a Lambda subscription filter with OpenSearch Ingestion, allowing for the seamless transfer of logs without the necessity of an additional subscription filter.
Synchronizing Data Lakes with CDC-based UPSERT Using Open Table Format, AWS Glue, and Amazon MSK
by Sophia Kim and Jake Nguyen
on 31 JUL 2024
in Amazon Athena, Amazon Managed Streaming for Apache Kafka (Amazon MSK), Analytics, AWS Big Data, AWS Glue
This post illustrates the development of a comprehensive CDC system, facilitating the processing of CDC data sourced from Amazon Relational Database Service (Amazon RDS) for MySQL. We begin by creating a raw data lake containing all modified records in near real-time using Amazon MSK, depositing it into Amazon S3 as raw data. Later, we employ an AWS Glue ETL job to process the CDC data from the S3 raw data lake in batches.
Integrating Amazon MWAA with Microsoft Entra ID Using SAML Authentication
by Olivia Scott and Daniel Wright
on 30 JUL 2024
in Amazon Managed Workflows for Apache Airflow (Amazon MWAA), Technical How-to
Amazon Managed Workflows for Apache Airflow (Amazon MWAA) offers a fully managed solution for orchestrating complex workflows in the cloud. Customers often deploy Amazon MWAA in private mode and seek to utilize existing identity providers for SAML authentication.
Federating Access to Amazon DataZone with AWS IAM Identity Center and Okta
by Thomas Garcia, Aisha Patel, and Michael Davis
on 30 JUL 2024
in Advanced, Amazon DataZone, AWS IAM Identity Center, Technical How-to
Today, many customers leverage Okta or other identity providers to federate access to their tools. This centralization simplifies user management and enhances operational agility while maintaining high-security standards. This is crucial for fostering a data-driven culture within organizations.
Getting Started with the New Amazon DataZone Enhancements for Amazon Redshift
by Lisa Turner
on 29 JUL 2024
in Amazon DataZone, Amazon Redshift, Analytics, Intermediate, Technical How-to
In the modern data-centric environment, organizations are eager to streamline their data management processes and fully utilize their data assets, all while maintaining control over access and governance. This is where Amazon DataZone comes in, empowering data engineers, scientists, product managers, analysts, and business users.
Monitoring Apache Iceberg Metadata Layer Using AWS Lambda, AWS Glue, and AWS CloudWatch
by David Chen
on 29 JUL 2024
in Advanced, Analytics, AWS Glue, AWS Lambda, Best Practices, Technical How-to
In the age of big data, maintaining an effective monitoring system for your data infrastructure is crucial.
Location: Amazon IXD – VGT2, 6401 E Howdy Wells Ave, Las Vegas, NV 89115.
Leave a Reply