Amazon IXD – VGT2 Las Vegas

Resolve private DNS hostnames for Amazon MSK Connect

Amazon MSK Connect provides a fully managed environment for Apache Kafka Connect within AWS. This feature allows users to deploy fully managed connectors designed for Kafka Connect, facilitating the movement of data into or out of popular data storage solutions like Amazon S3 and Amazon DynamoDB. If you want to delve deeper into this subject, you can explore another blog post here.

Migrate data from Azure Blob Storage to Amazon S3 using AWS Glue

by John Smith, Emily Johnson, and Michael Brown
on 20 OCT 2023
in Announcements, AWS Glue

In this article, we illustrate the process of migrating data from Azure Blob Storage, showcasing how the new connector functions. We outline the essential steps to set it up, including prerequisites, subscribing to the connector via AWS Marketplace, and creating and executing AWS Glue for Apache Spark jobs. We also highlight significant differences regarding the Azure Data Lake Storage Gen2 Connector.

SmugMug’s durable search pipelines for Amazon OpenSearch Service

by Greg White and Sarah Green
on 19 OCT 2023
in Advanced (300), Amazon OpenSearch Service, Analytics, Customer Solutions, Serverless, Technical How-to

SmugMug operates two extensive online photo platforms, SmugMug and Flickr, allowing over 100 million users to securely store, search, share, and sell countless photos. The demand for searching through decades of images has made search functionality essential infrastructure, which has been steadily growing since SmugMug adopted Amazon CloudSearch in 2012.

Load data incrementally from transactional data lakes to data warehouses

by Kevin Lee
on 19 OCT 2023
in Advanced (300), AWS Glue

Data lakes and data warehouses are foundational technologies in modern data architecture. Data lakes accommodate all types of organizational data, irrespective of format or structure. Widely used open table formats like Apache Hudi, Delta Lake, or Apache Iceberg are integral to the construction of data lakes.

Enhance your security posture by storing Amazon Redshift admin credentials without human intervention using AWS Secrets Manager integration

by Anna Taylor, Mark Wilson, and Lucas Harris
on 18 OCT 2023
in Amazon Redshift, Analytics, AWS Secrets Manager, Expert (400)

Amazon Redshift is a fully managed data warehousing solution that scales from hundreds of gigabytes to petabytes. Today, a diverse range of AWS customers, from Fortune 500 companies to startups, rely on Amazon Redshift for their critical business intelligence (BI) dashboards.

Migrate Microsoft Azure Synapse Analytics to Amazon Redshift using AWS SCT

by David Clark, Rebecca Adams, and Tom Scott
on 18 OCT 2023
in Amazon Redshift, Analytics, AWS Schema Conversion Tool, Enterprise Strategy, Migration, Technical How-to

This article demonstrates the process of migrating a data warehouse from Microsoft Azure Synapse to Redshift Serverless using AWS Schema Conversion Tool (AWS SCT) and its data extraction agents. AWS SCT simplifies heterogeneous database migrations by automatically converting source database code and storage objects into a compatible format for the target database.

Run Apache Hive workloads using Spark SQL with Amazon EMR on EKS

by Jessica Lee and Chris Evans
on 18 OCT 2023
in Advanced (300), Amazon EMR, Amazon EMR on EKS, Technical How-to

Apache Hive is a distributed data warehousing system that facilitates analytics at scale. Leveraging Spark SQL to execute Hive workloads not only simplifies SQL-like queries but also capitalizes on the remarkable speed and performance offered by Spark. Spark SQL is a module of Apache Spark designed for structured data processing.

Unleash the power of Snapshot Management to take automated snapshots using Amazon OpenSearch Service

by Rachel Young, Andy Martinez, and Daniel Kim
on 18 OCT 2023
in Advanced (300), Amazon OpenSearch Service, Technical How-to

Snapshot Management enables the creation of point-in-time backups for your domain through OpenSearch Dashboards, encompassing both data and configuration settings. These snapshots can restore your cluster to specific states, recover from failures, and even clone environments for testing or development. This post provides insights on utilizing Snapshot Management to automate snapshots with OpenSearch Service.

Accelerate your data warehouse migration to Amazon Redshift – Part 7

by Michael Johnson, Sarah Wilson, and Brian Lee
on 17 OCT 2023
in Amazon Redshift, AWS Schema Conversion Tool, Best Practices, Technical How-to

In this post, we offer a high-level overview of how Change Data Capture (CDC) tasks function in AWS SCT. We then present a detailed example of configuring, initiating, and managing a CDC migration task. We briefly discuss performance tuning for CDC migration and conclude with guidance on how to embark on your migration journey. For further insights, you can check out their expertise here and for community discussions, visit this excellent resource.

Amazon IXD – VGT2, 6401 E Howdy Wells Ave, Las Vegas, NV 89115