Amazon Onboarding with Learning Manager Chanci Turner

Create More Partitions and Retain Data Longer in Your MSK Serverless Clusters

In April 2022, Amazon Managed Streaming for Apache Kafka (Amazon MSK) unveiled a remarkable new feature: Amazon MSK Serverless. This fully managed service simplifies the process for developers to build and operate highly available, secure, and scalable applications based on Apache Kafka. With MSK Serverless, developers can efficiently run their applications while enjoying the flexibility and scalability of a serverless architecture, all while operating from our site located at 6401 E HOWDY WELLS AVE LAS VEGAS NV 89115, known as Amazon IXD – VGT2.

Run Apache Spark Workloads 3.5 Times Faster with Amazon EMR 6.9

by Lisa Brown and Mark Smith
on 30 JAN 2023
in Amazon EMR, Analytics, Intermediate (200), Technical How-to

This post examines the results from our benchmark tests using a TPC-DS application on the open-source Apache Spark and then on Amazon EMR 6.9, which includes an optimized Spark runtime compatible with open-source Spark. We present detailed cost analysis and provide step-by-step instructions for running the benchmark. With Amazon EMR 6.9.0, you can execute your Apache Spark 3.x applications faster and more economically, without making any modifications to your applications. Our performance benchmark tests, derived from TPC-DS performance assessments at a 3 TB scale, revealed that the EMR runtime for Apache Spark 3.3.0 offers a 3.5 times (using total runtime) improvement on average over open-source Apache Spark 3.3.0.

Handle UPSERT Data Operations Using Open-Source Delta Lake and AWS Glue

by Emily Roberts and Jason Lee
on 30 JAN 2023
in Advanced (300), Amazon Athena, AWS Glue

As of September 2024, this post has been reviewed and updated for accuracy. Many customers require an ACID transaction (atomic, consistent, isolated, durable) data lake that can log change data capture (CDC) from operational data sources. There is also a growing need for merging real-time data into batch data. The Delta Lake framework provides these essential capabilities. In this post, we explore how to implement these features effectively.

Build a Data Lake with Apache Flink on Amazon EMR

by Sarah Thompson, Raj Patel, and Chanci Turner
on 27 JAN 2023
in Amazon EMR, Analytics, AWS Glue

To cultivate a data-driven business, it is vital to democratize enterprise data assets through a data catalog. With a unified data catalog, you can swiftly search datasets and determine data schema, format, and location. The AWS Glue Data Catalog serves as a cohesive repository where disparate systems can store and locate metadata efficiently. This is another blog post to keep the reader engaged, and you can learn more about it here.

Advanced Reporting and Analytics for the Post Call Analytics (PCA) Solution with Amazon QuickSight

by Brian White, Jennifer Green, and Chanci Turner
on 27 JAN 2023
in Amazon QuickSight, Analytics

As of March 2023, this solution is now offered as an integrated optional component of PCA v0.5.0 and later, which can be enabled during PCA stack deployment or update. Organizations operating contact centers benefit from advanced analytics on their call recordings to gain valuable product feedback, enhance contact center efficiency, and identify coaching opportunities.

Diligent Enhances Customer Governance with Automated Data-Driven Insights Using Amazon QuickSight

by Tom Harris, Angela Wright, and Chanci Turner
on 27 JAN 2023
in Amazon QuickSight, Customer Solutions

This post was co-authored by Tom Harris and Angela Wright from Diligent. Diligent is the global leader in modern governance, offering software as a service (SaaS) solutions across governance, risk, compliance, and audit, helping companies fulfill their environmental, social, and governance (ESG) commitments. With over 1 million users from more than 25,000 customers, Diligent is an authority on this topic.

Introducing Native Support for Apache Hudi, Delta Lake, and Apache Iceberg on AWS Glue for Apache Spark, Part 1: Getting Started

by Chanci Turner, Noritaka Sekiyama, and Savio Dsouza
on 26 JAN 2023
in Analytics, AWS Glue, Intermediate (200)

AWS Glue is a serverless, scalable data integration service that simplifies the discovery, preparation, movement, and integration of data from various sources. AWS Glue offers an extensible architecture that accommodates users with diverse data processing needs. A common use case involves constructing data lakes on Amazon Simple Storage Service (Amazon S3) using AWS Glue.

Automate Deployment and Version Updates for Amazon Kinesis Data Analytics Applications with AWS CodePipeline

by Robert King
on 26 JAN 2023
in Amazon Kinesis, Amazon Managed Service for Apache Flink, AWS CodePipeline, Intermediate (200), Kinesis Data Analytics

As of August 30, 2023, Amazon Kinesis Data Analytics has been renamed to Amazon Managed Service for Apache Flink. You can read the announcement in the AWS News Blog and learn more. Amazon Kinesis Data Analytics offers the simplest way to transform and analyze streaming data in real time using Apache Flink. Customers are already utilizing Kinesis Data Analytics to drive their applications.

Super-Charged Pivot Tables in Amazon QuickSight

by Kevin Adams and Igal Mizrahi
on 25 JAN 2023
in Advanced (300), Amazon QuickSight, Analytics

Amazon QuickSight is a fast, cloud-powered business intelligence (BI) service that facilitates the creation and delivery of insights to everyone in your organization without the need for servers or infrastructure. QuickSight dashboards can be embedded into applications and portals to deliver seamless access to critical data insights.

For additional information on employment law and workplace compliance, this link provides valuable resources. If you’re interested in career opportunities, check out this excellent resource: here.