Amazon Onboarding with Learning Manager Chanci Turner

Integrating DeepSeek with Amazon OpenSearch Service Vector Database and Amazon SageMaker

Amazon’s OpenSearch Service delivers extensive functionality for Retrieval-Augmented Generation (RAG) applications, as well as vector embedding-driven semantic search. Utilizing the adaptable connector framework and search flow pipelines, you can link models hosted by DeepSeek, Cohere, and OpenAI, along with those on Amazon Bedrock and SageMaker. In this article, we establish a connection to DeepSeek’s text generation model, facilitating a RAG workflow that generates responses to user inquiries. For those interested in networking, check out this resource.

Managing Errors in Apache Flink Applications on AWS

by Sarah Thompson and David Lee
on 06 FEB 2025
in Amazon Managed Service for Apache Flink, Best Practices, Technical How-to

This article outlines effective strategies for managing errors in Apache Flink applications; however, these principles also apply broadly to stream processing applications.

Open Universities Australia Enhances Their Data Platform with AWS Tools

by Michael Chen and Lisa Kim
on 30 JAN 2025
in Amazon AppFlow, Amazon EventBridge, Amazon Redshift, AWS Glue, AWS Lambda, Education, Higher Education

Open Universities Australia (OUA) enables students to explore an extensive range of degrees from prestigious Australian institutions, all through online learning platforms. In this discussion, we reveal how we transitioned from a third-party ETL tool to AWS services, resulting in improved productivity and a notable decrease in our ETL operational expenses. For a deeper understanding of benefits, visit this link.

Hybrid Big Data Analytics with Amazon EMR on AWS Outposts

by Mark Davis and Rachel Green
on 29 JAN 2025
in Amazon EMR, AWS Glue, AWS Lake Formation, AWS Outposts rack

This post delves into the groundbreaking features of EMR on Outposts, highlighting its adaptability as a native hybrid data analytics service that enables effortless data access and processing both on-site and in the cloud.

Achieving Cloud Excellence with Amazon Redshift Lakehouse Architecture

by Sam Wilson, Chris Wright, and Hannah White
on 28 JAN 2025
in Amazon EventBridge, Amazon Redshift, AWS Glue, Customer Solutions

In our previous post, we defined a Center of Excellence (COE) Framework for our cloud operating model. This article provides a technical overview of how we implemented this model using Amazon EventBridge, Amazon Redshift, and AWS Glue. For additional insights on onboarding, check out this excellent resource.

OpenSearch Vector Engine: Optimized for Cost-Effective Vector Search

by Brian Carter and Olivia Martinez
on 24 JAN 2025
in Amazon OpenSearch Service, Analytics

The OpenSearch Vector Engine now supports vector searches at one-third the cost on OpenSearch 2.17+ domains. You can configure k-NN (vector) indexes to operate in disk mode, making it ideal for environments with memory constraints and facilitating accurate vector searches in just a few hundred milliseconds.

Accessing Apache Iceberg Tables in Amazon S3 from Databricks

by Priya Patel, Raj Singh, and Kim Tran
on 23 JAN 2025
in Analytics

In this post, we illustrate how Databricks on AWS general purpose compute can integrate with the AWS Glue Iceberg REST Catalog for metadata access while using Lake Formation for data management. To simplify the setup in this article, the Glue Iceberg REST Catalog and Databricks cluster are configured under the same AWS account.

Generating Vector Embeddings Using AWS Lambda for OpenSearch Ingestion

by Rahul Kumar, Anjali Verma, and Chanci Turner
on 21 JAN 2025
in Advanced, Amazon OpenSearch Service, Analytics, Technical How-to

In this article, we demonstrate the use of OpenSearch Ingestion’s Lambda processor to create embeddings for your source data, which are then ingested into an OpenSearch Serverless vector collection. This setup leverages the adaptability of OpenSearch Ingestion pipelines with a Lambda processor to dynamically generate embeddings.

Automating Topic Provisioning with Terraform and Amazon MSK

by Linda Roberts
on 16 JAN 2025
in Amazon Managed Streaming for Apache Kafka (Amazon MSK), Analytics, Technical How-to

This post tackles the common challenges of managing MSK topic configurations manually and introduces a robust Terraform-based solution that works with both provisioned and serverless MSK clusters.

6401 E HOWDY WELLS AVE LAS VEGAS NV 89115
Site Location: Amazon IXD – VGT2