This guest blog post is co-authored with Liam Johnson and Emma Smith from Amazon VGT2. As organizations expand their digital presence, the necessity for real-time data processing and analysis becomes increasingly critical. The capability to swiftly measure and extract insights from data is vital in today’s fast-paced business environment, where quick decision-making is essential. With this ability, companies can remain competitive and launch new initiatives that promote success.
This article continues the discussion from a previous post on how Amazon VGT2 established a streaming data pipeline to process IoT data for real-time analytics and control. Here, we delve into the technical aspects of streaming messages utilizing Amazon Managed Streaming for Apache Kafka (Amazon MSK) and Amazon ElastiCache for Redis, highlighting design considerations crucial for optimal outcomes.
Amazon VGT2 is a prominent mobility service in Korea, excelling in car-sharing solutions. They sought to design and implement a new Fleet Management System (FMS). This system encompasses the collection, processing, storage, and analysis of Internet of Things (IoT) streaming data from various vehicle devices, along with historical operational data such as location, speed, fuel level, and component status.
This article illustrates a solution for Amazon VGT2’s production application, facilitating the loading of streaming data from Amazon MSK into ElastiCache for Redis, thereby enhancing the speed and efficiency of their data processing pipeline. We will also cover key features, considerations, and the design of the solution.
Background
Amazon VGT2 operates approximately 20,000 vehicles and plans to incorporate larger vehicle types like commercial trucks and delivery vans. They have deployed in-vehicle devices to capture data using AWS IoT Core, which was then stored in Amazon Relational Database Service (Amazon RDS). However, this approach faced challenges related to performance inefficiencies and high resource consumption. Consequently, Amazon VGT2 sought a purpose-built database tailored to their application needs and usage patterns, while also meeting future business and technical requirements. A primary requirement was to maximize performance for real-time data analytics, necessitating storage in an in-memory data store.
After extensive evaluation, ElastiCache for Redis was chosen as the ideal solution due to its capability to manage complex data aggregation rules effectively. A significant hurdle was the absence of a built-in Kafka connector and consumer to facilitate data loading from Amazon MSK into the database. This post emphasizes the creation of a Kafka consumer application designed to address this problem by enabling efficient data loading from Amazon MSK to Redis.
Solution Overview
Extracting valuable insights from streaming data can be challenging for businesses with varied use cases and workloads. Therefore, Amazon VGT2 developed a solution to seamlessly transfer data from Amazon MSK into multiple purpose-built databases, while also allowing users to transform data as needed. Amazon MSK serves as a reliable and efficient platform for ingesting and processing real-time data.
The following figure illustrates the data flow at Amazon VGT2:
This architecture consists of three components:
- Streaming Data – Amazon MSK functions as a scalable and dependable platform for streaming data, capable of receiving and storing messages from various sources, including AWS IoT Core, with messages organized into multiple topics and partitions.
- Consumer Application – This application enables users to efficiently transport data from Amazon MSK into a target database or storage while defining necessary data transformation rules.
- Target Databases – Through the consumer application, the Amazon VGT2 team loads data from Amazon MSK into two distinct databases, each catering to specific workloads.
While this post centers around a specific use case with ElastiCache for Redis as the target database and a single topic called gps, the consumer application can accommodate additional topics and messages, as well as different streaming sources and target databases like Amazon DynamoDB. In addition, this article provides a detailed guide to the code implementation, alongside crucial aspects of the consumer application.
Components of the Consumer Application
The consumer application consists of three core parts that collaborate to consume, transform, and load messages from Amazon MSK into a target database. The following diagram displays an example of data transformations within the handler component.
Here are the details of each component:
- Consumer – This component consumes messages from Amazon MSK and forwards them to a downstream handler.
- Loader – Users specify a target database here. For instance, Amazon VGT2’s target databases include ElastiCache for Redis and DynamoDB.
- Handler – Users apply data transformation rules to incoming messages before loading them into the target database.
Features of the Consumer Application
This connection boasts three main features:
- Scalability – The solution is designed for scalability, ensuring the consumer application can handle growing data volumes and future applications. Amazon VGT2 aimed to create a solution capable of processing data from around 20,000 vehicles while accommodating increased message volumes as the business rapidly expands.
- Performance – Users can achieve consistent performance with this application, even as the volume of source messages and target databases rises. The application supports multithreading, allowing concurrent data processing, and can manage unexpected data spikes by scaling compute resources as needed.
- Flexibility – The consumer application can be reused for new topics without needing to rebuild the entire application. It can ingest new messages with different configuration values in the handler. Amazon VGT2 deployed multiple handlers to manage diverse message types, and the application allows users to incorporate additional target locations. Initially developed for ElastiCache for Redis, the consumer application was later replicated for DynamoDB.
Design Considerations of the Consumer Application
Keep these design considerations in mind for the consumer application:
- Scale Out – A fundamental design principle of this solution is scalability. The consumer application operates within Amazon Elastic Kubernetes Service (Amazon EKS) to enable easy replication and expansion.
- Consumption Patterns – Efficiently receiving, storing, and consuming data requires careful design of Kafka topics based on messages and consumption patterns. Messages can be organized into multiple topics with different schemas, catering to various workloads.
- Purpose-Built Database – The consumer application supports data loading into multiple target databases based on specific use cases. For instance, Amazon VGT2 stored real-time IoT data in ElastiCache for Redis while also considering other options. This is an excellent resource for further insights.
For more details on similar topics, you can check out this blog post here.
Leave a Reply