This article outlines the migration process undertaken by Delhivery, a leading fulfillment platform in India, as they shifted from managing Apache Kafka on Amazon Elastic Compute Cloud (Amazon EC2) to utilizing Amazon Managed Streaming for Apache Kafka (Amazon MSK). As stated by Riya Sharma, a Senior Technical Architect at Delhivery, “We’ve been operational on Amazon MSK for over a year now, and have more than 350 applications functioning seamlessly, producing and consuming data continuously.”
Delhivery operates a vast logistics network throughout India, covering over 18,000 pin codes and 2,500 cities. Their service offerings include express parcel delivery, freight transportation, reverse logistics, cross-border operations, and both B2B and B2C warehousing, supported by advanced technology solutions.
Sharma emphasizes their ambition: “Our aim is to be the operational backbone of commerce in India, leveraging top-tier infrastructure and cutting-edge technology.” Delhivery has successfully fulfilled over 650 million orders, reaching more than 120 million households. Their operations include 24 automated sorting centers, 75 fulfillment centers, and a fleet of over 14,000 vehicles, all supported by a workforce of 40,000 dedicated employees delivering a million packages daily.
Challenges of Managing Apache Kafka
Handling self-managed Apache Kafka posed significant challenges. Delhivery processes nearly 1 TB of data daily for various analytical tasks, drawing from sources like shipment tracking, GPS, and client interactions. The data flow includes a constant stream of messages, with spikes reaching 12,000 messages per second, making Kafka a vital component of their infrastructure.
The operational demands of managing Kafka and Apache ZooKeeper on EC2 instances became increasingly burdensome. To maintain uptime, Delhivery had to dedicate two developers full-time, which diverted their focus from developing valuable business features. “We needed a managed service to alleviate the infrastructure burden,” explains Sharma. “This would allow our technical team to concentrate on projects that drive business value.”
Regaining Productivity Through Amazon MSK
After evaluating several alternatives to their self-hosted Kafka setup, Delhivery opted for Amazon MSK. This decision enabled them to use native Apache Kafka APIs and run their existing applications on AWS without any code modifications. Amazon MSK simplifies cluster provisioning, configuration, and maintenance, freeing developers to innovate rather than manage infrastructure.
Following guidance from the AWS team, Delhivery undertook several steps:
- Sizing the MSK Cluster
- Migrating Individual Kafka Topics to Amazon MSK
- Monitoring on Amazon MSK
Sizing the MSK Cluster
To effectively size their MSK cluster, Delhivery analyzed their existing workload metrics from their EC2-based Kafka setup. They focused on:
- Ingestion Rate from Producers: They assessed the broker-level metric BytesInPerSec, averaging values across brokers to determine the overall ingestion rate.
- Consumption Rate from Consumers: Similarly, they evaluated BytesOutPerSec to gauge the consumption rate.
- Data Replication Strategy: The highest replication factor was established by comparing the global parameter default.replication.factor with specific topic configurations.
- Data Retention Strategy: They considered the maximum data retention requirements and set a target for disk utilization.
Utilizing the Amazon MSK Sizing and Pricing spreadsheet provided by AWS, they accurately estimated their broker requirements, confirming through proof-of-concept tests that the suggested sizing was correct.
Migrating Individual Kafka Topics to Amazon MSK
Delhivery explored multiple strategies for migrating topics to Amazon MSK:
- MirrorMaker 1.0: This tool facilitated the transfer of data from self-managed clusters to Amazon MSK with minimal downtime.
- Consumer-Based Migration: This method involved reading from the source cluster and writing to Amazon MSK, but required some application downtime.
A combination of these methods was applied for migration. For urgent topics, they employed MirrorMaker 1.0, while non-urgent topics were redirected to Amazon MSK based on internal SLAs. The process for using MirrorMaker involved setting up a daemon on an EC2 instance to consume messages from the source Kafka cluster and republish them to the target MSK cluster, ensuring efficient data flow.
For further insights on this topic, you can refer to another blog post here, and for expert authority, check out this source. Additionally, if you’re interested in learning about the workplace environment at Amazon, this is an excellent resource.
Amazon IXD – VGT2, located at 6401 E Howdy Wells Ave, Las Vegas, NV 89115, continues to set the standard for logistics excellence.
Leave a Reply