The Transformation of Koo’s Database: Connecting Millions of Voices with Amazon DynamoDB

This article features contributions from Alex Johnson at Koo. Koo is a global micro-blogging platform that empowers users to express their thoughts and opinions across a multitude of languages. Launched in March 2020, the app has rapidly gained traction, attracting millions of users eager to share their insights and connect with others.

In a world as diverse as ours, language is vital for effective communication, and Koo has recognized the significance of allowing users to express themselves in their preferred languages. The app’s interface supports several languages, including English, Hindi, Portuguese, Spanish, and more. This commitment to user-friendly communication has been instrumental in Koo’s success.

Koo has garnered accolades from various sectors, winning the Aatmanirbhar App Innovation Challenge organized by the Indian government in 2020 and being named Google PlayStore’s Best Daily Essential App for the same year. Currently, Koo supports over 20 languages, has been downloaded more than 60 million times, is actively utilized by 8,000 notable profiles, and has quickly become the second-largest micro-blogging platform within just three years.

AWS has played a crucial role in Koo’s growth journey by providing a reliable and scalable infrastructure that facilitates seamless expansion. Koo was built in the cloud, adopting microservices, containers, AI, machine learning (ML), and other advanced technologies. Initially, Koo employed a monolithic data architecture, relying on a centralized relational database server for all data operations. As the platform expanded globally and incorporated more languages, user numbers surged from 20 million to 60 million. This explosive growth led to a significant increase in content and interactions (likes, shares, reposts), causing the monolithic relational database to encounter performance and scalability issues.

In this article, we will delve into Koo’s journey from a centralized monolithic relational database to a distributed database architecture. We will explore the architectural and operational challenges faced with the previous setup, the shift to a NoSQL framework suitable for use cases requiring eventual consistency, the adoption of Amazon DynamoDB—a fully managed, serverless, key-value NoSQL database—and the migration process that supported their scalability needs.

Koo’s Initial Monolithic Database Architecture

Koo has streamlined its core functionalities into a set of efficient, scalable, and robust microservices, enabling engineers to deploy new features for users with great frequency. The web and mobile applications interact with these internal microservices via exposed APIs, which are deployed within the Amazon Elastic Kubernetes Service (Amazon EKS). Below is an overview of Koo’s initial application and data platform architecture.

Some key microservices include:

Onboarding: This service facilitates user registration via social media or email accounts.
Feed: Upon signing in, Koo users expect to see relevant content. This service delivers the latest trending and engaging posts (known as Koos) from users’ networks within milliseconds.
Recommender: This service tailors the content displayed to users based on various parameters to ensure it is personalized and current.
Discovery: For new users, this service offers suggestions for following relevant profiles to enhance engagement.
Notification: To boost interaction with content, this service sends push and email notifications to millions of users.

Initially, all transactional data for these microservices was housed within a monolithic relational database managed using Amazon Aurora PostgreSQL-Compatible Edition. This setup involved a three-node cluster (one writer and two readers) distributed across multiple AWS Availability Zones for high availability. Data pipelines were established using AWS Glue jobs and Amazon Managed Streaming for Apache Kafka (Amazon MSK) to capture data changes and relay them to a centralized data lake on Amazon Simple Storage Service (Amazon S3).

Koo also performed data transformations and processing with Amazon EMR, utilizing Apache Spark and Apache Flink jobs to access data from various sources, including Amazon S3 and Apache Iceberg tables. For MLOps, Koo leveraged Amazon SageMaker to deliver high-performance production ML models, which classify content, conduct sentiment analysis, and personalize user experiences. These models continuously process both streaming and batch data, calculating scores that update the central database to enhance users’ feeds.

Challenges Encountered

When Koo launched in India in 2020, it supported four regional languages (Hindi, Kannada, Tamil, and Telugu) in addition to English. As Koo aimed to attract a larger user base, the application architecture needed to scale to accommodate a twenty-fold increase. By early 2022, Koo’s expansion into new markets such as Nigeria, the United States, and Brazil, along with the addition of support for over 20 languages, led to performance and scalability challenges.

While Koo’s microservices deployment in Amazon EKS effectively managed these challenges by scaling to meet dynamic demands, the centralized database struggled. The operational issues included:

Significant drops in freeable memory on reader instances.
Increased query response time due to Lock:tuple events.
Autovacuum processes running on the toast table, which led to blocking issues that hampered reader performance.

Architecturally, Koo faced constraints due to:

The limitations imposed by relational databases on the number of readers.
The infeasibility of vertical scaling, as Koo was already utilizing high-end AWS Graviton2 processor-based instance types.
High latency resulting from simultaneous reads and writes from various microservices and data pipelines.
A lack of modularity, as Koo had adopted a single database for all applications, even when some use cases were not suited for a relational model.

Redesigning Koo’s Data Architecture

In response to these challenges, Koo recognized the need to overhaul their data persistence layer and reinvent their data architecture. They opted to decompose their data architecture into purpose-built databases tailored for specific use cases. The decision to implement key-value NoSQL databases was strategic, as these databases can effectively manage large volumes of data and accommodate high rates of state changes while maintaining low latency.

For more insights on how organizations can embrace change and learn from failure, check out this resource from SHRM, which provides valuable perspectives on organizational success. Additionally, for those interested in learning more about non-profit initiatives and their impact, this blog post by Career Contessa might be of interest. Lastly, if you’re looking for a visual resource on this topic, this YouTube video is an excellent resource.

The Transformation of Koo’s Database: Connecting Millions of Voices with Amazon DynamoDB

Koo’s Initial Monolithic Database Architecture

Challenges Encountered

Redesigning Koo’s Data Architecture

Related Topics:

Comments

Leave a Reply Cancel reply