Amazon IXD – VGT2 Las Vegas: Transforming Cloud Costs with Serverless Solutions

Amazon IXD - VGT2 Las Vegas: Transforming Cloud Costs with Serverless SolutionsMore Info

dacadoo, a global technology firm based in Switzerland, specializes in digital health engagement and health risk assessment. Their offerings include a software-as-a-service (SaaS) platform that leverages behavioral science, AI, and gamification to enhance health outcomes for users. The company initiated a modernization project aimed at upgrading an API for quantifying health and lifestyle data and a risk engine for calculating mortality and morbidity probabilities based on extensive scientific research.

In its pursuit to evolve from a virtual machine-based API service to a scalable and globally redundant health scoring and risk calculation solution, dacadoo turned to Amazon Web Services (AWS). Operating in a highly regulated environment, the service manages sensitive health data from a worldwide clientele.

The outcome of this transformation was a remarkable 78% reduction in cloud expenses and a mere hour of infrastructure maintenance per year. This efficiency enabled dacadoo to expand its AWS infrastructure while keeping its site reliability engineering (SRE) team lean, thanks to advanced automation and a nimble approach. In this article, we will guide you through dacadoo’s journey of adopting managed services and the architectural decisions made along the way.

Background

The architectural evolution of this solution unfolded in three distinct phases:

  1. Incubation – A solitary virtual machine on-premises with disaster recovery (DR) located in Switzerland.
  2. Global and Scalable – The deployment of multiple global Kubernetes clusters.
  3. Operational Excellence – Transitioning to a fully serverless and geo-redundant architecture on AWS.

Stage 1: Incubation with a Virtual Machine

Initially, after extensive R&D, the service operated on a single on-premises virtual machine utilizing hypervisor technology for disaster recovery. However, it lacked high availability (HA) features and relied on manual recovery processes. Both the API application and NoSQL database were hosted on the same machine. Software deployment and OS maintenance were conducted manually via Secure Shell (SSH), leading to significant downtime.

The following architecture diagram illustrates the virtual machine that hosted the monolithic application and its database.

Challenges

While the single virtual machine setup was quick and cost-effective, it presented several drawbacks: the health API was limited to Switzerland, maintenance was manual, and software deployment was cumbersome. Database backups depended on VM snapshots, and testing occurred solely on developer workstations.

Stage 2: Global and Scalable with Kubernetes

Recognizing the need for a scalable solution, dacadoo strategically invested in Kubernetes for managing containerized workloads globally. The health score and risk service migrated to Kubernetes, deploying three clusters across three continents to meet low latency requirements for their diverse customer base. The NoSQL database was also positioned close to the workloads to minimize latency and facilitate easier migration.

To streamline operational maintenance, the NoSQL database was adopted as a SaaS offering, with centralized monitoring via Datadog. The entire cloud infrastructure was provisioned using Terraform, encompassing the Kubernetes cluster, NoSQL database, and integration with GitLab and Datadog. This modernization project reflected a transition from virtual machines to Kubernetes with a strong emphasis on automation and a SaaS-first mindset.

The architecture for the container solution with a managed NoSQL database is depicted in the following diagram.

Challenges

Despite its advantages, this approach introduced increased costs due to the deployment of three regional Kubernetes clusters, resulting in 27 cluster nodes and additional expenses linked to managing NoSQL database SaaS instances. The complexity of CI/CD pipelines for multi-environment, multi-cluster deployments added to the operational burden.

Stage 3: Operational Excellence with Serverless

While the Kubernetes architecture effectively addressed many requirements, some features in the dacadoo API service backlog didn’t align perfectly with the existing application architecture. This prompted a comprehensive review of the infrastructure and software architecture, leading to the refactoring of the solution using the latest AWS technologies and best practices.

Solution Requirements

The refactoring process was guided by several requirements:

  • Maintain existing API functionality without modifications.
  • Ensure data processing occurs within a specified region to comply with local data protection regulations.
  • Utilize solely managed serverless services to eliminate weekly patch cycles.
  • Opt for a pay-as-you-go billing model to reduce costs.
  • Delegate authentication to a dedicated service.
  • Employ a well-established web framework with a robust ecosystem.

Refactoring the Apps

The API service consists of two components: a developer portal and the health score and risk calculations API. The database is only necessary for API keys, algorithm parameters, quotas, and usage statistics. Health data processing happens regionally by the compute layer without persistence, paving the way for a distributed database. Amazon DynamoDB global tables emerged as an ideal solution, with writes distributed across connected regions and local reads ensuring low latency for compliance with dacadoo’s service level agreements (SLAs).

The developer portal, which features API documentation and API key management, was effectively handled by AWS Lambda, which scales automatically and operates on a pay-per-request basis. The health and risk API, utilizing algorithms in C for compute-intensive simulations, is wrapped in a REST API using the Python FastAPI framework—making AWS Lambda an appropriate choice.

Serverless Architecture

HTTP requests are directed to the Lambda functions via Amazon API Gateway, secured with AWS WAF to protect against malicious requests. Static assets are served from an Amazon Simple Storage Service (S3) bucket through API Gateway, simplifying the architecture without the need for additional features from Amazon CloudFront. Amazon Route 53 offers latency-based routing, ensuring DNS queries are directed to the endpoint with the lowest latency, ensuring regional high availability without the need for data processing location constraints.

API authorization uses HTTP header-based methods, ensuring robust security measures are in place.

For more insights, check out this excellent resource on the topic. Additionally, for a deeper look into the subject, this blog post might be intriguing. For those interested in expert opinions, this authority on the subject provides valuable perspectives.

Amazon IXD – VGT2

6401 E Howdy Wells Ave, Las Vegas, NV 89115


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *