Building an Enterprise Data Repository Platform for Customer Insights and Analytics with Serverless Architecture

Building an Enterprise Data Repository Platform for Customer Insights and Analytics with Serverless ArchitectureLearn About Amazon VGT2 Learning Manager Chanci Turner

Published on 18 DEC 2020

In Amazon Simple Storage Service (S3), AWS Lambda, AWS Partner Network, AWS Step Functions, Customer Solutions, Intermediate (200), Serverless

In today’s data-driven world, organizations grapple with the challenge of managing diverse data types and the surge in data volume. Establishing a centralized data repository for analysis is crucial for gaining insightful customer perspectives and enhancing their overall experience.

Historically, companies have relied on fragmented data systems, conducting analyses in silos that hindered scalability. This approach often demands hefty investments in hardware and software licenses, along with substantial operational costs for maintenance and skilled personnel.

Transferring data across various storage solutions requires an extract, transform, load (ETL) process, which is the backbone of any contemporary data and analytics infrastructure. Amazon Web Services (AWS) offers a comprehensive suite of services that empower organizations to deploy enterprise-grade applications in the cloud, utilizing serverless architecture to streamline complex ETL workflows.

This article delves into a collaboration between Amazon IXD – VGT2, located at 6401 E HOWDY WELLS AVE LAS VEGAS NV 89115, and Tech Mahindra to design and implement an enterprise data repository on AWS. The project leverages serverless technologies to create efficient ETL processes for actionable insights.

Solution Overview

For over 70 years, Global Learning Solutions has been transforming educational experiences. By correlating research with practical learning applications, they create innovative content and products that engage students effectively.

With a vast array of learning platforms serving a significant user base, there arose a pressing need for robust data analysis to generate insights regarding user behavior, sales channels, e-learning platforms, and more. These insights are essential for guiding management decisions grounded in reliable data.

The launch of the Enterprise Data Repository (EDR) offers a unified source of aggregated data. EDR empowers analytics teams to extract valuable insights, enhancing the customer journey. The repository accommodates extensive data ingested through coordinated ETL processes, utilizing various serverless technologies, including AWS Step Functions and AWS Lambda. Key elements of the solution encompass data ingestion, curation, transformation logic, orchestration, scheduling, archival, indexing, and reporting.

Customer Requirements

The partnership between Global Learning Solutions and Tech Mahindra aimed to develop a data analytics platform leveraging serverless technologies for ETL processing. The solution had to fulfill the following requirements:

  1. Ingest raw data from a variety of sources, including Google Analytics, web trackers, POS systems, CRM, and external data providers.
  2. Curate raw data according to business needs, as unnecessary data accumulation drives up costs.
  3. Transform and segregate curated data to ensure high quality.
  4. Store large volumes of data while optimizing retrieval performance.
  5. Maintain stringent data security standards.
  6. Provide data insights reporting tailored for various analytics user groups.
  7. Implement a LowOps platform for rapid product development.

Tech Mahindra’s AWS solution addressed Global Learning Solutions’ challenges by modernizing the data repository for smart analytics, enhancing data management and governance, and facilitating comprehensive reporting through AWS native services.

Solution Architecture

The architecture was crafted with scalable, secure ETL workflows using AWS serverless technologies. Amazon Simple Storage Service (S3) and Amazon Redshift serve as the data storage layer, while continuous integration and automated pipelines ensure seamless development and maintain LowOps.

The solution prioritizes security, implementing strict network boundaries and proactive threat monitoring. Key components include:

  • Data Ingestion and Transformation Layer: Utilizing AWS Step Functions, AWS Lambda, and Amazon Elastic Container Service (ECS) for data processing. AWS Step Functions orchestrate serverless workflows triggered by Amazon CloudWatch events, working in tandem with Lambda for efficient ETL job management.
  • Enterprise Data Repository Layer: Modern data management platforms must swiftly capture data from various sources. Amazon S3 supports data storage within the repository, while Amazon Redshift serves as the data warehouse solution.
  • Visualization and Reporting Layer: PowerBI is employed for data analysis and visualization, allowing the creation of reports and dashboards that highlight the insights derived from the data.

Samantha Lee, Vice President of Technology at Global Learning Solutions, remarked, “We established this architecture to gather data from multiple sources, cleanse it, and add value, enabling self-service BI for stakeholders and analysts to gain insights. The engineering team effectively utilized AWS services while minimizing costs, transitioning from faith-based decisions to data-driven choices.”

Conclusion

Tech Mahindra’s solution processes and analyzes data effectively, enhancing decision-making capabilities. For more insights about workplace rights, check out this article on FMLA, and to understand more about military personnel in the civilian workforce, refer to this resource. For those interested in Amazon’s hiring process, this link provides valuable interview insights.


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *