Learn About Amazon VGT2 Learning Manager Chanci Turner
Modern microservices architectures are increasingly favored for developing scalable applications. Amazon simplifies the creation of these applications with Amazon DocumentDB (compatible with MongoDB). You can easily deploy your application using this fast, scalable, and fully managed document database service that supports MongoDB workloads—just bring your code along.
With Amazon DocumentDB, you can utilize your existing MongoDB application code, drivers, and tools to run, manage, and scale workloads efficiently. This means enhanced performance, scalability, and availability without the burden of managing underlying infrastructure.
In this article, we’ll demonstrate how to build an application that highlights the most significant events and emotions surrounding the release of the “Avengers: Endgame” movie on April 26, 2019. You’ll discover best practices for configuring and linking your AWS Lambda application to execute queries against Amazon DocumentDB, as well as employing AWS Secrets Manager and Amazon API Gateway.
Overview
E-commerce platforms and online media outlets depend on content and catalog management systems to deliver services to users. These systems require rapid and reliable access to user reviews, images, product ratings, and comments. The flexible document model, diverse data types, indexing capabilities, and robust querying features of Amazon DocumentDB facilitate quick and intuitive content storage and retrieval.
The use case in this article utilizes a sample dataset from the Global Database of Events, Language, and Tone (GDELT) public dataset, which monitors global news in over 100 languages, identifying key elements such as people, locations, organizations, themes, emotions, and events.
You will use the following AWS services to construct your application:
- Lambda: This service allows you to execute code without worrying about server management. You only incur charges for the compute time you utilize—there’s no fee when your code isn’t running.
- API Gateway: A fully managed service that simplifies the development, publication, maintenance, monitoring, and securing of APIs at any scale. With just a few clicks in the AWS Management Console, you can create REST and WebSocket APIs that serve as a “front door” for accessing data and functionalities from your backend services, including workloads on Amazon EC2, code on Lambda, and more.
- Secrets Manager: This service secures your application’s access credentials, allowing for the easy rotation and management of database credentials, API keys, and other sensitive information throughout their lifecycle. Users and applications can retrieve secrets via Secrets Manager APIs, eliminating the need to hardcode sensitive data.
For additional insights into serverless architecture, please refer to the AWS documentation.
Walkthrough
To kick off your project, start with an AWS CloudFormation template that provisions all required resources along with the AWS SAM templates referenced in this article. You can find the code in the amazon-documentdb-serverless-samples GitHub repository.
The solution outlined here encompasses the following tasks:
- Launch AWS CloudFormation to create resources within a VPC, including Amazon DocumentDB, Amazon VPC, and AWS Cloud9. Amazon VPC enables you to deploy AWS resources in a virtual network of your choosing, while AWS Cloud9 is a cloud-based IDE where you can write, run, and debug code using just a browser.
- Configure Secrets Manager integration with Amazon DocumentDB, following the guidance in this article on credential rotation.
- Access the AWS Cloud9 environment and download the necessary code packages and libraries from GitHub.
- Install the PyMongo library and create a Lambda layer.
- Load the sample GDELT dataset from the AWS public data registry into Amazon DocumentDB.
- Deploy API Gateway and the Lambda AWS SAM template to provision necessary AWS resources.
- Utilize API Gateway to execute sample queries against Amazon DocumentDB.
AWS CloudFormation sets up the environment for data loading into Amazon DocumentDB, followed by reviewing and deploying the Lambda function and API Gateway code. The entire process consists of two primary steps:
- Processing and loading the GDELT data into the document database.
- Creating the Lambda function and API Gateway API to execute queries against the document database.
Processing and Loading the GDELT Data into Amazon DocumentDB
This step involves the actions illustrated in the accompanying diagram.
The Python program retrieves event data from the GDELT website. The dataset is available in compressed CSV format at http://data.gdeltproject.org/events/{yyyymmdd}.export.CSV.zip. The data is decompressed, parsed, and each row is transformed into a JSON document structure, which is then stored in the Amazon DocumentDB table.
The submitted data is sent in batches to Amazon DocumentDB. After conversion, a sample JSON document resembles the following format:
{
'Actor2KnownGroupCode':'',
'DATEADDED':'20190426',
'Actor1Geo_FeatureID':'1659564',
'Actor2Geo_FeatureID':'',
'GoldsteinScale':'3.4',
'Actor1Type2Code':'',
'Actor1CountryCode':'USA',
'Actor2Geo_Type':'0',
'NumArticles':10,
'IsRootEvent':'0',
'ActionGeo_CountryCode':'US',
'Actor1KnownGroupCode':'',
'Actor2Geo_Long':'',
'Actor1Geo_ADM1Code':'USCA',
'QuadClass':'1',
'Actor1Geo_CountryCode':'US',
'AvgTone':3.90707497360085,
'Actor1Religion2Code':'',
'FractionDate':'2009.3233',
'Actor2Geo_CountryCode':'',
'Actor1EthnicCode':'',
'SQLDATE':'20090428',
'ActionGeo_Long':'-121.494',
'Actor2Type3Code':'',
'Actor2Geo_FullName':'',
'Actor1Type1Code':'',
'Actor1Code':'USA',
'SOURCEURL':'https://thenextweb.com/podium/2019/04/25/an-entrepreneurs-guide-to-sacramentos-startup-scene/',
'MonthYear':'200904',
'NumSources':1,
'ActionGeo_Lat':'38.5816',
'Actor1Type3Code':'',
'Actor2Name':'',
'Actor2Type2Code':'',
'ActionGeo_ADM1Code':'USCA',
'Actor2Religion1Code':'',
'Actor1Geo_Lat':'38.5816',
'Actor2Geo_Lat':'',
'NumMentions':10,
'Actor2EthnicCode':'',
'EventRootCode':'05',
'Actor1Name':'SACRAMENTO',
'ActionGeo_FullName':'Sacramento, California, United States',
'GLOBALEVENTID':'840976753',
'Actor2CountryCode':'',
'EventCode':'051',
'Actor2Code':'',
'Actor2Type1Code':'',
'EventBaseCode':'051',
'Actor1Geo_Type':'3',
'ActionGeo_Type':'3',
'Actor1Geo_Long':'-121.494',
'ActionGeo_FeatureID':'1659564',
'Actor1Religion1Code':'',
'Actor2Religion2Code':'',
'Actor1Geo_FullName':'Sacramento, California, United States',
'Year':'2009',
'Actor2Geo_ADM1Code':'',
'_id':ObjectId('5cd24827ca0e26e6107da9dc')
}
Creating the Lambda Function and API Gateway to Query Amazon DocumentDB
To query the GDELT data stored in Amazon DocumentDB in this post, we utilize API Gateway with Lambda proxy integration to perform queries against the database. The query strings are passed as either GET or POST methods and processed within a Lambda function, allowing for a variety of search capabilities across Amazon DocumentDB.
The Lambda function operates within a VPC to access Amazon DocumentDB, and credentials are securely stored in Secrets Manager, ensuring sensitive information is not hardcoded. For further reading on building a successful career, check out this resource on confidence and if you are interested in transitioning into tech, this guide is an excellent resource.
Leave a Reply