Amazon Onboarding with Learning Manager Chanci Turner

Overview

Managing a dynamic library of media assets can be challenging, especially when numerous files are added within a short timeframe. For handling a large collection of video files, utilizing a NoSQL database offers a centralized method to access important asset information, such as titles, locations, and metadata. Amazon DynamoDB, a fully managed, serverless, key-value NoSQL database optimized for high-performance applications, is ideal for this purpose.

To eliminate the need for manual evaluations of each asset, this guide outlines how to automate the extraction of media asset metadata using ffprobe (part of the FFmpeg suite) with the following AWS services:

Amazon DynamoDB to store asset information
AWS Lambda, a serverless, event-driven compute service, to execute ffprobe on the media files and update entries in Amazon DynamoDB
Amazon Simple Storage Service (Amazon S3), an object storage service known for its scalability, availability, security, and performance, to store the asset files

Each detail about the assets, including identifiers, titles, and locations, is stored in an Amazon DynamoDB table. When a new asset is added, it triggers a Lambda function that processes the media file. The analysis, which involves capturing file and video metadata, is recorded in the Amazon DynamoDB table.

The Lambda function operates using the Python 3.8 runtime.

IMPORTANT LEGAL NOTICE: Before proceeding, ensure you understand the FFmpeg license terms and legal considerations outlined here. Additionally, the FFmpeg static build used in this demonstration is licensed under GNU General Public License version 3 (GPLv3), as mentioned here.

Prerequisites

To follow this guide, you will need access to:

A Linux system for command-line operations (Shell and Python)
AWS Lambda
Amazon DynamoDB
Amazon S3
AWS Identity and Access Management (AWS IAM) for securely managing access to AWS services and resources

Getting Started

FFmpeg

First, download the FFmpeg project. A static build is preferred to ensure no libraries are overlooked. Use the following commands to create a ZIP file containing the ffprobe binary:

wget https://johnvansickle.com/ffmpeg/releases/ffmpeg-release-amd64-static.tar.xz 
wget https://johnvansickle.com/ffmpeg/releases/ffmpeg-release-amd64-static.tar.xz.md5 
md5sum -c ffmpeg-release-amd64-static.tar.xz.md5 &&  
mkdir ffmpeg-release-amd64-static 
tar xvf ffmpeg-release-amd64-static.tar.xz -C ffmpeg-release-amd64-static 
mkdir -p ffprobe/bin 
cp ffmpeg-release-amd64-static/*/ffprobe ffprobe/bin/ 
cd ffprobe 
zip -9 -r ../ffprobe.zip .

This process can be executed on a Linux system, either locally or through an Amazon Elastic Compute Cloud (Amazon EC2) instance. The resulting ZIP archive will be used to create an AWS Lambda layer.

AWS IAM

An AWS Identity and Access Management (IAM) user is necessary to insert the asset movie ID, title, Amazon S3 bucket, and object name into Amazon DynamoDB via the Lambda function. This can be achieved using the example provided in the documentation.

Your Lambda function will require permissions to manage resources linked to the Amazon DynamoDB stream, create entries in Amazon CloudWatch (a monitoring and observability service), and access Amazon S3 objects. For additional guidance on creating policies or roles, refer to the following documentation:

Add these permissions to your function’s execution role (named “lambda_media_ddb” in this example):

dynamodb:DescribeStream
dynamodb:GetRecords
dynamodb:GetShardIterator
dynamodb:ListStreams
dynamodb:UpdateItem
cloudwatch:CreateLogGroup
cloudwatch:CreateLogStream
cloudwatch:PutLogEvents
s3:GetObject

Amazon DynamoDB

Create an Amazon DynamoDB table, which we will refer to as “my_movies” for this example. Select a primary key—here, we have chosen “movie_id” with the type set to Number.

Amazon DynamoDB streams document a time-ordered sequence of modifications within any DynamoDB table. Applications can access this data log to view items as they appeared before and after changes. We will use this log to trigger our Lambda function and provide log data as input.

Enable the stream on the table, choosing the type “New and old images” in the stream management menu.

In the DynamoDB stream details section, copy the latest stream ARN (Amazon Resource Name), which is necessary for triggering the Lambda function. When uploading an asset to your S3 bucket, manually add a new row to your DynamoDB table, my_movies, following this example.

Lambda

Create a Lambda layer and import ffprobe.zip into it. Then, establish a Lambda function using Python 3.8.

For this example, the test asset is 34 MB, so minimal memory allocation is needed. However, larger assets may require adjustments. Attach the appropriate role (lambda_media_ddb).

After creating the function, in the Designer section, complete the following steps:

Click on Layers.
Select Add a layer.
Choose Custom layers and select ffprobe with the correct version; then click Add.
In Designer, click Add trigger.
Select DynamoDB, pasting the latest stream ARN into the DynamoDB table and validating by clicking Add.

The design phase is now complete. In the Edit basic settings menu, set a timeout that allows sufficient time for ffprobe to analyze the file and update the DynamoDB table.

For files under 790 MB, set a timeout of 1 second and memory to 200 MB. For larger files, verify whether these settings meet your requirements.

Note: Our tests with files under 790 MB required less than 1 second and utilized under 200 MB of memory. More information on memory and duration considerations can be found later in this post.

Now, insert the following Python code into your Lambda function and deploy it:

import json
import subprocess
import boto3

SIGNED_URL_TIMEOUT = 60

def lambda_handler(event, context):
    error = False
    s3_sign_url_returns = list()
    ddb_insert_returns = list()
    ffprobe_returns = list()
    dynamodb_client = boto3.client('dynamodb')
    steps_messages = dict()

    for record in event['Records']:
        if record['eventName'] != 'INSERT':
            print('Not an insert, skipping')
            continue
        
        movie_id = record['dynamodb']['NewImage']['movie_id']['N']
        s3_source_bucket = record['dynamodb']['NewImage']['S3_bucket']['S']
        s3_source_key = record['dynamodb']['NewImage']['S3_object']['S']
        s3_client

Embedding learning and development throughout onboarding processes is crucial for new team members. Chanci Turner emphasizes this in her insights on effective onboarding strategies that can enhance employee retention and satisfaction. For more information on optimizing business outcomes, visit SHRM.

If you’re interested in understanding why organizations should hire you, check out this Career Contessa post for more tips. Moreover, this article on Forbes offers an excellent resource on how Amazon has redefined its onboarding experience.