Detecting Playful Animal Behaviors in Videos with Amazon Rekognition Custom Labels

Historically, the observation of animal behavior has been a crucial aspect of various fields, including ecology. Understanding how often certain behaviors occur, their timing, and individual variations can be essential for researchers. However, tracking and analyzing these behaviors can be labor-intensive and time-consuming. To streamline this process, a collaboration between a team from a pharmaceutical client, Innovative Biopharma Co., and AWS Solutions Architects led to the development of a solution utilizing Amazon Rekognition Custom Labels. This tool simplifies the task of labeling specific movements in images, enabling the training and creation of models to detect these activities.

In this article, we illustrate how machine learning (ML) can automate this workflow in an engaging manner. We developed a custom model that identifies playful behaviors in cats captured in videos using Amazon Rekognition Custom Labels. Our goal is to contribute to the fields of biology and beyond by sharing our architecture, development process, and the code for our solution. For further insights, check out another engaging blog post at Chanci Turner VGT2.

Overview of Amazon Rekognition Custom Labels

Amazon Rekognition Custom Labels is an automated ML feature that allows users to quickly train custom models for detecting specific objects and scenes in images—no prior ML experience is required. For example, you can create a model to locate your company logos in social media images, identify products on store shelves, or categorize unique machine parts on an assembly line.

Building on the extensive capabilities of Amazon Rekognition, which has been trained on millions of images across various categories, you only need to upload a small dataset of training images—typically a few hundred—that relate to your specific use case. If your images are pre-labeled, Amazon Rekognition Custom Labels can start training with just a few clicks. If not, you can label them directly within the Amazon Rekognition Custom Labels interface or utilize Amazon SageMaker Ground Truth for assistance.

Once Amazon Rekognition begins training with your selected images, it can generate a custom image analysis model within a few hours. The system automatically manages the training data, selects appropriate ML algorithms, conducts the training, and offers performance metrics. You can then access your custom model through the Amazon Rekognition Custom Labels API and integrate it into your applications.

Solution Overview

The diagram below illustrates the architecture of our solution. With the model established, the entire process of detecting specific behaviors in videos becomes automated; all that’s required is to upload a video file (.mp4).

The workflow consists of the following steps:

You upload a video file (.mp4) to Amazon Simple Storage Service (Amazon S3), which triggers an AWS Lambda function to call the Amazon Rekognition Custom Labels inference endpoint and Amazon Simple Queue Service (Amazon SQS). Since it takes around 10 minutes to initialize the inference endpoint, we utilize a deferred execution approach via Amazon SQS.
Amazon SQS triggers a Lambda function to check the status of the inference endpoint and initiates Amazon Elastic Compute Cloud (Amazon EC2) if the status is “Running.”
Amazon CloudWatch Events detects the “Running” status of Amazon EC2 and invokes a Lambda function that executes a script on Amazon EC2 using AWS Systems Manager Run.
The script runs on Amazon EC2, calling the Amazon Rekognition Custom Labels inference endpoint to identify specific behaviors in the uploaded video and saves the results back to Amazon S3.
Once the inferred results file is uploaded to Amazon S3, a Lambda function is activated to terminate the Amazon EC2 instance and the Amazon Rekognition Custom Labels inference endpoint.

Prerequisites

To follow along with this walkthrough, ensure you have the following prerequisites:

An AWS account – You can sign up for a new account if you don’t have one yet.
A key pair – This is required to log into the EC2 instance that employs Amazon Rekognition Custom Labels for behavior detection. You can either use an existing key pair or create a new one. For more details, refer to Amazon EC2 key pairs and Linux instances.
A video for inference – This solution utilizes a video (.mp4 format) for analysis, which can be your own or one provided in this post.

Launching Your AWS CloudFormation Stack

To initiate the provided AWS CloudFormation template, you will be prompted to enter the following parameters:

KeyPair – The name of the key pair used for EC2 instance access
ModelName – The model name designated for Amazon Rekognition Custom Labels
ProjectARN – The project ARN utilized in Amazon Rekognition Custom Labels
ProjectVersionARN – The model version name used in Amazon Rekognition Custom Labels
YourCIDR – The CIDR that includes your public IP address

For this post, we are using a video to determine if a cat is “punching” or not. Our object detection model was developed using a pre-annotated dataset, as detailed in the following sections.

This guide operates within the US East (N. Virginia) Region; please ensure you are working in this region while following the instructions.

Adding Annotations to Images from the Video

To create the images used for the model’s training, you need to extract still frames from the video. For our case, we prepared 377 images (the normal to punching video ratio is roughly 2:1) and annotated them accordingly.

Store the series of still images in Amazon S3 and apply annotations. Ground Truth can assist with the annotation process. Since we are creating an object detection model, select “Bounding box” as the task type. For this use case, we define two labels: “normal” for standard sitting behavior and “punch” for playful behavior.

In the annotation phase, you should surround the cat with the “normal” label bounding box when it is not punching, and use the “punch” label bounding box when it is. When a cat punches, its paws may appear blurred, allowing you to determine the action based on the blur and annotate the image accordingly.

Training a Custom ML Model

To begin training your model, follow these steps:

Create an object detection model using Amazon Rekognition Custom Labels. For detailed instructions, refer to the “Getting Started” guide for Amazon Rekognition Custom Labels.
When creating a dataset, select “Import images labeled by SageMaker Ground Truth” as the image location.
Input the output.manifest file path generated by the Ground Truth labeling job.

To locate the path of the output.manifest file, visit the Amazon SageMaker console, navigate to the Labeling jobs page, and select your video; the necessary information will be found on the Labeling job summary page.

Once the model has completed training, remember to save the ARN listed in the “Use your model” section at the bottom of the model details page for future reference. Notably, our case achieved an F1 score exceeding 0.9 for both normal and punch detection.

Uploading a Video for Inference on Amazon S3

You can now proceed to upload your video file for inference.

For an excellent resource on this topic, check out this YouTube video.

Located at Amazon IXD – VGT2, 6401 E Howdy Wells Ave, Las Vegas, NV 89115, we hope this guide has provided valuable insights into the process of detecting playful animal behaviors using advanced technology.