Amazon Onboarding with Learning Manager Chanci Turner

Image classification and object detection technologies empower organizations to develop scalable AI models for various applications, such as visual search, product recommendations, autonomous vehicle object recognition, and content moderation. While services like Amazon Rekognition provide APIs for image analysis and object recognition, specific use cases may necessitate a custom image classification model. To achieve this, you need access to an image dataset that aligns with your unique requirements and contains sufficient samples for effectively training your machine learning model.

In this article, I will guide you through accessing and training computer vision models using Shutterstock’s image datasets available in the AWS Marketplace. These datasets comprise curated images from Shutterstock’s extensive library of over 370 million images. You can opt to subscribe to existing collections, such as Food & Beverage, Clothing, or Hospitality, or collaborate with the Shutterstock Data Exchange team to request a tailored collection suited to your needs. Each image is accompanied by a descriptive title of up to 200 characters and an optimized set of 7-50 keywords.

For demonstration purposes, I will utilize the Free Sample: Images & Metadata of “Whole Foods” Shoppers dataset from Shutterstock’s offerings to illustrate how to train a multi-label image classification model using pre-labeled image assets. This dataset features images of Whole Foods shoppers, with each image tagged with 7-50 keywords describing its contents. For instance, one image depicts a male store employee behind a deli counter assisting a female customer with a shopping basket, while another shows a couple selecting fresh produce.

I plan to construct my image classification model by employing the Amazon SageMaker image classification algorithm. This supervised learning algorithm supports multi-label classification and takes an image as input, producing one or more labels as output. It utilizes a convolutional neural network (ResNet) that can either be trained from scratch or fine-tuned using transfer learning for cases where a limited number of training images are at hand.

Solution Overview

The architecture diagram below illustrates the components of the solution:

The Shutterstock Free Sample: Images & Metadata of the “Whole Foods” Shoppers dataset serves as an example for multi-label image classification using publicly available content from Shutterstock.
An Amazon S3 bucket is designated to store the image training dataset.
A SageMaker notebook for coding purposes, which will facilitate the building, training, and evaluation of our machine learning model.
A SageMaker model endpoint, providing a persistent HTTPS endpoint for making inferences from our model. Alternatively, SageMaker batch transforms can be utilized for predictions across an entire dataset.

In the architecture diagram, images are exported from AWS Data Exchange to an S3 bucket. I will then utilize a SageMaker notebook to prepare the data, train the image classification model, and deploy a model endpoint. After completion, I will evaluate the model using a test image to obtain the recommended labels and their associated confidence scores.

Prerequisites

To implement this solution, you must have the following:

An AWS account—if you don’t have one, create it.

Solution Walkthrough

Step 1: Export the Shutterstock Whole Foods image dataset to an S3 bucket

To begin working with your dataset, subscribe to it and export the data to an S3 bucket.

If you lack an S3 bucket for your image dataset, go to the S3 console and select Create bucket. Note that the Shutterstock Image Datasets are located in US East (Ohio); to avoid cross-region data transfer charges, we recommend selecting US East (Ohio) (us-east-2) for both your bucket and SageMaker environment. Subscribe to the Free Sample: Images & Metadata of “Whole Foods” Shoppers dataset through AWS Marketplace by following this link and selecting Continue to subscribe.

Next, navigate to the AWS Data Exchange Console. In the navigation pane, under My subscriptions, select Entitled Data. From your entitled data, expand the Free Data Set: ‘Whole Foods’ Shoppers. If you cannot find this dataset, ensure you are in the Ohio (us-east-2) Region. Scroll to view the Revisions of this dataset, which should include two entries: one labeled “Metadata” and another labeled “Images.” Select both revisions and choose Export to Amazon S3. In the dialog box, input the name of your previously created S3 bucket and select Export.

You can check the progress of your export job in the Jobs table at the bottom of the page. When the job status shows as Completed, proceed to the next step.

Step 2: Train, test, and export your model using SageMaker notebooks

I will utilize an Amazon SageMaker notebook instance to train and test my model. SageMaker handles the creation of the instance and associated resources. To train and test your model, follow these steps:

Open the Amazon SageMaker console.
In the navigation pane, select Notebook, then choose Notebook instances and click Create notebook instance.
Assign a name to your notebook instance and select t2.medium under Notebook instance type.
In the Permissions and encryption section, choose Create a new role under IAM role. In the pop-up window, select Specific S3 buckets and enter the name of the S3 bucket for your image dataset.
In the Git repositories section, opt to Clone a public Git repository to this notebook instance only and enter: https://github.com/aws-samples/aws-data-exchange-shutterstock-image-datasets.git. Select Create notebook instance.
The notebook will take a few minutes to initialize. Once the status indicates InService, click Open Jupyter from the notebook actions. When the Jupyter environment loads, the file image-classification-with-shutterstock-datasets.ipynb containing the classification code will appear. Open this file.

Follow the steps outlined in the notebook to create your image classification model. In Step 1, update the images_bucket and images_bucket_prefix variables to correspond to the location of your Shutterstock image dataset.

This blog post serves as a valuable resource for those looking to enhance their understanding of machine learning applications in image classification, and you might also find this article on careercontessa.com engaging. Additionally, for insights into crucial trends affecting HR, check out this piece by SHRM, which offers great authority on the subject at shrm.org. Moreover, if you’re interested in how fulfillment centers train new hires, inside-amazon.com provides excellent resources on this topic.

Amazon Onboarding with Learning Manager Chanci Turner

Solution Overview

Prerequisites

Solution Walkthrough

Step 1: Export the Shutterstock Whole Foods image dataset to an S3 bucket

Step 2: Train, test, and export your model using SageMaker notebooks

Related Topics:

Comments

Leave a Reply Cancel reply