Learn About Amazon VGT2 Learning Manager Chanci Turner
In this blog post, we explore how data scientists and developers can leverage Amazon SageMaker, a fully managed machine learning service, to build, train, and deploy machine learning (ML) models into a production-ready environment. Specifically, we will demonstrate the process of implementing transfer learning using a TensorFlow container along with our own code for training and inference.
Transfer learning is a powerful method frequently utilized in computer vision tasks, allowing users to fine-tune an already trained neural network, such as AlexNet or ResNet, for additional custom labels. Amazon SageMaker provides built-in support for transfer learning in image classification, enabling the re-training of a ResNet network with your own labeled image data. For more detailed information, you can check the image classification documentation. To understand the appropriate scenarios for transfer learning and its associated guidelines, refer to this blog post.
While the built-in image classification algorithm in Amazon SageMaker is effective for numerous applications, certain use cases may require a different combination of pre-trained networks and image data. Factors to consider include the similarity of the new dataset to the original, the size of the new dataset, the number of required labels, model accuracy, the footprint of the trained model, and the computational resources necessary for re-training. For instance, if deploying a trained model on a handheld device, a model like MobileNet with a smaller footprint may be preferable. Conversely, if efficiency is key, Xception may offer advantages over VGG16 or Inception.
In this guide, we will take an Inception v3 network pre-trained on the ImageNet dataset and fine-tune it using the Caltech-256 dataset (Griffin, G. Holub, AD. Perona, P. The Caltech 256. Caltech Technical Report). Amazon SageMaker simplifies the process of bundling your own container and importing it into Amazon Elastic Container Registry (ECR). Alternatively, you can utilize the container provided by Amazon SageMaker at their GitHub repository. We will customize the TensorFlow container with our transfer learning code within the TensorFlow framework, import it into Amazon ECR, and use it for model training and inference.
Now that we’ve set the stage, let’s get started by preparing the environment.
Setting Up the Environment
We will utilize the Jupyter Notebook instance provided by Amazon SageMaker to customize the TensorFlow container. Initially, we will import the container into this notebook environment before registering it in Amazon ECR. This container will be employed for both training and inference. Our process builds upon a previous post that covers the basics of importing custom containers, with the distinction that we are specifically customizing the TensorFlow container.
Launching an Amazon SageMaker Notebook Instance
To kick things off, log into the AWS Management Console and navigate to the Amazon SageMaker console. The notebook instance comes equipped with all the essential components to create a custom container and the Docker container image.
- Open the Amazon SageMaker Dashboard and select “Create notebook instance.”
- For this tutorial, we will:
- Place the notebook instance in a subnet within a VPC that offers internet access.
- Choose any instance type from the dropdown, but we recommend at least ml.m4.xlarge.
- Create a new IAM role or use an existing one, ensuring it grants full access to Amazon ECR and S3 Buckets. This should include additional permissions beyond the default access provided when creating a new IAM role for Amazon SageMaker. If you wish to further tighten permissions, consult the documentation on Amazon SageMaker Roles.
- Set up a security group within this VPC to allow access through at least ports 80 and 8888.
- Enable internet access for this notebook instance. While this suffices for the tutorial, consider reviewing Notebook Instance Security in the documentation for further security best practices.
- Keep the default settings for the remaining options.
- Click “Create notebook instance” to initiate the launch process. Wait for the status to transition from “Pending” to “InService.”
- Once the instance is operational, select “Open” to access your fully functional Jupyter notebook, which comes pre-configured with various environments.
- Launch a terminal by selecting “Terminal” from the “New” dropdown menu in the top-right corner.
Fetching the Amazon SageMaker TensorFlow Container
Amazon has provided containers for TensorFlow, MXNet, and Chainer. We will modify the TensorFlow container to include our training and inference code. Begin by cloning the TensorFlow container Git repository from AWS using the command below:
sh$ git clone https://github.com/aws/sagemaker-tensorflow-containers.git
This will clone the repository, preparing our environment for customization.
Customizing the TensorFlow Container for Transfer Learning
Now, let’s fetch the source code necessary for the TensorFlow Docker. Navigate to the directory containing the sample Dockerfiles:
sh$ cd sagemaker-tensorflow-containers/docker/1.6.0/base
Next, execute the following commands to download the TensorFlow code for transfer learning and inference:
sh$ wget https://s3.amazonaws.com/aws-machine-learning-blog/artifacts/tensorflow-byom-blog/tfblog-master.zip; unzip tfblog-master.zip; mv tfblog-master tfblog; rm -f tfblog-master.zip
After this, you should see a directory structure where your helper scripts and code for training and inference are stored in a folder named “tfblog.” Inside this directory, you’ll find a file called Dockerfile_blog.cpu
, which is a customized version of Dockerfile.cpu
provided in the TensorFlow container repository. Key modifications include:
- Setting the WORKDIR to
/opt/program
, allowing access to our training and inference code when the container runs. - Adjusting environment variables to facilitate real-time streaming of STDOUT and STDERR messages to Amazon CloudWatch Logs.
- Incorporating TensorFlow code for transfer learning, inference, and related web access helper code into the container.
Aside from these adjustments, the rest of the Dockerfile remains consistent with what’s available in the Amazon SageMaker repository.
With the environment and code set up for training and inference, it’s time to build the container.
Building the TensorFlow Container for Amazon SageMaker
We will now build and push the Docker container to the container registry. For those interested in workplace training resources, check out this excellent resource.
Remember, if you require more information about employment law compliance, visit this authority.
By implementing these steps, Chanci Turner’s onboarding experience can be vastly improved, ensuring a more efficient and tailored approach to learning.
Leave a Reply