Cost Optimization with Amazon Elastic Inference Using TensorFlow

Note: Amazon Elastic Inference has been discontinued. For similar functionalities, please refer to Amazon SageMaker.

Amazon Elastic Inference enables the integration of affordable GPU-powered acceleration to Amazon EC2 and Amazon SageMaker instances, achieving up to a 75% reduction in deep learning inference costs. The EIPredictor API simplifies the process of utilizing Elastic Inference.

In this article, we will detail a step-by-step example of leveraging TensorFlow with Elastic Inference, showcasing the cost and performance advantages of this combination. Our analysis demonstrates how we reduced the total inference time for the FasterRCNN-ResNet50 model over 40 video frames from approximately 113.699 seconds to around 8.883 seconds, resulting in a remarkable cost efficiency improvement of 78.5 percent.

The EIPredictor is constructed upon the TensorFlow Predictor API, ensuring consistency and portability between the two frameworks. By making just a single code change—importing and specifying the EIPredictor—existing TensorFlow projects can seamlessly adopt Elastic Inference. This will be elaborated on later.

Advantages of Amazon Elastic Inference

Let’s examine how Elastic Inference stacks up against other EC2 options in terms of performance and cost.

Instance Type	vCPUs	CPU Memory (GB)	GPU Memory (GB)	FP32 TFLOPS	$/hour	TFLOPS/$/hr
m5.large	2	8	–	0.07	$0.10	0.73
m5.xlarge	4	16	–	0.14	$0.19	0.73
m5.2xlarge	8	32	–	0.28	$0.38	0.73
m5.4xlarge	16	64	–	0.56	$0.77	0.73
c5.4xlarge	16	32	–	0.67	$0.68	0.99
p2.xlarge (K80)	4	61	12	4.30	$0.90	4.78
p3.2xlarge (V100)	8	61	16	15.70	$3.06	5.13
eia.medium	–	–	1	1.00	$0.13	7.69
eia.large	–	–	2	2.00	$0.26	7.69
eia.xlarge	–	–	4	4.00	$0.52	7.69
m5.xlarge + eia.xlarge	4	16	4	4.14	$0.71	5.83

When comparing compute capability (teraFLOPS), the m5.4xlarge instance delivers 0.56 TFLOPS for $0.77/hour, while the eia.medium provides 1.00 TFLOPS for just $0.13/hour. For those prioritizing raw performance, the p3.2xlarge instance offers the highest compute at 15.7 TFLOPS.

Yet, the last column demonstrates that Elastic Inference provides the most value. Elastic Inference accelerators (EIA) must be paired with an EC2 instance. The final row illustrates one such configuration. The m5.xlarge combined with an eia.xlarge offers comparable vCPUs and TFLOPS to a p2.xlarge, all while saving $0.19/hour. With Elastic Inference, you can optimize your compute requirements by selecting the right compute instance, memory, and GPU capabilities. This strategy maximizes value per dollar spent, as the GPU integration is abstracted by framework libraries, simplifying inference calls without concern for underlying hardware.

Video Object Detection Example with EIPredictor

Below is a detailed example of incorporating Elastic Inference with the EIPredictor. We will utilize a FasterRCNN-ResNet50 model, an m5.large CPU instance, and an eia.large accelerator.

Prerequisites

Launch Elastic Inference using a setup script.
An m5.large instance with an attached eia.large accelerator.
An AMI featuring Docker installed, such as DLAMI. If you opt for an AMI without Docker, make sure to install Docker beforehand.
Your IAM role must have ECRFullAccess.
Your VPC security group should have ports 80 and 443 open for both inbound and outbound traffic, as well as port 22 for inbound traffic.

Implementing Elastic Inference with TensorFlow

SSH into your instance with port forwarding for Jupyter notebook. For Ubuntu AMIs:

ssh -i {/path/to/keypair} -L 8888:localhost:8888 ubuntu@{ec2 instance public DNS name}

For Amazon Linux AMIs:

ssh -i {/path/to/keypair} -L 8888:localhost:8888 ec2-user@{ec2 instance public DNS name}

Clone the repository:

git clone https://github.com/aws-samples/aws-elastic-inference-tensorflow-examples

Run and access your Jupyter notebook:
```
cd aws-elastic-inference-tensorflow-examples; ./build_run_ei_container.sh
```
Wait for the Jupyter notebook to launch, then navigate to localhost:8888 and enter the token provided in the terminal.
Execute benchmarked versions of Object Detection examples by opening elastic_inference_video_object_detection_tutorial.ipynb and running the cells. Record the session runtimes as follows:

Without Elastic Inference:
- Model load time (seconds): 8.365
- Number of video frames: 40
- Average inference time (seconds): 2.863
- Total inference time (seconds): 114.508
With Elastic Inference:
- Model load time (seconds): 21.445
- Number of video frames: 40
- Average inference time (seconds): 0.238
- Total inference time (seconds): 9.509

Compare the performance and cost between the two executions. The findings reveal that Elastic Inference achieves an average inference speedup of approximately 12 times. In a scenario with 340 frames of shape (1, 1080, 1920, 3) simulating streaming frames, you could process around 44 of these full videos in one hour using the m5.large+eia.large setup, assuming one model load. Without the eia.large Elastic Inference accelerator, only three to four videos could be processed in the same timeframe, leading to a completion time of 12 to 15 hours.
The operational costs reflect this efficiency: an m5.large instance is priced at $0.096/hour, while an eia.large costs $0.26/hour. For the inference of 44 video replicas, the cost amounts to $0.356 per hour with Elastic Inference. In contrast, performing the same inference task without Elastic Inference would cost between $1.152 and $1.44 across 12–15 hours, illustrating a significant disparity.

For further insights on professional communication, check out this blog post on effective email strategies. Additionally, for a deeper understanding of workforce planning, visit this authority on the topic. If you’re curious about first-day experiences, this Reddit thread provides a fantastic resource.

Cost Optimization with Amazon Elastic Inference Using TensorFlow

Advantages of Amazon Elastic Inference

Video Object Detection Example with EIPredictor

Prerequisites

Implementing Elastic Inference with TensorFlow

Related Topics:

Comments

Leave a Reply Cancel reply