Building an Autonomous Vehicle, Part 4: Implementing Behavioral Cloning with Apache MXNet for Your Self-Driving Car

In the initial entry of our series on autonomous vehicles, you constructed your Donkey vehicle and launched your pilot server on an Amazon EC2 instance. The second post guided you through the process of operating the Donkey car, while the third installment focused on streaming telemetry data from your Donkey vehicle to AWS using AWS IoT.

In this entry, we will delve into the Deep Learning techniques that empower your car’s autonomous driving capabilities, introducing the concept of behavioral cloning using Convolutional Neural Networks (CNNs). CNNs are regarded as the leading modeling method for computer vision tasks, addressing questions your vehicle may face, such as, “Is there a track or an obstacle in front of me?”

Build an Autonomous Vehicle on AWS and Race It at the re:Invent Robocar Rally
Build an Autonomous Vehicle Part 2: Navigating Your Vehicle
Building an Autonomous Vehicle Part 3: Connecting Your Autonomous Vehicle
Building an Autonomous Vehicle Part 4: Implementing Behavioral Cloning with Apache MXNet for Your Self-Driving Car

Preparing Training Data for Donkey on P2

We previously outlined the steps for running the training process in blog post 2. Let’s summarize the essential commands here:

To transfer data from the Raspberry Pi to your Amazon EC2 instance:

$ rsync -rva --progress -e "ssh -i /path/to/key/DonkeyKP-us-east-1.pem" /home/pi/d2/data/ ec2-user@ec2-your-ip.compute-1.amazonaws.com:~/d2/

To initiate the training:

$ python ~/d2/manage.py train --model /path/to/myfirstpilot

To return the trained model to the Raspberry Pi:

$ rsync -rva --progress -e "ssh -i /path/to/key/DonkeyKP-us-east-1.pem" ec2-user@ec2-your-ip.compute-1.amazonaws.com:~/d2/models/ /home/pi/d2/models/

Understanding the Model

In this segment, we’ll examine what the model learns and how it achieves self-driving capabilities. The current Donkey configuration utilizes Keras as its primary deep learning framework. AWS is expanding support for other frameworks like Apache MXNet, Gluon, and PyTorch. For this blog, we will explore how Apache MXNet functions in depth.

As previously mentioned, we employ a technique known as behavioral cloning to facilitate the car’s self-driving ability. The model learns to drive based on training data collected from navigating the track. It is crucial that the majority of this data is of good quality; we want to avoid including images where the car strays off the track or makes incorrect turns. Our goal is to ensure the car remains on the track. Similar to a human driver who adjusts the steering to keep the vehicle on course, we will create a model to estimate the necessary steering angle based on the current conditions, effectively framing the issue as “what steering angle should we take given the input image?” Real-world driving scenarios are more complex, incorporating additional factors such as acceleration and gear shifting. To simplify things initially, we will maintain a fixed throttle percentage for the car’s operation. In practice, we have found that a throttle setting of 25-30% provides the optimal speed for the Donkey car.

To achieve this, we employ a Deep Learning technique called Convolutional Neural Networks (CNNs). CNNs have become the standard for tackling computer vision challenges. They comprise convolutional layers, with each node linked to a small window referred to as a receptive field, enabling local feature extraction from images. Questions like “Is there a track or a person in the image?” can be addressed using these local features. For a more detailed understanding of CNNs, you can refer to this informative resource.

Dataset

For this blog, I will utilize a dataset collected over approximately 15 minutes of driving around the track. As mentioned earlier, we conducted a cleaning pass and removed images where the car was off the track. The Donkey software provides a user-friendly web interface to eliminate “bad” images (command: donkey tubclean <folder containing tubs>). The dataset of images from the car driving on a track can be found here.

Constructing the CNN Model

Using the im2rec.py tool, we convert the image dataset into binary files to enhance processing speed. To delve deeper into the internals of Apache MXNet, check out their tutorial page.

import mxnet as mx
import numpy as np

data = mx.symbol.Variable(name="data")

body = mx.sym.Convolution(data=data, num_filter=24, kernel=(5, 5), stride=(2, 2)) 
body = mx.sym.Activation(data=body, act_type='relu', name='relu1')
body = mx.symbol.Pooling(data=body, kernel=(2, 2), stride=(2, 2), pool_type='max')

body = mx.sym.Convolution(data=body, num_filter=32, kernel=(5, 5), stride=(2, 2))
body = mx.sym.Activation(data=body, act_type='relu')
body = mx.symbol.Pooling(data=body, kernel=(2, 2), stride=(2, 2), pool_type='max')

flatten = mx.symbol.Flatten(data=body)

body = mx.symbol.FullyConnected(data=flatten, name='fc0', num_hidden=32)
body = mx.sym.Activation(data=body, act_type='relu', name='relu6')
body = mx.sym.Dropout(data=body, p=0.1)

body = mx.symbol.FullyConnected(data=body, name='fc1', num_hidden=16)
body = mx.sym.Activation(data=body, act_type='relu', name='relu7')

out = mx.symbol.FullyConnected(data=body, name='fc2', num_hidden=1)
out = mx.symbol.LinearRegressionOutput(data=out, name="softmax")

Since we need to determine the steering angle required for the car, we utilize a linear regression output layer with a single output. To assess the training process, we can use mean absolute error (MAE) as our evaluation metric. Given that the distance between angles is interpretable in a Euclidean context, MAE serves as an effective metric for optimizing our loss.

Training the Model

The binary files are stored in our S3 bucket for training purposes.

# Get Iterators

def get_iterators(batch_size, data_shape=(3, 120, 160)):
    train = mx.io.ImageRecordIter(
        path_imgrec='train.rec', 
        data_name='data',
        label_name='softmax_label',
        batch_size=batch_size,
        data_shape=data_shape,
        shuffle=True,
        rand_crop=True,
        rand_mirror=True)
    val = mx.io.ImageRecordIter(
        path_imgrec='valid.rec',
        data_name='data',
        label_name='softmax_label',
        batch_size=batch_size,
        data_shape=data_shape,
        rand_crop=False,
        rand_mirror=False)
    return (train, val)

batch_size = 16
train_iter, val_iter = get_iterators(batch_size)

# Training

batch_size = 8
num_gpus = 1
num_epoch = 10
mod = mx.mod.Module(out, context=[mx.gpu(i) for i in range(num_gpus)])
mod.fit(train_data=train_iter, 
        eval_data=val_iter, 
        eval_metric='mae', 
        optimizer='adam',
        optimizer_params={'learning_rate': 0.0001},
        num_epoch=num_epoch,
        batch_end_callback=mx.callback.Speedometer(batch_size, 100),        
       )

Evaluation and Simulation

After training the model, we can deploy it on the vehicle and take it for a test drive. A low MAE error on our validation set indicates that our model is adequately trained and generalizes well. However, it’s essential to have a better understanding of the model’s performance in real-time scenarios. For further insights, you may wish to explore another blog post on this topic here. Additionally, you can refer to this authoritative source for more information on the subject. If you’re looking for guidance on interviews, this resource is excellent.