Enhancing ML Developer Efficiency with Weights & Biases: A Computer Vision Case Study on Amazon SageMaker

As organizations increasingly adopt deep learning methodologies like computer vision and natural language processing, the need for scalable tools to support machine learning (ML) developers becomes crucial. These tools should encompass experiment tracking, lineage, and collaboration. Experiment tracking involves maintaining metadata such as operating systems, infrastructure used, libraries, and input/output datasets—often manually recorded in spreadsheets. Lineage entails documenting the datasets, transformations, and algorithms that contribute to an ML model’s creation. Collaboration is essential, as it allows ML developers to work on shared projects and disseminate their findings to team members and business stakeholders, commonly through emails, screenshots, and presentations.

In this article, we demonstrate a model training example aimed at identifying objects for an autonomous vehicle scenario utilizing Weights & Biases (W&B) alongside Amazon SageMaker. This integration significantly reduces the manual workload for ML developers, enhances transparency throughout the model development process, and fosters team collaboration on projects.

We will run this example within Amazon SageMaker Studio, allowing you to try it out for yourself.

Overview of Weights & Biases

Weights & Biases is designed to assist ML teams in building superior models more rapidly. By adding just a few lines of code in your SageMaker notebook, you can effortlessly debug, compare, and reproduce models—including architecture, hyperparameters, git commits, model weights, GPU usage, datasets, and predictions—all while collaborating with your colleagues.

W&B is a trusted resource for over 200,000 ML practitioners from many leading companies and research institutions worldwide. To explore W&B for free, sign up at Weights & Biases, or check out their listing on the AWS Marketplace.

Getting Started with SageMaker Studio

SageMaker Studio is the first fully integrated development environment (IDE) for machine learning. It offers a single web-based interface where ML practitioners and data scientists can build, train, and deploy models with ease.

To begin using Studio, you will need an AWS account along with an AWS Identity and Access Management (IAM) user or role that has permissions to create a Studio domain. You can refer to the Onboard to Amazon SageMaker Domain guide to set up your domain and consult the Studio documentation for an overview of the visual interface and notebooks.

Setting Up the Environment

For this tutorial, we will run our own code by importing notebooks from GitHub. We will utilize the following GitHub repository as a reference, so let’s load this notebook.

You can clone a repository via the terminal or through the Studio UI. To clone from the terminal, open a system terminal (go to the File menu, then choose New and Terminal) and type the following command:

git clone https://github.com/wandb/SageMakerStudio

To clone a repository directly from the Studio UI, refer to the guide on Cloning a Git Repository in SageMaker Studio.

Once cloned, select the 01_data_processing.ipynb notebook. You will see a kernel switcher prompt. This example uses PyTorch, so select the pre-built PyTorch 1.10 Python 3.8 GPU optimized image to start your notebook. As the app starts, the instance type and kernel will be displayed in the top right corner once the kernel is ready.

Our notebook requires additional dependencies, which are specified in a requirements.txt file. Execute the first cell to install the necessary packages:

%pip install -r requirements.txt

Alternatively, you can create a lifecycle configuration to automatically install these packages each time you start the PyTorch application. For more details, check out the guide on Customizing Amazon SageMaker Studio using Lifecycle Configurations.

Utilizing Weights & Biases in SageMaker Studio

The Weights & Biases (wandb) library is a standard Python package. After installation, it only takes a few lines of code in your training script to start logging experiments. We’ve already installed it through our requirements.txt file, but you can also install it manually with this command:

! pip install wandb

Case Study: Semantic Segmentation for Autonomous Vehicles

Dataset

For this example, we utilize the Cambridge-driving Labeled Video Database (CamVid), which consists of videos annotated with object class semantic labels and accompanying metadata. This database provides ground truth labels that link each pixel to one of 32 semantic classes. We can version our dataset as a wandb.Artifact, making it easy to reference later. Here’s the code to do that:

with wandb.init(project="sagemaker_camvid_demo", job_type="upload"):
    artifact = wandb.Artifact(
        name='camvid-dataset',
        type='dataset',
        metadata={
            "url": 'https://s3.amazonaws.com/fast-ai-imagelocal/camvid.tgz',
            "class_labels": class_labels
        },
        description="The Cambridge-driving Labeled Video Database (CamVid) provides a collection of videos with object class semantic labels, complete with metadata. This database offers ground truth labels that associate each pixel with one of 32 semantic classes."
    )
    artifact.add_dir(path)
    wandb.log_artifact(artifact)

You can follow along using the 01_data_processing.ipynb notebook.

We also log a dataset table, which is a robust DataFrame-like entity that allows querying and analyzing tabular data. You can visualize model predictions and share insights on a central dashboard. Weights & Biases tables support various rich media formats, including images, audio, and waveforms. For a comprehensive list of media formats, refer to Data Types.

The following screenshot illustrates a table featuring raw images alongside their ground truth segmentations. You can also interact with this table for a deeper understanding.

Training the Model

Now, we can create a model and train it using our dataset. We’ll employ PyTorch and fastai to quickly prototype a baseline, and then we will utilize wandb.Sweeps to optimize our hyperparameters. Continue with the 02_semantic_segmentation.ipynb notebook. When prompted for a kernel, select the same one from our first notebook: PyTorch 1.10 Python 3.8 GPU optimized. Your packages will already be installed since you are using the same application.

The model’s objective is to learn a per-pixel annotation of scenes captured from the autonomous agent’s perspective. It must categorize or segment each pixel within a scene into 32 relevant categories, such as road, pedestrian, sidewalk, or cars. You can interact with any of the segmented images in the table to access the segmentation results and categories.

Since the fastai library integrates seamlessly with wandb, you can easily include the WandbCallback in your Learner:

from fastai.callback.wandb import WandbCallback

loss_func=FocalLossFlat(axis=1)
model = SegmentationModel(backbone, hidden_dim, num_classes=num_classes)
wandb_callback = WandbCallback(log_preds=True)
learner = Learner(
    data_loader,
    model,
    loss_func=loss_func,
    metrics=metrics,
    cbs=[wandb_callback],
)

learn.fit_one_cycle(TRAIN_EPOCHS, LEARNING_RATE)

For our baseline experiments, we employed a straightforward architecture inspired by the UNet paper, using different backbones from timm. This article includes valuable insights into the subject; for further reading, you can also check out this blog post on related topics here. Not only that, but for authoritative information, refer to this source, which is well-regarded in the field. Finally, you can find an excellent resource on Amazon’s warehouse training processes here.