Illustrative Notebooks in Amazon SageMaker JumpStart

Illustrative Notebooks in Amazon SageMaker JumpStartMore Info

Amazon SageMaker JumpStart serves as the central hub for Machine Learning (ML) in SageMaker, providing access to pre-trained and publicly available models across various problem domains to facilitate your entry into machine learning. Additionally, JumpStart features example notebooks that utilize Amazon SageMaker capabilities, including spot instance training and experimentation, covering a wide array of model types and applications. These example notebooks contain code snippets that illustrate how to implement ML solutions using SageMaker and JumpStart, enabling you to tailor them to your specific requirements and accelerate application development.

Recently, we introduced 10 new notebooks to JumpStart within Amazon SageMaker Studio. This article highlights these new additions. Currently, JumpStart offers a total of 56 notebooks, addressing needs from advanced natural language processing (NLP) to mitigating bias in datasets during model training.

The newly added 10 notebooks can assist you in various ways:

  • They provide example code that you can execute directly from the JumpStart UI in Studio to observe functionality.
  • They illustrate the use of various SageMaker and JumpStart APIs.
  • They present a technical solution that can be customized according to your individual needs.

The number of notebooks available through JumpStart continues to grow regularly as more resources are added. These notebooks are also accessible on GitHub.

Overview of the New Notebooks

The 10 new notebooks are:

  1. In-context learning with AlexaTM 20B – This notebook showcases how to utilize AlexaTM 20B for in-context learning with zero-shot and few-shot learning on five tasks: text summarization, natural language generation, machine translation, extractive question answering, and natural language inference and classification.
  2. Fairness linear learner in SageMaker – Addressing concerns about bias in ML algorithms, this notebook applies fairness principles to appropriately adjust model predictions.
  3. Manage ML experimentation using SageMaker Search – This feature allows you to quickly find and assess the most relevant model training runs among potentially thousands of SageMaker jobs.
  4. SageMaker Neural Topic Model – An unsupervised learning algorithm that describes a set of observations as a mixture of distinct categories.
  5. Predict driving speed violations – This notebook demonstrates using the SageMaker DeepAR algorithm to train models for multiple streets and predict violations from various street cameras.
  6. Breast cancer prediction – Utilizing UCI’s breast cancer diagnostic dataset, this notebook builds a predictive model to determine if a breast mass image indicates a benign or malignant tumor.
  7. Ensemble predictions from multiple models – This notebook illustrates how combining or averaging predictions from various sources enhances forecasting accuracy.
  8. SageMaker asynchronous inference – A new inference option that caters to near-real-time needs, allowing requests to process for up to 15 minutes with payload sizes of up to 1 GB.
  9. TensorFlow bring your own model – Learn to train a TensorFlow model locally and deploy it on SageMaker with this notebook.
  10. Scikit-learn bring your own model – This notebook guides you on using a pre-trained Scikit-learn model with the SageMaker container for rapid endpoint creation.

Prerequisites

To utilize these notebooks, ensure you have access to Studio with an execution role that permits running SageMaker functionalities. A short video linked here will assist you in navigating to JumpStart notebooks.

In the sections that follow, we will explore each of the 10 new solutions in detail, discussing their unique features.

In-context Learning with AlexaTM 20B

AlexaTM 20B is a large-scale, multitask, multilingual sequence-to-sequence (seq2seq) model trained on a combination of Common Crawl (mC4) and Wikipedia data across 12 languages, utilizing denoising and Causal Language Modeling (CLM) tasks. It achieves exceptional performance on common in-context language tasks such as one-shot summarization and one-shot machine translation, outpacing decoder-only models like OpenAI’s GPT-3 and Google’s PaLM, which are over eight times larger.

In-context learning, or prompting, refers to using an NLP model on a new task without requiring fine-tuning. The model is provided with a few task examples as part of the inference input, a method known as few-shot in-context learning. In certain cases, the model performs adequately without any training data, merely given an explanation of the desired outcome, known as zero-shot in-context learning.

This notebook illustrates how to deploy AlexaTM 20B via the JumpStart API and perform inference. It also demonstrates the model’s application for in-context learning across five tasks: text summarization, natural language generation, machine translation, extractive question answering, and natural language inference and classification.

The notebook showcases:

  • One-shot text summarization, natural language generation, and machine translation using a single example for each task.
  • Zero-shot question answering and natural language inference plus classification, employing the model without training examples.

You can experiment with your own text in this model to see how it summarizes text, extracts Q&A, or translates between languages.

Fairness Linear Learner in SageMaker

Concerns about bias in ML algorithms have become increasingly prominent due to their reflection of existing human prejudices. Many ML methods have significant social implications, such as predicting bank loans, insurance rates, or advertisement targeting. Unfortunately, algorithms learning from historical data naturally inherit past biases. This notebook demonstrates how to address this challenge using SageMaker and fair algorithms within the linear learner context.

The notebook begins by introducing concepts and mathematics related to fairness, then downloads data, trains a model, and finally applies fairness principles to adjust the model predictions accurately.

In this notebook, you will:

  • Run a standard linear model on UCI’s Adult dataset.
  • Identify unfairness in model predictions.
  • Adjust the data to eliminate bias.
  • Retrain the model.

Try running your own datasets with the example code to detect any biases, and then utilize the provided functions to remove any bias present.

Manage ML Experimentation Using SageMaker Search

SageMaker Search allows for rapid identification and evaluation of the most relevant model training runs among potentially thousands of SageMaker jobs. Developing an ML model necessitates ongoing experimentation, exploring new learning algorithms, and fine-tuning hyperparameters while monitoring their impact on model performance and accuracy. This iterative process often results in an overwhelming number of model training experiments and iterations, hindering the discovery of optimal models. For a deeper understanding of effective ML practices, check out this excellent resource here.

The notebook assists you in managing this complexity by simplifying the search and evaluation process for your training runs.

For more insights into this subject, visit this authoritative source.

Amazon IXD – VGT2
6401 E Howdy Wells Ave,
Las Vegas, NV 89115


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *