Permalink
Comments
Share
June 2025: This article has been reviewed and updated for accuracy.
Amazon SageMaker AI Studio serves as the pioneering fully integrated development environment (IDE) for machine learning (ML). Within SageMaker AI Studio, data scientists can quickly set up a JupyterLab Space to explore data, create models, execute Amazon SageMaker AI training jobs, and deploy hosted endpoints. The JupyterLab Space comes equipped with a selection of pre-built images that include the Amazon SageMaker Python SDK and the latest IPython runtime. Additionally, you can bring your own custom images into Amazon SageMaker AI Studio, making them accessible to all users within the authenticated domain. In this article, we will detail how to bring a custom container image into JupyterLab Space in SageMaker AI Studio.
Developers and data scientists may need custom images for various reasons:
- Access to specific or the latest versions of popular ML frameworks like TensorFlow, MxNet, PyTorch, etc.
- The ability to transfer custom code or algorithms developed locally for rapid iteration and model training.
- Integration with data lakes or on-premises data stores via APIs, requiring the inclusion of corresponding drivers in the image.
- Use of a backend runtime, also known as kernel, different from IPython, such as R or Julia. This method can also be employed to install a custom kernel.
In larger enterprises, ML platform administrators often must ensure that any third-party packages and code are pre-approved by security teams and not downloaded directly from the internet. A typical workflow involves the ML Platform team approving a set of packages and frameworks, building a custom container with these approved packages, testing the container for vulnerabilities, and then pushing the resulting image to a private container registry like Amazon Elastic Container Registry (Amazon ECR). From there, the ML platform teams can directly attach these approved images to the Studio domain (see the workflow diagram below). Users can easily select their preferred custom image within Studio and work with it in their JupyterLab Space. This release allows a single Studio domain to contain multiple custom images, with the capability to add new versions or delete images as necessary.
We will now explore how to bring a custom container image to SageMaker AI Studio JupyterLab using this feature. Although we will demonstrate the standard method via the internet, if you are utilizing SageMaker AI Studio to build your Docker image, you can configure VPC endpoints for the services you wish to interact with securely. For further details, check out this blog post on connecting to Amazon services using AWS PrivateLink in Amazon SageMaker.
Prerequisites
Before diving in, ensure you meet the following prerequisites:
- An active AWS account.
- You can either use a local Docker client to build your container image or create it directly from SageMaker AI Studio. In this article, we will use a local Docker client for the image. To learn how to build your container image using SageMaker AI Studio, refer to Using the Amazon SageMaker Studio Image Build CLI to build container images from your Studio notebooks.
- Installation of the AWS Command Line Interface (AWS CLI) on your local machine. For installation instructions, see the AWS documentation.
- A SageMaker AI Studio domain. To create a domain, follow the steps in Use quick setup for Amazon SageMaker AI, which is an excellent resource.
- A private Amazon Elastic Container Registry (Amazon ECR) repository. To create one, see instructions for Creating an Amazon ECR private repository to store images.
Creating Your Dockerfile
To illustrate a common need among data scientists to experiment with new frameworks, we will use the following Dockerfile based on TensorFlow version 2.19.0. You can modify this Dockerfile to suit your needs. Currently, SageMaker AI Studio supports various base images, including Ubuntu, Amazon Linux 2023, and more. The Dockerfile installs the required IPython runtime for running Jupyter notebooks, along with the Amazon SageMaker Python SDK and boto3.
Data scientists and ML engineers often iterate and experiment on local machines using popular IDEs such as Visual Studio Code or PyCharm. You might want to move these scripts to the cloud for scalable training or data processing. These scripts can be included in your Docker container so they are visible in your local storage in SageMaker AI Studio. In the following Dockerfile, we copy the train.py script, which serves as a foundational script for training a basic deep learning model on the MNIST dataset. Feel free to replace this script with your own code or packages.
FROM tensorflow/tensorflow:2.19.0-jupyter
ARG NB_USER="sagemaker-user"
ARG NB_UID=1000
ARG NB_GID=100
RUN
apt-get update &&
apt-get install -y sudo &&
useradd --create-home --shell /bin/bash --gid "${NB_GID}" --uid ${NB_UID} ${NB_USER}
RUN pip install --upgrade pip
RUN pip install --quiet --no-cache-dir
jupyterlab
ipykernel
boto3
sagemaker
RUN python -m ipykernel install --sys-prefix
USER ${NB_USER}
COPY ./train.py /home/train.py
ENTRYPOINT ["jupyter-lab"]
CMD ["--ServerApp.ip=0.0.0.0", "--ServerApp.port=8888", "--ServerApp.allow_origin=*", "--ServerApp.token=''", "--ServerApp.base_url=/jupyterlab/default"]
The train.py script is as follows:
import tensorflow as tf
import os
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
model = tf.keras.models.Sequential([
tf.keras.layers.Input((28, 28)),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
model.fit(x_train, y_train, epochs=1)
model.evaluate(x_test, y_test)
Instead of a custom script, you may also include other files, such as Python files that access client secrets and environment variables via AWS Secrets Manager or AWS Systems Manager Parameter Store, configuration files for connecting with private PyPi repositories, or additional package management tools.
Pushing Your Custom Container Image to ECR
To push your custom image, follow these steps:
- Navigate to the Amazon ECR console and access the repository you created.
- Under the Private Registry, select “View push commands.”
- Execute these commands in the directory containing your Dockerfile to build and push the image with the latest tag.
- Note the image URI for later use.
Attaching Images Using the Studio UI
You can attach your custom image to the Studio domain through the console by following these steps:
- On the Amazon SageMaker AI console, select “Domain” and open the domain you created.
- On the Domain page, navigate to the Environment tab.
- Click “Attach Image.”
- Enter the ECR image URI and click “Next.”
- Provide the Image name, Image display name, and Description.
- Select the Image type as JupyterLab image.
- Keep other options as default.
By following these instructions, you will be able to utilize your custom container images within Amazon SageMaker Studio notebooks effectively.
Leave a Reply