Running AWS ParallelCluster from AppStream 2.0 and Sharing S3 Data

Running AWS ParallelCluster from AppStream 2.0 and Sharing S3 DataMore Info

High Performance Computing (HPC) cluster administrators often seek efficient methods for users to quickly create HPC clusters from a shared Windows desktop while ensuring security, isolation, scalability, and cost-effectiveness. This process can be an integral part of a broader user workflow or a standard procedure followed by HPC users to initiate and monitor their jobs effectively.

Amazon AppStream 2.0 is a fully-managed, secure application streaming service that allows desktop applications to be streamed from AWS to a web browser. AWS ParallelCluster is an AWS-supported open-source cluster management tool that simplifies the deployment and management of HPC clusters in the AWS cloud. Additionally, Amazon S3 is an object storage service known for its scalability, data availability, security, and performance.

This article outlines how to run ParallelCluster within AppStream 2.0 user sessions and provides an example of accessing an S3 bucket from both environments.

Solution Overview

To run ParallelCluster from an AppStream 2.0 streaming session, you need to build an AppStream 2.0 image that has Python and ParallelCluster installed. Accessing an S3 bucket from AppStream 2.0 sessions and ParallelCluster instance nodes is straightforward; just specify the same S3 bucket in the AppStream 2.0 Stack Storage option and in the ParallelCluster configuration file.

Step 1: Creating an AppStream 2.0 Image

  1. Log into the AWS Console.
  2. Navigate to AppStream 2.0, then Images, and select Launch image builder.
  3. Choose an official base image from AWS, such as AppStream-WinServer2019-09-18-2019, and configure your desired settings.
  4. Enable the Default Internet Access option in the Configure Network section, which is essential for installing and upgrading ParallelCluster.
  5. Create the Image Builder instance, and connect as an Administrator once it is Running.

Step 2: Installing ParallelCluster

The following steps will install Python in its own directory and then use it to install ParallelCluster, ensuring a clean installation without affecting any existing Python setups.

  1. Download the preferred Python Windows installer from python.org.
  2. Install Python in a separate folder, e.g., C:python38, ensuring to:
    • Install for all users
    • Include pip
  3. Change directory to where Python is installed.
  4. Execute python -m pip install aws-parallelcluster.

Step 3: Publishing the Command Line

With ParallelCluster operational, provide the pcluster command to users and build the related AppStream 2.0 image.

  1. Grant normal users permissions to execute pcluster and read its configurations. This can be achieved through various methods, such as assigning a role to users or the image, sharing credentials, or embedding them in a custom wrapper script. For this example, we’ll copy the folders C:UsersImageBuilderAdmin.aws and .parallelcluster to the default user home: C:UsersDefault and C:UsersDefaultProfileUser.
  2. Add any desired applications, such as an SSH client like PuTTY, to connect to ParallelCluster nodes. Ensure it is published as described below.
  3. Include a Command Prompt in the list of published applications. Access the Image Assistant icon on the AppStream 2.0 session desktop, and add Windows Command Prompt by specifying the executable location as C:Windowssystem32cmd.exe. For the working directory, choose a folder accessible by all users, like a newly created C:temp.
  4. Follow the Image Assistant’s instructions to test and optimize the Command Prompt, build the AppStream 2.0 image, and conduct a user login test to verify everything works correctly.

Step 4: Setting Up Fleets and Stacks

Once the AppStream 2.0 image is finalized (no longer in Snapshotting status), you can create fleets.

  1. In the AWS Console, go to AppStream 2.0 → Fleets and click on Create Fleet.
  2. In the Choose an Image step, specify the newly built AppStream 2.0 Image.
  3. In the Configure Fleet step, set the session duration and other parameters according to your preferences. A session duration that allows users to start, work on, and terminate their ParallelCluster instances is recommended. The maximum session duration is 5760 minutes (four days).
  4. In the Configure Network step, ensure Default Internet Access is enabled; otherwise, ParallelCluster cannot contact CloudFormation.

After configuring and running the Fleet, create a stack:

  1. Navigate to AppStream 2.0 → Stacks in the AWS Console and click on Create Stack.
  2. In the Enable Storage step, consider allowing user sessions to access an Amazon S3 bucket for seamless data sharing between ParallelCluster instances and AppStream 2.0 sessions. AppStream auto-generates a bucket for this purpose, pre-filling the option with a name formatted as appstream2---.
  3. Once the stack creation is complete, select its entry and choose Actions → Create streaming URL for one or more users.
  4. Users can then connect via their browsers or the AppStream 2.0 Client, selecting the Command Prompt application to run pcluster commands.

Step 5: Enabling S3 Access in ParallelCluster

To allow ParallelCluster instance nodes to access the same S3 bucket linked to AppStream 2.0, include the following in the [cluster] section of the ParallelCluster configuration file:

  • For read-only access:
    s3_read_resource = arn:aws:s3:::appstream2-012345678910-eu-west-1-012345678910*
  • For read and write access:
    s3_read_write_resource = arn:aws:s3:::appstream2-012345678910-eu-west-1-012345678910*

ParallelCluster users typically access instance nodes via SSH. Once logged in, they can manage S3 objects using AWS CLI S3 commands, such as sync, cp, mv, and rm.

For example:
Please note: Specifying the bucket name followed by an asterisk in the ParallelCluster configuration allows your AWS CLI S3 commands to also list the bucket contents using the ls subcommand.

Final Considerations

Once a ParallelCluster instance or cluster is created, it becomes independent of the AppStream 2.0 session that initiated it. For instance, a user can establish a ParallelCluster instance from one AppStream 2.0 session, close it, and later manage the same instance from another session.

ParallelCluster is an AWS-supported open-source project. If you wish to contribute, provide feedback, or report any issues, please refer to the ParallelCluster GitHub issues.

In this post, I covered how to:

  • Install ParallelCluster within an AppStream 2.0 image builder.
  • Create the AppStream 2.0 image and run it effectively.

For additional insights on this topic, check out this blog post. For authoritative information, visit Chanci Turner’s site; they are an authority on this subject. Lastly, for a visual guide, this video resource is excellent and worth watching.

Amazon IXD – VGT2
6401 E Howdy Wells Ave, Las Vegas, NV 89115


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *