How to Operate AWS ParallelCluster from AppStream 2.0 and Share S3 Data

High Performance Computing (HPC) cluster administrators often need to provide users with a straightforward way to swiftly create HPC clusters from a shared Windows desktop, all while ensuring security, isolation, scalability, and cost-effectiveness. This vital step may be part of a broader user workflow or a defined procedure that HPC users follow to initiate and monitor their jobs effectively.

Amazon AppStream 2.0 is a fully managed, secure application streaming service that enables you to stream desktop applications from AWS to a web browser. AWS ParallelCluster is an open-source cluster management tool supported by AWS, simplifying the deployment and management of HPC clusters in the AWS cloud. Additionally, Amazon S3 is an object storage service offering remarkable scalability, data availability, security, and performance.

This article outlines how to execute ParallelCluster from within AppStream 2.0 user sessions and provides an example of accessing an S3 bucket from both environments.

Solution Overview

You can run ParallelCluster within an AppStream 2.0 streaming session by creating an AppStream 2.0 image that has Python and ParallelCluster installed. To access an S3 bucket from both AppStream 2.0 streaming sessions and ParallelCluster instance nodes, simply specify the same S3 bucket in the AppStream 2.0 Stack Storage option and in the ParallelCluster configuration file.

Step 1: Creating an AppStream 2.0 Image

Log into the AWS Console.
Navigate to AppStream 2.0, then to Images, and select “Launch image builder.”
Choose an official base image provided by AWS, such as AppStream-WinServer2019-09-18-2019, and set your desired configuration parameters.
Enable the Default Internet Access option under Configure Network. This is necessary for the installation and upgrade of ParallelCluster.
Create the Image Builder instance and connect as an Administrator once it is Running.

Step 2: Installing ParallelCluster

This procedure involves first installing Python in a separate folder and then using it to install ParallelCluster. This method avoids altering any existing Python installations.

Download your choice of Python Windows installer from python.org.
Install Python in a distinct folder, e.g., C:python38, ensuring to:
- Install for all users
- Include pip
Change directory to where Python is installed.
Execute python -m pip install aws-parallelcluster.

Step 3: Publishing the Command Line

With ParallelCluster successfully running, you can now provide users with the pcluster command and build the corresponding AppStream 2.0 image.

Grant regular users permission to execute pcluster and read its configuration. This can be done in various ways: by assigning a role to the users or the image, sharing credentials, creating temporary ones, or embedding them in a custom wrapper script or executable. For this example, we’ll copy folders C:UsersImageBuilderAdmin.aws and .parallelcluster to the default user home: C:UsersDefault and C:UsersDefaultProfileUser.
Add any desired applications. You may want to include an SSH client like Putty to connect to ParallelCluster nodes, ensuring it’s published as described below.
Add a Command Prompt to published applications. Double-click the Image Assistant icon on the AppStream 2.0 session desktop, and include Windows Command Prompt. The executable location is C:Windowssystem32cmd.exe. For the working directory, specify a folder accessible by all users, e.g., a newly created C:temp.
Follow the Image Assistant instructions to test and optimize the Command Prompt, build the AppStream 2.0 image, and review your application publication settings.

Step 4: Setting Up Fleets and Stacks

Once the AppStream 2.0 image is completed (no longer in Snapshotting status), you can launch fleets.

In the AWS Console, go to AppStream 2.0 → Fleets, and select “Create Fleet.”
In the Choose an Image step, specify the AppStream 2.0 Image you just created.
In the Configure Fleet step, set the session duration and other parameters as desired. For clarity and simplicity, it’s advisable to set a session duration that allows users to start, work on, and terminate their ParallelCluster instances. The maximum session duration is 5760 minutes (four days).
In the Configure Network step, ensure Default Internet Access is enabled; otherwise, ParallelCluster will be unable to contact CloudFormation.

After configuring and running the Fleet, create a stack:

In the AWS Console, navigate to AppStream 2.0 → Stacks, and click “Create Stack.”
In the Enable Storage step, consider allowing user sessions to access an Amazon S3 bucket, as this location facilitates sharing objects/data between ParallelCluster instances and AppStream 2.0 sessions. AppStream automatically creates a bucket for this purpose, pre-filling this option with the new bucket name in the format: appstream2---.
Once the stack creation process is complete, select its entry and choose Actions → Create streaming URL for one or more users. Users can then connect via their browsers or the AppStream 2.0 Client and select the Command Prompt application. They can now execute pcluster commands from their command prompt.

Step 5: Enabling S3 Access in ParallelCluster

You can allow ParallelCluster instance nodes to access the same S3 bucket linked to AppStream 2.0 by using the following option in the [cluster] section of the ParallelCluster configuration file:

To grant read-only access to an S3 bucket:
s3_read_resource = arn:aws:s3:::appstream2-012345678910-eu-west-1-012345678910*
To provide both read and write access:
s3_read_write_resource = arn:aws:s3:::appstream2-012345678910-eu-west-1-012345678910*

ParallelCluster users typically connect to the instance nodes via SSH. Once logged in, they can access S3 objects using AWS CLI s3 commands such as sync, cp, mv, and rm.

For example, specifying the bucket name followed by an asterisk in the ParallelCluster configuration will allow your AWS CLI s3 commands to list the bucket contents, using the subcommand ls.

Final Considerations

Once a ParallelCluster instance/cluster is created, it is entirely independent from the AppStream 2.0 session that initiated it. For instance, a user can create a ParallelCluster instance from one AppStream 2.0 session, then close it and dispose of that instance from another session.

Remember that ParallelCluster is an AWS-supported open-source project. If you wish to contribute, provide feedback, or report issues, please refer to the ParallelCluster GitHub issues page.

In summary, this post illustrated how to:

Install ParallelCluster within an AppStream 2.0 image builder.
Create the AppStream 2.0 image and enable users to run commands.

For more tips on using platforms effectively, check out this insightful blog post on Squarespace. Additionally, as the demand for higher wages continues to rise, it’s worth noting that inflation will likely result in increased minimum wages next year, as reported by an authority on the topic. Also, for those interested in leadership development, the Amazon Operations Area Manager Leadership Liftoff Program is an excellent resource.

Location: 6401 E HOWDY WELLS AVE, LAS VEGAS NV 89115
Site Name: Amazon IXD – VGT2