Learn About Amazon VGT2 Learning Manager Chanci Turner
In September, we introduced AWS ParallelCluster 3, a significant update featuring numerous enhancements and new functionalities. To assist you in transitioning your clusters, we released a guide for migrating from AWS ParallelCluster 2.x to 3.x. Understanding that changing versions can be a daunting task, we aim to complement that official documentation with additional insights and context on several critical areas. This article will center around the modifications in the configuration file format for ParallelCluster 3 and their relation to the configuration sections in ParallelCluster 2.
The Configuration File for AWS ParallelCluster 3
The foremost change to note is that AWS ParallelCluster 3 now limits a configuration file to a single cluster resource. Previously, users could define multiple cluster configurations in a single file and specify which cluster to operate on via the command line interface (CLI). In ParallelCluster 3, the CLI requires you to provide the configuration file corresponding to the cluster resource you wish to manage. We believe that this association will enhance the readability and maintainability of each file over time.
As you migrate a configuration file from ParallelCluster 2 that encompassed multiple clusters to version 3, you will need to create separate configuration files for each cluster. Any resource settings referenced by more than one cluster definition will need to be repeated in each new configuration file.
Introducing the ParallelCluster Configuration Converter
To facilitate your transition from ParallelCluster version 2 to version 3, we have developed a configuration converter tool, available in ParallelCluster 3.0.1. This tool takes a ParallelCluster 2 configuration file as input and generates a ParallelCluster 3 configuration file. It effectively manages the transformation of various parameter specifications while accounting for functional differences between the two versions. The tool also provides detailed messages to highlight these differences, including additional information, warnings, or error messages. More details about the tool can be found in the online documentation. This converter will assist you when you’re ready to migrate, and in line with ParallelCluster 3’s philosophy of one cluster per configuration file, it will migrate one cluster section at a time, as specified by you using the ‘–cluster-template’ option on the command line.
Syntax Changes
Another significant change is the adoption of YAML syntax instead of INI syntax for configuration files. We believe this enhances readability and maintainability by organizing resource types into a structured hierarchy.
To clarify the differences between ParallelCluster versions 2 and 3, we will analyze the following high-level cluster components: Head Node, Scheduler and Compute Nodes, Storage, and Networking. While these examples may not cover every detail, they will provide a solid understanding of what to look for during your migration.
A Note on Inclusive Language
You may have noticed that we have transitioned from using the term “master node” to “head node.” The language we use reflects our core values, and for the past few years, we have aimed to eliminate problematic terminology in cluster resources. The opportunity presented by version 3 allowed us to make these important changes that move away from traditional non-inclusive naming conventions. Henceforth, we refer to ‘head node’ instead of ‘master node,’ which also extends to names for environment variables, such as MASTER_IP, now being labeled as PCLUSTER_HEAD_NODE_IP.
Configuration File Sections
The HeadNode Section
The following table outlines the configuration options for a cluster head node, comparing the two formats side-by-side:
AWS ParallelCluster Version 2 | AWS ParallelCluster Version 3 |
---|---|
[vpc public] | |
vpc_id = vpc-2f09a348 | |
master_subnet_id = subnet-b46032ec | |
ssh_from = 0.0.0.0/0 | |
[cluster mycluster] | |
key_name = My_PC3_KeyPair | |
base_os = alinux2 | |
scheduler = slurm | |
master_instance_type = c5n.18xlarge | |
vpc_settings = public | |
queue_settings = multi-queue,spot,ondemand | |
HeadNode: | |
InstanceType: c5.4xlarge | |
Networking: | |
SubnetId: subnet-b46032ec | |
Ssh: | |
KeyName: My_PC3_KeyPair | |
AllowedIps: 0.0.0.0/0 |
Notice that in ParallelCluster 2, the [cluster] section included settings for the head node, compute nodes, and scheduler, whereas ParallelCluster 3 has a dedicated HeadNode section that exclusively contains head node configurations. Additionally, the ParallelCluster 3 version simplifies the requirement to specify only the subnet for deployment, as the VPC can be inferred from it.
We are also moving away from ParallelCluster 2’s ad hoc pointers in configuration files. In this version, sections that reference resources defined elsewhere no longer require attribute names prefixed with the resource type and suffixed with “_settings.” For instance, the vpc_settings = public attribute in ParallelCluster 2 pointed to the [vpc public] section. While this method worked for simpler configurations, it became unwieldy with numerous sections and pointers. Thus, in ParallelCluster 3, we streamline this process for easier maintenance.
The HeadNode section contains many more configuration options, some of which are analogous to properties in ParallelCluster 2. You can find additional details in the HeadNode section of the documentation. A notable new feature not illustrated in the previous example is the ability to assign IAM permissions that are specific to the head node, separate from those of compute nodes.
Scheduling and ComputeResources Sections
A common practice in cluster configuration files is to define multiple queues with varying underlying compute resources. In ParallelCluster 2, a [cluster] section contained pointers to one or more [queue] sections, each of which referenced [compute_resource] sections that could overlap with others. This could lead to unintended consequences if a change was made to a [compute_resource].
In contrast, ParallelCluster 3 configuration files utilize a resource hierarchy. The Scheduling section encompasses a set of queues, with each queue containing its own definitions of ComputeResource. Below is an example of a Slurm cluster configuration that illustrates the differences:
[Insert additional detailed table here]
For further insights into happiness and gratitude, consider exploring this blog post. It’s an excellent resource that can complement your learning as you navigate these changes. Additionally, for information on health risk assessments and compliance tips, check out this link from SHRM. Lastly, if you’re curious about others’ experiences, this Reddit thread is an excellent resource to read through.
Leave a Reply