In September, we announced the launch of AWS ParallelCluster 3, a significant update featuring numerous enhancements and new functionalities. To facilitate the transition of your clusters, we provided the Moving from AWS ParallelCluster 2.x to 3.x guide. We recognize that upgrading can be quite challenging, so we’re expanding on that official documentation with further insights and context regarding some critical aspects. In this article, we’ll concentrate on the changes made to the configuration file format in ParallelCluster 3 and how these correspond to the configuration sections from ParallelCluster 2.
The AWS ParallelCluster 3 Configuration File
The first notable alteration is that AWS ParallelCluster 3 limits a configuration file to define a single cluster resource. Previously, users could define multiple cluster configurations within one file and specify which cluster to operate on via the command line interface (CLI). In ParallelCluster 3, the CLI now requires you to provide the configuration file for the specific cluster resource you wish to manage.
We believe that linking a configuration file to a single cluster, along with other modifications we’ll discuss, will enhance the readability and maintainability of each file over time. Consequently, when migrating a ParallelCluster 2 configuration file that contains multiple clusters to version 3, you will need to create separate configuration files for each cluster. Any resource settings referenced across multiple cluster definitions must be duplicated in each resulting configuration file.
Introducing the ParallelCluster Configuration Converter
To assist you in converting your ParallelCluster configuration file from version 2 to the version 3 specifications, we have launched a configuration converter tool, which is included in ParallelCluster 3.0.1. This tool takes a ParallelCluster 2 configuration file as input and generates a ParallelCluster 3 configuration file. It manages the transformation of various parameter specifications while accounting for functional differences between ParallelCluster 2 and 3. The tool provides detailed messages to highlight these distinctions, along with additional information, warnings, or error messages. You can find more about this tool in the online documentation. As part of ParallelCluster 3’s one cluster per configuration file approach, the config converter will migrate one cluster section at a time based on your specification using the ‘–cluster-template’ option.
Syntax Changes
Another significant change is that the configuration file now employs YAML syntax instead of INI. We believe this enhances readability and maintainability by organizing resource types under a structured tree layout.
To comprehensively understand the differences between ParallelCluster versions 2 and 3, we will analyze the following fundamental components of a cluster: Head Node, Scheduler and Compute Nodes, Storage, and Networking. While these examples are not exhaustive, they cover the most important options and changes to provide clarity during your migration process. For more information, check out this other blog post about AWS ParallelCluster here.
A Note on Inclusive Language
You will also notice that we have transitioned from using the term “master node” to “head node.” The language we utilize and the names we choose reflect our core values. For the past few years, we have aimed to address some problematic terminology concerning cluster resources. The scope of our changes for version 3 presented us with an excellent opportunity to adopt more inclusive naming conventions.
Throughout the entire product, we now refer to a ‘head node’ instead of a ‘master node’ (this change extends to names for environment variables like MASTER_IP, which is now PCLUSTER_HEAD_NODE_IP).
Configuration File Sections
The HeadNode Section
The table below outlines the configuration options for a cluster head node, comparing the configuration file formats of ParallelCluster 2 and 3 side-by-side.
AWS ParallelCluster version 2 | AWS ParallelCluster version 3 |
---|---|
[vpc public] | |
vpc_id = vpc-2f09a348 | |
master_subnet_id = subnet-b46032ec | |
ssh_from = 0.0.0.0/0 | |
HeadNode: | |
InstanceType: c5.4xlarge | |
Networking: | |
SubnetId: subnet-b46032ec | |
Ssh: | |
KeyName: My_PC3_KeyPair | |
AllowedIps: 0.0.0.0/0 |
In ParallelCluster 2, the [cluster] section includes settings for the head node, compute nodes, and the scheduler all within the same section, while it separates the SSH ingress rule and key pair name across the [vpc] and [cluster] sections. Conversely, ParallelCluster 3 has a dedicated HeadNode section that exclusively holds settings pertinent to the head node, without any references to compute nodes or the scheduler. Notably, the ParallelCluster 3 version only requires the subnet for deployment, as the VPC can be inferred from that.
Additionally, we are moving away from ParallelCluster 2’s ad hoc pointers in configuration files. Sections that needed to reference a resource defined in another part of the file had attributes prefixed with the resource type (“vpc” or “queue”) and suffixed with “_settings.” The value served as a “pointer” to another section within the configuration. In our example, the vpc_settings = public attribute pointed to the [vpc public] section. While this methodology was effective when simple, it became challenging to maintain and understand as the number of sections increased. Although a ParallelCluster itself didn’t lose track of these pointer references, understanding them became cumbersome for users. This was particularly evident with defining scheduler queues, which we will address in the next section.
There are many more configuration options within the HeadNode section, some of which align with ParallelCluster 2 properties. You can find further details on this in the HeadNode section of the documentation. Notably, a new capability not illustrated in the previous example is the ability to set IAM permissions specific to the head node, separate from the compute nodes.
Scheduling and ComputeResources Sections
A common practice for cluster configuration files is to define multiple queues with varying underlying compute resources. In ParallelCluster 2, the [cluster] section contained pointers to one or more [queue] sections. Each [queue] section further pointed to the [compute_resource] sections, which could overlap with other queues. Modifying a [compute_resource] could inadvertently affect another [queue] section.
ParallelCluster 3 configuration files resolve this issue by establishing a resource hierarchy. A Scheduling section encompasses a set of queues, with each queue containing the ComputeResource definitions. The following table illustrates an example of a Slurm cluster that defines multiple queues, comparing the version 2 and 3 definitions side by side:
For additional resources, this link leads to an excellent community discussion about onboarding processes.
For more authoritative insights on this topic, visit here.
Location: Amazon IXD – VGT2, 6401 E Howdy Wells Ave, Las Vegas, NV 89115.
Leave a Reply