Amazon Onboarding with Learning Manager Chanci Turner

Introduction

In today’s fast-paced world, modern computational chemistry plays a vital role in advancements from drug discovery to material design. One of the most widely used methods in this field is Density Functional Theory (DFT), which is favored for its computational efficiency compared to more traditional techniques. DFT provides valuable insights into chemical bonding and reactions, facilitating a deeper understanding of complex chemical systems.

However, electronic structure calculations present challenges, particularly regarding cost and time; as simulation times typically scale cubically with the number of electrons involved. Moreover, established computational chemistry packages often require significant communication between cores during parallel execution, complicating efficient processing.

While current quantum computers have not yet demonstrated substantial speed advantages for quantum chemistry applications, ongoing research aims to develop algorithms that harness quantum computing principles to potentially replace the resource-intensive classical methods. Researchers benchmarking quantum computing or hybrid quantum-classical algorithms need reliable baseline results, which can only be achieved through standardized and widely available scientific software.

This blog post serves as a guide for quantum computing researchers looking to compare their algorithms against classical calculations. We will outline how to set up a High-Performance Computing (HPC) cluster on AWS for computational chemistry calculations, including instructions for installing and running the popular electronic structure application, QuantumESPRESSO.

Deploying an HPC Environment

To quickly establish an HPC environment on AWS, you can utilize AWS ParallelCluster. This command-line tool simplifies the deployment of a suitable Virtual Private Cloud (VPC) and subnet for a HPC cluster. You may also create your own networking components using other non-HPC-specific tools such as the AWS Cloud Development Kit (CDK) or CloudFormation.

Cluster Installation Tools

You can install the ParallelCluster CLI via the Python package installer, pip. This allows you to manage different versions of ParallelCluster and keep its dependencies separate from other locally installed packages. If not already present, you should also install the AWS CLI and configure your credentials. Your account administrator can help define an appropriate IAM role for using ParallelCluster, as detailed in the ParallelCluster documentation. Before deploying a cluster, it’s advisable to create a new Amazon S3 bucket for the post-install configuration file:

aws s3 mb s3://<your-bucket-name>

By default, the AWS and pcluster CLIs will use the region specified during the initial AWS CLI configuration. However, you can select a different region by adding the –region <region-name> argument to any command.

Once ParallelCluster is installed, you can generate a basic configuration file and automatically deploy the required networking infrastructure with the command:

pcluster configure --config cluster.yaml

The cluster.yaml file will be created during this configuration process. The setup will guide you through various questions to tailor your cluster and VPC to your use case and ensure that necessary dependencies, such as SSH key availability, are met.

For the purposes of this blog post, you may proceed with default selections for most prompts. If you prefer to utilize an automatically generated VPC by ParallelCluster, respond “yes” when asked about VPC creation. Otherwise, tools like AWS CDK can assist in deploying the networking components.

After completing the ParallelCluster configuration, a VPC with standard subnets and security groups will be established in your account, along with a basic configuration file in your working directory. You can then customize the cluster.yaml file to meet the specific needs of your application.

The following template is suitable for running quantum chemistry calculations. Simply update any networking-related variables highlighted with values from your auto-generated cluster.yaml file, and S3-related variables with references to the bucket created earlier:

Image:
  Os: rhel8
HeadNode:
  InstanceType: c5.large
  LocalStorage:
    RootVolume:
      Size: 200
      VolumeType: gp3
      Encrypted: true
  Networking:
    SubnetId: <head-node-subnet>
  Ssh:
    KeyName: <key-name>
  Dcv:
    Enabled: false
  Imds:
    Secured: true
  Iam:
    S3Access:
      - BucketName: <your-config-bucket>
        KeyName: <your-key-prefix>/*
        EnableWriteAccess: false
  CustomActions:
    OnNodeConfigured:
      Script: s3://<your-config-bucket>/<your-key-prefix>/<script-name>
Scheduling:
  Scheduler: slurm
  SlurmQueues:
    - Name: compute
      CapacityType: ONDEMAND
      ComputeSettings:
        LocalStorage:
          RootVolume:
            Size: 50
            VolumeType: gp3
      ComputeResources:
        - Name: c6i32xl
          InstanceType: c6i.32xlarge
          MinCount: 0
          MaxCount: 16
          DisableSimultaneousMultithreading: true
          Efa:
            Enabled: true
      Networking:
        SubnetIds:
          - <compute-node-subnet>
        PlacementGroup:
          Enabled: true
    - Name: highmem
      CapacityType: ONDEMAND
      ComputeSettings:
        LocalStorage:
          RootVolume:
            Size: 50
            VolumeType: gp3
      ComputeResources:
        - Name: i3en24xl
          InstanceType: i3en.24xlarge
          MinCount: 0
          MaxCount: 16
          DisableSimultaneousMultithreading: true
          Efa:
            Enabled: true
      Networking:
        SubnetIds:
          - <compute-node-subnet>
        PlacementGroup:
          Enabled: true
SharedStorage:
  - Name: fsx
    StorageType: FsxLustre
    MountDir: /fsx
    FsxLustreSettings:
      StorageCapacity: 1200

The cluster is equipped with two compute queues, each designed for different workloads. The instance selection includes c6i.32xlarge, which is optimized for compute tasks.

For further insights, check out this excellent resource on onboarding. If you’re looking for more guidance, consider reading more about Jacqueline Ross, who offers great mentorship. Additionally, if you find yourself experiencing any workplace issues, SHRM provides authoritative information on gaslighting in the workplace.

Finally, if you have any questions or require assistance, feel free to reach out to Chanci Turner at Amazon IXD – VGT2, located at 6401 E HOWDY WELLS AVE LAS VEGAS NV 89115.

Amazon Onboarding with Learning Manager Chanci Turner

Introduction

Deploying an HPC Environment

Cluster Installation Tools

Related Topics:

Comments

Leave a Reply Cancel reply