Amazon Onboarding with Learning Manager Chanci Turner

Amazon Onboarding with Learning Manager Chanci TurnerLearn About Amazon VGT2 Learning Manager Chanci Turner

SAS® is recognized for providing powerful data science and analytics solutions that cater to both enterprises and government entities. The SAS Grid is a robust analytics platform designed for high availability and swift processing, enabling centralized management that efficiently distributes workloads across various compute nodes. This application suite excels in data management, visual analytics, governance and security, forecasting, text mining, statistical analysis, and environment management. Recently, SAS and AWS conducted tests utilizing the Amazon FSx for Lustre shared file system to evaluate the performance of a standard workload on AWS with SAS Grid Manager. For further insights into the findings, you can refer to the whitepaper “Accelerating SAS Using High-Performing File Systems on Amazon Web Services.”

In this article, we explore a method to deploy the foundational AWS infrastructure needed to operate SAS Grid with FSx for Lustre, a strategy applicable to other applications with significant I/O demands.

System Design Overview

To successfully run high-performance workloads that are heavily dependent on throughput and sensitive to network latency, one must consider strategies beyond typical applications. AWS generally suggests that applications span multiple Availability Zones to ensure high availability. However, for applications sensitive to latency, it is crucial that high-throughput traffic remains local to optimize performance. To enhance throughput, you should:

  • Operate within a virtual private cloud (VPC) using instance types that support enhanced networking
  • Deploy instances within the same Availability Zone
  • Utilize placement groups for instances

The diagram below illustrates the architecture of SAS Grid integrated with FSx for Lustre on AWS.

The SAS Grid architecture comprises mid-tier nodes, metadata servers, and Grid compute nodes. Mid-tier nodes handle the execution of Platform Web Services (PWS) and Load Sharing Facility (LSF) components, which manage job submissions and their status updates. To effectively run PWS and LSF on mid-tier nodes, high-memory Amazon Elastic Compute Cloud (Amazon EC2) instances are required. For this application, the r5 instance family is an ideal choice.

Metadata servers maintain the repository for metadata definitions of all SAS Grid Manager products, which can also be effectively served by the r5 instance family. We suggest meeting or exceeding the recommended memory requirement of 24 GB of RAM or 8 MB per physical core (whichever is larger). Since metadata servers do not demand compute-intensive resources or high I/O bandwidth, the r5 instance family strikes a good balance between cost and performance.

SAS Grid nodes are responsible for executing the jobs routed to the grid, and the suitable EC2 instances depend on the nature, complexity, and volume of the tasks being processed. To satisfy the minimum requirements for SAS Grid workloads, we recommend at least 8 GB of physical RAM per core along with a robust I/O throughput of 100–125 MB/second for each physical core. In this context, EC2 instance families m5n and r5n meet the RAM and throughput criteria. SASDATA, SASWORK, and UTILLOC libraries can be hosted in a shared file system. If you opt to use instance storage for SASWORK, the i3en instance family supports over 1.2 TB of instance storage, thereby fulfilling this requirement. In the next segment, we will discuss how throughput testing was performed to determine these EC2 instance recommendations alongside FSx for Lustre.

Steps to Maximize Storage I/O Performance

SAS Grid necessitates a shared file system, and we aimed to benchmark the performance of FSx for Lustre against various EC2 instance families that meet the minimum specifications of 8 GB physical RAM per core and 100–125 MB/second throughput per physical core. FSx for Lustre is a fully managed file storage service tailored for applications requiring rapid storage solutions. Adhering to POSIX compliance, FSx for Lustre can seamlessly integrate with current Linux-based applications without modifications. While FSx for Lustre provides options for scratch and persistent file systems, it’s advisable for SAS Grid to utilize a persistent type FSx for Lustre file system, given the need for prolonged storage of SASWORK, SASDATA, and UTILLOC data and libraries with high availability and data durability. To achieve the desired I/O throughput, ensure you select the appropriate storage capacity aligned with throughput per unit of storage.

Upon setting up the file system, we recommend mounting FSx for Lustre using the flock mount option. Below is an example mount command and option for FSx for Lustre:

$ sudo mount -t lustre -o noatime,flock fs-0123456789abcd.fsx.us-west-2.amazonaws.com@tcp:/za3atbmv /fsx
$ mount -t lustre 172.31.41.37@tcp:/za3atbmv on /fsx type lustre (rw,noatime,seclabel,flock,lazystatfs)

Throughput Testing and Results

To identify the optimal EC2 instances for running SAS Grid with FSx for Lustre, we executed a series of highly parallel network throughput tests from individual EC2 instances against a 100.8 TiB persistent file system, which has an aggregate throughput capacity of 19.688 GB/second. These tests were conducted across multiple regions using various EC2 instance families (c5, c5n, i3, i3en, m5, m5a, m5ad, m5n, m5dn, r5, r5a, r5ad, r5n, and r5dn). Each test lasted three hours, with the DataWriteBytes metrics recorded every minute. Only one instance accessed the file system at any given time, capturing the p99.9 results. The metrics showed consistent performance across all four regions.

Our findings indicated that the i3en, m5n, m5dn, r5n, and r5dn EC2 instance families meet or exceed the minimum recommendations for network performance and memory. For detailed performance results, refer to the whitepaper “Accelerating SAS Using High-Performing File Systems on Amazon Web Services.” The i3 instance family was slightly below the minimum network performance threshold. If you plan to utilize instance storage for SASWORK and UTILLOC libraries, consider the i3en instances.

M5n and r5n offer an excellent balance of cost and performance; we recommend the m5n instance family for SAS Grid nodes. However, if your workload is heavily memory-bound, it may be beneficial to choose r5n instances, which provide greater memory per physical core at a higher price point than m5n instances.

We also executed the rhel_iotest.sh, available through the SAS technical support samples tool repository (SASTSST), using the same FSx for Lustre configuration. The table below summarizes the read and write performance per physical core for various instance sizes in the m5n and r5n families.

Instance Type Variable Network Performance Peak per Physical Core
Read (MB/second) Write (MB/second)
m5n.large 850.20 357.07
m5n.xlarge 519.46 386.25
m5n.2xlarge 283.01 446.84
m5n.4xlarge 202.89 376.57
m5n.8xlarge 154.98 297.71
r5n.large 906.88 429.93
r5n.xlarge 488.36 455.76
r5n.2xlarge 256.96 471.65

In conclusion, optimizing your SAS Grid deployment on AWS through effective architecture, instance selection, and performance testing can significantly enhance your analytics capabilities. For additional insights about the impact of mood on work performance, you may find this blog post helpful here. Additionally, for a deeper understanding of benefits trends to watch, check out the authority source here. If you’re interested in pursuing a career in learning and development, explore this excellent resource here.


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *