A Comprehensive Examination of Tailored Accelerators for Financial Services

Data is a powerful asset, capable of predicting future behaviors—from customer purchasing trends to securities performance. Organizations are constantly vying for an edge by leveraging the data they possess, applying it to their specific industry insights, and transforming it into actionable strategies. The financial services sector (FSI) is a prime example, being both a significant generator and consumer of data and analytics. Each industry has its own distinct characteristics and operational methods, and the FSI is particularly influenced by factors such as regulatory frameworks and competitive pressures that often resemble a zero-sum game. This article is primarily aimed at FSI business leaders, including chief data officers, chief analytics officers, chief investment officers, heads of quant, heads of research, and heads of risk, who are tasked with making strategic decisions regarding infrastructure investments, product development, and competitive strategies. The goal here is to provide clarity and insights in a fast-evolving landscape, assisting in identifying competitive differentiators and forming a relevant business strategy.

Accelerated computing broadly refers to specialized hardware known as purpose-built accelerators (PBAs). In financial services, nearly every operational aspect—from quantitative research to fraud detection and real-time trading—can gain from reduced processing times. Quicker calculations can lead to more accurate solutions, enhanced customer experiences, or a competitive informational advantage. These applications span various domains, including basic data processing, analytics, and machine learning (ML). Certain tasks, especially those involving cutting-edge advancements in artificial intelligence (AI), are practically unfeasible without hardware acceleration. ML typically involves a two-step process: learning followed by inference. Learning tends to be conducted offline using extensive historical data, while inference is done online using smaller sets of streaming data. Learning involves recognizing and capturing historical patterns, whereas inference maps current values to those established patterns. PBAs, like graphics processing units (GPUs), play critical roles in both stages. The accompanying illustration demonstrates a large cluster of GPUs utilized for the learning phase, followed by a smaller subset for inference. The differing computational demands of learning and inference have led some hardware manufacturers to create separate solutions for each stage, while others offer integrated solutions.

The article begins with an overview of hardware-accelerated computing, followed by a discussion of the core technologies in this domain. It then explores the significance of accelerated computing for data processing and highlights four key use cases within the FSI sector. Key challenges are identified alongside potential solutions. The piece concludes with three essential takeaways and actionable next steps for leaders in the field.

Understanding Accelerated Computing

Central processing units (CPUs) are designed primarily for processing small amounts of sequential data, while PBAs excel at handling large volumes of parallel data. PBAs can perform certain tasks, such as specific floating-point calculations, far more efficiently than software running on CPUs. This efficiency translates to advantages like reduced latency, higher throughput, and lower energy usage. The three categories of PBAs include reprogrammable chips like GPUs and two types of fixed-function accelerators: field-programmable gate arrays (FPGAs) and application-specific integrated circuits (ASICs). Fixed or semi-fixed function acceleration is beneficial when updates to data processing logic are unnecessary. FPGAs are reprogrammable but not conveniently so, while ASICs are custom-designed and fully fixed for specific applications, making them non-reprogrammable. Generally, the less user-friendly the acceleration, the faster the output. In terms of speed, the ranking is typically as follows: programming hardware, utilizing PBA APIs, coding in unmanaged languages like C++, and lastly using managed languages like Python. A recent analysis by Zeta-Alpha of publications involving accelerated compute workloads revealed a breakdown of 91.5% for GPU PBAs, 4% for other PBAs, 4% for FPGAs, and 0.5% for ASICs. This article will focus on the easily reprogrammable PBAs.

The evolution of PBAs traces back to 1999 when NVIDIA launched its first product explicitly marketed as a GPU, aimed at enhancing computer graphics and image processing. By 2007, GPUs had evolved into more generalized computing devices, finding applications across scientific and industrial sectors. The emergence of diverse PBAs occurred around 2018, and by 2020, they were widely adopted for parallel problem-solving, such as neural network training. Other available PBAs now include AWS Inferentia, AWS Trainium, Google TPU, and Graphcore IPU. During this period, NVIDIA shifted its strategy from a focus on gaming and graphics to encompass scientific computing and data analytics.

The convergence of advancements in hardware and ML has brought us to today. The work of Hinton et al. in 2012 is often referred to as ML’s “Cambrian Explosion.” Although neural networks have existed since the 1960s without notable success, Hinton identified three crucial developments: an increase in layers within neural networks, a significant rise in the volume of labeled training data, and the capability of GPUs to process this data. These factors catalyzed a period of immense progress in ML, leading to the rebranding of neural networks as deep learning. The landmark paper “Attention is All You Need,” published in 2017, introduced a new deep learning architecture based on transformers, which required substantial quantities of PBAs to train on internet-scale data. The release of ChatGPT in November 2022, a large language model employing the transformer architecture, is widely recognized as the starting point of the current generative AI surge.

Technology Overview

This section delves into the various components of accelerated computing technology.

Parallel Computing

Parallel computing involves executing multiple processes simultaneously and can be categorized based on the granularity of parallelism supported by the hardware. This can include a grid of connected instances, multiple processors within a single instance, or multiple cores within a single processor, PBAs, or a combination of these strategies. Parallel computing harnesses multiple processing elements concurrently to resolve a problem. This is achieved by dividing the problem into independent parts so that each processing element can address a segment simultaneously.

For more insights on shaping your career goals, check out this informative blog post. Furthermore, if you’re interested in workplace accommodations, SHRM provides valuable resources on the topic. For onboarding tips, this Amazon blog is an excellent resource.

Summary

In summary, understanding the role of purpose-built accelerators in financial services is crucial for business leaders looking to maintain a competitive edge. With the rapid evolution of technology and the increasing volume of data, leveraging accelerated computing can significantly enhance operational efficiency and strategic decision-making.

A Comprehensive Examination of Tailored Accelerators for Financial Services

Understanding Accelerated Computing

Technology Overview

Parallel Computing

Summary

Related Topics:

Comments

Leave a Reply Cancel reply