KEITH MANTHEY | Field CTO
Reflecting on this year’s Supercomputing conference in Atlanta, SC24, we are reminded of how broad the “High-Performance Computing” (HPC) category is. HPC spans applications across Academic Research, State Change Analysis, Fluid Dynamics in Enterprises, and Algorithmic Trading in financial firms. Despite this diversity, one common thread remains: the relentless pursuit of scale and performance.
HPC’s roots lie in what’s often called “embarrassingly parallel” computing, where tasks are broken down into independent, parallel processes. While much has been written about the concept of parallel computing, little attention has been given to the impact of embarrassingly parallel workloads at a massive scale. This oversight is significant, as scale—combined with the ability to parallelize tasks—is the very foundation of HPC’s power.
The Evolution of Scale in HPC
HPC is fundamentally about how specific work tasks can be federated at scale to break down a problem into the lowest common denominator. Scale and the ability to parallelize tasks are the essence of how HPC improves its performance. However, scaling isn’t just about adding more compute nodes—it’s about designing architectures that can handle the demands of parallelization, resource distribution, and efficiency.
The key elements of scale and their impact on HPC architecture include:
- Resource Distribution: Ensuring seamless compute, memory, and storage resource orchestration.
- Parallel File Access: Managing billions of files and the accompanying file locks at speeds that keep pace with exascale computing.
- Efficiency: Optimizing GPU utilization and storage architectures to eliminate bottlenecks.
GPUs: A Game Changer and a Challenge
The introduction of GPUs has been a game-changer in the world of HPC. Their ability to distribute work in parallel has significantly accelerated computation, opening up new possibilities and exciting opportunities for the future of HPC. But with this power comes complexity.
Before GPUs became a cornerstone of High-Performance Computing (HPC), CPUs reigned supreme, and their limitations defined the boundaries of what HPC systems could achieve. While powerful for general-purpose computation, CPUs were constrained by the number of cores, processing speed, and available memory. These constraints posed significant challenges for large-scale computations, such as data sorting and merging, demanding more memory than systems could handle directly.
To overcome this, HPC systems relied on “scratch space.” Scratch storage acted as a temporary, high-speed buffer where data could be swapped between memory and disk without crashing the system. Initially, its sole purpose was to facilitate these memory swaps during computation-heavy tasks. Over time, scratch storage evolved into a vital component of HPC, serving as a repository for users’ job files that required fast and frequent access.
Today, GPUs have all but eliminated the need for disk swapping in HPC, moving data processing onto the GPU itself. However, scratch storage remains essential—albeit now focused on enabling parallel file access at lightning-fast speeds.
While GPUs have revolutionized HPC, they also come with significant challenges. GPUs are expensive and complex to manage, particularly when fractionalizing workloads and sharing GPU resources across multiple jobs. Many architectures are designed around specific GPUs to save costs, but this can lead to inefficiencies if architectures are not optimized for the hardware.
A study by NERSC revealed that 50% of GPU jobs used less than 25% of GPU memory, highlighting a significant underutilization problem. With careful cluster design and workload optimization, GPU utilization can exceed 75%, dramatically improving efficiency. Are your GPUs utilized fully, or are inefficiencies limiting your ROI?
Storage: The Backbone of Exascale Computing
The shift to exascale computing has introduced new challenges for storage systems. Modern GPUs like NVIDIA’s H100 can open millions of files in parallel, creating a monumental scale problem for storage architectures.
From its earliest days, scratch storage locking and how it scales has been an issue. The Message Passing Interface (MPI) was developed in 1991 to address shared file access, and it remains relevant today. Modern solutions tackle file locking in innovative ways:
- Memory-Based Offsets: Using scale-up memory to offset the impact of lock pointers on memory.
- No Locks on Read: Dell PowerScale & others only opens a file lock when a write is attempted to a file, bypassing locks on reads.
- Client-Side Lock Distribution: Technologies from VAST Data and Weka use client-side drivers to distribute lock file memory, enabling efficient file access management at scale.
Each approach affects how storage architectures are designed and highlights the need for thoughtful, scalable solutions. How much conscious design was used in your storage architecture for HPC?
A Promising Future
Thanks to technological advances, the future of HPC for exascale computing has never been more promising. However, to fully realize this potential, we must address often-overlooked factors like GPU utilization starting with storage scalability. With the right strategies, the future of HPC will be bright, and the possibilities will be endless.
At BlueAlly, we’re dedicated to helping organizations design HPC systems that unlock their full potential. Whether you’re tackling inefficiencies in GPU use or building scalable storage architectures, we’re here to guide you through the complexities of HPC and help you maximize your investment. The future of HPC is bright, and with the right design choices, the possibilities are endless.