BlueAllyBlueAlly

As enterprise infrastructure races to keep pace with modern applications, it has become abundantly clear that storage (and more specifically, storage connectivity) can no longer remain an afterthought. 

AI and machine learning (ML) workloads are exploding in size. Real-time analytics pipelines demand unprecedented throughput. High-performance databases expect millisecond-class responsiveness. Cloud-native platforms continue scaling into hundreds or thousands of microservices, each generating or consuming data continuously. The sheer pressure on the datacenter I/O path has never been higher. 

And yet, many organizations still rely on storage protocols designed more than a decade ago. Protocols like iSCSI and traditional Fibre Channel have served enterprises well, but they were never designed for the intensively parallel, ultra-low-latency, data-hungry landscape we operate in today. In an era where a single GPU server can saturate multiple terabytes per second of internal bandwidth, traditional storage networking becomes a bottleneck far too quickly. 

That’s why NVMe-over-Fabrics (NVMe-oF) is capturing so much attention. It brings the performance of local NVMe drives into the world of networked storage, without the latency, overhead, and architectural limitations of older protocols. It lets storage operate at the same speed as modern compute. And it lays the foundation for the scalable, disaggregated datacenter architectures that AI, cloud, and edge environments increasingly require. 

What Exactly Is NVMe-oF?

To understand NVMe-oF, we need to look at NVMe itself. 

NVMe (Non-Volatile Memory Express) was created to unlock the performance potential of NAND flash and modern SSDs by using the PCIe interface. Unlike older SATA and SAS protocols, NVMe supports massive parallelism, up to tens of thousands of queues with thousands of commands per queue. It is fast, efficient, and built to take advantage of modern multi-core CPUs and high-performance storage media. 

But there has always been a limitation: NVMe works locally. Traditional NVMe drives sit directly on a server’s PCIe bus, meaning only that server can use them. If other servers need high-performance shared storage, organizations turn to network-based protocols like iSCSI or Fibre Channel, which introduce overhead and dilute NVMe’s capability. 

NVMe-oF solves that problem. It extends NVMe across the network, allowing storage devices located anywhere on a high-speed fabric to deliver performance that closely resembles local NVMe. It takes NVMe’s efficient command set and parallel queue architecture and maps it onto different network transports, including: 

  • NVMe-TCP – Runs over standard TCP/IP and Ethernet and is easy to deploy. 
  • NVMe-RDMA (RoCE/iWARP) – Provides ultra-low latency bypassing kernel overhead. 
  • NVMe-FC – Extends existing Fibre Channel environments with NVMe semantics. 

The result is shared storage that performs like direct-connected NVMe.

 

Why NVMe-oF Matters: The Case for Modernizing Storage Fabrics 

NVMe-oF isn’t just a new protocol. It represents a generational shift in how datacenters design, scale, and operate storage. 

It Delivers Microsecond-Class Latency Across the Network 

Traditional network storage adds significant latency. iSCSI overhead often pushes I/O operations into the millisecond range. Fibre Channel tends to fare better but still adds substantial control-plane overhead and serialization delays. 

NVMe-oF eliminates most of that overhead because: 

  • It uses the NVMe command set, not SCSI emulation 
  • It supports multiple parallel queues, just like local NVMe 
  • RDMA-based fabrics bypass the kernel entirely, lowering CPU usage and latency 

It Enables True Scalability for Modern Applications 

Traditional storage architectures force you to scale compute and storage together. NVMe-oF decouples them, enabling: 

  • Shared NVMe pools accessible to any server
  • Dynamic scaling of storage independent of compute nodes 
  • Composable infrastructure, where resources are allocated on demand 
  • Better utilization of expensive NVMe SSDs 

Instead of over-provisioning NVMe drives in each server—or worse, leaving unused capacity stranded—NVMe-oF allows you to right-size your storage footprint without sacrificing performance. 

It Fully Utilizes High-Performance NVMe SSDs 

Local NVMe is extremely fast, but that speed is often underutilized. A single server has limited workloads, and its internal NVMe capacity often sits idle. 

NVMe-oF unlocks that trapped performance by allowing centralized NVMe arrays or disaggregated storage nodes to share bandwidth across multiple compute servers. This increases ROI and improves overall infrastructure efficiency. 

It Works on Modern Ethernet Networks 

This is one of the most practical advantages. While RDMA transports require specific NICs and configurations, NVMe-TCP works on any modern Ethernet network, without requiring specialized hardware or significant network redesign. 

This makes adoption significantly easier and reduces capital expenditure, operational complexity, and vendor lock-in. 

It Aligns Perfectly With AI, GPU, and Real-Time Workloads 

AI and GPU-accelerated workloads are especially sensitive to storage bottlenecks. GPUs must be kept fed with massive streams of data, and any delay slows training, increases costs, and wastes silicon. 

NVMe-oF makes it possible to build high-throughput, low-latency storage backends that keep GPUs constantly busy, dramatically increasing the efficiency of AI clusters.  

 

How NVMe-oF Works 

At its core, NVMe was built to extract full performance from modern SSDs by replacing the bottlenecks of older SCSI-based protocols with a highly parallel design. The key components of that design are: 

  • Submission queues (SQs) – Where hosts place commands to be executed
  • Completion queues (CQs) – Where devices return the results of those commands
  • Massive parallelism – NVMe supports up to 65,535 queues, each with 65,535 outstanding commands, allowing enormous concurrency 

On a local NVMe drive, this queue structure maps directly to CPU cores. Each core can have its own dedicated queue pair, eliminating lock contention and enabling the drive to process I/O at extremely high parallel rates. 

In contrast, legacy protocols like iSCSI and traditional Fibre Channel rely on SCSI’s far more serialized command-processing model, which typically funnels I/O through a single queue or limited queue set. This means multiple I/O operations have to compete for the same limited resources. It also increases latency and restricts the parallelism modern SSDs are capable of. 

 Transport Mechanics 

NVMe-oF isn’t tied to any one network technology. Instead, it can operate over multiple fabrics: 

NVMe-TCP – Uses standard Ethernet and TCP/IP, requires no special NICs or switches, and introduces slightly more latency than RDMA, but still far below iSCSI. This is emerging as the mainstream choice because it balances performance and ease of deployment. 

NVMe-RDMA (RoCEv2/iWARP) – Provides ultra-low latency by bypassing the kernel and requires RDMA-capable NICs and appropriate switch configurations. This is ideal for demanding AI/ML or high-performance computing environments.  

NVMe-FC – Uses existing Fibre Channel networks and allows organizations with FC investments to evolve without overhauling infrastructure. While FC is declining in new deployments, many enterprises still rely heavily on it. 

Discovery, Subsystems, and Multipathing 

NVMe-oF adds several architectural components that help hosts find storage targets, establish connections, and ensure those connections stay resilient.  

In a local NVMe setup, the server automatically sees any NVMe device physically attached via PCIe. But in NVMe-oF, the storage is somewhere out on the network, which means the host needs a way to locate it. NVMe-oF uses discovery controllers to provide information such as what storage systems exist on the fabric, how to reach them, and what NVMe subsystems or namespaces they expose.  

An NVMe subsystem is a logical grouping that organizes storage in a way that makes it easier to present to hosts. You can think of a subsystem as a “storage service” that includes one or more NVMe controllers, one or more namespaces, and associated paths and access controls. This abstraction allows a single NVMe-oF storage array to present multiple independent storage pools to different hosts or applications. 

A namespace is the NVMe term for a block storage volume. In a traditional SAN, you might call it a LUN. Namespaces can be dedicated to a single server or shared by multiple servers (with proper coordination); thin- or thick-provisioned; and created or expanded dynamically. They are what applications ultimately read and write data to. 

Multipathing is critical in enterprise storage because it ensures high availability. If one path fails, traffic seamlessly reroutes to another. It also provides load balancing, which means I/O traffic spreads across multiple connections for better performance. 

In NVMe-oF, multipathing is built into the architecture. A host can establish multiple fabric connections to multiple target ports across multiple network paths. And NVMe automatically handles failover and load distribution. 

Network Switching and Platform Management 

Because NVMe-oF is highly sensitive to latency and congestion, network switching matters. Low-latency Ethernet switches reduce jitter. QoS ensures storage traffic maintains priority. Flow control and congestion visibility maintain throughput. And platform management tools offer insights into I/O paths, latency, and hotspots. These capabilities help teams design predictable, high-performance fabrics.  

 

NVMe-oF vs. Legacy Protocols: A Performance Comparison 

Let’s compare NVMe-oF with iSCSI and Fibre Channel in real terms. 

Latency 

Publicly available industry benchmarks have found that NVMe-TCP can reach latencies between 25 and 40 microseconds (μs), compared to 100 to 200 μs with iSCSI. NVMe-RDMA achieves even less latency at 10 to 20 μs, which approaches local NVMe performance.  

Storage vendor Simplyblocks reported that NVMe over TCP delivered an average access latency reduction of about 25% over iSCSI, without changing any other parameters.  

These aren’t small improvements. They represent a foundational shift for latency-sensitive workloads. 

Throughput 

Modern NVMe SSDs can exceed 7 GB/s and hundreds of thousands of IOPS. NVMe-oF fabrics are capable of: 

  • Saturating 25/40/100Gb Ethernet links with ease 
  • Supporting multi-line or multi-path configurations 
  • Handling millions of IOPS at scale 

Simplyblocks found that NVMe over TCP also increased IOPS performance over iSCSI by up to 35% and throughput by up to 20%, all without switching any hardware. 

CPU Overhead 

iSCSI is notorious for consuming CPU cycles because of its deep software stack. NVMe-oF significantly reduces this due to its streamlined command set and, in RDMA, kernel bypass capabilities. 

Real-World Impacts 

Lower latency + higher throughput yields faster transaction commits in databases, higher GPU utilization in AI workflows, faster VM and container startups, and more predictable performance during peak loads. All of which ultimately means your applications perform more efficiently, improving customer experiences and employee productivity. 

 

Where NVMe-oF Is Being Used Today 

High-Performance Databases 

Databases place heavy demands on storage I/O performance and consistency. NVMe-oF improves transaction time, read/write concurrency, row-level locking efficiency, and checkpoint and log flush times. This makes it ideal for distributed SQL engines, OLTP workloads, and real-time analytics. 

AI and ML Training 

AI workloads constantly read from massive datasets. NVMe-oF ensures higher throughput to feed GPUs, less idle GPU time, faster epochs and training cycles, and better scaling for multi-node clusters. 

Cloud and Virtualization Environments 

Shared NVMe pools mean hypervisors can access extremely fast storage without equipping each host with local NVMe. This supports multi-tenant clouds, VM density increases, lower costs per virtualized workload, and simpler scaling of storage independent of compute. 

Edge Computing 

Edge environments need fast, local-feeling storage, but can’t always physically colocate SSDs with compute. NVMe-oF provides low-latency remote storage, scalability across distributed nodes, and high availability for remote sites. 

Kubernetes and Cloud-Native Ecosystems 

Stateful containers benefit from high-speed persistent volumes. NVMe-oF integrates cleanly through container storage interface (CSI) drivers, which are plugins that let Kubernetes talk to external storage systems. They’re the translation layer that allows containerized applications to request, mount, expand, and detach storage volumes, no matter what storage platform sits underneath. NVMe-oF also utilizes dynamic provisioning and composable storage backends to unlock high-performance stateful microservices and real-time pipelines.  

 

NVMe-oF and the Future of the Datacenter 

As datacenters evolve toward disaggregated, software-defined models, the question is no longer whether enterprises will adopt NVMe-oF, but when. The demands of modern application platforms, from AI and cloud to containers, real-time analytics, and the edge, are simply outpacing the capabilities of iSCSI and Fibre Channel. 

NVMe-oF brings local NVMe performance to shared storage environments and sets the stage for the next decade of datacenter growth. The future datacenter will be faster, more flexible, and more distributed, with NVMe-oF as the connectivity layer supporting it all. 

Organizations that want to move in this direction need a clear modernization strategy. BlueAlly helps teams plan, design, and deploy storage fabrics that take full advantage of NVMe-oF without introducing operational complexity. 

Connect with BlueAlly to explore your next steps.