About a decade ago, new companies formed around building online applications in the areas of SaaS, Social Media, and other verticals that required the ability to scale effortlessly in multiple dimensions to support growth and peaks in demand. These companies and technologists built a new kind of infrastructure to service a rapidly-growing customer base that required real-time information. They relied upon low-latency storage resources directly installed in servers as direct-attached-storage (DAS) in order to put the data as close to the CPU as possible. The scale-out database technology that underpinned these applications could manage data across the cluster, and avoided the need to deploy traditional shared storage resources. Examples are shown below:
This is the second half of the Pavilion blog focusing on 3 important design areas of storage products: Bandwidth, Latency, and Density. This is the second entry in this series, focused on Latency and Density.
The access latency from the Host to the media is composed of host storage stack latency, network stack latency, IO controller latency and media access latency. The first three components here are fairly standard and we minimize the number of memory copies and limit data touches to keep latency at a minimum. The media access latency is largely governed by the type of media (NVMe NAND in our case) and associated drive controller overhead.
Topics: NVMe Technology
Applications are becoming increasingly parallel. What used to be done on a single application server is now spread across a cluster of servers operating in parallel. This allows for scaling in multiple dimensions. Need more bandwidth or compute power for your clustered application? Just add more servers to the cluster. Hyperscale customers have been doing this for nearly a decade now, and the number of customers that embrace this architecture are growing constantly.
Storage system vendors have chosen to integrate flash in two ways: incorporate standard off-the-shelf SSDs, or design their own flash modules and controllers. Many of the early all-flash array pioneers, like Violin and TMS, designed their own custom flash modules for what were very sound reasons at the time. The choice to go in one direction or another in this area revolves around several criteria, most notably Performance, Time-to-Market, and Cost. I explore all 3 as it relates to this subject below:
Storage system designers constantly make engineering tradeoffs to maximize three key attributes – Throughput, Latency and Density. This blog series will delve into each of these separately and cover the thought process we went through when we designed the Pavilion Storage Platform as it relates to these attributes. This first blog of the series focuses on Throughput.
We live in a connected world. Millions of apps, billions of users and trillions of things. Consequently, the amount of data that is generated is exploding. Yet at the same time, we are seeing massive fragmentation of where it originates from, its key structure, where it resides, how it gets accessed and processed and how it is ultimately consumed. New-fangled clustered databases, in-memory caches, exotic massively parallel filesystems, structured to semi-structured to unstructured, modern and disparate query languages, are just a few of the technologies that are causing this significant fragmentation.
Over the course of two decades I have been deeply involved in developing products that spanned various domains such as data networking, storage area networking, PCIe interconnects, IO Virtualization, Flash controller, and all flash array systems. My journey took me through a variety of companies such as Cisco/Andiamo, Aprius (acquired by FusionIO) and Violin Memory and I am proud to be part of these teams that developed and shipped innovative products that served the data center.
Businesses are transforming to be more digital and are constantly in the quest of getting a competitive advantage over their competitors. One of the important and popular means is to do more with the data that they have. We all know that the amount of data as in volume, velocity and variety are continuing to grow unabated. To maintain competitive edge in the digital era implies the ability to take control of this deluge, analyze and deliver insights accurately to better serve business needs. This has turned the focus back to storage software, storage media and systems as it was proving to be one of the big bottlenecks.
Several new storage systems have come to market with the goal of delivering shared flash resources as a service to high-scale, distributed applications.
These products take advantage of some of the following technology developments in the storage and networking space: Standards-based PCIe-Connected SSDs, RDMA-Capable Ethernet Networking up to 100 Gbe, a standard storage srotocol designed for PCIe-Connected SSDs (NVMe), as well as a standardp protocol for remotely Accessing NVMe devices (NVMe-Over-Fabrics, or NVMeOF). Note that Red Hat 7.4 and Ubuntu 16 both now include NVMeOF support inbox.