This is the second in a series of posts covering the evolution of Service Provider provided storage services and the underlying technology that supports them. The first post was about Object Storage Services and their place in an overall storage architecture. In this post I’m going to cover Block Storage Services and their typical use cases and deployment models within Service Providers.
Let’s start with a quick introduction to block storage. In its most simple form, block storage is the traditional spinning hard disk that is installed into your PC or laptop. As it’s the foundational unit that the operating system accesses, it underpins the other forms of storage presentation, file and object, that I discussed in the last post. We’re concerned with the block presentation this time though, so let’s focus on that for now.
If you’re using a Mac, you can open a terminal window and run the df command to list the block storage devices that are connected to your system. If you do this, you should see an output that starts with a line that look like the one below:
Filesystem 512-blocks Used Available Capacity Mounted on
The clue that you’re using block storage devices is right there in the second column heading and in a small system like a laptop or desktop block storage devices are very easy to manage. The performance requirements are very modest and the contiguous chunks of storage that you need are small in relative terms, especially when you set this against the fact that drive sizes are constantly increasing. Right now, 1TB SATA drives are standard for 2.5″ laptop drives and 2TB SATA and even 3TB SATA are the standard sizes for 3.5″ desktop drives.
Sounds simple right? Well, like many things to do with technology, the complexity with block storage comes not from the simple, consumer use-case, but is introduced when you need to use the technology to achieve massive scalability, extremes of performance, high availability or efficient operations, or combinations of two or more of these attributes.
In the enterprise or service provider context these challenges are addressed by using intelligent storage arrays interconnected to the hosts that access them via a storage area network (SAN). These intelligent arrays have the capability to aggregate large numbers of individual disk drives together and combine them with semi-conductor-based cache and software to address the availability, performance and scalability challenges. Management efficiency is improved with software tools that enable large amounts of capacity to be managed by a single administrator. A SAN provides similar features for the connectivity fabric between hosts and the storage. Once connected and configured an intelligent array can present chunks of block storage (called LUNs) to hosts across the SAN.
So, how are these components used by Service Providers to construct service offers? And, given the reference in my last post to it, the other question that might be on your mind is whether Amazon’s Elastic Block Storage, or EBS, service uses this type of architecture? The second one of these questions is the simplest to provide an answer for, but not to actually answer. What’s the reason for the cryptic response? Well, no one (except for a select team of engineers and support personnel at Amazon Web Services) knows exactly how the EBS service is constructed. Because the storage devices created on the EBS service are connected to hosts via a virtual machine hypervisor, they are abstracted and as a result they might be on block devices presented in a manner similar to virtual disks or raw device mappings (RDMs) in VMware, or they might be on file devices presented in the same way. It’s literally impossible to tell!
After that little detour into the murky world of AWS architecture, let’s return to the first question about how services can be constructed using a block storage architecture. The first important, and somewhat obvious, point to note is that when you’re dealing with provision of services to support a portfolio of enterprise applications, as is typical in many Service Providers today, you may actually need a relatively wide variety of storage service offerings with differing characteristics. Historically, this has resulted in a pretty complex architecture with different products being required to meet differing performance and availability/recoverability requirements. Even in scenarios where the same product might be used to provide a number of different offers, this would often result different resource pools being required, which would in turn lead to inefficiencies in the allocation of resource, from both a performance and capacity perspective.
If you were building out or refreshing the infrastructure for a similar service portfolio today, there are a couple of relatively new enabling technologies that can provide a lot more flexibility and allow you to make much more efficient use of the resources that need to deploy. These are automated storage tiering and virtual provisioning, which in EMC’s products are combined into a single software feature bundle called FAST VP, which stands for Fully Automated Storage Tiering with Virtual Provisioning. Let’s quickly cover these and then I’ll wrap up this post with some conclusions.
In simplistic terms, virtual provisioning allows you allocate capacity that you don’t actually have installed. Sounds like magic, well it’s not, it’s just clever software functionality that enables you create virtual LUNs whose apparent size can exceed the amount of underlying physical capacity that they consume. Physical storage capacity is not allocated until it used by having data written onto it. Service Providers that opt to use this technology within their block storage infrastructure literally have the ability to sell capacity that they don’t have, or to offer customers an improved price point, if they opt to pass the savings onto customers.
Automated tiering provides a similar set of functionality but in the allocation of performance resources. When using automated tiering, each LUN can be constructed from a blend of different underlying storage elements. Note that I didn’t say disks there, because in this sort of model, it’s very usual to make of solid state Enterprise Flash Drives (EFDs) to provide the highest performance tier in the allocation model. It’s probably best to illustrate this with a theoretical example. Under the old delivery model, a high performance storage tier might have be constructed using 6 300GB fiber channel (FC) drives to hold the data, and 6 more to hold copies to protect the data in the event of failure, so 12 drives in total. Using FAST VP, a higher theoretical performance level can be delivered with 4% of the capacity on EFD 100GB drives, 25% on 450GB FC drives and 71% on 2TB SATA drives. However, this benefit doesn’t only come where only one performance tier is required, but also where a service provider needs to construct a number of offering with different performance characteristics. The ability build these with a blend of different types of underlying storage offers fantastic flexibility and allows the SP to maximum usage of both the performance and capacity available on each of their assets.
So to wrap things up, whilst I did say in the first post in this series that in time block storage would come to represent a smaller and small proportion of the total volume as storage that’s deployed globally over time, it’s still the de facto standard for the majority of structured data and transactional applications today, especially where high availability and data recoverability are important factors. As a result, most Service Providers that provide services to larger enterprises and public sector organisations are likely to have a significant amount of block storage deployed within their infrastructure. The maturity and widespread deployment of virtual provisioning and automated tiering technology means there’s now a better way to do block, so it’s a good time to look at how you’re delivering block today and think about optimising it.