Thursday, 10 January 2013

Stop Thinking About Storage as "Tiers" & Start Thinking About Workloads


In the olden days (which for purposes of this article means "the 1990's"), Gartner introduced - or at least popularised - the concept of "Storage Tiers". The idea was that with new & differentiated storage technologies becoming available, some decision had to be made as to which kind of storage you'd use for a particular kind of data. At the top tier ("Tier 1"), you stored mission-critical, frequently-accessed, latency-sensitive data like OLTP databases. In the middle (call it "Tier 2"), you stored less latency-sensitive data that was still business-critical and needed backup, DR and/or replication such as email systems, and at the bottom (call it "Tier 3"), you stored infrequently accessed or archive data.

Each of these tiers had assumed physical characteristics: Tier 1 was fast, high-performance (and most expensive) disk, usually connected to a fibre channel SAN and featuring high-availability, synchronous replication and so on; Tier 2 was lower-speed disk, with some degree of high availability; Tier 3 was the biggest, cheapest, highest-density disks, probably connected via file sharing networks, and less likely to have high availability features included. These physical characteristics in turn led to an association with specific disk technologies: Tier 1 became small-capacity 15,000rpm fibre channel drives, Tier 2 became large capacity fibre channel drives (maybe operating at 10,000rpm), and Tier 3 became large-capacity SATA drives.

This was fine, for a while: storage arrays frequently provided no additional performance considerations beyond spindle type, spindle size and spindle count; and application managers became used to the the idea that they really only had 3 flavours to choose from, and that Tier 3 was “the slowest”, and Tier 1 was “the best”, so they chose the latter.

Then someone invented solid state drives.

With a tiering system so firmly tied to particular drive technology and “Tier 1” as “the best” (meaning “the fastest”), the introduction a new, faster storage device type created a slight problem: what's better than 1? Fortunately computer scientists all know that you're actually supposed to start counting at zero, so the answer was clear: the new “best” was “Tier 0”, and the top-level descriptions were nuanced to place high-speed transactional data on this new tier.

Problem solved. Until someone invents something faster (like phase-change state drives, or something). At that point we'd have to call the new technology “Tier -1”, which finally clearly shows how ridiculous it is to tie a drive technology to an expected workload.

That's the point of this article – we should be thinking in terms of “workloads”, rather than “tiering”, since tiering is so closely tied to disk technologies, and since the physical drive characteristics are no longer the sole feature to consider. In a NetApp environment there are several features to take into account when designing the solution for a given workload: deduplication & compression, FlashCache, FlashPools, FlashAccel, and so on.

Once we understand what a workload is going to be, we can design a storage system to provide the best combination of features to handle that workload – which may mean that even high-performance workloads are deployed on lower-speed disks. A classic example of this is a typical Virtual Desktop Infrastructure (VDI) workload: many (hundreds or even thousands) of copies of essentially the same operating system and application binary data, with latency-sensitive access. The many copies of data can be deduplicated down to a few or even one instance of actual physically stored data. The first time this data is accessed by a VDI client, it is placed in the controller FlashCache. Subsequent requests for the data from any client are then served directly from the cache. What this means is that the actual disk performance is almost irrelevant, so lower-speed (and lower-cost) drives can be used, and fewer of them thanks to the deduplication effect. The solution becomes cheaper, more efficient, and more performant all at the same time.

This is just one example, and there are plenty more. The main point is that this combination of technologies (slow disk, deduplication, FlashCache) is suitable for that workload, and gives better performance than a traditional “Tier 1” storage infrastructure. It means that it is no longer appropriate to simply use storage tiering to decide the best infrastructure for a given workload. What solution designers need to do now is understand the characteristics of a workload, and then combine the available storage features to most effectively support it.

So from now on, think about workloads, not tiers. This is especially true when trying to develop Infrastructure-as-a-Service offerings. But more about that later.

No comments:

Post a Comment