In the olden days (which for purposes
of this article means "the 1990's"), Gartner introduced -
or at least popularised - the concept of "Storage Tiers".
The idea was that with new & differentiated storage technologies
becoming available, some decision had to be made as to which kind of
storage you'd use for a particular kind of data. At the top tier
("Tier 1"), you stored mission-critical,
frequently-accessed, latency-sensitive data like OLTP databases. In
the middle (call it "Tier 2"), you stored less
latency-sensitive data that was still business-critical and needed
backup, DR and/or replication such as email systems, and at the
bottom (call it "Tier 3"), you stored infrequently accessed
or archive data.
Each of these tiers had assumed
physical characteristics: Tier 1 was fast, high-performance (and most
expensive) disk, usually connected to a fibre channel SAN and
featuring high-availability, synchronous replication and so on; Tier
2 was lower-speed disk, with some degree of high availability; Tier 3
was the biggest, cheapest, highest-density disks, probably connected
via file sharing networks, and less likely to have high availability
features included. These physical characteristics in turn led to an
association with specific disk technologies: Tier 1 became
small-capacity 15,000rpm fibre channel drives, Tier 2 became large
capacity fibre channel drives (maybe operating at 10,000rpm), and
Tier 3 became large-capacity SATA drives.
This was fine, for a while: storage
arrays frequently provided no additional performance considerations
beyond spindle type, spindle size and spindle count; and application
managers became used to the the idea that they really only had 3
flavours to choose from, and that Tier 3 was “the slowest”, and
Tier 1 was “the best”, so they chose the latter.
Then someone invented solid
state drives.
With a tiering system so firmly tied to
particular drive technology and “Tier 1” as “the best”
(meaning “the fastest”), the introduction a new, faster storage
device type created a slight problem: what's better than 1?
Fortunately computer scientists all know that you're actually
supposed to start
counting at zero, so the answer was clear: the new “best” was
“Tier 0”, and the top-level descriptions were nuanced to place
high-speed transactional data on this new tier.
Problem solved. Until someone invents
something faster (like phase-change
state drives, or something). At that point we'd have to call the
new technology “Tier -1”, which finally clearly shows how
ridiculous it is to tie a drive technology to an expected workload.
That's the point of this article – we
should be thinking in terms of “workloads”, rather than
“tiering”, since tiering is so closely tied to disk technologies,
and since the physical drive characteristics are no longer the sole
feature to consider. In a NetApp environment there are several
features to take into account when designing the solution for a given
workload: deduplication
& compression, FlashCache,
FlashPools,
FlashAccel,
and so on.
Once we understand what a workload is
going to be, we can design a storage system to provide the best
combination of features to handle that workload – which may mean
that even high-performance workloads are deployed on lower-speed
disks. A classic example of this is a typical Virtual Desktop
Infrastructure (VDI) workload: many (hundreds or even thousands) of
copies of essentially the same operating system and application
binary data, with latency-sensitive access. The many copies of data
can be deduplicated down to a few or even one instance of actual
physically stored data. The first time this data is accessed by a VDI
client, it is placed in the controller FlashCache. Subsequent
requests for the data from any client are then served directly from
the cache. What this means is that the actual disk performance is
almost irrelevant, so lower-speed (and lower-cost) drives can be
used, and fewer of them thanks to the deduplication effect. The
solution becomes cheaper, more efficient, and more performant all at
the same time.
This is just one example, and there are
plenty more. The main point is that this combination of technologies
(slow disk, deduplication, FlashCache) is suitable for that
workload, and gives better
performance than a traditional “Tier 1” storage infrastructure.
It means that it is no longer appropriate to simply use storage
tiering to decide the best infrastructure for a given workload. What
solution designers need to do now is understand the characteristics
of a workload, and then combine the available storage features to
most effectively support it.
So
from now on, think about workloads,
not tiers. This is especially true when trying to develop
Infrastructure-as-a-Service offerings. But more about that later.
No comments:
Post a Comment