Friday, 18 January 2013

Going For Gold Isn't Enough


For many years, IT services within organisations have been advertised to business projects using a familiar set of names: usually “gold”, “silver” and “bronze”. There may be some noun associated with the service as well, such as “gold processing tier” or “silver storage tier”. The premise behind this kind of naming is that “everyone understands what this means”, and to some extent this is true from a relative point of view – gold is understood to be “better” than silver, which in turn is “better” than bronze. But what does “better” mean in the context of the services on offer? How can someone differentiate between the different services & decide which one to use? What happens when a new service is introduced that fits between the existing ones?

Most of the time the answer to these questions is “no-one knows”, which is less than ideal. The big problem with this precious-metal-related naming methodology (or any other arbitrary method such as colours, gemstones or suchlike), is that although they give a relative indication of “goodness”, they don't actually tell anyone what the service offers, and nor is there any consistency between what a name means from one organisation to the next (or even within a single organisation – they are arbitrary terms, after all). This usually results in a business project simply choosing what sounds like the “best” service (i.e. “gold”), even if the project's needs don't align with what the service is offering. Alternatively a cash-strapped project may choose the “least” option (i.e “bronze”) simply to save on costs, without understanding, for example, that running a database on bottom-tier infrastructure just won't work. This problem is compounded as IT organisations become more like service providers, and especially as automated orchestration is used more to provision and advertise services to business users through self-service portals and the like. It's this context which leads us to understand the way we should be designing and describing our IT services, which is to be as specific as possible.

The concept behind being a service provider (either internally to an organisation or externally), is to consolidate resources and then apportion them to different projects or customers on a per-use basis. The idea is that by offering a standard set of well-defined services, the provider can reduce waste, streamline processes, take advantage of infrastructure efficiencies like data deduplication and single-instance cloning, and save their organisation money (and/or make a margin). The key phrase here is “standard set of well-defined services”: in the service-provider context, customers (or projects) must choose from a menu, rather than being allowed to pick and choose how the ingredients will be combined. In turn, the service provider must be very clear on what is included in the services being offered, and also on what the price will be per unit.

In a NetApp storage context, this means that some understanding of potential workloads must exist in order to develop a service appropriate for each workload the provider wants to cater to, including all of the efficiencies and capabilities relevant to that workload. The goal is to then normalise the per-gigabyte pricing so that simple comparisons can be made on the effective usable space, rather than on the basic physical capacity. For example, likely de-duplication rates for Virtual Server (VSI) workloads in VMware are around 50%, which is greater than the de-duplication rates for simple file sharing at around 20%, so a provider can define different services with per-gigabyte rates for these two workloads that take these savings into account. For example, the “NFS datastore for VMware VSI” might be offered at $0.50/GB, while the “NFS datastore for general data” might be offered at $0.80/GB. This naming and pricing helps the customer or project understand what they are ordering, and helps the service provider direct the customer to the most appropriate service for a workload.

Once you start naming these services in a descriptive way, you can extend them to add additional capabilities. For example the “NFS datastore for VMware VSI” might be extended to a new service called “NFS datastore for VMware VSI with local backup”, which introduces NetApp SnapShots on a defined schedule, and includes an additional per-gigabyte price to cover to cost of capacity used for those backups (perhaps a 20% premium). Another example might be “NFS datastore for general data with Disaster Recovery”, which would include a SnapMirror copy of the data at a remote site, and would have an associated premium to cover both copies of the data, networking costs, and so on.

It's easy to see how infrastructure policies can be applied to basic services to build up a complete catalogue of services, and how the capabilities and efficiencies of the infrastructure can be used to set pricing so that customers or projects can select based on both functional requirements and budget, and service providers can direct customers to the most appropriate infrastructure. With a descriptive naming system, providers can easily introduce new or extended services as technologies become available, without worrying about having to squeeze between or around arbitrary names (is aluminium better or worse than bronze? What sits between silver & gold?). Furthermore, if a service is named according to the functionality it provides, then the underlying technology can be swapped out without necessarily having to re-name the service: for example, if a file sharing service moves from fibre-channel disk to SATA disk plus FlashCache, the customer's view of the functionality remains the same, even though the underlying technologies moved from what might once have been called “gold” storage to what might once have been called “silver”.

So consider making bland old “gold”, “silver” and “bronze” and thing of the past, and well-defined, descriptive services that incorporate infrastructure capabilities the way of the future. It will certainly make life less confusing for your projects & customers.

Thursday, 10 January 2013

Stop Thinking About Storage as "Tiers" & Start Thinking About Workloads


In the olden days (which for purposes of this article means "the 1990's"), Gartner introduced - or at least popularised - the concept of "Storage Tiers". The idea was that with new & differentiated storage technologies becoming available, some decision had to be made as to which kind of storage you'd use for a particular kind of data. At the top tier ("Tier 1"), you stored mission-critical, frequently-accessed, latency-sensitive data like OLTP databases. In the middle (call it "Tier 2"), you stored less latency-sensitive data that was still business-critical and needed backup, DR and/or replication such as email systems, and at the bottom (call it "Tier 3"), you stored infrequently accessed or archive data.

Each of these tiers had assumed physical characteristics: Tier 1 was fast, high-performance (and most expensive) disk, usually connected to a fibre channel SAN and featuring high-availability, synchronous replication and so on; Tier 2 was lower-speed disk, with some degree of high availability; Tier 3 was the biggest, cheapest, highest-density disks, probably connected via file sharing networks, and less likely to have high availability features included. These physical characteristics in turn led to an association with specific disk technologies: Tier 1 became small-capacity 15,000rpm fibre channel drives, Tier 2 became large capacity fibre channel drives (maybe operating at 10,000rpm), and Tier 3 became large-capacity SATA drives.

This was fine, for a while: storage arrays frequently provided no additional performance considerations beyond spindle type, spindle size and spindle count; and application managers became used to the the idea that they really only had 3 flavours to choose from, and that Tier 3 was “the slowest”, and Tier 1 was “the best”, so they chose the latter.

Then someone invented solid state drives.

With a tiering system so firmly tied to particular drive technology and “Tier 1” as “the best” (meaning “the fastest”), the introduction a new, faster storage device type created a slight problem: what's better than 1? Fortunately computer scientists all know that you're actually supposed to start counting at zero, so the answer was clear: the new “best” was “Tier 0”, and the top-level descriptions were nuanced to place high-speed transactional data on this new tier.

Problem solved. Until someone invents something faster (like phase-change state drives, or something). At that point we'd have to call the new technology “Tier -1”, which finally clearly shows how ridiculous it is to tie a drive technology to an expected workload.

That's the point of this article – we should be thinking in terms of “workloads”, rather than “tiering”, since tiering is so closely tied to disk technologies, and since the physical drive characteristics are no longer the sole feature to consider. In a NetApp environment there are several features to take into account when designing the solution for a given workload: deduplication & compression, FlashCache, FlashPools, FlashAccel, and so on.

Once we understand what a workload is going to be, we can design a storage system to provide the best combination of features to handle that workload – which may mean that even high-performance workloads are deployed on lower-speed disks. A classic example of this is a typical Virtual Desktop Infrastructure (VDI) workload: many (hundreds or even thousands) of copies of essentially the same operating system and application binary data, with latency-sensitive access. The many copies of data can be deduplicated down to a few or even one instance of actual physically stored data. The first time this data is accessed by a VDI client, it is placed in the controller FlashCache. Subsequent requests for the data from any client are then served directly from the cache. What this means is that the actual disk performance is almost irrelevant, so lower-speed (and lower-cost) drives can be used, and fewer of them thanks to the deduplication effect. The solution becomes cheaper, more efficient, and more performant all at the same time.

This is just one example, and there are plenty more. The main point is that this combination of technologies (slow disk, deduplication, FlashCache) is suitable for that workload, and gives better performance than a traditional “Tier 1” storage infrastructure. It means that it is no longer appropriate to simply use storage tiering to decide the best infrastructure for a given workload. What solution designers need to do now is understand the characteristics of a workload, and then combine the available storage features to most effectively support it.

So from now on, think about workloads, not tiers. This is especially true when trying to develop Infrastructure-as-a-Service offerings. But more about that later.