Subscribe via E-mail

Your email:

Follow Me

Starboard Storage Blog

Current Articles | RSS Feed RSS Feed

How to choose SSD - How much flash is enough?

 

Solid State (SSD)  is expensive so it is worth some time making sure that you are making the most of the SSD resources you have.

Enterprise Strategy group1 (ESG) research has shown that the top two barriers to SSD adoption by companies that have not already taken the plunge but are considering it are cost and capacity. I guess you can sum that up as people feel SSD is too expensive and too small for their workloads.

Research Report: Solid-state Storage Market Trends. Source: Enterprise Strategy Group, 2011.

This is probably why the No1 question on most people’s lips with regard to SSDs is “How much do I need”? The typical answer of course is ‘What are your workloads and what requirements are there?’ “You need to do analysis to determine where best to place SSD”

But what if there was a better way? What if you did not need to do in depth analysis? What if you could use a simple formula? Can we solve the problem for you with analysis, mathematics and statistics? No individual workload behaves the same as another but on aggregate workloads do exhibit more deterministic patterns.

A few years ago the University of Santa Cruz in conjunction with NetApp, HP, Seagate and others ran a study entitled “Measurement and analysis of large-scale network file system workloads1”. In it they concluded :- “We found that read-write file access patterns and random file access are far more common than previously thought…”. The size of the data in the test was 22TB.

As recently as June 20th this year Ambrose McNevin wrote an article in Datacenter Dynamics entitled “How Goldman Sachs uses 1% of Flash to optimise data center storage set up”. The article discusses Goldman Sachs use of EMC storage and how EMC is recommending 5% of SSD for a 32TB EMC VNX 5300 system and went on to talk with Matthew Liste, managing director of Core Infrastructure Platform at Goldman Sachs who said “In our case a tiny bit of ‑ flash goes a long way. We put 1% into our arrays and out of that we saw close to 50% of IOPs being handled by that 1% of flash which was way beyond the results we actually thought we’d get.”

So what do these examples tell us?

First they show that there is no need to go overboard with SSD resources. This is especially true if you are consolidating multiple workloads. 5% is a good rule of thumb for the upper end of how much active data you will have on average and that can proxy to how much SSD will be optimal for your workloads.  You do not need to overprovision performance to remove bottlenecks.

Second they show that even at Goldman Sachs they did not really understand their workloads and which data is active at any given time. So how can you?

So we know we can use the 5% rule, but which 5% is active at any one time. Your workloads are unpredictable. How do you know where to apply the solid state drives. The good news is that with a modern storage platform you do not need to know. That is the job of a smart caching algorithm like the one present in the Starboard AC series. A little bit of SSD goes a long way because our algorithms are determining which data is hot at any given time and automatically promoting the data to solid state. The SSD resources essentially act as a performance utility for your data that can be drawn upon when needed and returned to the utility when they are not.

There is one more thing that the data in Ambrose McNevin’s article tells us though and that is that EMC is really bad at math or they have very inefficient SSD utilization. The article states that to support the 5% model an EMC VNX 5300 with 32TB would have 20 x 200GB SSDs. I make that 4TB of SSD for 32TB of raw storage. Even calculating the percentage using the raw storage number that looks like 12.5% to me. Further investigation though shows that because EMC mirrors the SSD used for read cache with FAST Cache they have to have more than 5% physically to deliver your applications 5% logically.

So if you want to make the most of your SSD purchases you should:-

Use SSD as a cache in a multiprotocol storage system
Consolidate as many applications and workload types as you can behind your solid state cache (Use Unified Hybrid Storage)
Let the Storage do the hard work of automating the acceleration of your hot data
Make sure you are not paying a high penalty for data protection on your read cache which limits the efficiency of your SSD purchase.

Starboard Storage is laser focused on ensuring that customers do not overprovision storage.  We call it Thin Provisioned Performance and just like thin provisioning capacity it saves you both time and money.

Starboard Unified Hybrid Storage


As ever if you have comments we would love to hear from you.

References:-

1 Research Report: Solid-state Storage Market Trends. Source: Enterprise Strategy Group, 2011.

2 Measurement and analysis of large-scale network file system workloads by the University of Santa Cruz. http://www.ssrc.ucsc.edu/Papers/leung-usenix08.pdf

3 “How Goldman Sachs uses 1% of Flash to optimise data center storage set up” http://www.datacenterdynamics.com/focus/archive/2012/06/how-goldman-sachs-uses-1-flash-optimise-data-center-storage-set

 

 

 

Comments

With OnTap 8.1.1, you can start using hybrid aggregates with storage. The system will take whatever you give it for use inside an existing aggregate to cache what is typically a great SQL load - random access.  
 
Your performance on that aggregate is stepped up dramatically. Add 6 SSDs to an aggregate with say 24 SAS drives, and watch your filer breeze through the workload....
Posted @ Wednesday, October 17, 2012 5:25 PM by Bill Blomgren
Hi Bill. Thanks for commenting. I know Netapp has released SSD caching finally and am pleased that they are now admiting that this is the right path for customers. Only a few weeks ago they were saying how a hybrid system like the Starboard system was not the right model. However it it still prone to the same question and issue above. How much do you need? You suggest 6 SSD drives for 24 SAS drives. This is because you are having to stripe RAID across all those drives and are really still treating them like spinning media. This is inefficient. With Starboard Storage you get more efficient use of SSD because it is treated purely as a cache. It was designed in rather than added in. Every system comes with the SSD built in and you do not need to add more unless the workload demands it and most of our customer find the standard SSD is effective for them. It is also shared across all of the workloads and not based on RAID aggregates (No RAID grouops in our modern architecture) and it is not limited to a small effective capacity as with NetApp, where for instance a 2240 is limited to just 300GB of flash pool and just 1TB for a 3240. You also of course have to buy way more than the usable SSD capacity as you scale the read cache capacility because of the overheads dictated by the data ONTAP architecture.
Posted @ Saturday, October 20, 2012 12:20 PM by Lee Johns
Post Comment
Name
 *
Email
 *
Website (optional)
Comment
 *

Allowed tags: <a> link, <b> bold, <i> italics