How to choose SSD - How much flash is enough?
Solid State (SSD) is expensive so it is worth some time making sure that you are making the most of the SSD resources you have.
Enterprise Strategy group1 (ESG) research has shown that the top two barriers to SSD adoption by companies that have not already taken the plunge but are considering it are cost and capacity. I guess you can sum that up as people feel SSD is too expensive and too small for their workloads.
Research Report: Solid-state Storage Market Trends. Source: Enterprise Strategy Group, 2011.
This is probably why the No1 question on most people’s lips with regard to SSDs is “How much do I need”? The typical answer of course is ‘What are your workloads and what requirements are there?’ “You need to do analysis to determine where best to place SSD”
But what if there was a better way? What if you did not need to do in depth analysis? What if you could use a simple formula? Can we solve the problem for you with analysis, mathematics and statistics? No individual workload behaves the same as another but on aggregate workloads do exhibit more deterministic patterns.
A few years ago the University of Santa Cruz in conjunction with NetApp, HP, Seagate and others ran a study entitled “Measurement and analysis of large-scale network file system workloads1”. In it they concluded :- “We found that read-write file access patterns and random file access are far more common than previously thought…”. The size of the data in the test was 22TB.
As recently as June 20th this year Ambrose McNevin wrote an article in Datacenter Dynamics entitled “How Goldman Sachs uses 1% of Flash to optimise data center storage set up”. The article discusses Goldman Sachs use of EMC storage and how EMC is recommending 5% of SSD for a 32TB EMC VNX 5300 system and went on to talk with Matthew Liste, managing director of Core Infrastructure Platform at Goldman Sachs who said “In our case a tiny bit of ‑ flash goes a long way. We put 1% into our arrays and out of that we saw close to 50% of IOPs being handled by that 1% of flash which was way beyond the results we actually thought we’d get.”
So what do these examples tell us?
First they show that there is no need to go overboard with SSD resources. This is especially true if you are consolidating multiple workloads. 5% is a good rule of thumb for the upper end of how much active data you will have on average and that can proxy to how much SSD will be optimal for your workloads. You do not need to overprovision performance to remove bottlenecks.
Second they show that even at Goldman Sachs they did not really understand their workloads and which data is active at any given time. So how can you?
So we know we can use the 5% rule, but which 5% is active at any one time. Your workloads are unpredictable. How do you know where to apply the solid state drives. The good news is that with a modern storage platform you do not need to know. That is the job of a smart caching algorithm like the one present in the Starboard AC series. A little bit of SSD goes a long way because our algorithms are determining which data is hot at any given time and automatically promoting the data to solid state. The SSD resources essentially act as a performance utility for your data that can be drawn upon when needed and returned to the utility when they are not.
There is one more thing that the data in Ambrose McNevin’s article tells us though and that is that EMC is really bad at math or they have very inefficient SSD utilization. The article states that to support the 5% model an EMC VNX 5300 with 32TB would have 20 x 200GB SSDs. I make that 4TB of SSD for 32TB of raw storage. Even calculating the percentage using the raw storage number that looks like 12.5% to me. Further investigation though shows that because EMC mirrors the SSD used for read cache with FAST Cache they have to have more than 5% physically to deliver your applications 5% logically.
So if you want to make the most of your SSD purchases you should:-
Use SSD as a cache in a multiprotocol storage system
Consolidate as many applications and workload types as you can behind your solid state cache (Use Unified Hybrid Storage)
Let the Storage do the hard work of automating the acceleration of your hot data
Make sure you are not paying a high penalty for data protection on your read cache which limits the efficiency of your SSD purchase.
Starboard Storage is laser focused on ensuring that customers do not overprovision storage. We call it Thin Provisioned Performance and just like thin provisioning capacity it saves you both time and money.
As ever if you have comments we would love to hear from you.
1 Research Report: Solid-state Storage Market Trends. Source: Enterprise Strategy Group, 2011.
2 Measurement and analysis of large-scale network file system workloads by the University of Santa Cruz. http://www.ssrc.ucsc.edu/Papers/leung-usenix08.pdf
3 “How Goldman Sachs uses 1% of Flash to optimise data center storage set up” http://www.datacenterdynamics.com/focus/archive/2012/06/how-goldman-sachs-uses-1-flash-optimise-data-center-storage-set