The Death of RAID?
Nearly 18 years ago now I moved from the UK to live in the USA in Houston, Texas. One of the first things you learn when moving to Texas is that “RAID® Kills Bugs”! I find it ironic that in the next few years we might be saying “Bugs Killed RAID”. In fairness it will not just be bugs, larger disks are playing a part too, but it is clear that there is a change sweeping through the industry.
Let me explain.
For the last 25 years RAID (Redundant Array of Inexpensive Disks) has been the predominant model for protecting data across a collection of disk drives. Back in 1989 I was product manager for the Compaq SystemPro at Compaq in the UK when we introduced what was debatably the first purpose built industry standard server. It also had one of the first RAID Controllers. It was innovative and it spawned a new class of server. Since then RAID controllers have resulted in the emergence of the shared storage industry and have saved countless customers across the world from disruptive and damaging data loss.
So why am I saying we might be at the start of the death of the Array controller? Three main reasons:
- Rebuild Times. When the Compaq SystemPro was launched the largest drive was 210MB and the maximum capacity for a single RAID controller was 1.2GB using RAID 5. That means in 23 years (using todays 3TB drives) the capacity of an individual disk is now 15000 times bigger.
Because of this disk drives have progressed beyond our ability to handle the rebuild times required for maintaining acceptable redundancy. A 3TB drive can take 36 hours to rebuild in a system that is not doing anything. It can be over a week in an active system. Additionally drives have become smarter but drive sector errors in a RAID controller architecture typically result in a drive failure.
- Performance. The power and performance of the Storage and server platforms. Most servers and indeed most storage, is now built upon a X86 processor architecture. Moore’s law has continued to deliver more and more performance. Array controllers are designed to offload work from the main processors and provide cache but with the massive memory and multicore processing of today’s servers there is less and less reason to build specialized hardware.
- Digitization. Marc Andresson wrote an article for the Wall Street Journal last year entitled “Software Eats the World”. In it he talked about the myriad examples there are of software business models disrupting more physical ones. It is a great article and I encourage you to read it if you have not.
But what does it have to do with array controllers? Well many new storage architectures (Like our MAST architecture that delivers Unified Hybrid Storage), are adopting a dynamic disk pooling architecture. They are still delivering on the parity based data protection we see in RAID controllers but they are doing so without a physical controller. It is all software. It provides faster rebuild times, maps out failing sectors without failing a drive and simplifies management. It enables you to track I/O all the way to the disk with no array controller in between and that enables you to optimize more easily for unpredictable workloads. Maybe most importantly it exposes you less to a failure and simplifies the administration of disk drives.
In the three examples above I did not mention why I started this piece saying “Bugs Killed RAID”. Well think about it. If something goes wrong with a RAID controller maybe you can fix it in firmware but you might just have to respin hardware. Firmware programmers are scarce and costly, spinning hardware is time consuming and costly. Having the ability to fix potential bugs in software is a much more expedient method for both the vendor and the customer. A bug in an array controller wastes time and money for everyone. It only takes one!
As always I welcome your thoughts and comments.
Addendum:- on 10/19/2012 Steve Duplesie of The Enterprise Strategy Group (ESG) posted a similar evaluation of the future of the RAID controller. you can read his post "time to kill clustered RAID Controllers" here http://www.esg-global.com/blogs/time-to-kill-clustered-raid-controllers/