What’s the Best Way to Configure a RAID?

Posted on by Larry

[ Updated July 29, 2014, to reflect additional comments from Promise Technology. ]

A RAID (Redundant Array of Independent Disks) collects a group of hard disks into a single enclosure to provide great speed, storage capacity, and/or security than is available with a single hard drive alone.

Storage is an essential element of media editing – both audio and video – and I enjoy testing and reviewing the latest storage hardware as it comes out.

Recently, after posting a review of a new RAID from OWC, Robin Harris (www.storagemojo.com) wrote a very intriguing comment:

“On the issue of RAID 0 vs RAID 5 on low-cost arrays: I prefer RAID 0. Based on the best available research evidence, all hard drives have an MTBF of ≈300,000 hours or an annual failure rate of 3-4%. This means that if you use RAID 0 you have less than a 1 in 5 chance of a drive failure in any given year with a 4 drive array. However, low-cost arrays like the ThunderBay only have one power supply and, in my experience, a higher failure rate than any one drive, or even any four drives.

“Therefore, in my experience – which includes working in large storage companies, so I have a large sample – a small array is as likely to have a power supply failure as a disk failure. And when that happens you’ll have to stop work until you get your backup loaded on a new array. So the extra safety of RAID 5 is more apparent than real, and it’s slower than RAID 0.”

Everything I thought I understood about RAIDs taught me that RAID 5 is better than RAID 0. On the other hand, I didn’t have Robin’s experience. So, I sent his comments to storage experts I knew in the industry and asked them to comment on it:

I found the discussion to be fascinating, which is why I’m sharing them with you.

MY QUESTIONS

Based on Robin’s comments, I asked everyone the following two questions:

  1. Which RAID should media producers use – RAID 0 or RAID 5… and why?
  2. What criteria should we use to choose RAID 0 or RAID 5 — assuming the RAID supports both?

COUGHLIN & ASSOCIATES www.tomcoughlin.com

Tom Coughlin heads a consulting company, Coughlin & Associates, that covers the entire storage industry. Tom, himself, is an engineer with over  30-years experience in media and storage technology.

Thanks interesting observations. Here are some comments:

First of all to use RAID 5 you need at least three HDDs. Thus we need to compare 3+ HDD RAID 0 vs. 3+ HDD RAID 5. RAID 0 stripes the data across all the drives and the failure of any one drive will cause significant data loss. The actual MTBF depends upon the drive type. HDD MTBF ratings can vary between 300,000 and 1.2 M. This may improve the odds that you will have a drive failure, but in a RAID 0 configuration if you do have a drive failure you will lose data. In RAID 5 you can lose one drive and assuming that you can rebuild the data from the failed drive on a new drive before another drive fails you will not actually lose any data. A low cost array may have a power supply that would fail more often than a drive but if the power supply fails in the wrong way at the wrong time it could corrupt one of the drives. If that failed drive was in a RAID 0 configuration you would have data loss even if you fixed or replaced the power supply. In the case of RAID 5 if you replaced the bad power supply you could recover the data from that failed drive.

So the data security on RAID 5 is real but there are other ways to make your data more secure. If you want the speed of a RAID 0 system you can have two of these that mirror each other–at double the cost of course. That way if one RAID 0 system goes down you have another copy and can rebuild it on a repaired RAID 0 system to retain the redundancy. Also there are systems with double drive parity vs. single drive parity (RAID 6), so two drives have to fail at one time to lose data, but this probably only makes sense for larger storage arrays. Also if you really want to avoid system downtime (vs. data loss) you may want to have power supply and other critical component redundancy or to have backups or mirrors of your storage system.

So it depends upon what sort of odds you are betting on… In general I suggest having data in more than one place and accessible quickly in case your primary storage goes down.

OWCwww.macsales.com

OWC was founded in 1988 and has long been a supplier of products for the Mac, including manufacturing their own storage products. Larry O’Connor is the CEO and founder of OWC.

Well – I’d hit that a couple ways….

One bit – we already have a lot of customers RAIDing across two ThunderBays with up to 8 drives. This is very stable and easy to do too. Some doing 6 drive RAID 5 and 2 drive RAID 1 mirror even for important archive. In any event – with 6 drives you pretty much max out Thunderbolt 10Gb/s under RAID 5… and with 8 Drives you pretty much get there with Thunderbolt 2 20Gb/s as well… so – performance is at max while still having redundancy where a little additional performance is sought.

As for the whole drive failure bit and power supply mention. The probability of our power supplies failing on any shipping unit today is a tiny fraction of the probability of a drive failure of a drive less than 1 year old and that is the case for years with the power supplies we utilize and applies 3 years from now adding a new drive to a enclosure as well (tiny faction probability to power supply failing vs. a brand new drive failing within first year)… and of course, the probability of a drive failure goes up substantially over time as well. We go overboard with highly rated, robust power supplies with all of our solutions because of the importance there of. I’ve seen competitor multi-Drive 3.5″ based RAID solutions that have a weaker/lower output than our SINGLE drive 3.5″ solutions… and for sure, that is a factor in those crap solutions failing earlier than they would (and other component factors too..) with a better power supply.

With any SoftRaid created array – you can move the drives to nearly ANY JBOD enclosure or even direct SATA bus bays – or a mix – and access / mount / use the RAID set. So – in the event you did have an enclosure failure – as by design we’re open vs. a closed/proprietary – there is great flexibility in accessing the RAID set via other available hardware and of course a replacement enclosure as well.

If I have been working on a project all day and I lose it cause a drive fails – I’d argue there is a high cost for the time and creativity vested that is lost in such an event and that the loss there of is preventable with a RAID 5 instead. Many applications can’t use all the speed that a RAID 0 would give vs. a RAID 5 in the first place too… if you’re working real-time, higher disk I/O isn’t making you get it done faster as real-time is real-time.

The tradeoff of a little performance to have that redundancy seems a very favorable one in my opinion. It’s no different than why driving without insurance is rather unsettling vs. having that safety net. Probabilities don’t matter when you lose something that you put time into… and, while a RAID 5 is not a replacement for a good backup strategy and rather should be a part thereof, a lot of pros aren’t the greatest with timely backups and that’s just another place where the RAID 5 can save someone’s butt.

Fixed hardware raids, by comparison, offer less performance and far less recoverability in the event the hardware itself fails vs. a drive.

Regardless – I’d always go RAID 5 vs. RAID 0 for anything that had work output. A RAID 0 as a scratch disk with the output going to another redundant volume, ok… but with a RAID 0 – it’s just one badly timed drive failure that could cost someone exponentially more than the cost of the actual hard drive that failed. Oh – and of course one other serious SoftRaid feature is the intelligent drive health monitoring and notification. If you’re going to go RAID 0 – it’s nice to the possibility of a decent warning before the worst happens… and in general with RAID 1, 4, 5 too – nice to have the ability to be proactive there as well. Most drive monitoring we’ve seen in other hardware and software is typically telling you the drive has a problem at a point where a failing drive may already be causing issues / slow access / etc – if you have such warning at all. SoftRAID has well developed predictive health monitoring that is far better, in what I’ve seen, for giving a real opportunity to avoid disaster before there are even signs of such occurring.

That’s my opinion. 🙂

PROMISE TECHNOLOGYwww.promise.com

Promise Technology is a global leader in the storage industry, and well-known amongst creative media professionals for its Pegasus2 desktop RAID storage and VTrak RAID storage subsystem – both of which are sold through Apple. The company makes RAID 0, 1, 5, 6, 10, 50, 60 systems.

Elaine Kwok, Product Marketing Manager for Promise Technology, writes:

For prosumer or professional backup, RAID 5 redundancy is much preferred for HDDs, or spinning disks. Anytime one disk goes down, which probability of happening can be quite high, all your data is lost. With a Pegasus2 RAID 5, 6, or 10 solutions, what you have is a robust portable solution with the versatility of ingest from a camera, to the capability of directly migrating this data to a shared storage solution for multi-user collaboration — via Sanlink2 8g FC and the VTrak a3800 storage appliance.

RAID 0 solutions tend to be for single users who are extremely cost sensitive and just require basic storage. While acceptable for SSD solutions, there is a very high risk for data loss with HDDs, or spinning disks. There is minimum protection, and this level of protection is usually fine for the average consumer. Many typical desk to basic storage only utilize RAID 0 to keep costs down. However, any consumer who wants peace of mind without having to make an additional duplicate copy may find the additional investment for a RAID 5, 6, 10 solution worth it for what might be at risk for loss.

The trend is that while video editing teams are no longer in the 20-30’s smaller workgroups are emerging that require shared storage at a subsystem level, with the same kinda of availability. Most consumer grade products can not cover this space. But Promise’s expertise is in the subsystem space with the VTrak x30 – and the VTrak A3800 based on this subsystem addresses this shared storage space. The Pegasus2 with RAID 5 is the DAS solution that can make this linkage.

Victor Pacheco, Director of Field Applications and Support at Promise, adds:

[Back in 2007, I wrote a white paper on using Raid 50 or RAID 60 in high-capacity RAID systems. It’s about the high probability of disk drive failure. Back then we limited the number of spindles per RAID 5 and RAID6 to 16 drives. Now we support up to 24 in our VTrak-series products. Since then our error handling has improved and drives are less and less prone to failure but these do happen.)

[The answer is that it’s] about what the customer is willing to afford.

1. Which RAID should media producers use – RAID 0 or RAID 5… and why?
Promise today only ships with RAID 5 for spinning disk, it also offers advanced configurations with RAID 6 (in all Promise products, including Pegasus series).

RAID 0 is suited for solid state media.

What criteria should we use to choose RAID 0 or RAID 5 — assuming the RAID supports both?

For spinning disk I would say RAID 5 at a minimum. Specifically if the customer does not require the additional bandwidth and can afford to simply reinitialize the array in the event of a drive failure with RAID 0.

G-TECHNOLOGYwww.g-technology.com

G-Technology delivers premium storage solutions for audio/video production, photography and the professional content creation market. The company makes both RAID 0 and RAID 5 systems. Mark Anderson is a solutions engineer at G-Technology.

1. Which RAID should media producers use – RAID 0 or RAID 5… and why?

Whether or not the chance of a power supply failure were as great as that of a drive failure (which I do not believe to be true, though I don’t have data to confirm that), there is still a big advantage to RAID 5 in terms of data protection:

If you have a power supply failure, whether running RAID 0 or RAID 5, you can, for the most part, swap the drives into another like enclosure (or replace the power supply) and continue working without data loss – it will only cost you the time it takes to fix or replace the enclosure.

If, however, you have a drive failure, it’s a completely different story – in RAID 5, your data is still safe, and you can continue working (plus, once you swap the failed drive out, the array will rebuild in the background); in RAID 0, however, you’re sunk. You lose everything since you last backed up. Plus, depending on the form of back-up you’re using (tape, a slower drive or drives, etc…), you also lose the time that it takes to copy all of that data onto a new array before you can get back to work.

2. What criteria should we use to choose RAID 0 or RAID 5 — assuming the RAID supports both?

I choose RAID 5 whenever possible. I would only choose RAID 0 if the requirements of the project I was working on dictated the highest level of performance possible (and exceeded the performance of my system in RAID 5). And if I did that, I’d make sure I also had a relatively fast backup system constantly connected and synchronizing frequently via Chronosync or a similar application.

DROBOwww.drobo.com

Drobo makes a variety of storage products, most of which are RAIDs using their proprietary RAID technology, which most closely emulates RAID 5. Mark Fuccio is their business development manager with long-time experience in the storage industry.

You always ask good questions. I’m happy to share some facts, deductions, and opinions.

Fact: With RAID 0 a drive failure will trigger loss of all data stored on the array.

Fact: Drive failure rates are not constant over time. They have a “bath tub” curve. Young drives fail at a high rate which decreases as the drive is used, seasoned drives that don’t die due to infant mortality have a low failure rate that lasts for years, elder drives have increase.

Facts: Until last year, information on drive reliability has been published in hard to find conference proceedings or obscure journals. Last November, 2013, cloud backup vendor Backblaze published information about drive failure rates they experienced in their data center: http://blog.backblaze.com/2013/11/12/how-long-do-disk-drives-last/ The article clearly shows the “bath tub” reliability curve.

Deduction: Using a RAID 0 puts data at a significantly higher risk of total loss than using RAID 5. Especially in newer arrays.

Fact: In an N drive RAID 0 array, the chance of failure is N times larger than the failure rate of [a single] drive.

Opinion: Unless RAID 0 is needed for performance reasons, I recommend all editors use a RAID 5 configuration for editing. For longer term storage, I recommend RAID 6, or equivalents, that will protect against two drive failures.

Opinion: Editors using RAID 0 must have a robust backup system to protect their work from the inevitable drive failure.

Opinion: Comparing hard drive failure rates to power supply failure rates is a specious argument, its the proverbial comparing apples to oranges. I think Robin is too focused on items that cause service interruption. But there is a world of difference between not having access to a system because of a failed supply and not having access because a drive failure triggered full data loss.

Summary: Use RAID 5 to ensure data is protected. Only use RAID 0 if necessary for performance.

SOFTRAIDhttp://www.softraid.com

SoftRAID is a development company, founded in 1996, that creates software and drivers for RAID systems that are used by a variety of vendors; OWC is one company that uses them. Tim Standing is the president, founder and lead developer for SoftRAID and has been developing software since 1986.

I strongly disagree [with Robin’s statement]. Out of the 25 disks we used for testing SoftRAID version 5, we had 6 fail, all with less than 1,000 hours on them (which is 6 months of 8 hours/day). While 4 of these were the Seagate disks which are now known to be unreliable (http://blog.backblaze.com/2014/01/21/what-hard-drive-should-i-buy/), I didn’t know that when I started using them. In the future, a video editor won’t know that a particular model of disk is unreliable when they purchase them either.

When I do the math, with 4 disks which each have a 4% chance of failing in a given year, one has a 15% chance of losing at least one of them during a given year. The calculation is the percentage chance of not having a problem (96% or 0.96) for each disk multiplied together or 0.96 * 0.96 *0.96 * 0.96 = 0.849 for 4 disks. With a 2% failure rate and 4 disks, we’re still at an 8% chance of having at least one fail in a year.

I have been using RAID volumes for my primary work Mac since SoftRAID started supporting booting in 2004. Twice during that time, I have had a disk fail. On each occasion, I was under deadline pressure, once to get a new version of SoftRAID out and the second time on a deadline for a magazine article. It was so reassuring to be able to just keep working, order the replacement disk and know that I could repair the degraded RAID volume when I had time after the deadline rather than having to go reinstall a new OS, reinstall all my apps and restore my work from the backup on the server.

Robin has been unusually lucky with disks.

One more comment: SoftRAID has the ability to detect when disks are much more likely to fail. This feature, which we call predicted disk failure, is based on a study by Google of over 100,000 disk drives and uses the SMART parameters they identified as being predictive of disk failure (http://research.google.com/archive/disk_failures.pdf). Of the six disks which failed during testing, 4 of them were predicted to fail by SoftRAID before they failed completely. When they were predicted to fail, they continued to function normally for a week or two before they just stopped working completely. The SoftRAID Monitor indicator in the menu bar turns red when a disk is predicted to fail so you know that there’s something wrong.

[Finally,] in the two years of testing and development of SoftRAID version 5, none of the 6 JBOD boxes we have used has experienced a power supply failure.

If you want to take this one step further, I would advocate using RAID 1+0 storage over RAID 5 for truly critical work. If you have the money for two enclosures, say two ThunderBay IVs, you could put each one on a separate Thunderbolt port. This would give you the performance of a 4 disk RAID 0 array and you would be completely protected from disk, power supply and interface failures.

Yesterday, I did a quick benchmark and, yes. an 8 disk RAID 1+0 array does give the exact same read and write speeds as a 4 disk RAID 0 using SoftRAID. I know it’s another $1,000 but if the project is really important, it might be worth it.

We have been listening to customers requests for the past 4 years for additional RAID levels. While a lot of people request RAID 5, about 20% of them have been asking for RAID 1+0. When I asked those requesting RAID 1+0 why it was so important, they typically say that they have used a RAID 5 hardware solution and lost data on it, either from a power failure or hardware RAID controller failure. They then gravitate towards a RAID 1+0 solution because of its inherent reliability and simplicity. I know this is a whole new topic of discussion but it might be interesting for another posting to your blog.

SUMMARY

For maximum speed and given the same number of drives, RAID 0 will be the fastest. However, as you just read, for media creators, the best option is to purchase a system that supports RAID 5.

I enjoyed putting this article together and want to thank everyone that contributed – and, especially, Robin, for getting this entire conversation started.

As always, let me know what you think.


Bookmark the permalink.

5 Responses to What’s the Best Way to Configure a RAID?

  1. Richard Oettinger says:

    Hi Larry,

    There is one key point to this conversation – how much is you data worth?

    I recommend to all of our customers that they use RAID6 for any arrays that will contain data that is valuable because it may be the only copy of otherwise un-replaceable material, such as that captured during a shoot, or a long-term archive.
    The read performance of RAID6 is the same as RAID5, while the write performance is only a small percentage slower, and the cost is only the capacity of one of the array’s drives.
    Editing of a copy of the material can be done on a RAID5 if you desire to squeeze that extra drives worth of capacity out, but using RAID6 to edit should be practical for all but the most demanding applications.

    But the main concerns should be how the array is protected in the event of a drive failure, and the array recovery process.
    Keep in mind that for an array using 3TB or 4TB drives a rebuild could take up to a week for ‘consumer’ type desktop subsystems, and during that time other events could occur that can jeopardize the data.
    To illustrate the advantages of RAID6 over RAID5, let’s assume you have a RAID5 array where a drive has failed, has been replaced, and the rebuild process started; during the period that the array is in a “critical” state where it can’t tolerate the loss of another array member, a second drive could fail to respond, leading to the loss of the data or the start of an expensive recovery process.
    Another aspect to consider is that during the RAID5 rebuild, one or more of the other member drives could encounter uncorrectable read errors for one or many sectors, which will result in the loss of a chunk of your data. That could mean the loss of a fraction of an image or thousands of database records or emails.
    With a RAID6 array – and its extra layer of parity data – your data is protected from both situations. This is why we say that a RAID6 array with one drive member missing is only “degraded” and not “critical”.

  2. LarryJ says:

    Richard:

    Thanks for your comments, they are very helpful.

    My experience is that a RAID 6 configuration lowers performance in both read and write operations by the speed of one drive. However, I agree with you that for most editing (excluding multicam and high-resolution RAW or uncompressed files) the loss in performance is not significant.

    However, there is also the loss of storage capacity that RAID 6 requires. Assuming a 4 drive RAID, RAID 0 stores data to all four drives. RAID 5 stores data to 3 of the 4 drives (the 4th is reserved, as you know, for parity data). A RAID 6 stores data to 2 drives with the other 2 reserved for parity data.

    It becomes a trade off of budget vs. performance vs. security. A RAID 5 is far superior to a RAID 0 or single hard disk, but for many editors, out of reach financially. When editors need the maximum of storage with the maximum of speed with the minimum of cost, decisions get difficult.

    I’m grateful for your time in explaining the advantages of RAID 6.

    Larry

  3. Ken Jansen says:

    Say someone was using an 8bay system and Raid 6, would the read write speeds be somewhere along the lines of a 6 bay drive running RAID 0 – less some percentage of performance for the extra reading and writing? So if the 8 Bay system running RAID 6 the read write speeds might be in the ballpark of 1GBps less some performance for the extra reading and writing – Hypothetically 800GBps or so? I am asking to make sure I understand the article and what would be a good choice for me.

    Also, why would it take a week to have a replacement drive ‘catch up’ or am I not understanding that part correctly?

    Thanks.

    • LarryJ says:

      Ken:

      Hmmm… I hadn’t thought of a RAID in this way… You are essentially correct, an 8-drive RAID 6 would be roughly the same speed as a 6-drive RAID 0. And, for a ballpark speed, I would expect something around 720 MB/s.

      The reason it takes so long to rebuild a new disk to fit into the RAID is that, in order to reconstruct the parity data, every bit in every sector on the drive needs to be read and adjusted. For multiple terabyte drives, this takes a LONG time.

      Larry

  4. Ken Jansen says:

    Thanks Larry,

    I meant 800MBps not GBps 🙂

    so the rebuild speed of a new disk would be bottlenecked by the read/write speed of the new individual disk. So 2TB of data would take (2,000,000 MB / 5 MBps) about 4.6 days to rebuild. Wow that is a long time. I think I understand it now though, assuming the above is correct.

    Thanks!

Leave a Reply to Richard Oettinger Cancel reply

Your email address will not be published. Required fields are marked *

Larry Recommends:

FCPX Complete

NEW & Updated!

Edit smarter with Larry’s latest training, all available in our store.

Access over 1,900 on-demand video editing courses. Become a member of our Video Training Library today!

JOIN NOW

Subscribe to Larry's FREE weekly newsletter and save 10%
on your first purchase.