[ Updated Feb. 10, 2019, to bring this current with media today.]
A common question in my email revolves around a question Jerry Thompson asks:
While I am interested in performance and speed between [Thunderbolt and USB 3], I find I am not completely understanding all I need to regarding RAID technology.
Or, as Craig McKenna writes:
[I recently bought] a 120 GB external SSD with Thunderbolt, I’m wondering how you would go about organizing my media.
I’ve spent a lot of time reviewing specific storage products. In this article, I want to take a step back and discuss storage performance in general.
A RAID (Redundant Array of Inexpensive Drives) is a collection of storage devices (hard drives or SSDs) that create a pool of storage this is both very large and very fast. To the computer, and on your desktop, it looks like a single very big, very fast hard drive. Generally, a RAID stores all the drives in a single box with a single connection to the computer.
NOTE: A traditional hard drive is often called “spinning media” to distinguish it from SSDs, which don’t spin.
RAIDs can be controlled using software on your computer or hardware built into the RAID chassis. There are advantages to each. In the past, hardware controllers were the fastest option. Today, they are essentially tied for speed.
To get the best performance from a RAID, it should be attached to your Mac via Thunderbolt. Thunderbolt 2 RAIDs transfer data up to 1,400 MB/sec, while Thunderbolt 3 RAIDs transfer data up to 2,800 MB/sec. RAID performance depends upon the number of drives in the RAID. For the highest speed, use SSDs, for the lowest cost and greatest storage capacity, use spinning media.
RAIDs which are contained in servers are limited by the speed of the Ethernet connection. 1 Gb Ethernet transfers data up to 120 MB/sec. 10 Gb Ethernet transfers data up to 1,200 MB/sec, depending upon the number of drives in the server, the speed of your data switch and cabling, and how many other users are accessing the same server at the same time. In other words, server speed varies.
RAIDs are categorized into “levels,” which describe a combination of speed, redundancy, and price.
NOTE: “Redundancy” is defined as the ability to recover data in the event one, or more, hard drives dies. This won’t protect you if you erase a file.
For the purposes of this example, let’s assume each of the RAIDs below uses 4 TB spinning media drives which transfer data at 150 MB/second. (In general, a single spinning media drive transfers data between 125 – 175 MB/sec, an SSD transfers data around 400 MB/sec, and the new NVMe solid state drives transfer data around 2,800 MB/sec. Faster performance costs more.)
RAID 0 – Fast, inexpensive, but no data redundancy. Requires a minimum of two drives inside the RAID enclosure. The more drives you add, the faster the performance, as performance and storage capacity are the sum of all drives in the RAID. However, if you lose one drive, you’ve lost ALL your data. Most often used when speed combined with low cost are paramount. In our example, a 2-drive RAID 0 would have 8 TB of storage and transfer data around 300 MB/sec.
RAID 1 – Complete data redundancy. Generally only uses two hard drives inside the RAID enclosure. Often called “mirroring,” each drive is a complete copy of the other. Most often used for backing up servers or when on-set for DIT media work. Has the speed and capacity of the slowest single drive in the system. In our example, a 2-drive RAID 1 would have 4 TB of storage and transfer data around 150 MB/sec.
RAID 3 – Medium-fast, data redundancy. Requires a minimum of three drives, as one drive is reserved solely for parity data. Should one drive die, your data is safe. This technology is no longer in common use, replaced by the faster performance of RAID 4 or 5 systems.
RAID 4 – Very-fast, data redundancy. Similar to RAID 3, requires a minimum of three drives, as one drive is reserved solely for parity data. Should one drive die, your data is safe. This is the preferred RAID format for SSD drives because of how the data is stored on the drives. When compared to a RAID 5, RAID 4 with SSDs is about 25% faster on reads. Performance is based on the number of drives in the system.
RAID 5 – Very fast, data redundancy. Requires a minimum of three drives and shares parity data across all drives. Most often found with four or more drives inside. If one drive dies, your data is safe. This is the preferred choice for RAIDs containing spinning media. These are used for both locally-attached storage and servers. Performance is based on the number of drives in the system.
RAID 6 – Fast, extra data redundancy. Requires a minimum of four drives. This version protects your data in the event two hard drives die at the same time. More expensive than RAID 5, but, generally, the same physical size. Like the RAID 5 this is most often used connected to just one computer. Not as fast as a RAID 5. Performance is based on the number of drives in the system.
RAID 10 (or 1+0) – VERY fast, totally redundant. Requires a minimum of four drives, but is more often created by combining two matched RAID 0’s into a RAID 1. This provides the speed equivalent of a RAID 0, with the data redundancy of RAID 1. As RAIDs continue to drop in price, this can be a less-expensive way to create systems that rival the performance of a RAID 50. Performance is based on the number of drives in the system.
RAID 50 – VERY fast, data redundancy. Generally the domain of very large RAIDs, this format combines the speed of RAID 0 with the redundancy of RAID 5 by dividing the RAID into sections, where you can lose a drive in each section without losing data. These systems generally cost more than $10,000 and contain at least twelve drives. Generally used in network and server situations where multiple users need to access the same data.
RAID 60 – VERY fast, extra data redundancy. Generally the domain of very large RAIDs, this format combines the speed of RAID 0 with the redundancy of RAID 5 by dividing the RAID into sections, where you can lose two drives in each section without losing data. These systems generally cost more than $10,000 and contain at least twelve drives. Generally used in network and server situations where multiple users need to access the same data. Performance is based on the number of drives in the system.
NOTE: Drobo is a special case. In general, all RAIDS must use drives of the same size and speed. As well, all drives need to be installed at the time you first power up the system. Drobo, on the other hand, has invented a technology which allows you to add drives, or mix and match drives of different sizes, even after you’ve put the RAID into operation. While Drobo does not provide the fastest RAIDs, this flexibility can be a significant benefit.
SIDEBAR: HOW DATA REDUNDANCY WORKS
This is so cool… This works because all digital data is stored as either a 1 or a 0.
Imagine a 3D checkerboard — let’s make it 5 stories high. Look down on the top left square and count the number of checkers on that square for each of the top four layers.
If they total an odd number, put a checker on the same square on the bottom layer. If they total an even number, don’t put a checker on the same square on the bottom layer.
Now, remove the second layer with all it’s checkers, and put in a new, empty checkerboard to take its place. By counting the number of checkers on the remaining top three layers and comparing the total to the indicator on the bottom layer, you can exactly rebuild all the missing checkers on the second layer. For example, if the total of the other three layers is even, and there’s a checker on the bottom layer, add a checker to the new layer. If the total of the other three layers is odd, and there’s a checker on the bottom layer, don’t add a checker to the new layer.
This is exactly how RAID redundancy works. Except each checkerboard represents a hard drive in the RAID. The bottom layer, which provides data redundancy, doesn’t need to know which drive failed, it only needs to compare the totals on all the different hard disks with the total stored on the redundancy disk in the RAID. This technique works whether you have three drives – the minimum – or twenty drives. The only difference is that more drives take longer to count.
An SSD (Solid State Drive) drive is essentially RAM that has been configured to act like a regular hard disk. You copy and move files around in it the same as a hard disk. Unlike RAM, an SSD remembers your data when the power is turned off. Depending upon which version of the operating system you are using, an SSD drive ranges from “so-so” performance to blinding. Later versions of the Mac operating system do a much better job supporting SSD drives. In fact, the new APFS file system from Apple is specifically designed for SSD storage. (Here’s a link to learn more about APFS.)
The big benefit an SSD provides is speed. Its two big limitations are cost and limited storage size.
While you can put an SSD drive anywhere you can put a “normal” hard disk – which we often call “spinning media,” the best place to put an SSD drive is inside your computer as a replacement for your boot drive. SSDs in RAIDs are very fast, but also much more expensive than spinning media and they don’t hold as much.
If performance is critical, create a RAID using all SSDs. If storage capacity is most important, create a RAID using spinning media.
NOTE: There is a limitation of SSD, however, in that it only allows a certain number of read/writes before the unit starts to fail. While the overall longevity of SSD is still being determined, for now, assume that you will need to replace an SSD drive sooner than a spinning media drive – probably after 3-4 years of normal use.
iCLOUD, and other Internet services like DropBox and YouSendIt, are essentially file servers that store your files outside of your computer.
If we ignore issues like file security, these services are excellent for backing up data, sharing files between devices, and moving files between computer systems. However, they are not good for storing source media files for editing. It isn’t because they don’t store enough. Just the opposite, these services can store a vast amount of data. The problem is that the connection speed – called the “data transfer rate” – between your computer and the iCloud is too slow. Video editing requires data transfer rates far beyond anything supplied by even the fastest DSL or cable modem.
Use the Cloud for sharing, but not for storing or editing source media files.
New Cloud-based services are appearing which allow media editing in the Cloud by using proxy files. These can be helpful to remote editors, but the challenge remains in how long it takes to upload media to the Cloud.
WHAT IS THUNDERBOLT?
Thunderbolt is a method for connecting monitors and hard disks to your system. In this regard it is just like FireWire or USB – its a cable and communication protocol that move data to and from your computer and storage.
The big benefit to Thunderbolt is that it is REALLY fast! More than 2 GB/sec of data transfer speed! However, in order for that speed to be realized, you need a REALLY fast RAID. A two-drive RAID 0 won’t begin to fill a Thunderbolt “pipe.”
Thunderbolt is how you connect your drive to your computer. The speed you get will depend upon the speed of the RAID you have attached. Here are some very general expectations for data transfer:
NOTE: A single drive connected via Thunderbolt will be only marginally faster than the same drive connected via Firewire. In order to see significant performance improvement, you’ll need to use a RAID that contains at least four hard disks.
GETTING THE BEST PERFORMANCE
For best performance, I recommend purchasing your computer with an SSD as the boot drive. (Fusion Drives are a good alternative, where an SSD is combined with a spinning hard drive. This yields excellent performance with extended storage capacity.)
In general, media should not be stored on your boot drive. This means that only applications and the operating system are stored on the boot drive – along with other files that tend to be small, like email or word processing documents. If you have a large iTunes collection, or large iPhoto library, moving them to an external drive may allow better performance.
If I were setting up a new system, I would get a Mac with a SSD drive as the boot drive, and a Thunderbolt 3 RAID 5 drive for media and project files.
My current boot drive uses 148 GB to store all applications and operating system files. I have hundreds of apps which don’t take a lot of storage. So, you don’t need to get a gigantic SSD drive – 250 – 500 GB is more than sufficient. I recommend 500 GB, currently.
My media RAID, though, can’t be big enough. I currently have about 150 TB of storage spread across five RAIDs. I’ve learned that hard drives have two states: empty or full. Any new RAID will be as big as I can afford at the time.
This configuration provides a huge speed boost for the operating system and applications, while providing extremely fast access to huge amounts of media, with full redundancy in case of drive failure. This setup also offers a good balance between price and performance.
FOR MORE INFORMATION
Here is an article that explains hard disk and RAID performance and video formats in more detail. I highly recommend you read this article to understand the speeds you can expect from a storage device, how much space it takes to store media, and the data transfer rates of popular video codecs.
14 Responses to RAIDs, SSDs, iCLOUD & Storage Performance
Some further options/perspectives:
1. Network-Attached Storage (NAS). Too much latency? Further issues? Specialised video ones exist, that avoid these?
2. eSATA connected drives e.g. GRaid-Mini or Lacie 4Big
Regarding Thunderbolt, I have read that its speed advantage is much reduced when data is being read-out non-sequentially, as I guess would happen for a project built on a set of video streams (multi-cam, PIP, ..). But if I’ve mis-construed anything there then I would certainly like to know! I have documented my grubby research on such topics at http://blog.davidesp.com/, under title: “USB3, eSATA, Thunderbolt: Comparison: I like the look of eSATA”.
Thunderbolt is just an interface (a form of PCI-e). Even within an external Thunderbolt enclosure with one drive, the drive is connected to the Thunderbolt chip via SATA. I believe in your case, it’s not Thunderbolt that’s have difficulty reading out non-sequentially, it’s the drive within the Thunderbolt enclosure. An SSD will be much better at this than a spinning hard drive. Same thing with USB3 enclosures – they connect to their internal drives via SATA. As Larry mentions above, you’ll only see the speed gains of USB3 and Thunderbolt with RAIDs. it should also be mentioned that RAIDs are terrible at reading and writing small files, especially non-sequentially. They are designed for large sequential files like video.
Larry, loved your explanation of data redundancy. I’ve contemplated it for a long time without really understanding it, your explanation really “turned the light on!” Have you used RAID boxes from Sonnet? We bought a bunch and like ’em a lot. http://www.sonnettech.com.
Thanks for the kind words. No, I have not yet had the chance to work with Sonnet gear.
Larry … I will follow your recommendation as given above! Does the new Fusion Drive announced yesterday reflect your thinking on separating the boot files from the FCPX footage? Are there any product recommendations for a good RAID product on the market or will we see new ones short term ? WD and LaCie have RAID 0 and 1, Promise seems to have issues on product reliability based on users comments in teh Apple Store …
Thanks for advice as always!
G-Technology and Drobo have both announced RAID 5 Thunderbolt products. More will be coming shortly. Apparently, the technology is very hard to implement.
And, while the Fusion drive is very fast, using two separate drives for video editing still makes a great deal of sense.
Thanks Larry for this excellent article! It answered all of my questions without me having to go through other forums with hundreds of threads! Best, Steve
I don’t earn my living as an editor, and can’t afford to buy a Thunderbolt RAID. I’m looking at a single Toshiba USB3 drive that runs at 5700RPM for $120 at Amazon. Is that going to be a problem? Thanks as always, Bob.
Why is it worth it to spend the money on a RAID-5 (for the redundancy) rather than have a RAID-0 with backup, like with Time Machine?
if a drive fails in your RAID-0, you can’t switch to your Time Machine backup and keep working. You have to restore the Time Machine backup to something before you can use it.
In my home system, i didn’t want to take up a card slot with a RAID controller. so i have 4 internal drives in a Mac Pro in a RAID-0 for speed, and an external drive backing it up. still not ideal, but if i lose a RAID drive, i can still work off my backup if i need. If i could spare the slot, I would have gone with a RAID controller and RAID-5. Much more reliable.
Everything is very open with a precise explanation of the issues.
It was definitely informative. Your website is extremely helpful.
Thank you for sharing!
I would appreciate your advice. I’m working on a no budget doc. I have 5 years of content backed up onto two GRAID 0. Just learned it’s not for long term storage so looking to buy new drives. I need about 12 TB of storage for all the content. Leaning on buying 2 and shipping one elsewhere vs. buying one more “dependable” one. Your article is from 2012 but very helpful. Wondering what your advice would be today in my case? Assuming still rec, RAID vs. GDrives? Leaning towards 2x Raid 1 vs. 1 Raid 5 or Raid 6. For the extra cost and research it seems people rec. Raid 6 over 5? Then there’s raid 10 but unclear about the benefit of it. Safety of content is my greatest priority. Please advise. Thank you!
12 TB is far too much media to fit on a single drive. So, a RAID becomes necessary. RAID 1 provides built-in data duplication – but reduces the total storage available. Also, you’d need to purchase multiple RAID 1’s to store all your data. As you’ve discovered, avoid RAID 0. And RAID 10 will be too expensive and well beyond what you need.
RAID 5 is a good choice – especially because you want to buy two. This is a decision I support.
12 TB RAID 5 systems are affordable and use very solid technology. 4 TB hard drives – which would be in that system – have been in the market a long time and are as reliable as any hard disk today.
Look at drives from Promise Technology and OWC; though other brands are also good. Don’t buy a no-name brand just because it is cheaper. You want a company that will be around for a while.
Nice update, Larry. However, in my opinion, in any discussion of media management shouldn’t the word “backup” appear at least once? For editing I have a 4-drive RAID 5. All Camera Original material is immediately backed up onto a RAID 1 at the same time media is ingested onto the RAID 5. (I use ShotPut Pro to do that.) As the drives in the RAID 1 fill up I swap them out for new ones. I remove “old” media from the RAID 5 when I need more space. Theoretically, I could re-use some of the oldest archived RAID 1 drives, but I haven’t done that yet. My weak link is that I don’t backup other project elements that are on the RAID 5. My bad.
I do have a question for you. Since I do have the RAID 1 backup, I’m thinking of converting my 4-drive RAID 5 into a RAID 0. What sort of performance increase do you think I would get? In other words, is it worth it?