Storage is like your heart. You take it for granted until something goes wrong. At which point, nothing else matters.
I’ve been thinking a lot about storage recently. Two days ago, I presented a webinar on multicam editing in Final Cut Pro X. Nothing demands more from your storage system than multicam editing. Then, yesterday, I attended the Creative Storage Conference in Los Angeles. Finally, today, our podcast – Digital Production Buzz – spent the entire show talking about backups and archiving.
It is impossible, in a blog, to summarize the entire state of storage today. (I’m not sure I could even do it in a book – the subject is so vast and changes daily.) So, think of this as a snapshot of where we are today, along with trends that media professionals need to pay attention to in the near future.
WHAT WE NEED TO CARE ABOUT
In general, we will spend far more for storage than we will spend for a computer. Because storage is so essential, make sure to pick storage for its performance not just its ability to meet your budget. Storage that is too slow or too small is just wasted money.
When it comes to storage, there are two types of data that are stored to a hard disk: structured and unstructured. Structured data means databases, where the files are organized in precise ways. Everything else is non-structured data. Within the unstructured group are large files, like media, and smaller files, like word processing documents.
When it comes to handling large, unstructured media files, three specs are most important:
If you are doing standard media editing, the first two – capacity and bandwidth – are the most important. When we start doing multicam editing, seek times become relevant. The more cameras you edit at the same time, the faster the seek times need to be to keep up with playback.
There are four broad categories of storage technology that we can use:
There are also RAIDs, which are collections of either spinning media or SSD drives, and hybrid systems, which are a combination of two or more of these technologies.
Though some were projecting that spinning media was hitting a wall in capacity, traditional hard disks continue to hold staggeringly large amounts of data and are projected to grow for the next several years.
However, that doesn’t mean that drives last forever. They don’t. And drives from different manufactures die at different times. BackBlaze provides Cloud-based backup services, more on that in a minute. Recently, they published a blog detailing failure rates for various hard drives in capacities from 3 TB to 8 TB, based on the 82,000 drives they use in their server farms. The results are fascinating and worth reading. (Click here.)
NOTE: You can also download their statistics to analyze further, if you want.
In general, for video editing, I recommend at least a two-drive RAID, configured as RAID 0 for best performance. Ideally, if budgets permit, a 4-drive RAID 5 (or a RAID 5 with more drives) will deliver high-capacity, high-bandwidth performance along with data redundancy in case one drive dies.
NOTE: Here’s an article explaining the differences in RAID levels.
Tim Standing is the Chief Technology Officer for SoftRAID. This illustration is from his presentation yesterday illustrating that standard spinning media fails either when first turned on, or after several years of use. This is why RAIDs are so popular.
Losing data is a terrible feeling.
SSD prices continue to fall, though they have not yet equaled the price of spinning media. However, if you are doing multicam work, I strongly recommend you use an SSD RAID to store your multicam source files. SSDs are roughly three times faster than a spinning media system and they have no seek times – which makes them ideal for the simultaneous playback needs of multicam editing. They don’t hold as much, though, so think about storage capacity carefully before purchasing an SSD RAID.
PREVENTING DATA LOSS
SoftRAID recently announced a new application – currently in beta – called “SmartALEC“.
“SMART Alec works in the background, constantly monitoring & checking your disks, and warning you if any disks are faulty, or about to fail. With SMART Alec’s advanced warning, you’ll have plenty of time to replace a bad disk and keep your data safe.” (SmartALEC website)
The good news is that this is developed by SoftRAID, a company that’s been deeply involved with creating storage software for years. The even better news is that a version of SMART Alec will be available free from the Apple App store after launch in the next few months.
Sign up for the beta version here – I already have.
One of the buzzwords at the Conference was “object-oriented storage.” This began at the enterprise level (meaning it cost BIG bucks!) but is slowly migrating into The Cloud and local storage.
Traditionally, when we write data to a hard disk, we are storing a file into sectors on the hard drive. This works fine, but doesn’t scale easily. If we have hundreds of thousands or millions of files, file-based storage breaks down. And finding the right file out of millions becomes really challenging.
Lest you think that only studios need to worry about this many files, I was stunned to learn recently, that I’m storing over 1.7 million assets across my three computer systems. Almost two million files on about 90 TB of storage!
Wow…. I had no idea.
Object-oriented storage was invented to solve the problem of constantly needing more space, while still able to track millions of files. The best analogy about how it works is when you drive your car up to a valet to park. He hands you a ticket, then parks your car. You don’t know and don’t care where your car is parked, as long as you get it back safely when you hand the valet your ticket at the end of your meeting.
Behind the scenes, there could be one parking garage or ten. They could even be building new parking garages and you wouldn’t care. Whenever you needed your car, you handed over your ticket and, like magic, your car would be delivered.
That’s the idea behind object-oriented storage. We stop caring where our files are stored. Instead, whenever we need them, we hand the operating system a “ticket” and our file shows up (or is saved).
What I learned yesterday in talking with the team at Caringo is that object-oriented storage is infinitely scaleable, with built-in support for access via the web. They were demonstrating it at their booth at the Conference. But, for the moment, it isn’t cheap.
Short-term, it is too expensive to implement on the desktop, though Symply – gosymply.com – is working hard to do exactly that. I am also seeing access to object-oriented storage migrate to the desktop with the announced-but-not-yet-shipping Cloud File Gateway from XenData.
TRACKING FILES, MACHINE LEARNING AND METADATA
We are generating too much stuff for us to keep track of everything in our heads. As the number of files we need to track explodes into the millions, the only way to manage them is via metadata and asset management systems. (And, as one speaker said at the conference yesterday, metadata does NOT mean making the file name longer.)
Philip Hodgetts reminded me tonight on The Buzz that the reason asset management software exists is to help us associate metadata with the media it describes. The problem is that I do NOT want to personally enter metadata for a million clips. Instead, Philip said, what we will see this year and even more next, is using “machine learning,” (also called “artificial intelligence”) to create metadata by recognizing the content in an image or the text in an audio file.
For example, companies like Digital Heaven – SpeedScriber – and Digital Anarchy – Transcriptive – have released software that automatically converts audio files to text. Now, producers can search media clips by typing text into a search box. This information can then be linked to the original media file so that we can search for those clips that contain a particular piece of text or person in the image.
Sam Bogotch, CEO of Axle Video, told me yesterday at the Conference that when Axle was released in 2012 they were delighted that it tracked up to 30,000 assets. In contrast, their newest version, released at NAB this year, now supports over 2 million! In fact, one of their most popular new features is Axle AI.
“Based on axle Video’s radically simple browser user interface and a visual analysis and search engine licensed from Visual Atoms, a UK developer of deep learning software, axle ai helps video postproduction teams bypass the laborious task of entering detailed metadata about every scene. Instead, users simply select a frame from a video, or grab an image from any onscreen source including web pages. axle ai is then able to rapidly identify a ranked list of media clips, as well as the exact segments in those clips, whose contents most closely match the image.” (Axle press release announcing the product)
For the last two years, I’ve been working with the team at Axle Video to implement a working media asset management system. I’ve had all kinds of problems, but remain impressed with their willingness to give me a hand and improve their software. I take personal responsibility for getting them to support 2 million assets. That’s because I crashed their system trying to catalog the 1.7 million assets on my system.
If you are thinking of implementing a digital asset management solution (DAM), here are a few tips, based on my experience:
I’m still trying to figure out the best way to manage the assets I’ve got, because what I’m doing now isn’t working.
BACKUPS AND ARCHIVING – LOCAL
The most important thing to remember is that all hardware dies; generally, at the worst possible time. If you don’t have your data backed up, its your own darn fault.
RAIDS prevent against data loss in the event a hard drive dies. But even a RAID won’t help you if you delete the wrong file by accident. That’s why you need backups.
LTO tape is the current leader for long-term backups and archiving. This technology was developed by a consortium of three companies: HPE, IBM and Quantum called LTO Ultrium. The current version of LTO technology is LTO-7, which holds 6 TB of data on one tape.
While the LTO drives are expensive – an LTO-7 drive costs from $3,000 – $5,000 – a tape cartridge (~$100) is much cheaper than constantly purchasing new hard drives. Essentially, once you have the drive, you create a virtually unlimited storage capacity simply by adding another tape at a cost of about 2 cents per gigabyte.
Last year, LTO added a new feature called: LTFS (Linear Tape Filing System). This allows an LTO tape to mount to the desktop where you can drag and drop files to it. While this is convenient, Macintosh, Windows and the various flavors of Linux have problems with it. For instance, whenever you open a folder in the Mac Finder, the Finder writes a small, invisible file into that folder detailing where icons and other screen elements are located. With a hard disk, writing a small file is no problem. With tape, the tape drive needs to shuttle to the end of all recorded material, record the file, then shuttle back to the location you are viewing. This process is called “shoe-shining” because of the whooshing sound the tape makes rushing back and forth and you open different folders on the tape.
As long as you don’t try to view the contents of a tape on your desktop, you’ll be fine. But if you treat a tape drive like a hard disk, you’ll get very frustrated, very quickly.
The other problem I have with LTO systems – aside from the cost of the drives – is that every two years new drive technology is released. This is clearly detailed on the LTO roadmap, but it also means that choosing to store files on LTO technology means that every ten years or so, I need to purchase the latest LTO drive and migrate (copy) all my data from the old format to the new format. At 6-8 hours per tape, this will take a while.
This means that we need to budget for long-term archiving hardware and staff time in order to actively manage our library of archived media.
THE HIDDEN TRAP OF TECHNOLOGICAL OBSOLESCENCE
As if trying to figure out what’s the “best” hardware to use for longer-term storage wasn’t bad enough, we also have to deal with the fact that programs, file formats, even codecs go out of style and are no longer supported.
The most recent example is Apple’s decision to end support for QuickTime on Windows. This is devastating to everyone creating media – regardless of platform – because one of the most ubiquitous media containers on the planet just died.
But, you only need to look at all the files stored on your hard disk that you can no longer open to realize that this is an on-going, critical problem. What’s the sense of storing your amazing project for 50 years if you can’t open it when you restore it?
The folks at Digital Bedrock have developed a system to track the file formats and codecs used by files that they archive. Then, when one of these is announced as “end-of-life,” Digital Bedrock will notify you so that you can respond while there’s time to work out a Plan B. (I’ll have more on them in a minute.)
BACKUPS AND ARCHIVING – CLOUD
As I was chatting with the folks in the BackBlaze booth yesterday, they pitched me on the idea of backup and archiving my data to The Cloud. The benefits are that it is infinitely scaleable, I never need to buy more storage for backups, yet I still have access to all my files.
The price is based upon the amount of data stored on their servers and whether you are using them for backup or archiving. Backups can be automatically scheduled, with a variety of other services for $50 per year for their Business Backup service.
Or, if you need longer term storage, their B2 Cloud Storage costs $5 per month per terabyte, which is about 25% the cost of Amazon storage.
NOTE: The four biggest public cloud storage vendors are Amazon, Microsoft, Google and IBM.
The big problem with Cloud storage is the speed of your Internet connection. For any of these storage services to make sense, you need a minimum upload speed of 10 mbps. For many, that speed is easily achievable. However, for others, like me, even though we live in Los Angeles, my upload speed tops out at 1.3 mbps. FAR too slow for any serious Cloud-based backup or archiving!
NOTE: A great way to test your Internet connection is using SpeedTest.net.
AN ARCHIVING HYBRID
Digital Bedrock has developed a hybrid approach combining Cloud access with LTO storage to create something very useful to smaller shops.
When you are ready to archive footage, they will send you a 60 TB hard disk. You process your files through their file logger/asset manager, copy them to the hard disk, then ship the hard disk back to them. This solves the problem of Internet access speeds.
Digital Bedrock then transfers the files to three copies of LTO-7 tape. One set is stored on the West Coast, one on the East Coast and one is sent back to you. Even in the event of a natural disaster, there is a high likelihood that at least one set of data will survive.
Another excellent protection is that at no time do your files go online, so there is no chance for them to be corrupted, hacked or stolen.
At the same time they create the LOT-7 tapes, they load your list of files, along with metadata tracking formats and codecs, into their web-based database. This allows you to see, search and manage the files you have in archive storage, but, as this is only metadata and not the files themselves, your source files are secure.
I took a tour of their facility recently, along with interviewing their CEO and I’m impressed with the system they’ve developed. I’m especially impressed with how well they’ve thought out the issue of security, yet made the entire process affordable for smaller shops and individuals.
With almost daily reports of hacks, malware, stolen user names and ransomware, these are dangerous times.
NOTE: If you want to get depressed, take a look at this list of companies that have been hacked recently. It numbers in the thousands…!
The best way to protect yourself is to avoid connecting critical systems to the Internet. But, these days, that’s not possible for just about all of us.
I am still leery of storing sensitive files on a remote server in The Cloud. I like the accessibility, but I don’t like placing blind trust in the quality of an unknown Cloud vendor’s IT staff. (This probably comes from watching Jurassic Park at an impressionable age.) I use the Cloud daily for training files that I distribute. But, I don’t use the Cloud for anything that hasn’t been released.
As I talk with vendors and industry experts, I get a strong impression of a media industry in both crisis and transition. Our need for storage is growing far faster than technology is able to supply products to meet the need. In fact, like the weather, if you wait a minute, things will change.
At the non-enterprise level, there’s isn’t a single unified solution with consistent standards, smooth interoperability, strong security and reasonable price. Instead, we are forced to create our own custom storage, backup, and archiving solutions from a rapidly evolving collection of standards, vendors and products.
There are many different solutions out there, however, no one solution will work for everyone. The problem is that we can not risk our data by waiting. So, here’s my current thinking:
But I don’t. So I’m still trying to figure out what needs to be stored where. As always, I’m interested in your comments.
NEW & Updated!
Edit smarter with Larry’s latest training, all available in our store.