[Updated Aug. 8, 2016, to more clearly explain the differences in size between ProRes and H.264.]
I get a lot of questions asking why ProRes files are so big, compared to, say, camera-native H.264 files. While it is true that there is a difference in bit-depth between the files – H.264 is 8-bit, while ProRes is 10-bit – this only accounts for a small portion of the difference. A much bigger reason for the difference in size is that the two files are compressed differently. And, in this compression difference, lies the explanation of why I-frame files are “more efficient” to edit than GOP files are.
Each codec determines whether it will use I-frame, or some form of GOP, compression. This is hardwired into each codec and can’t be changed. So an understanding of the difference can help us make wiser choices about the codecs we use.
GO BACK INTO HISTORY
Go back into the dim reaches of history, back to a time before cell phones, and you may remember a visual recording technology called “film.” (Those of you too young to remember a time before cell phones, ask your grandparents to describe it to you.)
As the illustration above indicates, film recorded each image complete and intact. In fact, if you held a piece of film up to the light, you could see a series of images extending along the film.
Film is the perfect example of recording I-frames. Each image complete. And, if we were to poke a hole in one image, that hole would not impact any of the frames around it.
Now, move forward in time to video tape. Again, as a video tape recorder was capturing images, it was doing so one complete image at a time, though we could no longer see them by holding a piece of video tape in front of a light, because the images were recorded magnetically, not optically.
NOTE: I remember, when I first worked in broadcast television, that video tape recorders did not have the ability to edit. Instead, we had to cut the video tape with a razor blade and scotch-tape the two pieces together. To determine where to cut the tape, we painted the back-side of the tape with a purple solution using a fingernail brush. That purple solution would be attracted to the magnetic lines of force on the tape and, by reading the lines with a microscope, we could determine the start of a frame and, hence, the best place to make the cut.
For both film and video-tape, the entire image was recorded intact, where each frame was not affected by the frames around it.
SHIFT INTO DIGITAL
The good news about film and video tape was that each image was complete and at as high a quality as the recording mechanism could support. The problem, however, was that these files were huge.
NOTE: A one-hour broadcast television program, required 4,800 linear feet of video tape when recorded on either 1″ helical or 2″ quad tape. The 2″ tapes alone weighed close to 30 pounds.
Most digital devices, up until very recently, didn’t have the bandwidth to record even standard-definition media at I-frame speeds. It wasn’t until hard disks were invented that could attach directly to the camera that we had the bandwidth necessary to record uncompressed source images.
NOTE: Uncompressed SD video requires about 35 MB/second for record or playback. While “uncompressed” means many different things in HD, a round number for uncompressed 1080p HD is about 210 MB/second.
To solve the problem of reduced bandwidth, which exists both when recording in the camera or distributing the finished video on the web, a new compression scheme needed to be invented: GOP compression.
THINK ABOUT A CHESS MATCH
To understand GOP (Group of Pictures) compression, it helps to think about a chess match. A chess board always has 64 squares (8 columns and 8 rows). It always has the same pieces, which are always set up the same way on the chess board at the start.
If I wanted to show you a chess match between two grandmasters, I could take a picture of the board after every move. This is what I-frame recording does. Works perfectly, but takes a lot of space to show all those images.
Or, I could take a picture of the board at the very start, so you can see how the pieces are set, then simply describe each individual move. This is what GOP compression does, it only tracks the changes from one frame to the next.
Because we know the starting position of every piece, by following the changes, we can precisely follow the match; provided we follow the match from the very beginning.
The problem with only indicating the changes is that if we join the match in the middle, we are totally lost because reading the changes does not provide the context we need to understand the entire picture. Unless we know the starting image, just reading changes is not sufficient to create the entire image.
To see the entire picture of the game, we need to go back to the beginning, then apply each of the changes until we are caught up.
GOP compression creates very small files, but if we join a file in the middle of a GOP, we can’t see the entire picture.
GOP COMPRESSION
In simple terms, GOP compression divides a video clip into groups of 7 or 15 frames; 7 for PAL and 15 for NTSC.
NOTE: In this illustration, I’m using a 12-frame GOP just to make the illustration easier to read. There are several different lengths to GOPs, but the structure is the same.
Each group starts with an I-frame, which, you recall, is a complete picture.
Next is a “B” frame, a bi-directional predictive frame, that just lists the changes that have occurred in the image from frame 1 to frame 2.
NOTE: You could think of a B-frame as a word processing document, it simply lists how the pixels have shifted between the first frame and the next. A B-frame, on its own, is not an image, it is a highly-compressible text file of pixel changes.
The next frame is also a “B” frame, again, listing changes between frame 2 and frame 3.
Next, comes a “P” frame, a “predictive” frame, which is used to make sure that in listing these changes, we haven’t gone too far off track. Each “P” frame looks back to the “I” frame and forward to the next “I” frame and lists the changes that have occurred between them.
This alternation of “B” and “P” frames repeats until the end of the group is reached. Then, a new group starts with a new I-frame and the entire process repeats.
One complete image, the “I” frame, is followed by a lot of frames simply listing changes, “B” and “P” frames.
THE DIFFERENCE IN FILE SIZE
All versions of ProRes using I-frame compression. All versions of H.264 use GOP compression. It is this difference that explains why H.264 is so much smaller than ProRes.
Granted, other compression factors also factor in here. For example, even though ProRes compresses each frame individually, different versions of ProRes apply different amounts of compression, which is why ProRes Proxy is much smaller than ProRes 4444. My goal here is not to explain all the different levels of compression that can be applied to a video clip but, rather, the differences between I-frame and GOP compression.
Because the file size differences can be dramatic. For example, in ProRes, a 5-second tripod shot at 30 fps where nothing is moving in the frame generates 150 complete images. That same shot in H.264 creates a single image along with a small text file that says “repeat this frame 150 times.” Essentially, the H.264 shot is 150 times smaller than the ProRes shot.
Or, in another example, a 5 second shot where the camera is locked on a tripod and one person walks through the frame, creates 150 complete images in ProRes, but in H.264 only the 20% of each frame that changes as the actor walks through needs to be reflected in the “B” and “P” frames. The ProRes version generates 150 frames, while the H.264 version only creates the equivalent of 30 new frames (20% of the shot). Again, much smaller files using H.264.
This is why it is impossible to predict how big an H.264 file will be prior to compression. The file size changes based upon the amount of movement between frames. If there is no movement – compressing a PowerPoint file filled with stationary slides, for example – the compressed file will be microscopic. If there is lots of movement – a dance recital shot with a hand-held camera – the file will be much, much bigger because the “B” and “P” frames need to account for so much more movement between frames.
THE PLAYBACK CHALLENGE
Because of the way H.264 uses “change documents” for its compression, this presents a significant challenge for playback. Like our chess match, if we start our video at the beginning, GOP compression is very efficient. The files are small, the changes are applied incrementally and in order and we can clearly see the moving image as it evolves over time.
However, again like our chess match, if we join a GOP in the middle, say by moving our playhead into the middle of a clip, we can’t see any image until, behind the scenes, our editing software goes back to the nearest “I” frame and reconstructs all the changes that occurred until we reach the position of the playhead.
With an I-frame video, no reconstruction is necessary, as each I-frame image is complete. Wherever the playhead stops, it can instantly display the complete image.
Matters get much more complicated as we start to place GOP-compressed video on higher layers/tracks, cut clips based on content not I-frames, stack multicam clips on top of each other where the I-frames don’t all occur at the same time.
Behind the scenes, every time we move the playhead, the NLE needs to find the nearest I-frame, and apply the changes until it reaches the position of the playhead.
Worse, when we create an edit in the middle of a GOP, the entire GOP structure needs to be rebuilt, because a GOP MUST start with an I-frame, otherwise simply listing the changes won’t make any sense.
This is the reason older computers, such as Mac Pros, have a very hard time playing H-264 video because they can’t solve the GOP compression fast enough to make editing feel smooth. There’s just too much math and not enough CPU horsepower.
SUMMARY
GOP compression isn’t “bad,” anymore than I-frame compression is “good.” As with most things in video, good or bad depend upon what you are trying to do.
Without GOP compression, we couldn’t record video using a DSLR camera. Without GOP compression, YouTube videos wouldn’t exist.
But, I-frame video will always have better image quality, edit more smoothly, export faster, and play on slower devices.
12 Responses to A Primer: I-frame vs. GOP Video Compression [u]
Hi, Larry–
Thank you for the I-frame v. GOP compression piece. Perhaps I missed it, but I’m not sure you clearly addressed the problem you began with: the size difference between H.264 and ProRes files. I’d love to see that issue directly addressed by you, as well as one or two related matters, like how I-frame/GOP relates to H.264/ProRes.
On one hand, this is just a matter of curiosity for me, but it’s also one of frustration and no little cost. I use FCPX, and I can’t believe that–50 interviews and hours of B-roll into a documentary project–my 3TB archive drive still has tons of room, yet I’ve had to create a 10 TB RAID 0 to handle my optimized media…and I’ve imported less than 50 percent of the material for editing purposes. It cost me $500+ and many hours to investigate RAIDs, buy the drives, and copy over the material to the RAID. The ultimate cause must have something to do with those I-frames and GOPs.
I look forward to your newsletter every Monday and very much appreciate your generosity of time and expertise. Here’s hoping your new business arrangements are working well!
Sincerely,
Rik
Rik:
This is a great question – so I went back into the article and added a new section called “The Difference in File Size” to explain this point more clearly.
And you are correct, the reason you need so much more storage is directly due to the differences between I-frames and GOP compression.
Larry
“This is the reason older computers, such as Mac Pros, have a very hard time playing H-264 video because they can’t solve the GOP compression fast enough to make editing feel smooth. There’s just too much math and not enough CPU horsepower.”
Larry, What are you recommending as new computers? PCs, MacPro’s newer than ?, Trash can Mac Pros? I know a lot of folks, including myself who are still using Mac Pro towers (mid 2010) and the new Mac Pro’s are starting to turn old. My FCPX edits aren’t that complex but I haven’t had any real issues with my edits. Maybe you could give more details about “Old”.
Dick:
“Old” is a very relative term.
If you are able to do good work with your existing gear, there’s no real reason to upgrade. If your system starts to bog down, drop frames, perform so slowly that you are unable to meet deadlines, then upgrading your hardware makes sense.
I agree, the new Mac Pro is getting old and in need of an update. I’m hoping it happens soon; but I have no inside knowledge of Apple’s plans for its hardware.
As for new computers, I am a big fan of the 27″ iMac for video editing. Fast and affordable. For video compression, the iMac is faster than the new Mac Pro due to including hardware acceleration for H.264 compression.
Larry
So Glad I read your reply I use a 27″ mac (mid 2013) and often get the rainbow wheel. So thought to purchase a Macbook Pro and install a 1TB flash drive. I will stick with the mac and look at ways of using the flash drive.
Is this the reason I have to import entire clips from 4K cameras into FCP; can’t just pick a segment of a camera shot? from, say GoPro, or DSLR or the new Panasonic 4K HPX200? FCP can’t let me choose only the segment want becauseI’m probably choosing in and put points in the middle of a Picture Group?
Jim:
This is a good guess, but no.
The only way you can import a portion of a clip is to make a copy of the source clip. Why? Because FCP X most often is simply pointing to the file. Since it is pointing to a file, not transcoding it, you can only import the entire file.
Larry
Larry,
I’m currently trying to determine if an iPhone 7 video file (720HD setting) has been trimmed or is it an original. I noticed the GOP info in “Media Info” would display the GOP line for the original but, not for the TRIM version. I’m finding this is not always the case as sometimes both ORIGINAL and TRIM files can have the BOP line as well. In short, what is the marker or signature an iPhone video file leaves in the data that would let me determine whether it is an ORIGINAL or TRIM file? Lastly, I am aware that the file name would change with an emailed TRIM file. So, I’m looking for something a little deeper in the file… Thanks
Frank:
Good question and I don’t know the answer. It would be great if Apple flagged a file as original or copied, but I don’t think they do. This would be a question best asked of a company that develops video plug-ins for the Mac.
Larry
Hi Larry,
I am currently editing some clips shot on a DJI drone and have noticed a flicker or pulse in the footage, occuring probably at about every 15 frames, which I imagine is the effect of GOP?
I haven’t noticed this before but, looking back over older footage, the flicker is evident in some of these shots too. Is there any way to counter this flicker during recording or in post or is it unavoidable?
thanks very much, your article is the best and clearest explanation of GOP compression I have found after much searching.
Mark P.
Mark:
I don’t shoot with DJI, so I haven’t seen this problem. The flicker would NOT be caused by GOP compression. However, that doesn’t mean it isn’t there.
Take a look at Flicker Free from DigitalAnarchy.com and see if that helps.
Larry
[…] into the technical details on long-GOP versus intra frame codecs, check out these articles from Larry Jordan, Premium Beat, and Frame.io. But for now, here’s the short short […]