The Moving Picture: Everything You Always Wanted to Know About H.264 but Were Afraid to Ask
I recently taught an H.264-specific class at Streaming Media East in New York. Though the class was 3 hours long, I can distill a bunch of useful knowledge into this column if you don't mind missing the sample videos and example screens, not to mention the insightful and surprisingly compelling lecture. Let's begin.
H.264 is today's "it" codec because it was jointly created by the ISO and ITU, two standards bodies that together control the cell phone, television, and computer industries. Throw in the fact that Apple, Adobe, and Microsoft (for Silverlight) have all adopted H.264, and you've got a technology with lots of mojo.
One unique characteristic of H.264 is that unlike WMV and VP6, which are controlled by single vendors (Microsoft and On2, respectively), many different vendors have produced H.264 codecs and their quality varies significantly at lower bitrates when you're pushing the ragged edge of data rate efficiency. At the risk of increasing my hate mail by 400%, I'll state that at aggressive bitrates, Apple's codec in Compressor is clearly the least efficient H.264 codec, especially for HD output. On the other hand, the MainConcept codec in Sorenson Squeeze, Adobe Media Encoder, and Rhozet Carbon Coder and the Dicas Codec in Telestream's Episode line of products are roughly equivalent for both SD and HD output.
When producing H.264 video, you'll have a number of unique encoding options. First is the choice of profile. Briefly, profiles define the types of encoding techniques that can be used to produce the H.264 bitstream. The simplest Baseline profile uses the most basic techniques, which produces a stream that can be played back on low-power devices such as iPods. The Main and High profiles use more advanced encoding techniques, resulting in better quality but a stream that requires more CPU power to decode. The rule here is to use the Baseline profile for devices and the High profile for computer playback.
The next choice you'll face in the typical encoder is the number of B-frames and "reference" frames (B-frames and P-frames being the incomplete frames in a group of pictures used in a compression format such as H.264 or MPEG-2). As you may know, B-frames are the most efficient frames in the group of pictures (GOP), since they can benefit from interframe redundancies found both before and after the B-frame in the stream. P-frames can only look backward for interface redundancies, so are less efficient, while I-frames (also called key frames) are wholly self-referential, making them the least efficient frame. The magic number I recommend for B-frames is three, which creates a sequence such as this: IBBBPBBBPBBBPBBBP, which then repeats. By the way, as with all streaming files, I recommend one I-frame every 10 seconds, or one every 300 frames in a 30 fps file, and always enable the option to add I-frames at scene changes.
Reference frames defines the number of frames within which a B-frame can find redundancies. As you would guess, B-frames find redundancies in the nearest frames, so any value beyond five seems to inflate search times (and the resultant encoding time) without improving quality. Use five for reference frames when given this option.
The final H.264 encoding option that I'll address is entropy encoding, with two techniques available: CABAC and CAVLC. CABAC creates a higherquality, harder-to-compress stream, while CAVLC offers less quality but lower decode requirements. If you're producing for devices, CABAC isn't an option, since it's not available for the Baseline profile. If you're producing for computers, I recommend enabling CABAC and will point out that YouTube uses both the High profile and CABAC in its 720p H.264 encoded videos (check out the example here: www.youtube.com/watch?v=VJ5KJVCc5C4).
As mentioned, H.264 is now compatible with QuickTime and Flash and will be compatible Silverlight when Microsoft releases Silverlight 3.0 sometime during the second half of 2009. If producing for QuickTime playback, you can create either MOV or MP4 files. The Flash player can play these formats plus files encoded into an F4V stream, and Johnny-come-lately Silverlight promises to play files created with all three extensions. Note that all three players can play files produced using the High profile with all advanced encoding options enabled, so no worries there.
How about general configuration options such as resolution and data rate? Largely due to the historical work of a big company that only recently adopted H.264, there's a general impression among streaming producers that H.264 video files don't play back smoothly on low power computers. In my comparison tests, I found that H.264 required less CPU horsepower to decode than either VP6 or VC-1, which you can read about here: www.streamingmedia.com/article.asp?id=10776. Basically, if you're converting from the VP6 or VC-1 codecs, you should be able to use the same resolution and data rate for your H.264 files and produce better looking video that requires less CPU horsepower to play back.
All of this sounds great, but H.264 does have one significant negative, which is an as-yet-undefined royalty obligation starting in January 2011 (www.mpegla.com/avc/avc-faq.cfm). No one knows what the charge will be, and it's at least likely that there will be some de minimis exceptions for small websites. However, if you recommend that your major corporate clients adopt H.264, whether for producing podcasts or general web use, you need to advise them that they'll probably have to send checks to the MPEG LA for the privilege.
Jan Ozer (jan at doceo.com) is currently working on a book on using video on social networking and content aggregation sites to grow your business.