I just finished doing a review of streaming video encoders, and one of the tested criteria was encoding speed. Fortunately, I had some pretty peppy hardware around, specifically two dual-processor, quad-core systems. One was a 2.8gHz HP xw6600 with 3GB of RAM running Windows XP, and the other a 3.2gHz Apple Mac Pro with 8MB of RAM running Leopard. To get a complete picture of encoding performance, I decided to test both one- and five-file encodes for VC-1, H.264, and VP6. One surprising early result was how inefficiently most encoding tools utilized the eight cores available to them when used in their default configuration. For example, only one tool, Canopus ProCoder, encoded multiple files simultaneously. The rest encoded them serially.
With some codecs, such as H.264, this wasn’t too much of a problem, because a glance at Windows Task Manager or Apple Activity Monitor revealed decent processor utilization—say, in the 30–50% range, depending upon the program. For VP6 encoding, however, no tool other than ProCoder ever utilized more than one of the eight available processor cores, leading to glacial multiple-file encoding times.
For example, on the HP workstation, it took ProCoder 11:56 (min:sec) to render five 1-minute test files into VP6 format, while On2 Flix Pro took 48:10 and Sorenson Squeeze took 50:40. On the slightly faster Mac Pro, it took Compressor 32:20 to complete the same task via Flix Exporter, while Squeeze took 43:30.
Naturally, I started wondering whether these tools could work more efficiently on a multiple-core system. So, I started checking user forums, blogs, and other articles, and here’s what I found, neatly divided into Windows and Mac results.
First, let me tell you the bad news. If you’re encoding with Adobe Media Encoder (CS3 or CS4, Mac or Windows) or Telestream Episode Pro, you’ll take no pleasure in the results reported here. I couldn’t find any way to accelerate the performance of these apps. Second, for the most part, the techniques that I discuss in this article accelerate multiple-file encoding, not single-file encoding. The exception is the technique I used to improve performance with Apple Compressor, which accelerates multiple-file encodes and some single-file encodes.
Let’s start with Windows, where there’s one simple technique that works with On2 Flix Pro, Sorenson Squeeze, and Microsoft’s Expression Encoder 2. Specifically, all three applications let you open “multiple instances” of the program and run separate encoding jobs in each instance. To open multiple instances, you just click the icon (or Start > application) multiple times, and the program opens separately each time.
Figure 1 (below)—perhaps the world’s ugliest screen shot—shows what happens when you run encodes in multiple instances of Sorenson. In this case I have Squeeze running in five separate instances.
The only beautiful part of the image is on the lower right, where you see CPU utilization at 100%. You can see the decrease in encoding time that this technique produced in Table 1 (below). Note that the first three columns (Squeeze, ProCoder, On2) reflect times for VP6 encoding, which is the best possible case, since VP6 encoding is notoriously single-threaded, while the Expression Encoder results shown in column four were for producing WMV files.
To explain the table, the top line is the encoding time for one file; the second is for five files encoding in a single instance of the program. Again, since ProCoder was the only program that encoded multiple files simultaneously, it was far and away the performance leader. The third line is the encoding time for five instances of an encoding application running simultaneously, each encoding a single file. As you can see, producing five files while running multiple instances of Flix Pro and Squeeze took only seconds longer than encoding a single file.
Obviously, this technique pays the most dividends if you’re encoding multiple long files; if your average file duration is 1 or 2 minutes long, it may not be worth the administration time. With Squeeze, if you’re producing one file for multiple formats, it’s also worth a try.
If you’re a Flix Pro user, you may be frustrated by the program’s inability to batch encode a single file to different encoding parameters; it can only render multiple files to the same encoding parameters. Note that you can use this technique to open multiple instances and run unique encoding parameters on each.
With Expression Encoder 2, the time savings weren’t quite that significant, because the WMV codec uses multiple cores during encode. Still, a 50% reduction in encoding time is nothing to sneeze at if you’re on a deadline or simply want to make it home in time to play with the kiddies or spend quality time with the spouse.
On the Mac
On the Mac, you can also run multiple instances of a program, though you have to approach it in a different way. Rather than clicking the same icon multiple times to run multiple instances, you copy and paste the application icon in the Applications folder multiple times. Each time you paste the application, OS X will politely rename it Squeeze “copy,” and at that point you can rename it as desired. You can see this in Figure 2 (below), where I have my multiple instances of Sorenson Squeeze. This also worked for On2 Flix Pro.
Click each individual icon to run each instance separately, then load your source file, choose a target preset, and click the magic Go button. You can see the result in Figure 3 (below), where five instances of Squeeze are pushing overall processor utilization to 97%. As you’ll see later on in Table 2, this technique reduced five-file VP6 encoding time from 43:30 to 7:56 for Squeeze and from 31:30 to 6:22 for Flix Pro. Hold your applause, please; there’s more to come.
Note that you need a fully licensed version of Squeeze to make this work; if you try it with the trial, you’ll get some kind of licensing error. You can make a trial version of Flix Pro work in this manner, however.
With these easy cases handled, let’s turn our attention to Compressor.
Qmaster is an application Apple created to allow Compressor and other rendering intensive applications to utilize other Macs on a network “cluster” to share encoding tasks. In September 2006, Todd Gillespie wrote a wonderful tutorial on how to set up and accomplish this goal that you can read http://www.tinyurl.com/Qmaster.
However, if you have a multiple-core system, you can also use Qmaster to treat the multiple cores on your system as a cluster when rendering in Compressor. It’s a two-step process: First you have to configure Qmaster, and then you have to choose the new cluster for encoding in Compressor.
Start in the Mac Preferences window, and click Apple Qmaster to open the Qmaster configuration window (Figure 4, below). In this window, choose the QuickCluster with services radio button. Note that this is critical, as neither other option will allow you to access the cluster from within Compressor.
Fortunately, it’s easy to tell when you’ve made a mistake, since the “Identify this QuickCluster as” text entry box remains grayed out. This is the next step, and if you can’t insert a name here, you’ve clicked the wrong sharing option. When you get it right, name your QuickCluster something memorable, and click the Options for selected service button (Figure 5, below).
This is where you choose the number of cores available for Qmaster. If you want your Mac totally devoted to rendering, choose all available cores. On the other hand, if you need to work while rendering, select a lesser number of cores to retain some residual cycles for other activities. Then click OK to close this window. When you’re back in the Qmaster configuration screen, select Start Sharing, and close the Qmaster configuration window.
Next time you run Compressor, when you submit the encoding task, be sure to send it to the cluster you just created, not to This Computer (Figure 6, below).
This will run the encoding job on the cluster, and if you check the Activity Monitor while you’re rendering, you should see something like what you see in Figure 7. Note that using Qmaster will not only encode multiple jobs simultaneously, it may also run some single encoding jobs more efficiently, depending upon the codec. For example, single H.264 encoding jobs ran faster since Qmaster could split the work over multiple processors, but VP6 didn’t show any improvement because the On2 codec is relentlessly single-threaded.
Table 2 (below) presents the Mac results in all their glory. In addition to the Squeeze and Flix Pro results reported previously, you can see how Qmaster reduced the multiple-file VP6 encoding time from 32:20 to 6:48.
Qmaster In the Field
A couple of notes from the field, where I recently helped a large corporation that will rename nameless fine tune its streaming encoding settings. One of the bright producers I worked with noted that while you couldn’t run multiple instances of Episode Pro simultaneously, Compressor could call Episode Pro as a plug-in. He then wondered if this workflow, combined with enabling Qmaster, would effectively run multiple instances of Episode Pro. Alas, I just tested this, and it didn’t work. Multiple instances of the Episode encoding task loaded, but only the initial instance actually encoded, and the others sat waiting for their respective turns. Great idea, but again, no joy.
Second, while enabling Qmaster on one of the company’s editing stations, a Mac technician wandering by had some disparaging descriptions of Qmaster that I can’t repeat in this family-friendly magazine. He wouldn’t get specific, and I had been running Qmaster for several months without any problems, but I figured the most likely issue would be instability. I Googled “Compressor Qmaster unstable” and got 210 hits.
One of them was this tech note from Apple: “However, with large, complicated jobs, such as HD H.264 transcodes, your computer may not have enough memory to support multiple services running at the same time. In this case, your computer may become unstable. Be sure your computer has an adequate amount of RAM before configuring it to support running multiple services. A minimum of 1 GB of RAM per service is recommended.” You should definitely take this into consideration when selecting the number of instances to run simultaneously via Qmaster.
I also found a post on the Apple discussion boards called “Why is Qmaster so unreliable,” and several other negative posts on the CreativeCOW boards. With the number of licensed Final Cut Pro users now more than 1 million, it’s not surprising that these types of problems have cropped up. But I’m guessing they’re peculiar to specific scenarios, rather than systemic.
As I’ve said, Qmaster has worked well for me, but check it out on test projects before you take it into production. If you experience unstable operation, return to the Qmaster configuration screen and simply click Stop sharing.
Jan Ozer (jan at doceo.com) is a contributing editor to EventDV and Streaming Media.