DV on HT Time
Posted Apr 1, 2003

Intel's new hyper-threading processors promise enhanced performance for software-only video, but taking full advantage of the technology depends on vendors' commitment to optimization.

April 2003|If there's one thing constant in the field of digital video, it's that you can never have too much processing power. "When was the last time you heard someone say, ‘The thing I don't like about this application is that it's too fast, too powerful?'" asks Ioannis Katsavounidis, software manager for InterVideo, makers of WinDVD and WinDVD Creator. "It doesn't happen." Whether it's capture, transcoding, effects preview, rendering, or formatting a DVD disc image, there's always a need to do it faster and better—and hopefully at lower cost.

For years, the standard approach to satisfying this appetite for power, at least for the professional video market, was to handle processor-intensive tasks with specialized hardware, using the host CPU for less-demanding tasks such as user interface and control. Meanwhile, for those who couldn't afford hardware-based systems, software-based processing meant doing without some processes and waiting a long time for the results of others. As CPU throughput has accelerated, however, software-only systems have become increasingly capable, particularly when run on dual-processor machines.

The latest ammunition for the software-only insurgency is Intel's introduction of "hyper-threading" processors for desktop PCs, which potentially boost performance by up to 25%. The keyword, of course, is "potentially." The promise of hyper-threading is real, but the extent to which end-users will see the difference in their daily video routines depends not only on what tasks they are trying to accomplish, but also on how each task is implemented in their toolset.

Virtually Distinct
While hyper-threading—dubbed "HT" for short—isn't exactly new, it's previously been available only on Intel's Xeon processors, used primarily in high-end servers. Late last year, Intel started enabling the technology in Pentium 4 CPUs, specifically 3.06gHz chips aimed at top-of-the-line desktop machines. Requirements for using the P4's HT capabilities include an HT-enabled chipset and BIOS and an operating system that includes HT optimizations, either Microsoft Windows XP Professional or certain versions of Linux. A new "Intel Pentium 4 Processor with HT Technology" logo is being used to indicate systems that are HT-ready.

As described by Intel, the impetus behind HT is the realization that "clock speed is only half the story. Faster clock speeds are an important way to deliver more computing power...[but] the other route to higher performance is to accomplish more work on each clock cycle." To do this, HT takes advantage of multi-threading, which is the ability of software to subdivide a process into "threads," meaning tasks that can be scheduled and run independently. "A thread," Intel explains, "shares code and data with the parent process but has its own unique stack and architectural state." The resources needed to execute a thread are allocated by the operating system when the thread is scheduled, then freed when the thread is completed.

Multi-threading really comes into its own on a multiprocessor system, where the threads can be executed on different physical processors. Without multiprocessing, a PC is like a one-lane road. If one thread is waiting for I/O to complete or for another task to provide information it needs to move forward, other threads can get backed up behind it. When threads can run in parallel, on the other hand, traffic in one lane can keep moving even when there's a slowdown in the other. Thus, multiprocessing can maximize the overall throughput of multi-threaded processes by minimizing the impact of bottlenecks.

Intel has integrated HT into the Pentium 4 to provide the benefits of multi-threading even on machines with only a single physical processor. "HT technology allows a single Pentium 4 processor to function as two virtual or logical processors," Intel says. "The processor can execute two threads simultaneously, use resources that otherwise would sit idle, and get more work done in the same amount of time."

Intel doesn't achieve HT's "two-in-one" trick by squeezing two physical processors into a single chip. Instead, the chip has the ability to simultaneously handle two "architectural states," each of which constitutes a logical processor. Each state has its own set of general-purpose registers, control registers, and other elements needed to track the flow of a program or thread. Other than architectural state, however, all processor execution resources—the units on the processor that perform work such as addition and multiplication—are either shared or partitioned between the logical processors. The performance boost comes because HT can keep those execution resources busier than in a standard processor.

Despite the greater efficiency, the resource sharing inherent in HT limits its effectiveness compared with true multiprocessor systems. "Because hyper-thread pipelines within a single CPU are not independent and must share certain critical logic resources, hyper-threading is by no means the performance equivalent of multiple CPUs," says Steve Chazin, product manager for Xpress DV at Avid Technology. "However, it is a cost-effective way of increasing the number of processing units per host CPU." A dual-processor HT machine, for instance, potentially offers four-lane performance at only incremental cost over a standard dual-processor system.

Performance Boost for Video?
There's little debate that HT can theoretically provide a nice boost to processor-intensive computing tasks. And some ob- servers with a video orientation, such as InterVideo's Katsavounidis, are unreserved in their enthusiasm for the new technology. "It's like having a built-in turbo-charger," he says. "You can encode at higher bit-rates, author a DVD in less time, or do more things at the same time without a performance hit."

Richard Townhill, group product manager for Adobe Premiere, agrees that HT will "greatly improve overall speed and performance." And Travis White, product manager at Ulead Systems, adds that consumers will now be able to obtain PCs with enough processing power to perform real- time tasks within video editing applications, "tasks that were previously only possible with dedicated chips on video hardware solutions."

As White points out, however, for a video application to be able to take advantage of HT it is "necessary for the software to have a multi-threaded code base, so that tasks can be split between the two processing units." And there is a range of opinion about how well various video processes lend themselves to this approach.

One process that is critical for most digital video workflow is encoding and decoding, which Townhill describes as "the primary bottleneck in a non-linear environment. He believes that HT will "allow the codec speed to be increased dramatically."

Chazin, on the other hand, offers a more guarded assessment. "In general terms," he says, "any computing function that can be broken up into separable parts that can potentially be performed in parallel could benefit from multiple processors or, to a lesser degree, hyper-threading. But while there are exceptions, codec algorithms typically do not lend themselves to parallelization within a video frame, although you can often process separate video frames in parallel—again, there are exceptions. Still, effects can often be performed on multiple sub-regions of a video frame in parallel. And multiple video streams, which are typically of interest for NLEs, can be decoded in parallel as well."

Jim Taylor, general manager of the Advanced Technology Group at Sonic Solutions, speaks with a similar emphasis on judging HT's effectiveness on a process-by-process basis rather than assuming that HT will automatically boost performance across the board. "Video operations are by nature rather monolithically sequential processes," he says, "with dependence on specific functions such as discrete cosine transforms (DCT/ IDCT), variable length coding/decoding (VLC/VLD), and motion estimation/compensation. Some of these may be limited simply by the huge amounts of data involved."

Benefits Stage by Stage
Addressing the variations in HT benefits depending on the task, an Intel white paper entitled "Accelerating Digital Multimedia Production with Hyper-Threading Technology" breaks desktop video workflow into four stages: acquire, build/edit, render, and output. In acquisition from an external video source, for instance, Intel says that once a system is fast enough to keep up with video at a given resolution and frame rate without dropping frames, the speed at which real-time capture can be performed is defined by the duration of the program. So the company argues that the main value of HT during this stage is actually in making available enough extra processing headroom that the system can be used for other tasks without adversely affecting capture.

In the build/edit stage, Intel says no single task is particularly challenging, but handling multiple tasks at once, as is common in previewing, can push the limits of a standard processor. "Individually, decoders for MP3, MPEG-2, AVI, and other formats do not demand high CPU utilization. Playback is very smooth—until you throw in additional audio tracks, transitions, and special effects where multiple codecs and filters must run simultaneously. The more complex transitions can be very CPU-intensive and usually involve decoding two or more media streams at the same time." In these situations, multiprocessor and HT systems can greatly improve playback performance if the program code supports multi-threading.

As for rendering—building a video file from an edit decision list (EDL)—Intel describes it as "very CPU-intensive—it can use all the performance capability you can throw at it. Audio and video encoders run simultaneously during rendering so this step is well-suited to threading and parallelism." The company also says that the extra headroom provided by HT means that the user interface is more responsive during background rendering. "New tasks on HT-enabled systems launch right away, and the cursor is rarely in an hourglass."

In the output-to-media stage, Intel claims a dramatic shortening of the time required for an authoring system to convert (transcode) source assets to formats compatible with the target delivery medium (e.g., MPEG-2 video and Dolby Digital audio for DVD-Video). On the other hand, Intel says, the process which follows—creation of the file structure and writing it to disc—is "not CPU-intensive since the rate-limiting step is the CD or DVD burner." So there is less potential for HT to provide any advantage during writing.

The bottom line is that HT's impact depends greatly on the setting in which it's being used. "The speedup in rendering, encoding, and effects will vary depending on the type of processing the application is doing at any particular time," says William Chien, director of product management for the Business & Consumer division at Pinnacle Systems. "In certain situations, Pinnacle has measured a speed-up of nearly 20 percent as a result of hyper-threading."

Work in Progress
In addition to the type of process involved, the degree to which the new HT processors will deliver noticeable benefits to any given set of end-users depends on the progress made by tools vendors in optimizing their code. To illustrate this point, Taylor points to a set of test results posted last November by Tom's Hardware (www17.tomshardware.com/cpu/20021114/p4_306 ht-15.html) comparing the speed of two encode operations (DV to MPEG-4 and DV to MPEG-2) using a range of different processors. For both encodes, the speed of the HT-enabled 3.06gHz P4 was only marginally better than the non-HT version of the same processor.

So what are vendors of video-related software doing to take better advantage of HT? To some extent, it's a continuation of existing efforts to implement multi-threading, with some additional optimization for HT. For many companies, it seems to be less a matter of arriving at an end-point than a process of continual upgrading that should yield ever-improving performance.

Manal Ma, assistant vice president of marketing at CyberLink, says the company has been working with Intel for some time on the optimization of applications, including PowerDirector Pro, PowerProducer, PowerVCRII, and PowerDVD. "The major change is the ability to balance the load of each thread to maximize CPU utilization. We have tested on an HT processor and the performance was noticeably faster."

Ma says the benefits of HT can be seen in areas such as increased video quality from real-time MPEG-2 encoding and real-time DVD-Video recording to writable DVDs. "Of course," she adds, "HT has its performance increase limitations because multiple threads will all fight for computing resources—the core logic part of the CPU."

Sonic has also been working closely with Intel on optimizing for HT. "Because our applications already utilize a highly multi-threaded design," Taylor says, "we're well-positioned to take advantage of hyper-threading. But some architectural redesign work remains. We've found that some code runs faster, but some code actually runs slower, so we're analyzing things with the help of Intel's engineers to isolate the areas that can benefit most from tweaking. The work involves making the code more threaded and balancing CPU resource usage across threads. For example, putting floating-point operations in one thread and integer operations in another helps keep from stalling one of the threads."

Hard Times for Hardware?
Even as code is optimized for HT, Taylor says it's important to understand that hyper-threading "doesn't magically replace hardware. For tasks that today are right on the edge of being doable in software, hyper-threading has the potential to make them possible without dedicated hardware. But some high-end, real-time video tasks, such as high-definition video encoding, will require hardware support until CPU speed increases significantly."

Chien concurs with Taylor that it's premature to diminish the value of hardware encoding, which he says is particularly useful for external video capture products such as Pinnacle's new MovieBox USB and MovieBox DV devices. He also notes that both Pinnacle Edition and Pinnacle Studio are multi-threaded applications, which means that "users will experience an immediate benefit when they are run on a hyper-threaded processor system. And our products will continue to be optimized for hyper-threading. Particular areas where hyper-threading will help are MPEG encoding in Studio and Edition, real-time preview in Studio, and background rendering in Edition."

Katsavounidis, on the other hand, believes that "even for the highest-quality encoding—meaning broadcast—a hyper-threading machine is better than hardware. When combined with the proper software, PCs are already powerful enough. Right now InterVideo is replacing all the dedicated hardware encoding equipment for a large satellite TV company in Japan with PCs running our software. Customized PCs can already handle this high-level encoding and produce superior video. So it's never a good idea to use a hardware encoder."

While Katsavounidis reports a 15% performance increase for the company's current software versions run on hyper-threading machines, he says that InterVideo is "already in the middle of multi-threading all components and applications and optimizing performance for hyper-threading. To get the full advantage you need to partition each task—audio encoding, video encoding, rendering—into complementary tasks. So even when you author a DVD, for example, and only video encoding is performing, you can do motion estimation on one virtual CPU while you do DCT on the other. We plan to do this for all of our software as quickly as possible."

White, meanwhile, says that many of Ulead's products are already optimized for hyper-threading. "The latest versions of both our professional and consumer video products have multi-threaded code bases. The work that had to be done included revamping the code base in our real-time video editors such as MediaStudio Pro 7 and VideoStudio 7 to produce multiple tracks and effects composited in real-time preview and output. For other products, it was important to optimize the performance of our own MPEG codec, Ulead MPEG—now, so that encodes can take place in real time and transcodes can take place near real time."

White adds that hyper-threading yields an extra payoff for the work already done to support multi-threading. "Ulead performance benchmarks show that our upcoming version of MediaStudio Pro has the real-time ability to play and output three DV and two graphics streams simultaneously on a single-processor Pentium 4 2.0gHz system with 512MB RAM. With HT technology, the number of streams will increase. Our tests show that HT technology improves the processing speed of our software by approximately 23 percent."

Chazin agrees that HT, combined with faster CPUs, will increase the possible real-time stream count, as well as the count, complexity, and quality of effects. "Since compressed video encoding generally requires more CPU cycles than decompression, real-time encoding/transcoding will be possible in the near future. The NLEs that have the most optimized multi-threaded implementation will be the first to accomplish that goal."

As for the currently shipping version (3.5) of Avid Xpress DV, Chazin says it benefits from HT right out of the box. "There's roughly a 20 percent performance increase in real-time effect playback performance, which is approximately the maximum benefit that we've observed with other well-known multi-threaded applications. This theoretical maximum benefit reflects the current state of hyper-thread technology, and may well increase in the future. For now, hyper-threading represents a very cost-effective way to significantly boost performance. And future versions of Avid's software-only NLEs will become even more effective at tapping the potential of multiple HT-enabled processors."

Chazin's positive assessment of HT's continuing value is echoed by Katsavounidis. "For the software makers that properly modify their applications," he says, "it means significant speed gains in processor-intensive tasks. Faster is better, and we will continue to find uses for more power. Incredibly complex videos like Shrek and Monsters, Inc. will still rely on ‘rendering farms,' but desktop movie-makers with hyper-threading machines now have more power at their fingertips than professional videographers with a room full of hardware had 10 or 15 years ago."

COMPANIES MENTIONED IN THIS ARTICLE
Adobe Systems, Inc. www.adobe.com
Avid Technology, Inc. www.avid.com
CyberLink Corporation www.gocyberlink.com
Intel Corporation www.intel.com
Intervideo, Inc. www.intervideo.com
Pinnacle Systems, Inc. www.pinnnaclesys.com
Sonic Solutions www.sonic.com
Ulead Systems, Inc. www.ulead.com