What’s CS4 going to deliver? The best indicator is the technology demonstration given at NAB by Hart Shafer, product manager for Adobe Production Premium, which you can watch at http://tv.adobe.com. Search for NAB and then select Adobe at NAB 2008—Adobe Technology Preview, which should be on the third search page.
One of the first programs that Shafer demonstrated was a Mac version of OnLocation. OnLocation provides waveform, histogram, and other scopes to help fine-tune your exposure settings, and it can function as a direct-to-disk video recorder. OnLocation was still a new acquisition at the time CS3 was released, but Adobe did manage to wedge it into the bundle. The compromise was that it was a Windows-only tool; although it did ship in the Mac suite, you needed BootCamp or Parallels Desktop to run the software on your Intel Mac. Now it runs natively.
Adobe has also upgraded OnLocation’s graphical interface, changing from the CS3 version’s clunky faux-hardware look to that of a traditional software program, which looks great. You can now also input a shot list and incorporate shot-related metadata. For example, you can mark a shot as good or bad and add descriptive information, to which OnLocation will add all discernable camera-related information. When you open the clips in Premiere Pro, you can search by the entered data and use a new metadata panel to edit or expand this information.
OnLocation for the Mac and expanded metadata are nice, but the most significant addition to Premiere Pro is a new text-to-speech function that will revolutionize how video is edited and watched within 6 months of CS4’s ship date. The news and broadcast markets will jump on the bandwagon first, followed quickly by corporate; and in these markets, text-to-speech is such a sufficiently "disruptive" feature that it will change editing and streaming preferences.
If Apple and Avid don’t counter with comparable features of their own, they’ll be in big trouble. Here’s why. Video, by nature, is a big lump of largely indiscernible data. Most video with intellectual content, such as news, interviews, lectures, or seminars, is impossible to scan for information, whether for editing, archiving, playback, or for online search engines such as Google. For example, suppose you were editing an interview and wanted to find where the speaker predicted that the Celtics "didn’t have a snowball’s chance …" of beating the Lakers in the NBA finals. Even if you knew that the less-than-prescient interviewee made this claim somewhere between 10 and 15 minutes into the interview, it could easily take you 2–3 minutes to find the exact quote.
As an editor, you’d take the time to find the quote because it’s your job. But if you were watching the streaming video and wanted to find the quote, you’d have to page through the file using the fast-forward controls or slider, which is slow and frustrating. If you were Google scanning the webpage that contained the video, you’d never find the quote, unless, and until, Google inserts speech-to-text capabilities in its web spiders.
Enter Premiere Pro’s speech-to-text function, which is run as a post process after capture. Once the speech has been converted, the text is linked to the originating video like the audio file that generated it. Want to find the "snowball" quote? Type that into the metadata window and Premiere instantly jumps to the quote. You can even mark the video in and out using the text.
As I’m sure you’ve guessed by now, after rendering, the text stays with the video, so long as it’s played in the Adobe Flash Player, of course. You could configure the text to accompany the video during playback, e.g., subtitles, or allow viewers to scan or search for the desired passages. A formerly featureless lump of video is transformed into a scanable, searchable object.
Think of the impact this feature will have on web video, say within the context of our next president's first State of the Union (SOTU) address. Say CNN presents the SOTU address the way it would be today, a 45-minute lump of video, while longtime Adobe partner BBC presents it with the accompanying text. I can scan the text for topics that are important to me, then highlight the passage I want to watch, click a button, and play the video. Which feed would you rather watch? How long will it be before this is a must-have feature for all web-broadcast video? How about for any instructional videos or other videos with text content that viewers would like to quickly find.
Of course, there are many markets in which this feature will have little impact. But if you produce any video in which you need to scan and quickly locate speech, the ability to leverage speech-to-text from production to delivery will quickly become a commercial necessity.
Jan Ozer (jan at doceo.com) is a frequent contributor to industry magazines and websites on digital video-related topics and the author of Critical Skills for Streaming Producers, a mixed media tutorial on DVD published by StreamingMedia.com.