CC Writer
Posted Nov 19, 2004

If you're an in-house or contract video producer doing government or academic work, it's your job to meet those requirements. Here we'll look at the letter of closed-caption law, as well as tools and techniques for following it and producing accessible and eye-pleasing closed-captioned video for streaming and DVD.

If you work in a government facility, in academia, or for a contractor supplying video to the government, your work must meet the accessibility standards described in regulations concerning Section 508 of the Rehabilitation Act of 1973. You may also need to meet these standards in work done for private clients, corporations, or non-profits, depending on the nature of the target audience. As one regulation states, "All training and informational video and multimedia productions which support the agency's mission, regardless of format, that contain speech or other audio information necessary for the comprehension of the content, shall be open or closed captioned." That's a big net that catches a lot of government and academic videos.

There are also requirements specific to captioning for broadcast material, but meeting those requirements doesn't typically fall under the job descriptions of corporate, institutional, or contract videographers. Nor do the rigors of live event captioning, which uses a different set of rules and obviously different transcription requirements. Here we'll look at closed captioning issues that impact video producers whose work is intended for streaming or DVD distribution; specifically, what to caption, and how closed captions differ from DVD subtitles.

Once you decide to caption, the next step is to set captioning standards to ensure consistency within and among your video productions. We'll also look at captioning workflow, starting with how to format your files for import into a captioning program. Using a remarkable program called MAGpie, we'll synchronize a set of sample captions to a video file and integrate the results with Windows Media for streaming applications

If you're producing both streaming videos and DVDs, you'll need to produce one text stream that works with both formats. So as a final step, I'll identify some tools and suggest a workflow for doing just that.

Preliminary Issues
Closed captions serve a broader purpose than the subtitles you may have seen in (or produced for) foreign movies or DVDs. That's because most subtitles assume that the viewer can hear, but doesn't understand the language. For this reason, noises like gunshots, screams, music welling up, dogs barking, and cars beeping may not be noted in the subtitle text.

Closed captions assume that the viewer cannot hear. To comply with Section 508, closed captions must "contain speech or other audio information necessary for the comprehension of the content." If you're producing a DVD for government use, you'll need to add all the audio cues necessary to satisfy this requirement. In addition, note that captioning for broadcast and captioning for streaming video and DVD involve completely different techniques and technologies.

Unlike captioning for broadcast, which requires expensive hardware and software, captioning for streaming and DVD involves creating text files in inexpensive (or free) shareware programs and then linking the text file with the video file for streaming, or importing the text file into the DVD authoring program to create subtitles. While some broadcast captioning systems can export text strings for DVD and streaming, inexpensive programs will serve just as well.

Creating Your Captioning Standard There are few absolute rights and wrongs in captioning. The most important thing is to be consistent in how your captions are created and applied. So before creating your first closed captioned text, you should define the conventions you'll use to produce your closed captions. In formulating the standards outlined here, I relied heavily on the practices used by the Media Access Group at WGBH in Boston, the world's first captioning agency and creators of MAGpie [see sidebar, "Web Resources"]. They've been delivering accessible media to disabled adults and children for over 30 years. Certain sections of this article are paraphrased from materials from the Media Access Group, with their gracious permission.

Step 1: How Many Lines
The first decision you'll make in defining your captioning standard is how many lines of captions. Television captions tend to be three or four lines, while the majority of Hollywood DVD titles tend to use two lines.

Most streaming producers also use two lines. Some companies, like the Media Access Group, add a third line at the top of the caption to identify the speaker (Figure 1, above). Unless you have a strong reason to choose otherwise, two lines of text is probably a good place to start.

Step 2: Which Captioning Technique
There are two styles used to make captions appear and disappear from the screen:

Step 3: Text Segmentation
What's actually happening when you convert a script or transcribed speech into captions? Most conversations involve a give and take between two or more individuals. Sometimes a comment from one might be a single word like "yes" or "no." Sometimes a response can be three or four sentences long, involving dozens of words. 

Irrespective of the length of the speaker's comments, during captioning we break them down into their most comprehensible chunks. If you choose two-line captions, this means that all dialogue must be subdivided into chunks of two lines each, averaging 30-35 characters per line, including spaces. 

When dividing up your text, understand that people don't read letter by letter or even word by word, they read in chunks of words; or phrases. For this reason, captions are most readable when divided into logical phrases, both within the two lines in a caption and from caption to caption. Both points are shown in Table 1 (below). 

The phrasing on the right in Table 1 violates both rules, with poor phrasing within Caption 1 (breaking up a name) and between Captions 1 and 2 (breaking up a title). If you read both versions out loud, you'll instantly see that the first column reads much more naturally. Column 2 also violates another rule of segmentation: a period should always end a caption (though not all captions have to end with periods). Specifically, in Caption 2, where the first sentence ends with "Magazine," the next sentence should start a new caption, as it does on the left. 

Step 4: Choose Your Font and Case
Typically, when it comes to print or static, on-screen text, fonts with serifs, like Times New Roman, are more readable than sans serif fonts, and words are more recognizable; most books and magazines use fonts with serifs. The Media Access Group recommends using the Roman font, and Times New Roman is the most similar font typically installed on most computers.

Text with mixed capitals and lower-case lettering is easier to read than all uppercase text, so that's the recommended practice for streaming media and DVDs. (TV captions don't mix cases because they can't—TV caption decoders can't display below-the-line segments of letters like j, g, q, and y.) 

Step 5: Choose Your Font Size
Font sizes vary by captioning program. In general, larger fonts are more readable, but if your font is too large, your caption will wrap to the next line, or extend outside the viewing area.   

My recommendation is to prioritize readability over elegance, and be sure to preview the font size within your target player. While MAGpie is a fantastic tool, in testing, the appearance of the font size within its preview window didn't always accurately represent font size within the ultimate player application.

Step 6: Define Text Placement and Speaker Identification
Placement of text, in certain situations, can provide strong clues as to who is speaking. If there are multiple on-screen speakers in a fast-paced discussion, consider identifying the speaker in all captions. Alternatively, since most speakers talk for longer than one or two captions, consider identifying the speaker only when the speaker changes.

Here are the accepted rules of caption placement and speaker identification:

• If there are two consistently positioned speakers: Place captions under the appropriate speaker, justifying to their respective side of the screen.
• If there is only one speaker: Place captions in the center of the screen and center-justify them.
• If the speaker is off-screen: Place captions in the center of the screen and center-justify them. Some producers identify off-screen speakers with italics.
• Clearly identify new speakers whenever speaker identification is not obvious to the viewer: This can occur with off-screen narration, during J-cuts, or when there are many speakers onscreen. Format your speaker identification to distinguish it from spoken text.

There's one significant real-world caveat to these rules: not all players and/or closed captioning tools can create or implement left-justified, right-justified, and centered captions. For example, because of alignment problems encountered when playing closed captioned streams with Windows Media Player, the Media Access Group modified MAGpie to produce only left-placed captions. In addition, RealPlayer can only display left and center-aligned captions (though, of course, you could right-justify the text with space or tab commands). In fact, the only streaming player that properly implemented our speaker-placement strategy was QuickTime.

Positioning within a DVD stream proved a bit more straightforward, and should be feasible in most authoring programs.

Step 7: Define Rules for Noises and Other Points of Emphasis
Closed captions must describe a broad range of audio events to enhance the viewer's comprehension of the video. Like speaker identifications, these audio events need to be visually different from the spoken information.

The Media Access Group recommends showing sound-effect captions parenthetically, in lowercase italics, typically presented as a standalone caption. In the context of our interview footage, which was shot during the hustle and bustle of a trade show, a "(crowd noises)" caption displays as the video is fading in from black at the start. This lets the viewer know that we're shooting in a crowd. Note that you should identify both the source of the noise and the noise itself.

You can also use these indicators to describe the intonations that flavor the speech. In the interview, Ken and I were swapping stories, and he recalled a joint presentation where the equipment setup went less than smoothly. I laughed, and commented, "What a mess that was." Following MAGpie style, this would be captioned as follows:

                   (Laughing)
                   What a mess that was!

In addition to noises and sound effects, consider identifying other information that's apparent in the audio but not in the text description. This would include accents (French, German), audience reaction (laughing, loud boos) and the pace of speech (slow drawl).

It's appropriate to caption emotion (e.g., angry frown, deep in thought, daydreaming) even if there is no accompanying speech. You can also use onomat- opoeia, or text strings that sound like the noise being described, though Gallaudet University found that most consumers preferred both a text description and onomatopoeia.

These would appear as:

            (dog growling)
            Grrrrrrrr,

Step 8: Choose Your Music Treatment
Music often sets the mood of the video, so when background music is present, it should be indicated. Television sets use a musical note sign to identify music playing, or when someone is singing, but the character is not universally recognized by all streaming media players. If not available, use the word music in italics surrounded by either parentheses or brackets. If the music has no lyrics, be as descriptive as possible ("soothing music," "disco music") and identify the name and the composer if known. Caption the lyrics if they are being sung, starting and ending with the special music character.

Step 9: Edit the Text
The goal with captions is to present them with the actual spoken word, but some people talk faster than others can read. In these instances, it's accepted practice to edit the text to achieve a certain reading speed.

The Captioned Media Program style guide, produced by the National Association of the Deaf, states that most elementary or secondary students can read at 120 words per minute (wpm), with adults reading up to 160wpm. For Captioned Media Program videos, the guide requires that "no caption should remain onscreen less than 2 seconds or exceed 235wpm."

When editing the text, the Media Access Group has these recommendations: "Avoid editing only a single word from a sentence as this does not really extend reading time. Similarly, avoid substituting one longer word for two shorter words (or a shorter word for a longer word) or simply making a contraction from two words (e.g., contracting ‘should not' to ‘shouldn't')." Virtually all style guides recommend against modifying for correct English (substituting "isn't" for "ain't," or "you all" or "you" for "y'all").

Step 10: Other Issues
The first nine steps cover the main issues, but many additional standards should also be addressed.

Two common ones include:
• Treatment of numbers—generally spell out one through ten, numerals for higher numbers except when they start a sentence (Media Access Group).
• Acronyms—display as normal (IEEE rather then eye-triple e)

For others—like fractions, dates, dollar amounts, and more—consult the Captioned Media Program style guide.

Creating Your Closed-captioned Text
There's your style guide; now you're ready to begin formatting your text. The first step is to convert the audio into a text file. There are a number of ways to tackle this problem.

Manual Conversion
With manual conversion, a transcriber listens to the audio and enters all speech and other audio information into a word processing file. Then the transcriber breaks the file into individual captions according to standards used by the software program to create the closed captioned text streams. MAGpie has very specific requirements for text input.

Some sources estimate that manual conversion of television programs and movies that are rich in non-speech audio content—sound effects, background music, and drama—can take up to 20 hours for each hour of audio. Corporate or academic training materials should be much shorter, since most of the text is simply transcribed speech. If you have a script that was largely followed, this provides a good starting point.

Service bureaus typically charge about $6/minute for manual conversions, or under $200 for a 30-minute production. While certainly not cheap, this is a fraction of most production costs, especially if you had to rent equipment or a soundstage or pay actors or other related personnel. When obtaining a quote, be sure to ask the following questions:
• Which digital and analog captioning formats does the bureau support (Windows Media, QuickTime, Real, Line 21, DVD)?
• What level of accuracy will the service bureau guarantee?
• Which style of captions will the service bureau produce (roll-up or pop-on)?
• How will the service bureau segment the text (characters per line and lines per caption)?
• Will charges include complete audio transcription (sound effects, intonations) or just speech?
• Which style guide or other direction will the service bureau use to segment the text and caption information like dates, numbers, etc.?

Speech Recognition
A second option for converting speech into closed captioned text is automatic speech recognition, which generally works best when tuned to the voice and speech pattern of one user. For this reason, plugging in the audio feed from a training video or lecture involving random individuals will almost always produce poor results.

Many universities (including Gallaudet's Television and Media Production Service) have adopted "shadowing" or "voice writing" where a person trained on the software repeats every word spoken in the video into the voice recognition system. Typically, these individuals work in a quiet environment or use a mask to minimize outside interference. Though these systems are not 100% accurate, they provide a first draft that can accelerate the transcriber's work. Computer Prompting and Captioning Company (CPC), a prominent closed captioning vendor, sells several systems that include IBM ViaVoice speech recognition software.

Converting Your Text to Closed Captions
Once you have a transcription of your audio, your work has just begun. Typically, your transcription will list each speaker and their comments in paragraph style, and may or may not note background noises, speech inflections, or other additions necessary to provide for complete comprehension of the event.

As a starting point, your transcription may look like this:

Ken: We're streaming today live with both the Real encoder and the Microsoft Windows encoder, side by side.
Jan: Very politically correct. 

Now the task is to add the required sound effects and format the text as specified in your style sheet, and according to the requirements of your captioning software. For example, MAGpie assumes that a single carriage return separates two lines within a single caption, while a double carriage return means a new caption.

If I input the text as shown above, MAGpie would produce three two-line captions, but the first and third would contain far too many characters. To avoid this, and remove the names (I'll show who's speaking by positioning the captions), I would pre-format the file as follows:

We're streaming today live with both the Real encoder and the Microsoft Windows encoder, side by side.
Very politically correct.
(both laugh)


 

I also added the "(both laugh)" caption to reflect that both Ken and I were laughing after my terribly clever "politically correct" quip. Pre-formatting the file in this manner makes the import into MAGpie a snap. If you decide to use another captioning program, check the manual for pre-formatting tips to use for that product. In addition, note that most captioning programs won't accept Word for Windows .DOC files, so save the file as a plain text file with a .TXT extension.

Creating Closed Captions with MAGpie
Here, we'll do a walk-through with MAGpie, a free download from http:// ncam.wgbh.org/webaccess/magpie. As you move to the download page, you'll see that installing MAGpie involves several elements, including the Java Virtual Machine, which actually runs the program. Print and follow these installation instructions carefully, or the program won't run.

MAGpie works most efficiently when applying captions to the actual compressed file you'll be distributing, so if you haven't encoded your file, do so before starting. Make sure your captions are properly formatted as well.

MAGpie's interface has two windows, one to play the video and the other to format the captions and synchronize them to the video stream. Run MAGpie, then click File>New Project in MAGpie's file menu. MAGpie's Project Properties screen appears (see Figure 2, p. 58). Note that you can return to this screen at any time by clicking File>Properties. Click the Browse button on the top right to load your media file. Then click the QuickTime radio button if you're captioning a QuickTime file, or the Oratrix GRiNS Player radio button if you're captioning a Windows Media or Real Video file.

MAGpie allows you to set separate text options for the Caption and Speaker identification styles. The default is Arial, with a white font against a black background. Click each style and boost the font size to 14 to set a readable size for your final captions. Accept all other font defaults.

The Segment Annotation style is an advanced feature that applies Karaoke-style labeling to the captions. Leave this at the default setting (Style segments manually). If your video file is not 320x240, adjust the video parameters to those of your file, and adjust caption width and height accordingly. For example, if your file is 640x480, enter that into the video width and height fields, and make the caption width 640, and the caption height 80.

Next, MAGpie displays the Create New Project Track screen where you choose the type and name of the track. Audio descriptions are audio files containing narrated descriptions of the video for those with impaired vision, which is a completely different operation. Accept the track name (as I've done) or enter a new one.

After you click this screen, the main MAGpie interface should appear with your video in a separate player. You can now insert the text file containing the captions. Right-click on Track One, and choose Insert Captions from File, as shown in Figure 3 (p. 58).

MAGpie opens a standard File Open dialog which you should use to select and load your file. Note that MAGpie loads your file starting on Row 2 of the captions. If you'd like to delete Row 1 so your captions start on Row 1, just select the blank line, right-click and choose Delete Selected Row. When you're done, MAGpie should look like it does in Figure 4 (above).

Each caption row has a column for Start and End time. You don't have to insert an End time; if that column is blank, MAGpie simply replaces the caption with the subsequent caption at its designated start time. The only reason to insert an End time is if you'd like the closed captioned screen area to go blank.

To start synchronizing captions and audio, click Row 1 and make sure that you're at the absolute start of the video file. Click F9, and MAGpie will insert 0:00:00.00 to synchronize the video starting point with the first row.

MAGpie then automatically advances to the next caption row. Use the player controls to advance the video to where the next caption should appear and press F9 again. With a little practice (try focusing on the penultimate word in each caption), you should be able to play the video in real time, and press the F9 key to synchronize each row with the associated audio. Finally, make sure the final row is blank but contains a start time, which according to product documentation helps ensure that the exported caption file will work with all formats.

Once the file is complete, you can use the player controls to test your file and make sure your synchronization is accurate. You can make any changes directly into the timecode of each starting point; just touch it and enter the new start time. Or you can rewind the video until it's in front of that caption, start playback, and then press F9 when appropriate.

Then it's time to apply the necessary justification, as shown in Figure 5 (above). In this case, since I'm sitting on the left, I'll keep all my comments on the left, right-justify Ken's questions, and center-justify all other captions. Run MAGpie's spell-check function if you didn't spell-check in your word processor.

Because Ken and I were sitting still on our respective sides of the screen, I didn't insert any speaker names during the interview. By default, MAGpie isolates the speaker name on the first row of a caption, as shown in Figure 1. You can alternatively place the name in the first row of the caption, so long as it's clearly distinguished from the spoken word.

In MAGpie's preview window, 14-point closed-caption text over 320x240 video proved reasonably readable and proportionate. If you find your font size inappropriate, change it in the MAGpie Project Properties screen.

In terms of workflow, you can enter captions directly into MAGpie. I find it less efficient than creating a separate narration file in Word and then formatting that document for MAGpie. If you decide to enter the text directly into MAGpie, simply click the caption box to make the field active, and type the desired text. Use one carriage return to create another line within the caption, and a double carriage return to switch from caption to caption.

To export your file, click Export and choose the desired format. Note the Plain Text export, which is useful if you need a transcript of the event. There are efforts underway to standardize how QuickTime, Real, and Windows Media files synchronize with text, but until these standards are set and adopted by each company, you'll have to create a separate text file for each technology. Here we'll do a walk-through of adding captions created in MAGpie to WMV files.

Closed Captioning and Windows
Media files Using MAGpie's export function, I produced interview.smi, a file compatible with Microsoft's Synchronized Accessible Media Interchange format (SAMI). This file contains the caption text and synchronization information just created in MAGpie. Next I created a text metafile with an .ASX extension to link this file with interview.wmv, the Windows Media file containing the streaming audio and video file.

To play the video concurrently with the closed caption file, viewers will load the ASX file into Windows Media Player (WMP). To post your video and captions to a Web site, you would upload all three files to your Web site and have your Web page link to the ASX file. In Figure 6 (below left), WMP displays both the video file and closed captions.

WMP won't play captions if you've selected a "skin" rather than being configured in Full mode; you must also enable the program to play captions. If your viewers can't see your captions, chances are it's one or both of these configuration issues causing the problem. If you post the files to a Web site, you must upload the ASX, WMV, and SMI files to your server and update the ASX file to reflect the new locations (also called paths) of the content files.

Converting Closed Captions To DVD Subtitles
If you're producing streaming media with closed captions, chances are you're also producing DVDs with the same content. Obviously, once you've produced and synchronized the captions, you'd like to use the same files in your DVD production. If you own a broadcast captioning program, start by asking your vendor if you can export DVD-compatible subtitles from their systems.

Working with MAGpie, this option wasn't open to me, so I tracked down a (free) downloadable tool called URUSoft Subtitle Workshop that could input closed captioned files, and output subtitle files compatible with the three authoring programs I primarily use: Adobe Encore 1.5, Apple DVD Studio Pro 3, and Ulead DVD Workshop 2.

You can create subtitles in Subtitle Workshop, though it lacks features like right- and left-justifying text and synchronizing captions to the video in real time. In terms of pure captioning usability, I prefer MAGpie. However, where MAGpie can only output in the three streaming video formats, Subtitle Workshop supports over 50 DVD subtitle output formats and offers a wealth of import options.

I started by importing the SAMI file created by MAGpie, which formatted perfectly, as shown in Figure 7 (p. 60).

Using output templates available within Subtitle Workshop, I then exported caption files for DVD Workshop and Adobe Encore. DVD Workshop imported the file without problem, though I had to perform minor text cleanup, primarily in two-line captions that DVD Workshop attempted to display in one line (see Figure 8 above, upper left). In addition, all captions were center-justified and placed in the middle of the screen.

When I attempted to import the captions into Adobe Encore, I received an error message indicating that Encore can only import Unicode files for text. I loaded the file into Notepad to see if I could spot any obvious errors, then checked Notepad's export options to see if either option recommended by the Encore error message was available. I saved the file into UTF-8 format, and Encore loaded it without difficulty (as shown in Figure 9 above, upper right) and all captions displayed correctly, though they were all left-justified and placed on the left of the screen.

Importing into DVD Studio Pro required a bit more trial and error, but was ultimately successful. I first tried the DVD Studio Pro output preset, which wouldn't import. This wasn't surprising given that Apple had changed the subtitle architecture in Version 3. Fortunately, the Apple manual outlines several other formats that should import, including the Spruce Technologies STL format (again, no surprise—DVD Studio Pro is partly based on the now-defunct program Spruce DVD Maestro).

DVD Studio Pro happily accepted the STL file, with a couple of minor problems, primarily replacing a with a carriage return to produce a two-line caption, as shown in Figure 10 above, lower left). As with DVD Workshop, all captions were center-justified and placed in the bottom center of the screen.

While not totally problem-free, Subtitle Workshop certainly proved much more efficient than starting from scratch, and you have to like the price. Note that Subtitle Workshop is not the only fish in the caption-converting sea: I also found a product called Lemony that can import MAGpie files and output captions in a variety of formats, and costs 135 Euros.

Sidebar 1: Closed Captioning Resources
WGBH Media Access Group: http://main.wgbh.org/wgbh/pages/mag/services/captioning/
National Center for Accessible Media: http://ncam.wgbh.org/
MAGpie Free Download: http://ncam.wgbh.org/webaccess/magpie
Complete list of captioning tools: www.captions.org/softlinks.cfm
Media Access Group Captioning Style Guide: http://main.wgbh.org/wgbh/pages/mag/services/captioning/faq/sugg-styles-conv-faq.html#copy
Captioned Media Program (CMP), National Association of the Deaf: www.cfv.org/caai/nadh7.pdf
Consumer research study, Gallaudet University Technology Access Program: http://tap.gallaudet.edu/nsi_recom.htm
Gary Robson's Closed Captioning FAQ: www.robson.org/capfaq
Joe Clark, "Typography and TV Captioning," Print magazine, 1989: www.joeclark.org/design/print/print1989.html
Other Joe Clark material: www.joeclark.org/access/captioning
Font choices: www.joeclark.org/access/captioning/bpoc/typography.html
Font size examples: http://ncam.wgbh.org/richmedia/examples/92.html
IBM ViaVoice-based speech recognition sofware systems (Computer Prompting and Captioning Company): www.cpcweb.com/Captioning/cap_via_voice.htm
Shadowing and speech recognition commentary: http://tap.gallaudet.edu/SpeechRecog.htm
Info on pre-formatting documents for MAGpie: http://ncam.wgbh.org/richmedia/tutorials/transcriptpreformat.html
 Info on text synchronization standards for QuickTime, Real, and Windows Media: www.w3.org/AudioVideo/timetext.html
White paper on ASX files and server support: www.microsoft.com/netshow/howto/asx.htm.
Info on SAMI: http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnwmt/html/wmp7_sami.asp
Embedding Windows Media files directly into a Web site: www.webaim.org/techniques/captions/windows/3?templatetype=3.
SMIL Web site: www.w3.org/AudioVideo/
RealPlayer caption content linking: www.webaim.org/techniques/captions/real/3?templatetype=3
Lemony video subtitler: www.jorgemorones.com/lemony/index.htm