Captions and Transcripts

Captions and transcripts are designed primarily for people with hearing impairments, but they're also beneficial for non-native speakers, aiding general understanding, especially in noisy environments or when keeping the volume low or off is necessary. Prerecorded audio-only and prerecorded video-only media must be accompanied by one of the following: 

  • Prerecorded audio-only content must be accompanied by a text transcript that includes all essential dialogue, identifies the speakers, and describes all essential sound effects. 
  • Prerecorded video-only content must be accompanied by a text transcript or audio description describes all important visual information such as scenes, important actions, text on the screen, facial expressions, etc. 


Captions convert the audio in a video into text, displaying it simultaneously with the spoken words. To ensure an inclusive design, captions should not only transcribe the dialogue but also identify the speakers, providing an experience equivalent to listening for all viewers. Captions should also include relevant sounds in brackets:

  • wordless vocalizations, such as laughter, coughing, and sighing; example: [laughing]
  • important sound effects, such as environmental sounds; example: [wind blowing]
  • details about how someone is speaking, such as whispering; example: [whispering]

Automatically-generated captions such as on YouTube are not yet accurate enough to serve people with disabilities. Proofread and correct any automatically-generated captions before using them.


Transcripts also provide a text description of the audio, but they are not synchronized with it. Transcripts exist in a separate file outside the video player. Transcripts are best for audio-only recordings without any other sounds or video.

Who Benefits

For those who are deaf or deafblind, accessing audio-only content like podcasts requires a text transcript. This transcript should encompass all dialogue, speaker identifications, and essential sound effects. People who are deaf can read the transcript, while those who are deafblind can use a screen reader with a refreshable braille display.

Similarly, individuals who are blind require a text transcript or audio description to access video-only content such as silent movies or narration-free how-to videos. This transcript or description should detail crucial visual elements like scenes, actions, on-screen text, and facial expressions. While the audio description serves those who are blind, deafblind individuals can access the content through a screen reader with a refreshable braille display, but this option excludes those who are solely deafblind.

Best Practices for Captions and Transcripts

  • Proofread and edit automatically generated captions.
  • Along with the words spoken, capture the speakers and relevant sounds. Show names and sounds in brackets [melancholy music]. Adjust the timing if needed.
  • When including a transcript, add a link to the transcript file as close as possible to the relevant source.

Learn More