Recorded audio and video should have captions and transcripts to give all users equal access to information. Captions and transcripts are especially helpful for non-native speakers. They're also useful for all users in situations when sound must be kept low or muted.


Captions put the audio in the video player into text. They are shown at the same time as the audio they describe. To meet the inclusive design goal of providing an experience equal to listening, captions indicate the words spoken and identifies who is speaking. Captions also include relevant sounds:

  • wordless vocalizations, such as laughter, coughing, and sighing
  • important sound effects, such as environmental sounds
  • details about how someone is speaking, such as whispering

Automatically-generated captions such as on YouTube are not yet accurate enough to serve people with disabilities. Proofread and correct any automatically-generated captions before using them.


Transcripts also provide a text description of the audio, but they are not synchronized with it. Transcripts exist in a separate file outside the video player. Transcripts are best for audio-only recordings without any other sounds or video.

Best Practices

  • Proofread and edit automatically generated captions.
  • Along with the words spoken, capture the speakers and relevant sounds. Show names and sounds in brackets [melancholy music]. Adjust the timing if needed.
  • In a transcript, add a link to the file as close possible to the relevant source.