The elements of online communication 2: audio


We continue our tour of the elements that make up all our online communications with audio. What contribution can it make? Where is it less effective? How is it best delivered online?

What audio is good for

To rather state the obvious, audio is useful when we want to know what something sounds like – a human voice, a piece of music, a fire alarm, a bird song. In these situations, textual descriptions will always be second best.

More commonly, we use sound as an alternative verbal channel to text. In fact, it’s a very rich alternative because it conveys tone of voice as well as the words.

In an online context, audio is useful because it takes up no space on the screen. When you’re presenting a sequence of images, an animation, a software demonstration or a movie, the verbal content of your message can be delivered in sound without taking attention away from the visual elements.

Although not so often a key factor in online communication, music has the capacity to alter mood more successfully than any other medium.

When audio is not so suitable

Unlike text, audio is not self-paced. Although the user may have the facility to rewind and fast forward recorded audio, they cannot control the speed at which the sound is delivered; and with a live audio stream, even this capability is lost. Delivery of the spoken word is much slower than the speed at which a person can read, which might, in some circumstances, frustrate a user from achieving their goal as quickly as they would like.

Audio also requires more in terms of technical paraphernalia. The listener needs a sound card and either speakers or headphones; and in a live online conversation, the contributors also need microphones. In an open office environment, there is the additional risk of causing disturbance to those working nearby.

It goes without saying that sound will be inadequate when the subject matter is highly visual or is better understood with visual aids. In these situations audio can be combined with, or replaced entirely by photos, illustrations, video, diagrams, screen grabs or animations. Audio-only media, such as podcasts, will clearly struggle when a visual element is required to convey the required meaning.

Optimising audio for online delivery

If your audio is pre-recorded and intended for download by the user (that’s when they wait while the file is saved in full to their computer, but can then play it repeatedly offline), then it pays to keep the duration of the audio short and therefore minimise the file size. A good example is with podcasts – better to distribute your programme in three 5MB sections rather than one of 15MB. This constraint does not apply when the audio is streamed (that’s when the audio plays almost immediately but is not stored on the user’s computer), as will be the case with live audio or when the audio forms part of a multimedia presentation.

Generally speaking, it pays to limit the user’s exposure to a single voice. However interesting the speaker and however expressive the voice, any listener will begin to tune out after 10 minutes or so. On radio, you will rarely hear a single voice continuously for more than a minute or two. For this reason, interviews, discussions, question and answer sessions and drama work much better than monologues.

In The Media Equation, Stanford University researchers Byron Reeves and Clifford Nass reported on the impact that audio quality had on a user’s overall impression of their medi experience. Their conclusion was that audio quality does matter a great deal, an argument for taking care when recording and editing, and then sampling at the best rate possible given the bandwidth constraints. When recording it pays to use quality microphones (ideally fitted with pop shields, which reduce the explosive peaks that occur when speakers say the letter ‘p’). Ideally the room will be free of reflections (so few natural reverberations or echoes) and the speaker should be a comfortable distance (say four or five inches) from the microhphone. The recording level should be high enough to avoid background hiss, while avoiding the high peaks that cause ‘clipping’. When editing, bad takes and gaps can be removed and the overall volume level equalised using a process called ‘compression’.

If you are recording a narration to accompany a multimedia presentation or an e-learning programme, it pays to employ a professional voiceover artist. While this may appear to be extravagent, the cost rarely exceeds a few hundred dollars and can make a big difference to the professionalism of the end result.

Audio can be captured on a portable recording device (a digital recorder, a phone or camera) or directly into a computer. In the case of the latter, it pays to work with dedicated audio software if you can, as this will provide you with much more flexibility when it comes to editing (although professional audio editors are expensive, free software such as Audacity is good enough for most purposes). On the other hand, many authoring tools allow you to record directly into the tool and, with a little care, the results can be more than adequate.

To accommodate those users who have a hearing impairment, you need to provide a transcript of any important audio components within recorded media.

Combining audio with other elements

As a verbal element, speech combines well with visual elements but clashes badly with a second verbal element such as text. So, audio over a sequence of images works well, whereas if the words are also replicated on the screen as text, the user stands to be confused and frustrated. The brain cannot process two verbal inputs simultaneously, so the most likely consequence is that the user will reach for the volume control to block out the slower of the two verbal sources, the speech. Of course, if the content of the audio is music or sound effects, this will not clash with the text and can work well.

How audio is represented online

Digital audio is represented as a stream of ‘samples’. The quality of these samples is determined by the frequency with which these samples are taken (the more often the better) and the resolution of the samples (the more bits used to describe each sample the better). As an example, CD audio is sampled 48,100 times per second (48.1KHz) with a 16 bit resolution. Typically, much lower sample qualities are used online in order to reduce the strain on bandwidth (the speed with which data can be transmitted across the network). Similarly, most music is recorded as two-channels of samples (stereo), whereas a single channel (mono) is acceptable in many circumstances and certainly when the content is simple speech.

Even when the audio is encoded in mono and at a lower sample quality, it will still be far too bulky to download or stream without extensive compression. The most common compression formats are:

  • MP3
  • AAC (Apple’s alternative to MP3)
  • WMA (Windows Media Audio)

Most audio editing software will be able to export in a wide variety of compression formats.

See also: text
Coming next: images

About Clive Shepherd

Clive Shepherd has written 244 post in this blog.

Clive is a consultant specialising in the application of technology to learning and business communications. He was previously Director of Training and Creative Services for a multinational corporation and co-founder of a major multimedia development company. For four years he was chair of the eLearning Network.

Download Our Free eBooks

Onlignment ebook link image


  1. says

    A very “sound” post on this topic, if you’ll pardon the pun! I did hold my breath until the end when you pointed out the conflict between narration and identical on-screen text. So many people just do this in their online learning without understanding how it can have a negative impact on understanding – the opposite effect of what they intend! Thanks for making sure to spread the word on this issue.

Leave a Reply