Tuesday, 30 March 2010

HD Voice - how much bandwidth do you need?

In this article, the first of a series covering HD Voice, we discuss the issue of audio bandwidth for VoIP systems, and in particular, how much audio bandwidth do you actually need for voice?

Voice communications systems in use today are based on traditional telephony standards that haven't changed much since the 1950s. These standards were set at that time and limit the information bandwidth for voice communications to 300-3400Hz (200-3200Hz in the US and Japan). However, if one analyses normal conversational speech, it typically covers the frequency range 0-8000Hz. In fact, only 20 percent of the frequencies utilized by the human voice are transmitted in the 300Hz to 3.4kHz range. Furthermore, the human ear is capable of hearing frequencies up to 18 or 20kHz. Back when these standards were set, it was felt that a voice channel limited to 3.4kHz would be good enough and ever since that time we have all accepted that telephone conversations would have a slightly muffled tone.

The public switched telephony network (PSTN) still utilises these mature standards, but IP-based voice communications (VoIP) have now evolved to a state where there is technically no reason, on a VoIP call, to limit the clarity of the call by restricting its audio bandwidth. However, when an analysis of the wideband voice market is made, there appear to be many codecs and standards available with a wide range of possible audio bandwidths to choose from. Some codecs are royalty free and available from standards bodies such as the ITU-T; others come with usage fees, but offer other benefits and have vendors actively promoting them as the best possible choice.

The ITU-T standards body have standardised terms for narrowband, wideband, super wideband and fullband in relation to voice codecs. A wideband codec, as defined by the ITU-T, has an audio bandwidth of 50Hz to 7kHz.
The choice of which type of codec to use should come from the end application; some examples for music applications are shown below:
•    CD audio is usually defined as 20Hz - 20kHz
•    MP3 - limited to approximately 18kHz  using 16bit samples
•    AM radio typically 40Hz - 5kHz
•    FM radio is 30Hz - 15kHz

So where does voice fit, and why not just adopt a fullband codec for voice so the full audio range is transmitted?

In recent years, the MPEG-1 Layer 3 (MP3) codec has proven that high quality audio/music reproduction is possible using a digitised stream of bits to represent the sound wave - however, there is a good reason why not to use the already widely adopted MP3 or similar music codecs for voice, and that reason is latency. Whilst MP3 and other such codecs are good at streaming music tracks, they do not have the low latency required for real time two-way voice communications. Specific voice codecs have therefore been developed taking this into consideration.

Fullband codecs
The sensitivity of the human ear to high frequency sounds varies from person to person and typically degrades with age. Some people can hear sounds up to 20kHz, but 15-17kHz is more typical. This is the reason for the development of the fullband codecs, for example ITU-T G.719. These are new developments, however, and as yet these fullband codecs have not seen major deployment, and they are technically complex. But there are large gains to be made in audio quality by extending the bandwidth to just 7kHz or 14kHz.

Super-wideband codecs
Super-wideband (14kHz) codecs are less commonplace, the main one being the Polycom Annex C extension of the G.722.1 standard also known as Siren14. Super-wideband codecs have found use in platforms such as high-end conferencing systems where high quality audio is required to complement the HD video streams.

Wideband codecs
The benefits of wideband speech come from the added information being carried. The additional frequency range 50Hz to 200/300Hz contributes to increased presence and comfort and a more natural conversation. The addition of the higher speech frequencies (3.4kHz to 7kHz) gives improved ability to discern, for example, between 'p' and 't', 'm' and 'n', 's' and 'f'.

It has been found that for voice applications, most of the benefit comes from extending the bandwidth to 50Hz at the low end and out to 7kHz at the high end. Additionally, 7kHz bandwidth codecs also offer a good balance of added quality versus increased complexity and therefore offer high channel counts for a given system.

This is why the standards bodies and developers of voice platforms have chosen, on the whole, to develop applications supporting 'wideband' as opposed to 'super-wideband' or 'fullband' codecs. Currently the most widely deployed voice codecs with extended audio bandwidth are all 'wideband' codecs - G.722, G.722.2 (AMR-WB) being two such examples.

A 7kHz bandwidth is therefore the best overall choice for most voice applications. Now all that needs to be done is to pick which wideband codec to choose - from the huge range available. More on that in a later post...

For further information on HD voice and the reasons why the time is right for HD Voice, download my whitepaper.

Andrew Nicholson


  1. product is the second-generation HD encoder from Scientific Atlanta. D9050 HD Encoder provides many benefits in systems of multi-channel encoding, because a.

  2. For various aspects of sound quality, you need to be conspicuous with choosing which one's the best bandwidth. Even for voice.

    audio brisbane