Modern video and audio codecs

Traditional digital broadcasting systems use fixed standards, so changing to using new video or audio codecs means making millions of existing receivers obsolete. On the Internet, however, users frequently upgrade the version of media players they're using on their computers, which makes the most up-to-date video and audio codecs widely available within a short period of time, and broadcasters can take advantage of this to provide their streams at higher picture and/or audio quality.

An example of broadcasters taking advantage of this took place recently when the BBC launched new higher quality versions of the iPlayer TV streams using the H.264 and AAC+ for the video and the audio respectively, as Anthony Rose, the person in charge of the BBC iPlayer, explained on the BBC Internet blog:

 

"Back in December of last year, relatively few people had installed the Flash player needed to play H.264 content; now almost 80% of BBC iPlayer users have it."

 

H.264 video codec

H.264 is currently the best-performing video codec available, as it's about twice as efficient as older codecs such as MPEG-2, which is used for standard-definition TV (SDTV) channels on satellite, cable and Freeview, and it's also more efficient than the more modern video formats such as MPEG-4 Simple Profile (SP) and DivX. H.264 being "twice as efficient" as MPEG-2 means that it can deliver the same level of picture quality at half the bit rate that MPEG-2 would require, so it allows broadcasters to improve the picture quality if they use it at higher bit rates than that.

H.264, which is an MPEG-4 standard, was designed to perform well from very high right down to very low bit rate levels, which is why it's being used for the HDTV channels that are broadcasting on satellite, and it'll be used for the BBC, ITV and Channel 4 HD channels when they launch on Freeview in the next couple of years. It's also the codec-of-choice for mobile TV systems, and a number of the latest mobile phones and MP3 players support it, such as Apple's iPhone, iTouch and the latest version of the iPod.

The range of bit rate levels that H.264 is or will typically be used at are as follows:

 

Type
 
Range of bit rates
HDTV 8 - 20 Mbps1
SDTV 1 - 3 Mbps
BBC iPlayer TV streams 700 kbps
Mobile TV / video 250 - 500 kbps

1 - The 1080i HDTV format requires a higher bit rate than the lower resolution 720p HD format, and the bit rates of HDTV channels will fall over time due to the H.264 codec being a new format, and the performance of new codecs increases quite quickly at the beginning of its lifecycle as developers learn how best to tune the encoder for the best performance.

 

The reason why the bit rates in the above table vary by so much is due to the wide range of video resolutions used, which in turn is due to the differences in screen sizes that the video is typically viewed on — the higher the number of pixels in the video format being used the higher the bit rate needs to be. The following table shows some of the common video formats and their resolutions

 

Format Horizontal x vertical pixels
 
Total number of pixels Typical use
HDTV 1080i 1920 x 1080 2,073,600 HDTV
HDTV 720p 1280 x 720 921,600 HDTV
SDTV 720 x 576 414,720 SDTV
VGA 640 x 480 307,200 BBC iPlayer?
CIF 352 x 288 101,376 Mobile phone / MP3 player
QVGA 320 x 240 76,800 Mobile phone / MP3 player
QCIF 176 x 144 25,344 Mobile phone / MP3 player

 

AAC/AAC+ audio codec

Similar to the case with H.264 for video, AAC/AAC+ is the best-performing audio codec available today in terms of efficiency. However, there seems to be a lot of confusion surrounding AAC and AAC+, because based on comments I've received via email, and reading other people's comments on the Internet, a lot of people seem to be under the misconception that AAC+ provides higher audio quality than AAC. This is not the case, because it depends on the bit rate level being used.

The best-peforming AAC/AAC+ encoder according to listening tests is Nero's implementation, which works as follows:

 

Bit rate
 
AAC Profile Common name
Less than 40 kbps HE-AACv2 AAC+
40 kbps to 84 kbps HE-AACv1 AAC+
85 kbps and over LC-AAC AAC

 

As you can see in the table, it's better to use AAC when the bit rate is 85 kbps or higher, especially when the audio is music.

 

How AAC+ works

AAC+ uses AAC to encode the bottom half of the audio spectrum, and it uses SBR (spectral band replication) to encode the top half of the audio spectrum. SBR only uses a bit rate of 1 to 3 kbps per audio channel, so it uses a bit rate of about 2 to 6 kbps for stereo audio, and it's the fact that SBR uses such an extremely low bit rate to encode the top half of the audio spectrum the reason why AAC+ is the most efficient codec available today for encoding audio at very low bit rate levels.

However, whereas using SBR helps the audio quality a lot for very low bit rate levels, such as 64 kbps (AAC on its own will perform very poorly at 64 kbps), when the bit rate level is higher the SBR actually hinders the audio quality, and that's why it's better to use AAC at bit rates above 85 kbps. The reason for this is that although SBR is very efficient at encoding the top half of the audio spectrum, it isn't able to produce what most people would call a good quality top end, whereas when AAC will produce a good quality top end so long as the overall bit rate is high enough.

For example, at a bit rate of, say, 112 kbps, and assuming that the SBR bit rate is 6 kbps (at the top end of the bit rate levels SBR uses), the bit rates used by AAC+ to encode the bottom and top halves of the audio spectrum would be as follows:

 

Audio band Frequency range
kHz
Bit rate
Bottom half of audio spectrum 0 - 11 kHz 106 kbps
Top half of audio spectrum 11 - 20 kHz 6 kbps

 

In comparison, AAC would probably allocate around 85% of the bit rate to encode the bottom half of the audio spectrum, and 15% to encode the top end, so the bit rates would be as follows:

 

Audio band Frequency range
kHz
Bit rate
Bottom half of audio spectrum 0 - 11 kHz 95 kbps
Top half of audio spectrum 11 - 20 kHz 17 kbps

 

The audio quality of the bottom half of the audio spectrum isn't likely to be much different between the 106 kbps used for AAC+ and the 95 kbps used for AAC, because at 112 kbps AAC already performs well, so there isn't scope for a huge improvement. The quality of the top end as a result of AAC using 17 kbps is likely to be significantly better than the quality produced by the 6 kbps SBR on AAC+. So overall, it's better to use AAC than AAC+ at 112 kbps — I've used 112 kbps as a more extreme example just to demonstrate why it's better to use AAC at higher bit rate levels, but the same argument applies for bit rates down to the 85 kbps transition bit rate where Nero has chosen to use AAC above and AAC+ below. The closer you get to 85 kbps, though, the smaller the difference in quality will be between using AAC and AAC+.

 

Music vs speech

There's an argument for the transition bit rate at which it's better to use AAC being higher for speech than for music. Speech audio predominantly consists of low frequencies — for example, the highest frequency of the person's voice you hear on the telephone is only about 3.4 kHz. Music, on the other hand, has a lot more energy at higher frequencies. So whereas it definitely makes more sense to use AAC at bit rates higher than 85 kbps, it might be better to use AAC+ at bit rates higher than 85 kbps for speech, because there's little energy in a speech signal at the frequencies (11 kHz and over) that would be encoded by the poorer-performing SBR. That would then allow the bottom end of the audio spectrum to be encoded as accurately as possible to make the speech sound more realistic. This could be the reason why the BBC chose to use AAC+ instead of AAC for the 96 kbps audio being used on the new iPlayer TV streams.