mp3 and more

A short history of audio coding

By Fraunhofer IIS

Digital music is everywhere by Fraunhofer IISFraunhofer IIS

Digital music: A constant companion

Digital music is everywhere: videos, music, podcasts – they all need good audio quality at low bit rates so they can be accessed via an Internet connection.

What is music? by Getty Images/Hybrid ImagesFraunhofer IIS

What is music?

Music can be defined as organized sound waves.

What is sound? by Fraunhofer IISFraunhofer IIS

What is sound?

Sound waves are created when something (e.g. an instrument or a loudspeaker) causes the surrounding air to vibrate. The amplitude of the wave (the difference from its mean value) influences the sound’s volume, while the frequency influences its pitch (number of vibrations per second). The human ear can usually perceive frequencies between 20 Hz and 20,000 Hz.

How does music become digital? by Fraunhofer IISFraunhofer IIS

How to make music digital?

For storage, transmission and many other applications, we need music to be in a digital form. A digital signal requires values that fall within certain time and amplitude grids. The number of these samples per second is the sampling rate. CD audio, for example, has a sampling rate of 44,100 Hz.

CD sampling rate by Fraunhofer IISFraunhofer IIS

How to make music digital?

The sampling rate indicates the number of samples per second. Its magnitude is described by the Nyquist–Shannon sampling theorem, which states that a continuous-time signal can be represented losslessly by a discrete-time signal whenever the minimum sampling frequency of a signal is more than double that of the highest frequency in the continuous signal. The choice of sampling rate influences the sound quality of the digitized signal.

CD sample with 44.1 kHz sampling rate

00:00

Phone sampling rate by Fraunhofer IISFraunhofer IIS

Phone sample with 8 kHz sampling rate

00:00

Quantization by Fraunhofer IISFraunhofer IIS

Quantization: bit per sample (bit/sample)

Quantization describes the process of mapping inputs from a continuous set of values to a discrete set of outputs. The number of bits used to describe each value or sample is called the bit depth. The fewer values that are available (lower bit depth), the less accurate the digitized signal becomes, leading to greater interference and louder noise.

Audio example with 16 bit/sample

00:00

Quantization by Fraunhofer IISFraunhofer IIS

Audio example with 8 bit/sample

00:00

Quantization by Fraunhofer IISFraunhofer IIS

Audio example with 4 bit/sample

00:00

Audio quality by iStock/SteexFraunhofer IIS

Audio quality

Digitally stored music should sound as good when played back as it did when recorded. The choice of adequate sampling rate and bit depth is important for the quality of digital music. However, the sound quality also depends on other factors (for example recording quality, storage medium and dynamic range processing).

Threshold in quiet by Fraunhofer IISFraunhofer IIS

Psychoacoustics: What do we actually hear?

Psychoacoustic models describe the human ear’s capacity for sound perception. Sound events above the so-called threshold in quiet are audible to humans. Anything below that threshold is inaudible in a quiet environment and therefore irrelevant to humans.

Masking threshold by Fraunhofer IISFraunhofer IIS

Masking in the frequency domain

A sound event changes the threshold in quiet: in the frequency ranges adjacent to the sound event, human ears are less able to perceive other, weaker sound events. The threshold in quiet becomes the masking threshold. Parts of the audio signal that are below this threshold and therefore no longer audible can be stored more efficiently to use less data.

Masking effect by Fraunhofer IISFraunhofer IIS

Masking in the frequency domain: An example

In the following video, you will hear a narrow-band noise at a fixed volume and a constant frequency bandwidth (160 Hz). This noise acts as a masker for a simultaneous series of pure tones (sine waves). These tones start at a low volume and become successively louder in seven steps. The masker raises the threshold in quiet so that the first tones are below the listening threshold and are therefore inaudible. For most people, the sine wave is perceptible only from level 5 onward. Try it out and see for yourself!

Test the masking effekt, Fraunhofer IIS, From the collection of: Fraunhofer IIS

Show lessRead more

Audiocoding by Adobe Stock/blackzheepFraunhofer IIS

Technical consequences of psychoacoustics on audio coding

Audio coding makes use of the human ear’s limited range of perception. It compresses the data rate and psychoacoustic effects ensure inaudibility of the coding process. With a sufficient data rate, the process is completely inaudible (transparent). Furthermore, the digital signal is transformed into a more efficient mathematical representation, which also lowers the data rate.

Psychoacoustics by Fraunhofer IISFraunhofer IIS

Audio coding: redundancy and irrelevance

In audio coding, we are looking for redundant and irrelevant signal components.

Redundant signal components can be reduced by taking into account human perception to ensure no audible information is lost, a process implemented in lossless audio coding. Examples are FLAC, Apple Lossless Coding or zip in the non-audio domain. In this way it is possible to achieve an average reduction factor of 2 (at CD sampling rate and bit depth).

Irrelevant signal parts can be reduced only with a loss of information. An inaudible loss of information, however, is of no importance for the listener or receiver. Examples of this are mp3, AAC, etc.

Data transmission by Fraunhofer IISFraunhofer IIS

The history of mp3
Initial idea: Music over the phone

In the late 1970s, with the introduction of ISDN and fiber optic cables for telecommunications, Prof. Seitzer is at Friedrich-Alexander Universität Erlangen-Nürnberg, researching how to transmit music over telephone lines. He works with a group of students who are studying audio coding for their master’s and doctoral theses. In 1979, Prof. Seitzer’s team develops the first digital signal processor for audio coding. During subsequent development, Karlheinz Brandenburg, one of the students on that team, starts applying psychoacoustic principles to the audio coding processes.

mp3 Team 1987 by Fraunhofer IIS/ Kurt FuchsFraunhofer IIS

The audio team in 1987

In 1987, the first functional, real-time stereo audio codecs (predecessors of mp3) are realized (hardware device on the table). Possibly the world's first real-time realization of an audio codec using psychoacoustic principles. Up to this point, such algorithms solely had existed as computer simulations, and required enormous amounts of computing time to test the processes with only a very limited amount of audio material. The real-time codecs enabled testing under real-world conditions and lead to significant improvements to the algorithms.

The photo shows the audio team in 1987: (from left) Harald Popp, Stefan Krägeloh, Hartmut Schott, Bernhard Grill, Heinz Gerhäuser, Ernst Eberlein, Karlheinz Brandenburg and Thomas Sporer (missing here is Jürgen Herre who joined a bit later)

mp3 Team 2007 by Fraunhofer IIS/ Kurt FuchsFraunhofer IIS

The extended core team of mp3 development

This picture, taken in 2007, shows the extended core team of mp3 development at Fraunhofer IIS in Erlangen: (from left) Harald Popp, Stefan Krägeloh, Hartmut Schott, Bernhard Grill, Ernst Eberlein, Heinz Gerhäuser, Karlheinz Brandenburg, Thomas Sporer and Jürgen Herre. Many other people and research institutions supported the team as they developed the mp3 technology.
The good team spirit is one reason why most of those team members are still working in various vital roles at Fraunhofer.

Reat-time implementation of the OCF algorithm by Fraunhofer IIS/ Kurt FuchsFraunhofer IIS

1988

The pre-predecessor of mp3, the OCF algorithm (Optimum Coding in the Frequency Domain), is running in real-time together with the world's first real-time measurement device to show the working principles of a psychoacoustic algorithm on a screen.

OCF real-time implementation by Fraunhofer IIS/ Kurt FuchsFraunhofer IIS

1989

Extensions to the basic OCF technology create a practical process that is the first in the world to enable the coding of audio signals at just 64 kbit/s in good quality. As a result, it is possible to transmit music in real time over telephone lines.

In 1989, OCF is proposed as the audio standard for the Moving Picture Experts Group, or MPEG. The working group was established the year before as an offshoot of the International Organization for Standardization (ISO). MPEG is in charge of developing standards for digital audio and video compression.

ASPEC 19 ”studio equipment for transmission via ISDN by Fraunhofer IIS/ Kurt FuchsFraunhofer IIS

1991: From the lab to a practical codec

A new high-performance audio codec called ASPEC (Adaptive Spectral Perceptual Entropy Coding), the immediate predecessor of mp3, is presented to the public. The codec is the result of further improvements to OCF and contributions by University of Hannover, AT&T and Thomson.
ASPEC is submitted as the joint candidate of Fraunhofer and the above mentioned companies to the ISO MPEG standardization that resulted in the ISO/IEC 11172 MPEG-1 Audio Layer 3 standard in 1992.

Presentation of the ASPEC 19'' studio equipment. by Fraunhofer IIS/ Kurt FuchsFraunhofer IIS

1991

Presentation of ASPEC 19” studio equipment for reliable transmission of speech and music between broadcasting studios via ISDN (from left: Jürgen Herre, Martin Dietz, Harald Popp, Ernst Eberlein, Karlheinz Brandenburg, Heinz Gerhäuser).
Fraunhofer IIS manufactures the studio equipment and sells it to professional users, such as radio stations. The reliable transmission of speech and music between broadcasting studios via ISDN is the first real application of Fraunhofer IIS audio coding methods.

mp3 field test during the Olympic Games in Albertville. by Fraunhofer IISFraunhofer IIS

1992

All German private radio stations broadcast the 1992 Winter Olympics in Albertville using ASPEC over conventional ISDN phone lines.

First prototype of an mp3 player without moving parts by Fraunhofer IISFraunhofer IIS

1994

Fraunhofer IIS develops an ISO MPEG Audio Layer-3 player prototype, implementing the standard that had just been finalized.
As the first portable music player without moving parts, it can save about one minute of a music track and proves the suitability of the concept for everyday use.
Note: mp3 at that time was not called mp3.

e-mail voting for .mp3 by Fraunhofer IISFraunhofer IIS

1995

The name “mp3” is coined in 1995. In an internal poll, Fraunhofer researchers unanimously vote for .mp3 as the filename extension for MPEG-1 Audio Layer 3. That same year, Fraunhofer provides the first PC-based Layer 3 codec as shareware.

Three minutes of music by Fraunhofer IIS/ Kurt FuchsFraunhofer IIS

1996

Fraunhofer IIS starts to sell the mp3 software over the Internet. Shortly after, an Australian student buys the software using a stolen credit card number and makes it publicly available. The stolen software quickly spreads worldwide.

Saehan MPman the first available mobile mp3 player by Fraunhofer IIS/ Kurt FuchsFraunhofer IIS

1998

The era of portable mp3 listening begins with the introduction of Diamond Multimedia’s Rio 100 in the U.S. and Saehan Information Systems’ MPMAN in Korea. They are the first portable players using solid-state flash memory to store and play compressed mp3 music files, either downloaded from the Internet or encoded from a music CD.

Harald Popp, Karlheinz Brandenburg and Bernhard Grill (from left to right) are awarded with »Deutscher Zukunftspreis« as representatives for the entire development team. The award was presented by the federal president Johannes Rau (second from the right). by Henning ScheffenFraunhofer IIS

2000

Harald Popp, Karlheinz Brandenburg, German Federal President Johannes Rau, Bernhard Grill (from l). On behalf of the entire development team, Brandenburg, Grill and Popp receive the German Future Prize.

The soundlab "Mozart" at Fraunhofer IIS. by Fraunhofer IIS/ Kurt FuchsFraunhofer IIS

Fraunhofer IIS today

Today, an interdisciplinary team of over 200 researchers, engineers and tonmeisters from more than 14 countries is working on the audio technologies of tomorrow.
With our state-of-the-art equipment and laboratories that are unparalleled worldwide (such as the Mozart sound laboratory pictured here), we create technologies that can be found in almost all consumer electronic devices, computers and mobile phones.

Four generations of audio codecs by Fraunhofer IISFraunhofer IIS

Even while mp3 was still in development, work was already being done on AAC, its successor format. The device to bring the AAC format into the mass market was the Apple iPod and Apple's itunes service.

Today, Fraunhofer IIS is working on the 4th generation of audio codecs. These audio technologies from Erlangen improve the sound of UHD TV, provide uninterrupted streaming and connect people via mobile phone calls in hi-fi quality.

MPEG-H Audio by Fraunhofer IISFraunhofer IIS

MPEG-H Audio offers interactive, immersive 3D sound for TV, music streaming and VR.

xHE-AAC by Fraunhofer IISFraunhofer IIS

xHE-AAC: The audio codec of choice for uninterrupted audio/video streaming and ditigal radio.

EVS Interview by Fraunhofer IISFraunhofer IIS

EVS (Enhanced Voice Services): The audio codec offers crystal clear audio quality for mobile phone calls in 5G networks, VoLTE and VoWiFi.

Credits: All media

The story featured may in some cases have been created by an independent third party and may not always represent the views of the institutions, listed below, who have supplied the content.

Fraunhofer IIS

Explore more