Golden Ears Are Expensive

Massimo Ferronato [epidemiC]

The idea of preserving sounds and music
to be able to listen to them again
is as old as the human race.


Sound is a vibration transmitted by a material and is carried in waves like the sea. Our ear can pick up these vibrations normally transmitted by the air and turn them into impulses which the brain interprets. A vibration consists of waves traveling in the material, the number of waves present each second is called frequency and their width expresses the energy of the sound, i.e. volume. A frequency with a high number of waves is interpreted by the ear as a treble sound, like a violin, and a frequency with fewer waves as a lower sound, like a double-bass.

Edison’s phonograph

Sound recording was patented in 1877 with the advent of Edison’s phonograph. A wax cylinder with constant rotation was engraved by a nail guided by the energy of the sound waves of the surrounding environment which were picked up by a trumpet. The groove thus created reproduced both the frequency and the width of the sound waves. Once the cylinder had cooled and hardened, the nail could trace the groove, vibrating with the same frequency and intensity as the size of the groove, thus creating a sound wave. The sound thus produced was amplified by the trumpet of the phonograph which enabled it to be heard.

Analogue sound recording

Edison’s phonograph was an analogue sound recorder, taking a vibration of the air, transferring it to a nail and making it vibrate to create a groove with a form which is an analogue of the sound, i.e. the same frequency and same width. The copy is the equivalent of the original sound, this is the characteristic of analogue recording. It is theoretically perfect because of that characteristic, but of course the real world is far from perfect. The materials used to make the copy are limited by their physical characteristics and vary the original vibration. These errors or distortions do not enable correct recording and reproduction of the original sound.


The advent of telephony at the end of the 1800s revealed the limits imposed by recording, reproducing and transmitting sound with analogue technology and these limits were the same as those encountered with radio transmission.
The voice is recorded by a microphone, transformed into an analogue electric signal which then begins traveling through copper cables, processing centers and an infinite number of joints. Each of these elements introduces its own distortion which increases with distance and the number of items the signal passes through. Meaning that the original sound will be different from what is heard by the receiver of the telephone call.


The telegraph was introduced in the mid 1800s and did not transmit the voice but a text transformed into electric signals in accordance with a table known as morse code. Each character is represented by a sequence of two impulses (dot/dash)which can be easily transmitted over an electric network or via radio, unlike an analogue signal which, with its infinite subtle differences, is subject to all forms of distortion. Morse enables faithful transmission of the original text with optimized use of resources.

Sound sampling

In the first half of the 20th century, the morse and analogue systems continued to be used in parallel. The immediacy of the use of the voice favored the telephone and radio but the safety and low cost of morse enabled it to remain popular for long-distance transmissions. People began to wonder how an analogue signal could be transformed into a sequence of impulses which, like morse code, would be easy to transmit and less subject to distortion. Pioneers like E.T. Whitaker, P.M. Rainey, Harold Nyquist and Claude Shannon established the foundations for sampling a signal.


A sound can be converted into an analogue electric signal, with frequency and width equivalent to the original. Sampling measures the width of this signal and records the value in numerical form, repeating the operation for the entire duration of the signal at regular intervals and thus creating a matrix which numerically represents the shape of the electric signal. The matrix enables the inverse procedure and hence the reconstruction of the original signal. Its fidelity depends on the precision and frequency of the measurements. The matrix is the digital (numerical) representation of the signal. Digital sound.

Advantages of sampling

Transforming a sound into numbers involves several advantages. The information is stable and no matter how many copies are made or how great the distance it is sent, it will always remain the same. There may be a distortion in the means carrying it but the correction systems possible ensure that the information will be preserved because, unlike an analogue signal, the support is not the information. An easy analogy is an old picture book in which the images have probably been ruined by time and so the information contained therein is irremediably distorted. Yet despite the characters fading, the text perfectly preserves the message with no distortion of its meaning.

Problems of sampling

Transforming an analogue signal into a sequence of numbers has two limits. The first is related to the sampling frequency which must be at least double the maximum frequency of the original signal (Nyquist theorem). A signal with a frequency of 10,000 hertz (vibrations per second), must be measured at least 20.000 times a second. The second is the error introduced by the precision used in saving the value of the width of the signal. Using a finite number of possible values of the signal, it is impossible to represent the infinite values of the original analogue signal. The maximum difference possible between the original and the memorized value is called quantization error. It is inseparable from the sampling process but is reduced with an increase in the number of levels used to represent the signal.

Sound sampling

Humans can hear frequencies between 20 and 20.000 hertz so a sound simply needs to be sampled at a minimum frequency of 40.000 hertz for no information to be lost. Our ear also has difficulty recognizing minimal differences in the volume of sound and accepts as the absolute equivalent of the original a signal sampled with a range of a million values (or 20 bits). Most people notice no difference using a range of 65.000 values (two bytes or 16 bits), which is the sized used by ordinary compact discs.

Digital telephony

With the arrival of computers and digital electronics in the 1950s, the telephone companies studied the possibility of using digital voice transmission to improve transmission quality and use existent telephone lines more efficiently. In 1954, EPSCO produced the first valve circuit able to convert an analogue signal to digital,(ADC analogue to digital converter). In 1962, the first commercial digital voice transmission system was introduced, known as T1. Transmission used a technology called PCM(pulse code modulation)with 8,000 measurements of a conversation per second with a precision of one byte, hence a maximum of 256 levels recognized and resulting traffic of 64.000 bits per second (64Kbps). Each T1 could carry 24 voice channels.

Sound compression

Every written document has a certain degree of redundancy which increases its size. Redundancy makes reading and interpretation easier. It can be reduced by using acronyms or abreviations but these make reading less easy. These rules can be applied if they are known by those reading the document. Documents in digital format may also be redundant. Over the past 50 years, mathematicians have produced methods of compression able to reduce their size. There are two categories: the first preserve the original information of the document; the second create an approximation of it and are used to compress images and sounds which are better able to tolerate a slight distortion as a tradeoff for greater compactness (lossless and lossy compression).

Lossy compression

Humans recognize a limited range of sounds both in frequency and width and possess selection mechanisms which exclude some sounds when others are present. For example, a sound at low volume produced when there is also a sound of greater width and at a similar frequency is not perceived by the brain. This is known as masking. These imperfections are at the basis of theories on the compression of sound. Unperceived frequencies thus have no influence on hearing and are eliminated from the original digital signal which can be subjected to the inverse procedure and so create a sound similar to the original which will be perceived as a good approximation of it.

Compression in telephony

The spread of digital telephony as of the 1970s persuaded telephone companies to make enormous investments in sound compression technologies which promised more efficient use of existent lines and hence a considerable reduction in the cost of a single conversation.
The PCM protocol already enabled good compression of the voice via a sampling frequency of 8.000 hertz and quantization of one byte. This was acceptable for the human voice, although not for music reproduction. PCM produced ten times less traffic than 16 bit sampling at 40.000 hertz. Mathematicians began to study the behavior of the human ear and to understand how to compress PCM information further. In the 1960s and 1970s, compression techniques such as ADM (Adaptive Delta Modulation), ADPCM(Adaptive Delta Pulse-Code Modulation) and others were developed. Sampling techniques were introduced which took account of the characteristics of the human ear, more sensitive to variations in width at low volumes, in order to reduce quantization error ("a-law" in Europe and "µ-law" in America).

Standardization of compression

Compression technologies were patented and international organizations began to produce documentation ensuring the uniformity of the application of these technologies. The ITU-T (International Telecommunication Union) is the organization controlling the introduction and application of all telecommunication technologies.

Sampled music

The arrival of low cost technologies at the end of the 1970s enabled the manufacture of instruments able to sample and reproduce music (samplers)and digital tape recorders of sufficient quality for use in a professional recording studio.
Thomas Greenway Stockham was a trailblazer. In 1962 at the MIT (Massachusetts Institute of Technology), he produced a prototype of a digital recorder but only in 1976 did he manage to manufacture and sell it with his company Soundstream Inc. In the mid 1970s, he also introduced the technology of audio editing by computer and storage of sound on hard disk.
In 1965, James T. Russell patented a system to read a sequence of sampled music recorded on a disc via a laser. The system remained on the drawing board until the 1980s.

Compact disc

The first analogue records made from digital recordings came out in 1978 but it was evident that vinyl was limiting the original recording. Russel’s ideas and patents were licensed to Sony and Philips in 1982, who thus came up with the first product for home reproduction of digital music, the compact disc. Licenses for this product were freely distributed on the market with strict rules for their application, avoiding any problem of compatibility between digital supports and players. The sound was sampled at 44,000 hertz and 16 bits, theoretically offering musical perfection although in actual fact this was not the case due to the limits of the digital to analogue converters (DAC) in the first players on the market.


The ISO (International Organization for Standardization) is the worldwide federation of 140 national institutes with the task of ratifying and applying standards. Documents issued by the ISO enable uniformity in products, information and collaboration between people and companies all over the world.
In 1987, the ISO created a working group leading to the ratification of a standard for the compression of images, known as JPEG. The success of this persuaded the ISO in January 1988 to set up another working group to develop a standard to memorize and reproduce moving images, sound and the combination of the two. The Moving Picture Experts Group began a series of conferences with the involvement of many national research laboratories, universities and several companies. The leading player was and is the research laboratory of Telecom Italia (CSELT, today Telecom Lab) in Turin, under Leonardo Chiariglione.
MPEG issues documents referring to the various annual meetings and puts them together in macro collections. The first of these was called MPEG-1.

MPEG-1 (ISO/IEC 11172)

In July 1989, the document MPEG-1 was issued, a set of techniques to compress and synchronize sounds and video. The initial aim was to store and read sound videos on the most commonly available digital support, i.e. the compact disc.
MPEG-1 is divided into five levels. The first describes a solution to combine one or more video and sound streams to create a single, easily manipulated data stream for transmission and storage. The second level describes the compression of a video stream of 1.5 million bits per second (standard compact disc playing speed). The best techniques available are used, such as prediction of the variations of the next images and elimination of the non-significant sections of an image. One of the most famous commercial uses of this was Video CD. The third level describes the compression of an audio sequence. The fourth level specifies the tests that can be created to verify whether MPEG documents and converters are compatible with the specifications described in the first three levels. It is used by anyone developing a solution based on the MPEG-1 standard.
The fifth level describes a program able to convert documents based on the MPEG-1 standard.


Fraunhofer-Gesellschaft is a German organization involving 56 research institutes and 11.000 researchers from all over the country in projects financed by private and government companies. Research grants account for two thirds of current expenditure while the remainder is paid by the German government and the federal states. Fraunhofer began work on the problems of the perception of sound in 1987 as part of the European Community project Eureka EU147 in collaboration with the university of Erlangen. The result was presented at the MPEG conference and accepted as standard as part of the MPEG-1 project. ISO-MPEG Audio Layer 3 (IS 11172-3 e IS 13818-3), otherwise known as MP3, uses the most sophisticated knowledge on the perception of sounds to simplify them without compromising their quality when listened to. The result is that a sound piece can be compressed 11 times without most people being able to detect it. MP3 avoided reduction in audio quality where it had once been necessary due to lack of space or speed of connection. Radios were the first to benefit from this new technique. They could now create high quality connections with no need for expensive radio bridges but simply using ISDN lines. Transmission of a music CD, needing 1.4 million bits a second, a speed that can be reached only by very expensive data lines, can now be compressed at 128,000 bits a second, which can be reached with a digital telephone line (ISDN), already available throughout Europe. The MPEG standard describes the format of the file already compressed with no comment on the procedure to follow for its creation. This has enabled many companies, Fraunhofer being the most important, to register many of the techniques used by the main MP3 conversion programs, the manufacturers of which have to pay the patent holders for their licenses. In the mid 1990s, there spread via Internet a free converter for converting audio files extracted from CDs to MP3 format and subsequently playing them using the sound card of a computer. This program gave rise to the popular phenomenon of MP3.

MPEG-2 (ISO/IEC 13818)

The ISO entrusted the MPEG working group with drawing up a complete standard for compression of images and sounds able to handle the most sophisticated technologies. The old standard did not give the compressed document much scope, the images and sounds could only be compressed into a few preset formats. The new MPEG-2 would be able to leave the choice of image size and level of compression up to the user. Over the years, ten levels were developed, each specializing in one aspect of the standard. Sound was described in level three, a standard that improved and expanded the MP3 standard while maintaining compatibility. The multi-channel concept was introduced, which could be used to carry cinematographic sound, i.e. Dolby Surround. In 1997, level seven was introduced (AAC, Advanced Audio Coding), redefining sound compression without the restrictions imposed by compatibility with the old MPEG-1. New compression techniques improved final quality by as much as 30%. The evident advantage of these new techniques was immediately welcomed by media such as digital television and the new DVD, but they did not have the anticipated success with MP3 users who considered more interesting the compatibility with the enormous archive of songs already present on Internet.

MPEG-4 (ISO/IEC 14496)

MPEG-4 was completed in 1998 but became a standard only in 2000. The independence from the carrier, the easy re-use of the compressed content, the integration with Internet and greater controllability of document distribution were the thrust behind the new standard. This standard set no limits on the format and quality of the compressed document, now able to adapt to the potential reception and representation of the receiver.
MPEG-4 has become successful in several implementations, such as DIVx, a video compression standard very popular on Internet, and Microsoft Media Player, a software included in the latest versions of Microsoft Windows able to provide effective representation of both video and sound files.

MPEG-7 (ISO/IEC 15938)

The huge amount of multimedia information present on Internet has shown the problem of cataloguing and searching it in a similar manner to written documents.
MPEG-7 (Multimedia Content Description Interface) describes a method to document the content of a multimedia file, created with the methods MPEG-2 or MPEG-4, and make them accessible to search programs.

MPEG-21 (ISO/IEC 18034)

In recent years, the spread of new technologies for distributing multimedia contents has created challenges which the ISO has met with a new standard. The distinction between text, music, video, images and other contents is increasingly blurred and the problems of their representation depending on the instrument and means of transmission employed by the user are unmanageable with the tools currently available.
MPEG-21 defines an open platform for distributing and using multimedia documents. The concept is introduced of Digital unit and of User. Rules are established enabling the exchange, access, consumption, trade and manipulation of digital content. The standard does not differentiate between the distributor and the user: both are users.
MPEG-21 will perhaps define the way in which digital information will be used in the near future.


Torino, 25, 11, 2002
©2002 Massimo Ferronato, [epidemiC]