Golden Ears Are Expensive
Massimo Ferronato [epidemiC]
The idea of preserving sounds and music
to be able to listen to them again
is as old as the human race.
Sound
Sound is a vibration transmitted by a material and is carried in waves like the sea. Our ear 
can pick up these vibrations normally transmitted by the air and turn them into impulses which 
the brain interprets. A vibration consists of waves traveling in the material, the number of 
waves present each second is called frequency and their width expresses the energy of the sound, 
i.e. volume. A frequency with a high number of waves is interpreted by the ear as a treble sound, 
like a violin, and a frequency with fewer waves as a lower sound, like a double-bass.
Edison’s phonograph
Sound recording was patented in 1877 with the advent of Edison’s phonograph. A wax cylinder 
with constant rotation was engraved by a nail guided by the energy of the sound waves of the 
surrounding environment which were picked up by a trumpet. The groove thus created reproduced 
both the frequency and the width of the sound waves. Once the cylinder had cooled and hardened, 
the nail could trace the groove, vibrating with the same frequency and intensity as the size of 
the groove, thus creating a sound wave. The sound thus produced was amplified by the trumpet of 
the phonograph which enabled it to be heard.
Analogue sound recording
Edison’s phonograph was an analogue sound recorder, taking a vibration of the air, transferring 
it to a nail and making it vibrate to create a groove with a form which is an analogue of the 
sound, i.e. the same frequency and same width. The copy is the equivalent of the original sound, 
this is the characteristic of analogue recording. It is theoretically perfect because of that 
characteristic, but of course the real world is far from perfect. The materials used to make the 
copy are limited by their physical characteristics and vary the original vibration. These errors 
or distortions do not enable correct recording and reproduction of the original sound.
Telephony
The advent of telephony at the end of the 1800s revealed the limits imposed by recording, 
reproducing and transmitting sound with analogue technology and these limits were the same 
as those encountered with radio transmission.
The voice is recorded by a microphone, transformed into an analogue electric signal which then 
begins traveling through copper cables, processing centers and an infinite number of joints. 
Each of these elements introduces its own distortion which increases with distance and the 
number of items the signal passes through. Meaning that the original sound will be different 
from what is heard by the receiver of the telephone call.
Morse
The telegraph was introduced in the mid 1800s and did not transmit the voice but a text 
transformed into electric signals in accordance with a table known as morse code. Each 
character is represented by a sequence of two impulses (dot/dash)which can be easily 
transmitted over an electric network or via radio, unlike an analogue signal which, 
with its infinite subtle differences, is subject to all forms of distortion. Morse enables 
faithful transmission of the original text with optimized use of resources.
Sound sampling
In the first half of the 20th century, the morse and analogue systems continued to be used 
in parallel. The immediacy of the use of the voice favored the telephone and radio but the 
safety and low cost of morse enabled it to remain popular for long-distance transmissions. 
People began to wonder how an analogue signal could be transformed into a sequence of impulses 
which, like morse code, would be easy to transmit and less subject to distortion. Pioneers like 
E.T. Whitaker, P.M. Rainey, Harold Nyquist and Claude Shannon established the foundations for 
sampling a signal.
Sampling
A sound can be converted into an analogue electric signal, with frequency and width equivalent 
to the original. Sampling measures the width of this signal and records the value in numerical 
form, repeating the operation for the entire duration of the signal at regular intervals and thus 
creating a matrix which numerically represents the shape of the electric signal. The matrix enables 
the inverse procedure and hence the reconstruction of the original signal. Its fidelity depends on 
the precision and frequency of the measurements. The matrix is the digital (numerical) representation 
of the signal. Digital sound.
Advantages of sampling
Transforming a sound into numbers involves several advantages. The information is stable and 
no matter how many copies are made or how great the distance it is sent, it will always remain 
the same. There may be a distortion in the means carrying it but the correction systems possible 
ensure that the information will be preserved because, unlike an analogue signal, the support is 
not the information. An easy analogy is an old picture book in which the images have probably been 
ruined by time and so the information contained therein is irremediably distorted. Yet despite the 
characters fading, the text perfectly preserves the message with no distortion of its meaning.
Problems of sampling
Transforming an analogue signal into a sequence of numbers has two limits. The first is related 
to the sampling frequency which must be at least double the maximum frequency of the original 
signal (Nyquist theorem). A signal with a frequency of 10,000 hertz (vibrations per 
second), must be measured at least 20.000 times a second. The second is the error introduced 
by the precision used in saving the value of the width of the signal. Using a finite number 
of possible values of the signal, it is impossible to represent the infinite values of the 
original analogue signal. The maximum difference possible between the original and the memorized 
value is called quantization error. It is inseparable from the sampling process but is reduced 
with an increase in the number of levels used to represent the signal.
Sound sampling
Humans can hear frequencies between 20 and 20.000 hertz so a sound simply needs to be sampled 
at a minimum frequency of 40.000 hertz for no information to be lost. Our ear also has difficulty 
recognizing minimal differences in the volume of sound and accepts as the absolute equivalent of 
the original a signal sampled with a range of a million values (or 20 bits). Most people notice 
no difference using a range of 65.000 values (two bytes or 16 bits), which is the sized used by 
ordinary compact discs.
Digital telephony
With the arrival of computers and digital electronics in the 1950s, the telephone companies 
studied the possibility of using digital voice transmission to improve transmission quality 
and use existent telephone lines more efficiently. In 1954, EPSCO produced the first valve 
circuit able to convert an analogue signal to digital,(ADC analogue to digital converter). 
In 1962, the first commercial digital voice transmission system was introduced, known as T1. 
Transmission used a technology called PCM(pulse code modulation)with 8,000 measurements of a 
conversation per second with a precision of one byte, hence a maximum of 256 levels recognized 
and resulting traffic of 64.000 bits per second (64Kbps). Each T1 could  carry 24 voice channels.
Sound compression
Every written document has a certain degree of redundancy which increases its size. Redundancy 
makes reading and interpretation easier. It can be reduced by using acronyms or abreviations but 
these make reading less easy. These rules can be applied if they are known by those reading the 
document. Documents in digital format may also be redundant. Over the past 50 years, mathematicians 
have produced methods of compression able to reduce their size. There are two categories: the first 
preserve the original information of the document; the second create an approximation of it and 
are used to compress images and sounds which are better able to tolerate a slight distortion 
as a tradeoff for greater compactness (lossless and lossy compression).
Lossy compression
Humans recognize a limited range of sounds both in frequency and width and possess selection 
mechanisms which exclude some sounds when others are present. For example, a sound at low volume 
produced when there is also a sound of greater width and at a similar frequency is not perceived 
by the brain. This is known as masking. These imperfections are at the basis of theories on the 
compression of sound. Unperceived frequencies thus have no influence on hearing and are eliminated 
from the original digital signal which can be subjected to the inverse procedure and so create a 
sound similar to the original which will be perceived as a good approximation of it.
Compression in telephony
The spread of digital telephony as of the 1970s persuaded telephone companies to make enormous 
investments in sound compression technologies which promised more efficient use of existent 
lines and hence a considerable reduction in the cost of a single conversation.
The PCM protocol already enabled good compression of the voice via a sampling frequency of 
8.000 hertz and quantization of one byte. This was acceptable for the human voice, although 
not for music reproduction. PCM produced ten times less traffic than 16 bit sampling at 40.000 
hertz. Mathematicians began to study the behavior of the human ear and to understand how to 
compress PCM information further. In the 1960s and 1970s, compression techniques such as ADM 
(Adaptive Delta Modulation), ADPCM(Adaptive Delta Pulse-Code Modulation) and others were 
developed. Sampling techniques were introduced which took account of the characteristics 
of the human ear, more sensitive to variations in width at low volumes, in order to reduce 
quantization error ("a-law" in Europe and "µ-law" in America).
Standardization of compression
Compression technologies were patented and international organizations began to produce 
documentation ensuring the uniformity of the application of these technologies. The ITU-T 
(International Telecommunication Union) is the organization controlling the introduction 
and application of all telecommunication technologies.
Sampled music
The arrival of low cost technologies at the end of the 1970s enabled the manufacture of 
instruments able to sample and reproduce music (samplers)and digital tape recorders of 
sufficient quality for use in a professional recording studio.
Thomas Greenway Stockham was a trailblazer. In 1962 at the MIT (Massachusetts Institute 
of Technology), he produced a prototype of a digital recorder but only in 1976 did he manage 
to manufacture and sell it with his company Soundstream Inc. In the mid 1970s, he also introduced 
the technology of audio editing by computer and storage of sound on hard disk.
In 1965, James T. Russell patented a system to read a sequence of sampled music recorded on a disc 
via a laser. The system remained on the drawing board until the 1980s.
Compact disc
The first analogue records made from digital recordings came out in 1978 but it was evident that 
vinyl was limiting the original recording. Russel’s ideas and patents were licensed to Sony
and Philips in 1982, who thus came up with the first product for home reproduction of digital 
music, the compact disc. Licenses for this product were freely distributed on the market with 
strict rules for their application, avoiding any problem of compatibility between digital 
supports and players. The sound was sampled at 44,000 hertz and 16 bits, theoretically 
offering musical perfection although in actual fact this was not the case due to the limits 
of the digital to analogue converters (DAC) in the first players on the market.
MPEG
The ISO (International Organization for Standardization) is the worldwide federation of 140 
national institutes with the task of ratifying and applying standards. Documents issued by 
the ISO enable uniformity in products, information and collaboration between people and companies 
all over the world.
In 1987, the ISO created a working group leading to the ratification of a standard for the compression 
of images, known as JPEG. The success of this persuaded the ISO in January 1988 to set up another 
working group to develop a standard to memorize and reproduce moving images, sound and the combination 
of the two. The Moving Picture Experts Group began a series of conferences with the involvement of 
many national research laboratories, universities and several companies. The leading player was 
and is the research laboratory of Telecom Italia (CSELT, today Telecom Lab) in Turin, under 
Leonardo Chiariglione.
MPEG issues documents referring to the various annual meetings and puts them together in macro 
collections. The first of these was called MPEG-1.
MPEG-1 (ISO/IEC 11172)
In July 1989, the document MPEG-1 was issued, a set of techniques to compress and synchronize 
sounds and video. The initial aim was to store and read sound videos on the most commonly 
available digital support, i.e. the compact disc.
MPEG-1 is divided into five levels. The first describes a solution to combine one or more 
video and sound streams to create a single, easily manipulated data stream for transmission 
and storage. The second level describes the compression of a video stream of 1.5 million bits 
per second (standard compact disc playing speed). The best techniques available are used, such 
as prediction of the variations of the next images and elimination of the non-significant 
sections of an image. One of the most famous commercial uses of this was Video CD. The third 
level describes the compression of an audio sequence. The fourth level specifies the tests 
that can be created to verify whether MPEG documents and converters are compatible with the 
specifications described in the first three levels. It is used by anyone developing a solution 
based on the MPEG-1 standard.
The fifth level describes a program able to convert documents based on the MPEG-1 standard.
MP3 (MPEG-1 LAYER 3)
Fraunhofer-Gesellschaft is a German organization involving 56 research institutes and 
11.000 researchers from all over the country in projects financed by private and government 
companies. Research grants account for two thirds of current expenditure while the remainder 
is paid by the German government and the federal states. Fraunhofer began work on the problems 
of the perception of sound in 1987 as part of the European Community project Eureka EU147 in 
collaboration with the university of Erlangen. The result was presented at the MPEG conference 
and accepted as standard as part of the MPEG-1 project. ISO-MPEG Audio Layer 3 
(IS 11172-3 e IS 13818-3), otherwise known as MP3, uses the most sophisticated 
knowledge on the perception of sounds to simplify them without compromising their 
quality when listened to. The result is that a sound piece can be compressed 11 times 
without most people being able to detect it. MP3 avoided reduction in audio quality 
where it had once been necessary due to lack of space or speed of connection. Radios 
were the first to benefit from this new technique. They could now create high quality 
connections with no need for expensive radio bridges but simply using ISDN lines. 
Transmission of a music CD, needing 1.4 million bits a second, a speed that can be 
reached only by very expensive data lines, can now be compressed at 128,000 bits a second, 
which can be reached with a digital telephone line (ISDN), already available throughout Europe. 
The MPEG standard describes the format of the file already compressed with no comment on the 
procedure to follow for its creation. This has enabled many companies, Fraunhofer being the most 
important, to register many of the techniques used by the main MP3 conversion programs, the 
manufacturers of which have to pay the patent holders for their licenses. In the mid 1990s, 
there spread via Internet a free converter for converting audio files extracted from CDs to 
MP3 format and subsequently playing them using the sound card of a computer. This program gave 
rise to the popular phenomenon of MP3.
MPEG-2 (ISO/IEC 13818)
The ISO entrusted the MPEG working group with drawing up a complete standard for compression 
of images and sounds able to handle the most sophisticated technologies. The old standard did 
not give the compressed document much scope, the images and sounds could only be compressed into 
a few preset formats. The new MPEG-2 would be able to leave the choice of image size and level of 
compression up to the user. Over the years, ten levels were developed, each specializing in one 
aspect of the standard. Sound was described in level three, a standard that improved and expanded 
the MP3 standard while maintaining compatibility. The multi-channel concept was introduced, which 
could be used to carry cinematographic sound, i.e. Dolby Surround. In 1997, level seven was 
introduced (AAC, Advanced Audio Coding), redefining sound compression without the restrictions 
imposed by compatibility with the old MPEG-1. New compression techniques improved final quality 
by as much as 30%. The evident advantage of these new techniques was immediately welcomed by media 
such as digital television and the new DVD, but they did not have the anticipated success with MP3 
users who considered more interesting the compatibility with the enormous archive of songs already 
present on Internet.
MPEG-4 (ISO/IEC 14496)
MPEG-4 was completed in 1998 but became a standard only in 2000. The independence from the carrier, 
the easy re-use of the compressed content, the integration with Internet and greater controllability 
of document distribution were the thrust behind the new standard. This standard set no limits on the 
format and quality of the compressed document, now able to adapt to the potential reception and 
representation of the receiver.
MPEG-4 has become successful in several implementations, such as DIVx, a video compression standard 
very popular on Internet, and Microsoft Media Player, a software included in the latest versions of 
Microsoft Windows able to provide effective representation of both video and sound files.
MPEG-7 (ISO/IEC 15938)
The huge amount of multimedia information present on Internet has shown the problem of 
cataloguing and searching it in a similar manner to written documents.
MPEG-7 (Multimedia Content Description Interface) describes a method to document the content 
of a multimedia file, created with the methods MPEG-2 or MPEG-4, and make them accessible to 
search programs.
MPEG-21 (ISO/IEC 18034)
In recent years, the spread of new technologies for distributing multimedia contents has created 
challenges which the ISO has met with a new standard. The distinction between text, music, video, 
images and other contents is increasingly blurred and the problems of their representation depending 
on the instrument and means of transmission employed by the user are unmanageable with the tools 
currently available.
MPEG-21 defines an open platform for distributing and using multimedia documents. The concept is 
introduced of Digital unit  and of User. Rules are established enabling the exchange, access, 
consumption, trade and manipulation of digital content. The standard does not differentiate 
between the distributor and the user: both are users.
MPEG-21 will perhaps define the way in which digital information will be used in the near 
future.
Links: 
http://mpeg.telecomitalialab.com/standards
Torino, 25, 11, 2002
©2002 Massimo Ferronato, [epidemiC]
Email: massimo@ferrona.to