Golden Ears Are Expensive
Massimo Ferronato [epidemiC]
The idea of preserving sounds and music
to be able to listen to them again
is as old as the human race.
Sound is a vibration transmitted by a material and is carried in waves like the sea. Our ear
can pick up these vibrations normally transmitted by the air and turn them into impulses which
the brain interprets. A vibration consists of waves traveling in the material, the number of
waves present each second is called frequency and their width expresses the energy of the sound,
i.e. volume. A frequency with a high number of waves is interpreted by the ear as a treble sound,
like a violin, and a frequency with fewer waves as a lower sound, like a double-bass.
Sound recording was patented in 1877 with the advent of Edison’s phonograph. A wax cylinder
with constant rotation was engraved by a nail guided by the energy of the sound waves of the
surrounding environment which were picked up by a trumpet. The groove thus created reproduced
both the frequency and the width of the sound waves. Once the cylinder had cooled and hardened,
the nail could trace the groove, vibrating with the same frequency and intensity as the size of
the groove, thus creating a sound wave. The sound thus produced was amplified by the trumpet of
the phonograph which enabled it to be heard.
Analogue sound recording
Edison’s phonograph was an analogue sound recorder, taking a vibration of the air, transferring
it to a nail and making it vibrate to create a groove with a form which is an analogue of the
sound, i.e. the same frequency and same width. The copy is the equivalent of the original sound,
this is the characteristic of analogue recording. It is theoretically perfect because of that
characteristic, but of course the real world is far from perfect. The materials used to make the
copy are limited by their physical characteristics and vary the original vibration. These errors
or distortions do not enable correct recording and reproduction of the original sound.
The advent of telephony at the end of the 1800s revealed the limits imposed by recording,
reproducing and transmitting sound with analogue technology and these limits were the same
as those encountered with radio transmission.
The voice is recorded by a microphone, transformed into an analogue electric signal which then
begins traveling through copper cables, processing centers and an infinite number of joints.
Each of these elements introduces its own distortion which increases with distance and the
number of items the signal passes through. Meaning that the original sound will be different
from what is heard by the receiver of the telephone call.
The telegraph was introduced in the mid 1800s and did not transmit the voice but a text
transformed into electric signals in accordance with a table known as morse code. Each
character is represented by a sequence of two impulses (dot/dash)which can be easily
transmitted over an electric network or via radio, unlike an analogue signal which,
with its infinite subtle differences, is subject to all forms of distortion. Morse enables
faithful transmission of the original text with optimized use of resources.
In the first half of the 20th century, the morse and analogue systems continued to be used
in parallel. The immediacy of the use of the voice favored the telephone and radio but the
safety and low cost of morse enabled it to remain popular for long-distance transmissions.
People began to wonder how an analogue signal could be transformed into a sequence of impulses
which, like morse code, would be easy to transmit and less subject to distortion. Pioneers like
E.T. Whitaker, P.M. Rainey, Harold Nyquist and Claude Shannon established the foundations for
sampling a signal.
A sound can be converted into an analogue electric signal, with frequency and width equivalent
to the original. Sampling measures the width of this signal and records the value in numerical
form, repeating the operation for the entire duration of the signal at regular intervals and thus
creating a matrix which numerically represents the shape of the electric signal. The matrix enables
the inverse procedure and hence the reconstruction of the original signal. Its fidelity depends on
the precision and frequency of the measurements. The matrix is the digital (numerical) representation
of the signal. Digital sound.
Advantages of sampling
Transforming a sound into numbers involves several advantages. The information is stable and
no matter how many copies are made or how great the distance it is sent, it will always remain
the same. There may be a distortion in the means carrying it but the correction systems possible
ensure that the information will be preserved because, unlike an analogue signal, the support is
not the information. An easy analogy is an old picture book in which the images have probably been
ruined by time and so the information contained therein is irremediably distorted. Yet despite the
characters fading, the text perfectly preserves the message with no distortion of its meaning.
Problems of sampling
Transforming an analogue signal into a sequence of numbers has two limits. The first is related
to the sampling frequency which must be at least double the maximum frequency of the original
signal (Nyquist theorem). A signal with a frequency of 10,000 hertz (vibrations per
second), must be measured at least 20.000 times a second. The second is the error introduced
by the precision used in saving the value of the width of the signal. Using a finite number
of possible values of the signal, it is impossible to represent the infinite values of the
original analogue signal. The maximum difference possible between the original and the memorized
value is called quantization error. It is inseparable from the sampling process but is reduced
with an increase in the number of levels used to represent the signal.
Humans can hear frequencies between 20 and 20.000 hertz so a sound simply needs to be sampled
at a minimum frequency of 40.000 hertz for no information to be lost. Our ear also has difficulty
recognizing minimal differences in the volume of sound and accepts as the absolute equivalent of
the original a signal sampled with a range of a million values (or 20 bits). Most people notice
no difference using a range of 65.000 values (two bytes or 16 bits), which is the sized used by
ordinary compact discs.
With the arrival of computers and digital electronics in the 1950s, the telephone companies
studied the possibility of using digital voice transmission to improve transmission quality
and use existent telephone lines more efficiently. In 1954, EPSCO produced the first valve
circuit able to convert an analogue signal to digital,(ADC analogue to digital converter).
In 1962, the first commercial digital voice transmission system was introduced, known as T1.
Transmission used a technology called PCM(pulse code modulation)with 8,000 measurements of a
conversation per second with a precision of one byte, hence a maximum of 256 levels recognized
and resulting traffic of 64.000 bits per second (64Kbps). Each T1 could carry 24 voice channels.
Every written document has a certain degree of redundancy which increases its size. Redundancy
makes reading and interpretation easier. It can be reduced by using acronyms or abreviations but
these make reading less easy. These rules can be applied if they are known by those reading the
document. Documents in digital format may also be redundant. Over the past 50 years, mathematicians
have produced methods of compression able to reduce their size. There are two categories: the first
preserve the original information of the document; the second create an approximation of it and
are used to compress images and sounds which are better able to tolerate a slight distortion
as a tradeoff for greater compactness (lossless and lossy compression).
Humans recognize a limited range of sounds both in frequency and width and possess selection
mechanisms which exclude some sounds when others are present. For example, a sound at low volume
produced when there is also a sound of greater width and at a similar frequency is not perceived
by the brain. This is known as masking. These imperfections are at the basis of theories on the
compression of sound. Unperceived frequencies thus have no influence on hearing and are eliminated
from the original digital signal which can be subjected to the inverse procedure and so create a
sound similar to the original which will be perceived as a good approximation of it.
Compression in telephony
The spread of digital telephony as of the 1970s persuaded telephone companies to make enormous
investments in sound compression technologies which promised more efficient use of existent
lines and hence a considerable reduction in the cost of a single conversation.
The PCM protocol already enabled good compression of the voice via a sampling frequency of
8.000 hertz and quantization of one byte. This was acceptable for the human voice, although
not for music reproduction. PCM produced ten times less traffic than 16 bit sampling at 40.000
hertz. Mathematicians began to study the behavior of the human ear and to understand how to
compress PCM information further. In the 1960s and 1970s, compression techniques such as ADM
(Adaptive Delta Modulation), ADPCM(Adaptive Delta Pulse-Code Modulation) and others were
developed. Sampling techniques were introduced which took account of the characteristics
of the human ear, more sensitive to variations in width at low volumes, in order to reduce
quantization error ("a-law" in Europe and "µ-law" in America).
Standardization of compression
Compression technologies were patented and international organizations began to produce
documentation ensuring the uniformity of the application of these technologies. The ITU-T
(International Telecommunication Union) is the organization controlling the introduction
and application of all telecommunication technologies.
The arrival of low cost technologies at the end of the 1970s enabled the manufacture of
instruments able to sample and reproduce music (samplers)and digital tape recorders of
sufficient quality for use in a professional recording studio.
Thomas Greenway Stockham was a trailblazer. In 1962 at the MIT (Massachusetts Institute
of Technology), he produced a prototype of a digital recorder but only in 1976 did he manage
to manufacture and sell it with his company Soundstream Inc. In the mid 1970s, he also introduced
the technology of audio editing by computer and storage of sound on hard disk.
In 1965, James T. Russell patented a system to read a sequence of sampled music recorded on a disc
via a laser. The system remained on the drawing board until the 1980s.
The first analogue records made from digital recordings came out in 1978 but it was evident that
vinyl was limiting the original recording. Russel’s ideas and patents were licensed to Sony
and Philips in 1982, who thus came up with the first product for home reproduction of digital
music, the compact disc. Licenses for this product were freely distributed on the market with
strict rules for their application, avoiding any problem of compatibility between digital
supports and players. The sound was sampled at 44,000 hertz and 16 bits, theoretically
offering musical perfection although in actual fact this was not the case due to the limits
of the digital to analogue converters (DAC) in the first players on the market.
The ISO (International Organization for Standardization) is the worldwide federation of 140
national institutes with the task of ratifying and applying standards. Documents issued by
the ISO enable uniformity in products, information and collaboration between people and companies
all over the world.
In 1987, the ISO created a working group leading to the ratification of a standard for the compression
of images, known as JPEG. The success of this persuaded the ISO in January 1988 to set up another
working group to develop a standard to memorize and reproduce moving images, sound and the combination
of the two. The Moving Picture Experts Group began a series of conferences with the involvement of
many national research laboratories, universities and several companies. The leading player was
and is the research laboratory of Telecom Italia (CSELT, today Telecom Lab) in Turin, under
MPEG issues documents referring to the various annual meetings and puts them together in macro
collections. The first of these was called MPEG-1.
MPEG-1 (ISO/IEC 11172)
In July 1989, the document MPEG-1 was issued, a set of techniques to compress and synchronize
sounds and video. The initial aim was to store and read sound videos on the most commonly
available digital support, i.e. the compact disc.
MPEG-1 is divided into five levels. The first describes a solution to combine one or more
video and sound streams to create a single, easily manipulated data stream for transmission
and storage. The second level describes the compression of a video stream of 1.5 million bits
per second (standard compact disc playing speed). The best techniques available are used, such
as prediction of the variations of the next images and elimination of the non-significant
sections of an image. One of the most famous commercial uses of this was Video CD. The third
level describes the compression of an audio sequence. The fourth level specifies the tests
that can be created to verify whether MPEG documents and converters are compatible with the
specifications described in the first three levels. It is used by anyone developing a solution
based on the MPEG-1 standard.
The fifth level describes a program able to convert documents based on the MPEG-1 standard.
MP3 (MPEG-1 LAYER 3)
Fraunhofer-Gesellschaft is a German organization involving 56 research institutes and
11.000 researchers from all over the country in projects financed by private and government
companies. Research grants account for two thirds of current expenditure while the remainder
is paid by the German government and the federal states. Fraunhofer began work on the problems
of the perception of sound in 1987 as part of the European Community project Eureka EU147 in
collaboration with the university of Erlangen. The result was presented at the MPEG conference
and accepted as standard as part of the MPEG-1 project. ISO-MPEG Audio Layer 3
(IS 11172-3 e IS 13818-3), otherwise known as MP3, uses the most sophisticated
knowledge on the perception of sounds to simplify them without compromising their
quality when listened to. The result is that a sound piece can be compressed 11 times
without most people being able to detect it. MP3 avoided reduction in audio quality
where it had once been necessary due to lack of space or speed of connection. Radios
were the first to benefit from this new technique. They could now create high quality
connections with no need for expensive radio bridges but simply using ISDN lines.
Transmission of a music CD, needing 1.4 million bits a second, a speed that can be
reached only by very expensive data lines, can now be compressed at 128,000 bits a second,
which can be reached with a digital telephone line (ISDN), already available throughout Europe.
The MPEG standard describes the format of the file already compressed with no comment on the
procedure to follow for its creation. This has enabled many companies, Fraunhofer being the most
important, to register many of the techniques used by the main MP3 conversion programs, the
manufacturers of which have to pay the patent holders for their licenses. In the mid 1990s,
there spread via Internet a free converter for converting audio files extracted from CDs to
MP3 format and subsequently playing them using the sound card of a computer. This program gave
rise to the popular phenomenon of MP3.
MPEG-2 (ISO/IEC 13818)
The ISO entrusted the MPEG working group with drawing up a complete standard for compression
of images and sounds able to handle the most sophisticated technologies. The old standard did
not give the compressed document much scope, the images and sounds could only be compressed into
a few preset formats. The new MPEG-2 would be able to leave the choice of image size and level of
compression up to the user. Over the years, ten levels were developed, each specializing in one
aspect of the standard. Sound was described in level three, a standard that improved and expanded
the MP3 standard while maintaining compatibility. The multi-channel concept was introduced, which
could be used to carry cinematographic sound, i.e. Dolby Surround. In 1997, level seven was
introduced (AAC, Advanced Audio Coding), redefining sound compression without the restrictions
imposed by compatibility with the old MPEG-1. New compression techniques improved final quality
by as much as 30%. The evident advantage of these new techniques was immediately welcomed by media
such as digital television and the new DVD, but they did not have the anticipated success with MP3
users who considered more interesting the compatibility with the enormous archive of songs already
present on Internet.
MPEG-4 (ISO/IEC 14496)
MPEG-4 was completed in 1998 but became a standard only in 2000. The independence from the carrier,
the easy re-use of the compressed content, the integration with Internet and greater controllability
of document distribution were the thrust behind the new standard. This standard set no limits on the
format and quality of the compressed document, now able to adapt to the potential reception and
representation of the receiver.
MPEG-4 has become successful in several implementations, such as DIVx, a video compression standard
very popular on Internet, and Microsoft Media Player, a software included in the latest versions of
Microsoft Windows able to provide effective representation of both video and sound files.
MPEG-7 (ISO/IEC 15938)
The huge amount of multimedia information present on Internet has shown the problem of
cataloguing and searching it in a similar manner to written documents.
MPEG-7 (Multimedia Content Description Interface) describes a method to document the content
of a multimedia file, created with the methods MPEG-2 or MPEG-4, and make them accessible to
MPEG-21 (ISO/IEC 18034)
In recent years, the spread of new technologies for distributing multimedia contents has created
challenges which the ISO has met with a new standard. The distinction between text, music, video,
images and other contents is increasingly blurred and the problems of their representation depending
on the instrument and means of transmission employed by the user are unmanageable with the tools
MPEG-21 defines an open platform for distributing and using multimedia documents. The concept is
introduced of Digital unit and of User. Rules are established enabling the exchange, access,
consumption, trade and manipulation of digital content. The standard does not differentiate
between the distributor and the user: both are users.
MPEG-21 will perhaps define the way in which digital information will be used in the near
Torino, 25, 11, 2002
©2002 Massimo Ferronato, [epidemiC]