STEREO: A MISUNDERSTANDING
THE
THEORY, SOUND-SYSTEMS, AND PROBLEMS OF HEARING
©1982 The Anstendig Institute
Revised 1984
There is a common misconception
that the addition of stereophonic sound-reproduction was the necessary, correct
step in perfecting monophonic recording. It is believed that, because we hear
with two ears, sound should be recorded with two microphones if it is to sound
natural. It is also believed that stereophony exists as a natural, scientific
phenomenon. Neither belief is correct. The attempt to reproduce the way sound
is heard by means of stereophonic sound reproduction is a misunderstanding that
is the result of a fault in logic. Since recording is a duplication of sounds, only
the sounds can be duplicated, not the
manner in which they are heard. The
introduction of stereophony and its universal acceptance has had the
unfortunate effect of slowing progress in the improvement of recorded sound
quality and keeping the general level of musical experience substantially below
that which is truly possible, both through recordings and in live performance.
Hearing is classically accepted
as the most important of the senses. Of all five senses, the effects of hearing
are the most powerful. It is humanity's chief means of becoming familiar with
and communicating emotions. Today, recordings are the means through which the
greater part of society is introduced to the vast scale of human experiences
that can be had through sound. It is important, therefore, that society take a
careful look at the universal use of stereophony in sound reproduction.
I. THE FLAWED LOGIC
The word "stereo" is
currently used as a blanket designation for all sound reproduction. This is a
misrepresentation. Stereo is only a means of achieving an effect of
directionality. In fact, it is only one of many ways directionality can be
sonically produced, and a very limited one at that. Stereo is limited to
producing only a frontal, horizontal plane, with no means of reproducing sounds
that come from above, below or behind, nor can it accurately reproduce depth.
(Impressions of depth are a combination of the arbitrary disposition of the
loudspeakers and the listener in relation to the listening room, which is
different in each situation. It is a form of auditory illusion, not an accurate
duplication of the depth of the recorded event.)
Most people have the impression
that the stereo signal is a complete entity that is made up of two incomplete
halves of a complete signal, each of which essentially contains only half of
the sounds. That is not true. When two microphones are used, each channel is a
single, complete monophonic signal documenting every bit of the particular
sound event, but each from a slightly different position (in theory, only about
as far apart as our two ears, i,e., the width of a human head).
It is important to understand
that there are no stereo sound sources. From
any given position in space, all sound sources are monophonic.1 In
live sound as well as sound reproduction, the effects that produce the
impression of dimension and direction take place within the listener and not in
the sound source(s). The stereophonic signal does not, in itself, include the
spatial, stereophonic effect. It only includes two mono signals, which would
produce no effect of spatial dimension if they were played by themselves,
played through two separate speakers standing next to each other, or electrically
combined and fed through one speaker (played back monophonically). The spatial
effect only occurs through separation of the two signals in space in relation
to the listener, and that effect changes in relation to any change in position
of the two speakers and the listener. Live sounds may occur at various
distances and in various directions in relation to the hearer, but each one is
always a separate monophonic sound whether the source is stationary or moving.
Sounds are only directional in relation to the hearer. They are given
directionality during the act of hearing, which occurs after the sounds
are produced or reproduced and therefore has nothing to do with the manner in
which the sounds are produced.
Stereo is based on the premise
that, because sound is heard with two ears, the correct way to reproduce sound
is to simulate the way it is heard, i.e., by recording two separate signals,
using two microphones separated by a distance equivalent to the width of a
human head. That is a misunderstanding of the realities. Stereophony as thus
defined is an attempt to reproduce the way sound is heard. This is illogical
and impossible. Human hearing could never be duplicated in the recording
process because hearing consists of more than just two ears. The shape of the
ears plays a role in distinguishing the direction of sounds, and the rest of
the body also plays a role in the hearing and experiencing of sound. None of
these aspects of hearing can be duplicated by microphones.
The hearing experience takes place
only in the hearer and only after the sound has been reproduced by the
sound system. This phenomenon is incidental to and completely separate from
both the production of the sounds and the characteristics of the sounds. What
comes out of the speaker is a duplication (more accurately, an approximation)
of the sound as it was produced by the source and colored by the space in which
it was produced. It is not, nor can it ever be, a duplication of the hearing
process. In fact, the shape of the sound source and the materials of which it
is made determine the characteristics of a sound. Any recording, whether in
stereo, quad, or any other mode, can only duplicate the sound as
produced by the sound source, not as heard by a listener. The characteristics
of the sound source and of the sound itself are what must determine the
technical means used to record it. How a sound happens to be heard is
completely incidental to and has no bearing on the production or accurate
reproduction of that sound.
Stereophony should play no role
in considerations regarding sound quality in the construction or evaluation of
components, even those meant for stereo reproduction. All auditioning and
evaluating of the accuracy of components, especially loudspeakers, whether by
the manufacturer or the buyer, should be done with a mono signal, with no
attempt to reproduce spatial effects. All aspects of the rest of the sound
system, except the depiction of space, should also be auditioned monophonically.
The only function of the electrical components of a sound-reproduction system
and the loudspeakers is to produce an acoustic signal that as closely as
possible resembles the electrical signal fed into it by the source. Nothing
more! In fact, it is impossible for a sound system to do anything more than
that. The signal itself does not, and cannot, include any effects, such as the
depiction of space. Those effects take place in the listener after the sounds
have already been reproduced monophonically. Technically the aim is to
reproduce two entirely monophonic signals as accurately as possible. The two
channels should be kept completely separate from each other, all the way
through the sound-system until they have been reproduced monophonically in
space by the loudspeakers.
The fact that the two signals
of stereo are cut in the same record groove and that most components have two
channels that share the same power supply and therefore have some interaction
is merely an economic compromise. If it were possible, each channel should be
absolutely independent from beginning to end. But that is generally impossible,
because, to perfectly synchronize the channels, the two signals have to be
combined somewhere. Either the two channels are combined in the record groove
or as parallel tracks on the same tape. On any systems that would be practical
for the end-user, some interaction of the signals is unavoidable either in the
signals on the record groove itself, in the needle as it traces the signals, in
cross-talk between the channels of the recorder, or in the sound system.
The problem of designing a
sound-system, including building a loudspeaker, is to arrive at the most
accurate possible reproduction of each separate signal that is fed through it,
whether that signal is a monophonic signal or one channel of a stereo
recording. Even if two or more signals are ultimately to be combined to produce
spatial effects, the only way to assure that each signal will be reproduced as
accurately as possible is to reproduce each signal as separately as possible. As
will be shown in Section V, the effects of combining two signals distract the
listener from the more important qualities of sound. Therefore, all system
development, testing, and evaluation of sound-quality should be done
monophonically, even if stereo is desired. Especially with loudspeakers, any
technical decisions of design, such as the size and shape of the speaker or how
the drivers are mounted in the speaker, should be arrived at only with the need
for accurate rendition of a single signal in mind. Practices such as mounting
the drivers unsymmetrically in stereo-pairs within the cabinet have nothing to
do with how accurately that speaker will reproduce sound and can, in fact,
compromise the sound-quality if the preferred position of the drivers for stereo
listening is not the ideal position for accurate reproduction of a single
signal.
III. IT IS IMPOSSIBLE TO KNOW
THE REAL SPATIAL RELATIONS OF A STEREO RECORDING
In order to know if a system's
reproduction of spatial relationships is accurate, one would have to know if
the reproduction matches those relationships exactly as they were at the
microphones during the recording. Since heads are differently shaped and no one
can be in exactly the same place as the microphones, the spatial effects of
direction, depth, etc., will be different for each person in the room. Even the
engineer, listening with speakers or earphones, who decides on the microphone
placement and mixes the signals to his liking, is only deciding subjectively
how he wants the impressions of space. The monitoring equipment has already
changed the spatial relationships and made them different from the spatial
relationships at the microphones. And those relationships in the monitoring
booth will be different from every other listening room.
An attempt to achieve precise
reproduction of the spatial dimensions of a sound event by means of stereo is
therefore doomed from the beginning. All that can be achieved is a particular
spatial effect that may be preferred by the particular listener but cannot lay
claim to being a reproduction of the original. Therefore, the prevalent
procedure of evaluating sound-quality on the basis of the reproduction of such
spatial effects as “soundstage", "imaging",
"dimensionality" (terms currently used in professional circles) or on
the basis of impressions of height, width, or depth are futile, since it can
never be known whether the reproduction matches the original. All that is
possible is to prefer a certain sound-system's reproduction of spatial
dimensions over that of another system, but it is not possible to know when the
reproduction corresponds to the original, even if the listener had been in the
room in which the sound originated.
The characteristics of sound
are so bound up with the size, shape, dimensions and materials of the source
that they can only be reproduced exactly by duplicating the entire original
physical situation. That would mean the same musicians in the same hall (or an
exact duplication of the hall), sitting in exactly the same positions, etc.,
which is an impossibility. Therefore, absolutely exact reproduction of the
spatial characteristics of a sound by another sound medium is impossible. It
certainly cannot be achieved by differently shaped objects of different
reflectivity, i.e., speakers, in a differently-shaped space of a different
reflectivity, i.e., the listening room. Thus, and definitively, any attempt at
reconstruction of the spatial characteristics of a sound source can only be a
flawed approximation, which the listener can never be sure is the way the
original sounded.
Furthermore, in stereo
reproduction, there is only one very small area, equidistant from the speakers,
within which the volume of the two separate channels is balanced. The
equalization, i.e., the loudness of the different frequencies (highs, lows,
middle, etc.) in relation to each other, can also be different in various parts
of a room. But a room's equalization can be compensated for during playback and
is a variable that has to be adjusted anyway for differing volume levels in
relation to an individual listener's hearing at the time of the playback.2
The one perfect area for the listener relative to the two stereo speakers is a
small area in the exact center between and in front of the speakers, which
extends only a small distance front to back. In any other positions, not only
are the stereo balances wrong, but part of the content is missing. Obviously,
for larger numbers of people (theater productions, movies, etc.), mono sound
reproduction is more accurate for the bulk of the audience; it is, in fact, the
only non-flawed possibility of reproducing the entire musical content.
IV. THE POINT OF ALL MUSIC IS
TO EXPRESS SOMETHING
The expressive content of
sounds is contained in the dynamic variations of the sounds. In fact, it is the
dynamic content of the sound. The presentation of the
dynamic subtleties is, therefore, the most
important problem of sound reproduction.3
Problems of instability in the sound, which can plague the stereo spatial
effect relative to the listener's position in the room, do not occur in the
dynamic content of the sounds, which remains the same throughout the room. No
matter how the balance of frequencies or stereo imaging may be changed, the
sounds retain their dynamic-expressive character relative to each other as they
flow in time.
Until the advent of stereo,
spatial relationships were unimportant, even undesirable in the bulk of the
world's music. In most classical music, the introduction of directional effects
in the sound-reproduction distracts the listener from the important factors
that actually contain the musical experience. The most important aspects of
sound, especially those of classical music, have nothing to do with spatial
effects and can be reproduced satisfactorily in mono.
A stereo signal introduces
extraneous "effects” that distract from the more important dynamic aspects
of music. Except for the pickup cartridge, stereo effects have nothing to do
with the quality of the system components. The reason is that, in the sonic
arts, spatial relationships are a very insignificant component of sound and are
particularly insignificant in music. In most classical music, they can be eliminated
without at all degrading the quality of the artistic experience.
The reason spatial effects
distract from the expressive qualities of music lies in the limitations of
human consciousness. Most people can only concentrate on one thing at a time, which,
in music, is usually the melodic line. Few can concentrate on two things at a
time. Since our consciousnesses are too limited to be simultaneously aware of
all the components of music, concentrating on spatial effects distracts from
the important aspects of music.
To understand why the
stereo-spatial aspects of music-reproduction have been accorded such
predominance, to the point of obscuring the truly important aspects of music,
one must know that the easiest-to-hear aspects of sound are the directions the
sounds are coming from. The most difficult-to-hear aspects are the subtle
expressive nuances.4 Many people cannot hear subtle expressive
nuances. Few are oriented towards listening for those nuances and practically
no one takes pains to be sure they are hearing them correctly. Furthermore,
long-playing record-playing equipment has, without exception, not as yet been
able to reproduce the finest nuances of records. The record-listening public
has not, therefore, experienced nuances as fine as they can be. It is taken for
granted that they are hearing the exactly the same nuances as in the original.
In controlled situations, our
institute has found that, although they do experience something, many people
are incapable of accurately hearing expressive nuances either live or
reproduced. They experience either a coarser form of the actual emotion of the
performances or a completely different emotion.5 Even those capable
of hearing fine nuances cannot hear them the moment they sit down to listen,
especially with recordings. It takes quite a while for most people to settle
down enough physically to begin to register the subtleties of the music and to
experience the emotional content. To understand why, one must realize that what
is heard is not the sound vibrations coming from the sound source; what is
heard is the vibrations of the hearer's own body when it is caused to vibrate
by the sound-waves striking it. Therefore, any nuances finer than the
vibrational state of the body itself are not heard. Essentially, unless the
body is in a physical state that is as fine as the music being listened to, the
music is filtered through, and degraded by, the coarseness in the way the body
is vibrating. This point is crucial to understanding why spatial effects figure
prominently in most people's considerations of sound reproduction. Besides
being easy to hear, spatial effects do not demand a particularly great
refinement of body. Being able to notice and make-out spatial dimensions and
directional effects impresses listeners who are not hearing the full content of
the music, and gives them the impression that they are getting something out of
the recording, when they are actually missing the point of the music.
If, from the beginning of a
listening session, one would carefully observe what aspects of the music one
becomes progressively aware of, one will notice that, besides notes and words,
the first things one is able to hear are the simple spatial relationships
(right, left, center, etc.). The last thing one is able to hear is the
expressive, i.e., the emotional, content. The notes and spatial relationships
can be called the “informational" aspects of sound, while the expressive
content can be called the “experiential” aspect.6 The point to be
made is that, without the experiential aspects, there really is no music, and
that a distortion or change in the expressive content of a recorded performance
is tantamount to changing the words in a sentence so that they mean something totally
different from what the writer expressed. In other words, a complete
falsification. On the other hand, it makes no difference to the quality or
intensity of the way one experiences the expressive content of the music if the
so-called "sound stage" is changed to give one or another impression
of height, depth, and width, nor does it matter if the orchestra seems to be
spread out in front of the listener (unless the music was specifically written
for stereo, or has some of the expressive content contained in the directions
of the sounds. The Beatle's album Sergeant Pepper's Lonely
Hearts Club Band has excellent examples of both).
The spreading out of the sound
in space is totally unimportant to and contrary to the aims of most music
written before stereo became popular. In their orchestration, composers took
great pains to create particular sound colorings by blending together
the sounds of different instruments. Halls were designed so the sounds would
thoroughly blend together before reaching the listener. When a conductor has
balanced his orchestra, there is no need for separation of the instruments by
spreading them out in differing directions in order to hear the different
voices; whatever is supposed to be heard can be differentiated even from so far
away that all the sounds of the orchestra essentially come from the same
direction. Similarly, if a recording of such a well-balanced performance is
correctly equalized to match the original, the balance that the conductor has
achieved can be heard in mono, without the supposed help of stereo “separation”.
This is an important point for the music-loving public because it means that
older recordings of such excellent performances can, to a great degree, be
restored since it is mainly their imbalances in the frequencies that obscure
their detail and not a lack of stereo effects.
One must assume that composers
know what their music should sound like, but, originally, composers were
singularly unimpressed by stereo. Virgil Thomson went so far as to call it a
“technological pretext” giving the recording companies “another excuse for
recording the standard works all over again" (A Virgil Thomson
Reader, p. 144). Another composer has mentioned that stereo is an excuse
to sell new, more expensive equipment. No composer whom I asked or with whom I
listened to music was the slightest bit interested in the depiction of spatial
effects.
V. CONSCIOUSNESS IS LIMITED
Few people can concentrate on
more than one thing at a time; but music consists of many things happening all
at once. In fact, music is the ultimate consciousness-expander because, if you
are not a Mozart, there is almost always more to be aware of than is humanly
possible. Even with a single melodic line there are both the notes and the
expression to be conscious of. For all but a very few particularly
"gifted” individuals, consciously registering the expressive content of
music demands every bit of concentration, awareness, and poise that can be
mustered, especially when the expression is as fine and delicate as it should
be in most classical music. In the finest ear-training and conducting classes,
which even included seasoned professionals, there are enormous differences in
sensibility to nuance and expressive content. Particularly interesting is that
neither the ability to recognize tones (perfect pitch) nor extraordinary
memories that allowed students to write down, from memory, anything the teacher
dictated, was of help in hearing the expressive content. For example, many
conductors (and other musicians too) with amazing ears for recognizing notes
and hearing mistakes were and are strikingly deficient in expressive
interpretive qualities. In most of these cases, the orientation towards the
informational (mental) aspects of music takes up all of their powers of
concentration and keeps them from registering the nuances of expression.
Therefore, the addition of artificial informational material, such as the
spatial effects of stereo, will distract most people from the more important
experiential aspects of music.
VI. THE BODY IS SENSITIVE TO
DIRECTIONAL IMBALANCES IN SOUND PRESSURE LEVEL
While the depiction of
directional effects has little effect on the experience of most fine music,
monophonic reproduction with only one loudspeaker is not the solution. That is
because the body is sensitive to unequal sound-pressure levels, i.e., whether
or not the sounds around it are of equal strength (volume). The body itself,
which is highly sensitive to physical imbalances, has to recreate the
vibrations produced by the sound-source, and this happens most effectively when
the whole body is equally subjected to those vibrations. Music coming
predominantly from one side creates an uncomfortable feeling of imbalance that
is especially disturbing and distracting when the body is in the requisite
relaxed, sensitive state necessary to hear fine musical nuances. Our tests have
shown that music from four equidistant speakers, arranged as in quadraphonic
listening, is the best arrangement, whether they play mono, stereo, or quad.
The sound-pressure level is then most evenly distributed around the body.
The body's sensitivity to the
lateral balance of sound is one reason why stereo seems to many to be superior
to mono with one speaker. In stereo, when the listener is located exactly
between the two speakers that are balanced for volume level, the sounds at
least come from both sides. But in this respect, mono is still preferable,
because the sound from both sides has the same volume level, while it varies in
stereo. Because the musical experience is predominately physical, mono with at
least four speakers surrounding the listener is the most effective way to
experience recorded music.
VII. EPILOGUE
Originally, stereo was thought
to be the next necessary step in perfecting sound-reproduction. But it was not.
Monophonic sound-reproduction was still gravely flawed when stereo was
introduced. The first step should have been to perfect monophonic
sound-reproduction. Some companies were well on their way towards doing so. The
last monophonic Mercury recordings were very close. It remained mainly for
playback techniques and equipment to be perfected in order to retrieve the
information which was on the grooves.
The introduction of stereo
halted progress by introducing a whole new set of problems, namely the
preservation and reproduction of two signals simultaneously. The state of the
technique at that time was not able to combine two signals and still preserve
the quality already achieved in monophonic recordings, especially not in
phonograph pick-up cartridges. Sound quality, particularly in the playback,
deteriorated markedly.
It is an individual's
prerogative to want sound-reproduction that includes some sort of depiction of
the placement of sounds in space. But to call stereophony accurate sound reproduction
is a falsification. Stereo is an extraneous effect added to sound, a special
phenomenon similar to 3-D in photography and cinema: both stereo and 3-D are
effects that may be interesting, even "kicky”, but they are only effects
and have little to do with the way we really hear or see.
Since hearing the expressive
content demands all of most listeners' concentration, the addition of other
effects such as those of stereophony, keeps the listeners from experiencing the
real content of the music if they pay attention to those effects. Such
distracting sound-reproduction has been the rule for over three decades among
laymen and even among professionals, most of whom use recordings to help study
scores (with the prevalence of recorded sound in our society, those who do not
outrightly use recordings for study, still cannot avoid listening to and being
influenced by recordings). The legacy of stereophonic sound-reproduction is a
loss of sensitivity to and awareness of delicate, fine interpretative nuance in
music. A full understanding of this fact must be cause for considerable alarm,
because music is the flagship of a society. It leads, serves as an example for,
and sets the tone of every other civilized pursuit within that society. It is
the best civilization has and must be preserved at the highest possible levels.
1 Even sounds consisting of many combined
sound sources, as in recording techniques using many microphones are
monophonic. With multi-miking, each microphone documents the complete
monophonic event from that microphone's position. Each channel of the stereo-signal
plays a monophonic signal consisting of the combined signals from its
microphones. But the use of many microphones is not even stereo. It is a whole
new technique that has nothing to do with either the natural spatial
relationships of the original sounds or the principles of stereo.
2 See our papers on sound equalization.
3 For most recordings, even digital, it is
necessary to compress the overall dynamic range of the performance. That should
be done section by section, i.e., the louder sections should all be reduced the
same amount and the softer sections all raised in volume by the same amount. In
that way, the dynamic subtleties within each section of the music will be
preserved. Automatic dynamic range expanders are not desireable because they
will expand and compress the dynamic range of everything, even the dynamics
within a single melody, thus changing the whole expressive content of the
performance.
4 Various papers of The Anstendig
Institute deal with the problems of hearing fine nuances, particularly those
due to the fact that the body must be vibrating as finely as the nuances or
they will be changed and degraded by the vibrations of the body itself.
5 The author's insights into the hearing
of expressive nuance comes from many years in the ear-training classes of some
of the finest music schools and long testing with volunteers at The Anstendig
Institute.
6 This is explained in other papers of The
Anstendig Institute, particularly “Hearing:
The Informational and the Experiential”.
Papers on related subjects are
available free of charge on request.
The Anstendig Institute is a
non-profit, tax-exempt, research institute that was founded to investigate the
vibrational influences in our lives and to pursue research in the fields of
sight and sound; to provide material designed to help the public become aware
of and understand vibrational influences; to instruct the public in how to
improve the quality of those influences in their lives; and to provide the
research and explanations that are necessary for an understanding of how we see
and hear.