960L Owner’s Maunal Using The Reverb Program
5-3
send if we wish, and control the distance or depth of
each sound source by controlling the amplitude of this
source in the echo send.
But distance is not the only perception we need. We
need the envelopment that makes notes come alive.
How can we produce envelopment with a 5.1 system?
Once again the key is the way reflections affect
horizontal localization. Our brains have an exquisitely
sensitive detector for differences in sound arrival times
between our two ears. These time differences are
converted into perceived horizontal angles, or azimuth.
In the presence of reflected energy – particularly
reflections not in the direction of the source – the time
differences are not constant. As reflections come and
go the time differences (and level differences) fluctuate,
with the amount of fluctuation depending on the
direction and strength of the reflections.
When the sound source is continuous – like legato
strings, or pink noise – we perceive these fluctuations
as an enveloping room impression. The time delay of
the reflections does not matter very much, as long as
they are longer than about 10ms. (Below 10ms there
are severe combing effects we will try to avoid in this
discussion.) But most musical sounds (and all speech
sounds) are not continuous.
To understand what happens with speech or music we
must learn how the brain separates sounds into
streams. Streams are the perceptual equivalent of
musical lines. Sentences from a single talker form a
stream. A stream has in general a specific source and
a single continuous semantic content. However the
streams themselves are not continuous at all – in music
the streams are composed of notes, in speech streams
are composed of phones – little bursts of sound roughly
equivalent to syllables. When we hear a string of
phones, our speech apparatus goes into high gear.
First we must separate the phones one from another,
then we must use frequency and time information to
assign an identity to each phone – at which point the
phone becomes a phoneme, the basic building block of
speech. From phonemes to words, from words to
sentences, from sentences to meaning – all seemingly
effortless and automatic – our brains decode the
spoken word.
The perception of envelopment is a useful by-product of
stream formation. To form a foreground stream the
brain must separate the sound events related to a
single source from the total sonic input. To do this we
must be able to detect when a single phone starts, and
when it stops. Detecting the start of sound events is
easy – we just look for a rapid increase in level. How do
we know when one phone stops and another starts?
There are only two ways possible – we can detect the
stop of a phone, or we can assume it has stopped when
we detect the start of another. Naturally, we do both.
But if we are to hear background sounds at all, we must
detect the stop of phones before a new phone starts.
How do you know if a phone has stopped? We can do
an experiment – about a 2dB drop in level in a 20ms
time period seems sufficient. What if the level drops
more slowly? Experiment shows that even with a slow
drop a 6dB change is sufficient. What if the sound
drops in level by 2dB, and then within 30ms comes back
up again? (This drop could be caused by a low-level
reflection.) Its turns out the level rise – if it occurs within
50ms of the first drop in level – tends to cancel the
effect of the first level drop. The brain assumes the
phone is continuing.
In general, to find the ends of phones the brain looks for
a level drop, and waits for 50ms to be sure the level
stays down. If it does, the sound event – the phone – is
assumed to have ended. Now imagine another simple
experiment. You are listening to someone talk in a noisy
room. You can easily understand the person, but you
are aware of the noise in the room - which is perceived
as continuous. How can this be? It is clear that during
the phones of the person who is talking you are unable
to hear the room – the phones are masking the
background. Yet you perceive the background as
continuous.
The brain is clearly separating the sound of the room
into a separate stream – the background stream. The
neurology that detects the background stream works in
the spaces between phones. Thus it cannot work
without the participation of the mechanism that
determines the ends of phones. Again we can
experiment. It turns out that the background detection
is inhibited during phones, as we would expect, and is
still inhibited for the first 50ms after the end of each
phone. After this time the inhibition is gradually
released, so the background detector has full sensitivity
within 150ms after the end of each phone. The
loudness of the background is then perceived through a
standard loudness integration, taking about 200ms for
full loudness to develop.
It is the background perception of reverberation that
gives us the sense of envelopment. Clearly it is the
reflection level 150ms and more after the end of sound
events that matters. Note that the relevant time is after
the END of sound events. We are conditioned by years
of looking at impulse responses to think about
reflections as always coming from hand-claps or pistol
shots. In speech and music it is the behavior of
reflected energy at the ends of sound bursts of finite
Reverberation and Reality, Continued