FORUM
UK Gig Guide
UK Music Venues
Music News
Directory
ENCYCLOPAEDIA
History
Composers
Quotations
World
Instruments
Calendar
TECHNICAL
Hardware
MIDI
Audio
Science
Programming
THEORY
Tutorials
Notation
Tools
Home | Technical | Audio | Sample FX FAQ

Sample FX FAQ

Mike Currington curring@ferndown.ate.slb.com

V1.0 4 March 1995

About this FAQ

This document is intended to help anyone who has questions concerning audio effects, and was written after seeing the same questions about audio effects being asked time and time again on usenet newsgroups. By audio effects I mean effects that are applied to audio signals, usually to change the sound in some way. If this sounds (pun intended) even slightly interesting then read on...

As this is the first version of the FAQ, I need comments, suggestions, revisions, and (most importantly) additions for the next version of the faq (of course it may stink so bad that there will be no next version :-) ).
Please read the sections at the end of the faq on contributions and how to contact me, thanks.

CONTENTS

Legal Stuff (deny everything)
Sampling Related Questions
What is the Nyquist frequency/rate?
Is the Nyquist freq really enough?
How Do I Change the Sampling Rate of a signal?
Please could someone explain sampling theory?
Effects using Delays
How Do I create echo effects?
What is Reverb, how is it different to echo?
How do I get my echo/reverb to sound more realistic?
What is chorus
I would like to know about flanging. Help
Whats a Spring Delay Line?
Volume Changing Effects
Distortion
Noise Gating
How do I do Compression?
What is the best way to mix signals together?
Filtering (coming soon)
Frequency Changing Efects
How do I change the pitch of a sound?
How do I "time stretch" a signal
What is Vocoding? How do I do it?
What is Ring Modulation?
Surround Sound (coming soon)
Miscellaneous Effects
hat does the Aphex Aural Enhancer and similar units do?
References
Internet Resources
Other FAQ's
News groups
Mailing Lists
FTP sites
World Wibe Web (WWW) pages
Software Packages
Shareware / Public Domain Software
Commercial Software
Recommended Books and Papers
General FAQ Questions and Comments
How do I get this document
How do I contribute to the FAQ?
Help Desparately wanted...
About the Author
Credits

At this point suppose I ought to do a little legal stuff :

LEGAL STUFF

This document is copyright (c) 1995, Michael John Currington. All rights are reserved by the author.

Some of the material in this FAQ is has been contributed by others I accept no responsibility for what they have written, and neither do they ! I have tried to give credit to contributers, but if this is wrong, incorrect, or missing, sorry but tough, EMail me and I will try to correct it.

All trademarks are acknowledged.

I accept no responsibility for what I have written or included in this FAQ, so there !

Sorry about that, I hope I've excluded myself from any possible legal problems and resposibility :-). And now on with the show...

SAMPLING RELATED QUESTIONS

This section will deal with questions relating to the sampling of digital audio data. Since the bulk of this FAQ is aimed at creating effects digitally using Digital Signal Processors, the text in this section will be of use to many starting on the road to effect production, but does not describe any effects as such.

What is the Nyquist frequency/rate?

One of the most important contributions to sampling theory (for any signal) is the idea of the Nyquist rate. Sampling theory says :

If the highest frequency component in a signal is f (Hz) then the signal should be sampled at the rate of at least 2f (Hz) for the original signal to be completely represented by the discrete samples.

And thats all folks, easy hu, well.....

Is the Nyquist freq really enough?

[by Paul G Russell - thanks]

Answer: Definitely NOT!
If tone is 1KHz, and we use Fs=2KHz Then we are in BIG trouble if our sampling points lines up with the zero crossings in the tone - Our output is NIL 0.0.0.0!

Even if we are off a bit we will get a reduced amplitude version of the signal.

If we sample at slightly more that the Nyquist, we can get strange effects, i.e. 2010Hz sampling 1000Hz will generate samples, that when sent back to the D/A will give a varing amplitude 1KHz output, as the sampling point catches the 1000Hz at different phase points. - approx 10Hz amplitude modulation I believe?

What the limiting factor seems to be is how long is your data gathering: i.e. for a 100bps modem you sample for 10msec per bit. If you misalligned on the phase of a tone near the Nyquist, you won't see much of it for the bit period.

In our voice band application we use 7200, 8000, or 9600Hz to sample audio in our radio band of 300-2700Hz. Many telephone line modems use 9600Hz to sample the phone line (~3000 to 3500 Hz bandwidth depending upon the modem characteristics and phone line location), of course this is partially chosen to be proportional to the baud rates.

And of course there are harmonics that may have to be gathered to reconstruct your signal.

I've seen recommendations of Fs=2.5 Highest tone, This seems like a reasonable value. But, I would recommend factors as high as your CPU
will allow if you are taking short audio blocks for processing. If you are sampling over a long time, you may reduce the factor, with the
knowledge that signals near the Nyquist rate may suffer amplitude distortion - less a problem for data than voice/music ??

Paul G Russell
St. John's, Newfoundland, Canada
paulr@neweast.ca

Note : The most important lesson to be learnt from this is that signals should be filtered to limit frequencies, before sampling. If the signal is analog then an analog filter should be used (think about it...). Even if you are sure the signal will not exceed the "safe frequency" watch out for noise on the input which could possibly be of higher frequency.

How Do I Change the Sampling Rate of a signal?

The purpose of changing the sampling rate is to retain the digital signal whilst changing the number of samples needed to store the signal. The most popular uses of this sample rate changing are to convert CD audio data (44100Hz sample rate) to another format (ie DAT at 48000Hz), or to change from one computer sample rate to another (ie 11000Hz Amiga rate to 8000hz Sun format).

Various methods of changing the sample rate were discussed on the csound mailing list, I have included the method that was thought to be best:

[by Tom Ahola]
If the sample rate is to be drop to 1/2 you first have to filter out the top half of the bandwidth (spectrum over Sr/4 !) and after that you can throw away every second sample. (resampling) If the lowpass filter is ideal (brick wall) no extra noise is added, you only loose half the bandwidth.

In interpolation to double sample rate you first add a zero between each sample (resampling). This gives no additional white noise or any distortion, you just end up with a spectrum that is doubled (imaged). To remove the undesired spectral image you have to lowpass filter the resampled signal with a filter with a bandwidth of Sr/4, where Sr is now the new sample rate. If the filter is ideal, the signal is identical to the originalsignal (no noise!).

You can interpolate/decimate by any integer, rational or decimal number. Rational interp./decim. can be done efficiently by the use of polyphase filters.

If somebody does interpolation by fitting a line or parabola between two samples he or she has not read enough dsp books. These methods will end up in aliasing and/or distortion.

Tom Ahola Metrology Research Institute, Finland
E-mail: tom.ahola@hut.fi
WWW: http://www.hut.fi/~tahola/


The Method of doing sample rate conversion tought in my DSP course was to perform a fourier transform (ie FFT or DFT) on the signal and then do a reverse transform, with extra zero padded values (increasing rate) or with less high frequency components (decreasing rates). Ie the reverse transform is done on more or less frequency components than the original. I dont know whether this system is better than Toms method, or if the extra complexity has no actual benefit. My gut feeling is that this is better, especially for non integer rate changes, but for down conversion I can see that some windowing may be needed.

Please could someone explain sampling theory?

[by Robert Bristow-Johnson]

Here is the mathematical expression of the sampling theorem:


x(t)*q(t) = T*SUM{x(kT)d(t - kT)} .------.
x(t)-->(*)----------------------------------->| H(f) |--> x(t)
^ '------'
|
'------ q(t) = T*SUM{ d(t - kT) } (SUMming over all k)

where d(t) = 'dirac' impulse function
and T = 1/fs = sampling period
fs = sampling frequency

q(t) = T*SUM{ d(t - kT) } is a periodic function with period, T and can be expressed as a fourier series. It turns out that ALL the fourier coefficients are equal to 1.

q(t) = SUM{ exp(j2n(pi)(fs)t } (SUMming over all n) Using the frequency shifting property of the fourier transform,

F{x(t)*q(t)} = SUM{ X(f - n(fs)) } (SUMming over all n)

where X(f) = f{ x(t) } and F{ ... } is the fourier transform.

This says, what we all know, that the spectrum of our signal being sampled is shifted and repeated forever at multiples of the sampling frequency. If x(t) or X(f) is bandlimited to B (i.e. X(f) = 0 for all |f| > B) AND if there is no overlap of the tails of adjacent images X(f), that is B < fs - B, then we ought to be able to reconstruct X(f) (and also x(t)) by low pass filtering out all of the images of X(f). To do that, fs > 2B and H(f) must be:

{ 1 for |f| < fs/2 = 1/(2T) H(f) = {
{ 0 for |f| > fs/2 = 1/(2T)

The impulse response of the reconstruction LPF, H(f), is the inverse fourier transform of H(f), called h(t).

h(t) = inv F{ H(f) } = sin[(pi/T) * t] / [(pi) * t] = (1/T) * sinc[ t/T ]

where sinc(w) = sin[pi*w]/[pi*w]

The input to the LPF is x(t)*q(t) = T*SUM{x(kT)d(t - kT)} . Each d(t - kT) impulse generates its own impulse response
and since the LPF is linear, all we have to do is add up the impulse responses weighted by their coefficients, x(kT).
The T and 1/T kill each other off.

The output of the LPF is

x(t) = SUM{x(kT)*sinc[(t - kT)/T]} = SUM{x(kT)*sinc[t/T - k]} (SUMming over all n).

This equation tells us explicitly how to reconstruct our sampled, bandlimited input signal from the samples. When
doing sample rate conversion, you must evaluate this equation for times that are integer multiples of your NEW
sampling period, TN. i.e. t = n*TN.

The sinc(t/T) function is one for t = 0 and zero for t = k*T for k = nonzero integer. This means that if your new
sample time, t, happens to land exactly on an old sample time, m*T, only that sample (and none of the neighbors)
contributes to the output sample and the output sample is equal to the input sample. Only in the case where the
output sample is in between input samples, do the neighbors contribute to the calculation.

Since the sinc() function goes on forever to + and - inf, it must be truncated somewhere to be of practical use.
Truncating is actually applying the rectangular window (the worst kind) so it is advantageous to window down the sinc()
function gradually using something like a Hamming or Kaiser window. In my experience, you'll need to keep the domain of the
sinc() function from -16 to +16 and sample it 65536 times in that region. This requires a 32 point FIR computation to
calculate one output sample. Since it is symetrical, that means 32768 numbers stored somewhere in memory. When interpolating,
the integer part of t/T detemines which 32 adjacent samples to use, the fractional part of t/T determines the 32 sinc() coefficients
to be used to combine the 32 samples. There are other ways of determining the LPF impulse response, some are published
(D. Rossum's paper at Mohank, R. Adams at some AES convention, and some other Julius Smith paper), and some are tightly guarded
trade secrets.

Effects using Delays

The following questions relate to effects that delay the signal to produce
the required audio effect. The two main exclusions from this section are
digital filters (they use a form of delay), surround sound (it has its own
section), and pitch shift (several methods exist for this).
For the most part delay effects are ideally done in the digital domain, because
memory chips make the perfect, noise free, store for data that is being delayed.
In the analog domain bucket brigend delay lines (semi digital) and spring delay
lines can be used.


How Do I create echo effects?

A basic echo effect is obtained by taking the input signal and mixing the
input with a delayed version of the signal. The proportion of delayed signal to
"clean" (straight through) signal determines how obvious the echo is, and the
size of the delay changes the sound of the echo.

A quick picture of how a delay is implemented follows :
___
*a | |
INPUT -------+--------------------------------->| |
| | + |----------> OUTPUT
| ________________ +--->| |
| | | | *b |___|
+----->| Delay d secs |-----+
|________________|

Algorithmically this can be written :

out(t) = a * in(t) + b * in(x-d) (where t is current time, d is delay)

And for you pseudo code freaks :-)

/* this is for non real time (ie data in arrays) */
for Sample := 0 to NumberOfSamples
{
if ( Sample - DelayLength < 0 )
{
Output[Sample] := a * Input[Sample]
}
else
{
Output[Sample] := a * Input[Sample]
+ b * Input[Sample - DelayLength];
}
}

/* this is for real time */
repeat( until we dont want to )
{
Input = Read_Input();
Output := a * Input
+ b * DelayMemory[DelayCount];
DelayMemory[DelayCount] := Input;
DelayCount := (DelayCount + 1) MOD DelayLength;
}

I dont want to say too much about the code or diagram, a and b control how loud the input and delayed signals sound at the output. In the code it
is assumed that the input and output are data arrays and that DelayMemory is a block of memory DelayLength samples (bytes, words, floats, or whatever)
long. NB the MOD operator gives the modulus (remainder) of the value (equivalent to % in C) and "wraps around" the delay count.

Extra echos can be added by extending the length of the delay and by "tapping off" values at various points. Ie 2 stages :

* a ___
INPUT -------+------------------------------------->| |
| * b | |
| +------------------>| + |----------> OUTPUT
| _________ | _________ * c | |
| | | | | | +--->|___|
+----->| Delay 1 |---| Delay 2 |--+
|_________| |_________|

To create gradually fading echos we can use lots of individual delays as above, or put the delayed signal back into the system - this is the reverb effect:


What is Reverb, how is it different to echo?

In its simplest form, reverb is echo, but with the mixed signal fed-back into the the input to the delay :
___
*a | |
INPUT ---------->| |
| + |-------------->--------------------------> OUTPUT
+--->| | |
| *b |___| ________________ |
| | | |
+-----------<---| Delay d secs |<----+
|________________|

The effect of the feedback is to make it sound like there are multiple echos. For example if a=1 and b=0.5, and we apply a signal, initially the output will be the same as the input. After the delay time has elapsed, a delayed version of the signal will be mixed with the input, giving a single echo at 0.5 time its original amplitude, and the current input. A time d seconds later, the single echo signal will have been delayed, giving the original signal (delayed twice) at 0.25 its original amplitude, the signal from d second previously at 0.5 times its original amplitude, and the current input. This process continues, the older a signal, the more times it has passed around the loop, and the lower its amplitude, so multiple fading echos are produced.

Algorithmically :
out(x) = a * in(x) + b * out(x-t)

Care must be taken with reverb when choosing the value for b, if b is larger or equal to one, then a signal will in theory circulate the delay loop and be
amplified each time it goes round the loop. In practice, the signal overloads the output value, resulting in a nasty feedback noise (similar to feedback screech heard at gigs when singers decide that the bast place to wave a microphone is next to their monitor speaker) and possible destruction of speakers!

High values of b (getting close to 1) can produce cool effects for transient sounds (like drums) and muddle other sounds. More subtle reverb adds ambience
to sound (especially voice). Be careful not to get that bathroom sound unless thats what you want :-)

Very small values of d (less than a few milliseconds) makes it impossible to hear the individual echos, and instead the reverb becomes a filter.

How do I get my echo/reverb to sound more realistic?


The answer to this question is provided by Arun Chandra, taken (with permission) from a document he sent me which was intended for composers who were
interested in implementing algorithms without using too much maths. Other parts of the document may be included in the section on filtering (when I
write it!). Thanks Arun.

M.A. Schroeder, in the 1960s and 1970s, had suggested two models for "realistic reverberation". The first model was to use five
allpass filters cascaded together, i.e., the output of the first was the input to the second, the output of the second was the input
to the third, etc.

The algorithm for an allpass filter is:

y(n) = -g*x(n) + (1 - g*g) * ( x(n-D) + g * y(n-D) )

[where: n the current sample number y(n) the n-th output sample x(n) the n-th input sample D delay in samples g gain (determines reverb time) ]

As you can see, the latter part of an allpass filter is a comb filter.

[Mike's Note : the comb filter is a type of reverb, its algorithm is out(t) = in(t-d) + b * out(t-d) ]

The second model he suggested was to use four comb filters in parallel, sum their outputs, and then pass their outputs through two allpass filters in sequence. These two filters designed by Schroeder are the most commonly used ones in digital synthesis.

James A. Moorer built on the work of Schroeder. He noticed that Schroeder's filters tended to keep the high partials of a sound ringing, and so suggested that the comb filter be used with a built-in low-pass filter:

y(n) = x(n-D) + g2 * ( y(n-D) + ( g1 * y(n-(D+1)) ) )

He then suggested a model for reverberation that involved six combs with built-in low-pass filters run in parallel, their outputs are summed, and then sent to an allpass filter.

A problem manifest by all three of these reverberation units (first noticed by Schroeder) is their "lack of early echoes". Remember that the samples are delayed by the magnitude of D. This means samples that are less than D are ignored by the reverberators. Schroeder suggested a solution, which was to add to the source sound some initial delays in the form of an FIR filter:

y(n) = a1 * x(n) + a2 * x(n-D1) + a3 * x(n-D2) + ... aN * x(n-N)

These delays would range from 0 to 80 milliseconds. Moorer found some appropriate coefficients for either 7 or 19 "early echoes".

What is chorus?

Chorus is so named because it makes it sound as if several instruments, or whatever, are playing at the same time but at slightly different pitches. With vocals it sounds as if a chorus of people are singing.

In implementation chorus is very similar to echo. The block digram is the same as the one for echoing. The input is mixed with a delayed version of the input
to produce the output, but in chorusing the sample rate is being changed continuously.

Algorithmically we can describe this as :

out(t) = in(t) + in(t - d * f(t) )

(where d is the length of the delay at the "normal" sampling frequency, and f(x) is the varition of the sampling frequency).

The sample rate is changed quite slowly, this rate of the chorus effect is typically in the region of 0.1Hz to 5Hz, and the sample rate is typically
changed in a sine or triangular wave pattern. The depth of the effect determines how much the sample rate changes by, a typical effect would go
from twice the normal sampling frequency to half the normal frequency. ie f(x) ranges from 2 to 0.5 and back, between 5 and 0.1 times a second.

The deviation of the pitch at any time (t) is proportional to log[1 - d * f'(t)], where f'(t) is the derivative of f(t).

Chorus in real time is easy, if we have control over when we sample the inputs then this is easily changed to give the chorus effect. If the data is already sampled at a fixed rate then some sort of interpolation/averaging may need to be done to produce the output samples (see the sectionon resampling).

The chorus effect can be enhanced by adding different sized delays (as can be done in a basic echo effect), this increases the number of "voices" singing
with the original and makes for a richer effect - ie. a 3 stage chorus :

out(t) = in(t) + in(t - d*f1(t) ) + in(t - d*f2(t) ) + in(t - d*f3(t) )


I would like to know about flanging. Help

Flanging is exactly the same as chorus, except the block diagram is the same as reverb rather than echo. The sample rate is changed in the same way as
chorus. The flange effect gives a more discordant effect than the chorus and has a metallic sound.

Whats a Spring Delay Line?

The spring delay line is one of the oldest "electronic" effects, although not exactly digital technology (and barely electronic). A line driver (often a speaker) is hooked to a cylindrical spring with its other end connected to a reciever (often a speaker!), the whole construction is kept flat. When the driver is given a sound signal the spring is moved and a little time later the movement reaches the reciever where it is converted back into an electrical signal. This method of delaying sounds can still be found on some guitar amps, if a clunking noise is heard evertime the amp gets knocked then chances are its the spring delay line (useless fact #18327).

Look on "Leper's Guitar Effects Schematics" www pages at http://www.wwu.edu/~n9343176/schems.html) for a further explanation (and lots of cool analogue effect circuits).

Volume Changing Effects

This section will discuss effects that changes the sound of a signal by altering its volume (only).

Distortion

The best place to read about distortion is in the guitar effects faq, which has a whole section on this effect and analog implementations. Many distortion methods can be done in the digital domain by a look-up and possibly a filter. More complex forms of distortion such as valve emulation are not very well understood and so it is difficult to reproduce using anything but valve "technology".

As you may have guessed I need more material for this section, if you have implemented good disptortion effects (especially in the digital domain) then get in touch!!!

(Noise) Gating

The idea of simple gating is simplicity itself. If the input signal fall below a set threshold level, then turn the output off. So that the effect does not distort inputs that should not be gated, a hold time needs to be added. If a signal falls below a set threshold and stays below this value for time 'th' (the hold time) then turn the output off until the input goes above the threshold level again. This is the basic noise gate. Used with electrically noisy equiptment (such as old analogue synths and guitars) the noise gate will remove background hiss when a note is not being played.

The above noise gate has some problems, if a graduallly fading note is played then the noise gate may cut off the not abruptly when the volume falls below the threshold. Because of this a release time 'rt' is introduced. Instead of cutting off the output abruptly when the sound goes below the threshold and the hold time is exceeded, the sound is now faded during the release period.

Hysterisis can also be introduced so that the volume needed to turn on the effect is higher than the volume that the signal must fall below before the gate turns off.

The noise gate can be used as an effect by making the threshold level quite large, so that only loud parts of a sound can be heard. The final addition to a good gate is an attack time. Instead of turning the gate full on as soon as the input exceeds the threshold, the output level is gradually increased over the attack time.

By tweaking these parameters drums can be made to sound a lot tighter, guitars can be made to sound more funky, or you can just use it to make noisy instruments quiet when they should be.

How do I do Compression?

Ahhh, the question that started the faq...
A compressor reduces the volume of loud sounds and quiet sounds are increased in volume. This results in a much more even volume level, although care must be taken or elese everything will be at the same volume. Like gating there is an element of time delay so that a sound with a rapid "attack" (like a snare) will still sound dynamic.

Compression is applied to almost all radio stations (to cover hiss??) and most pop music (to a lesser extent) so that your stereo is not damaged by transient signals. Despite these frowned upon practices, compression is very useful when recording vocals so that the singer does not have to
sing at the same volume all the time :-)

For an explanation of the effect I include a post which was sent to Comp.music after I asked how to do compression:

[by Frederick Umminger, thanks]
Take the absolute value of your signal and low-pass filter it with a cutoff frequency of 15-30hz, and probably a pretty steep slope. The result will track the volume of your signal. Now compute

out(t) = in(t)*low(abs(in(t)))^A

for A = -1, all dynamics should be removed. A= 0 is no effect, and A > 0 heightens the dynamics. You'll need gating or the compression will amplifying all of the background noise during moments of silence.

What is the best way to mix signals together?

Can someone help out here with references or text?

Filtering (coming soon)

As soon as I get time, I'll be writing some stuff about filtering. I would like to cover the following areas:
Simple filers (high pass, low pass, band pass)
Equalisation
DC removal
Parametric Equalisation
Phaser effect
Wah-wah effects

Frequency Changing Efects

The effects in this section all noticably change the pitch of the effected signal.

How do I change the pitch of a sound?

There are several different ways of changing a sound's pitch. The best method to use depends on what the sound is, what quality is required, and how much
processor power you have. Here are the main methods of pitch changing :

- If the signal you want to pitch shift is speech then you can split the signal into small blocks and then add or remove blocks to make the speech last longer/shorter (more/less samples). Then to speed up, play at a faster rate (or convert the sample rates [2-3]). The sections must be split at zero crossings or cross faded so that clicks are not produced between blocks. Detecting the pitch sections (pitch pulses) requires knowledge of how speech is produced and is the main obstical in using this method. Auto-correlation is probably the most popular way of detecting these
sections although it requires lots of computing power.

- For any other type of signal a version of the chorus can be used. The chorus effect [3-4] changes the pitch of a signal by changing the sampling frequency, causing the time to "wobble" around it's normal value. If a sawtooth wave is used to modulate the sampling frequency then the signal is pitch shifted up (or down). When the edge of the sawtooth occurs there will be a click on the output. To get around this some form of filter could be used, or for better results two flangers can be used. If their modulation signals are out of step with each other then we can switch the output between the two flangers so that we avoid the click. In order to prevent clicks when the change of output source is made some form of filtering may be needed.

- The final method of pitch shifting is the most complex (oh no!) but gives the best quality results. We can transform the input signal into the frequency domain (using a fourier transform) and stretch the frequency information, so that the frequencies of the signals are changed. Reverse transforming this new frequency domain representation will give a pitch shifted signal.

How do I "time stretch" a signal

Time stretching (where a sound is made to last longer, but keeps the same pitch as the original) is exactly the same process(s) as pitch shifting. If we need to make a sound twice its current length, pitch shift the sound to twice its original frequency (ie one octave up) but play back the data at half the original rate. The same can be done for shortening lengths of sounds but by halving the frequency and playing at twice the original rate.

When the change in time requires more than a simple doubling or halving of the frequency, interpolation of the pitch shifted samples is needed.

What is Vocoding? How do I do it?

Vocoding is an effect that kind of superimposes your voice (usually) over the top of another instrument. Used effectively it makes it sound like your instument is talking! Probably the best (and first?) uses of this effect can be heard on Kraftwerk records, their material is also well recomended if you need inspiration for electromic type sounds, "The Man Machine" is sooooo cool. Sorry, I digress, so without much further ado, on with the question...

[The following explanation of vocoding is by Eric Harnden]

From: HARNDEN@AUVM.BITNET (ronin)
Subject: vocoder tutorial
Date: 18 Oct 91 12:47:21 GMT

someone has asked about vocoders here, and there have been various levels of reply, some of which were aimed at providing manufacturer info,and a couple of which gave a little bit of operational info. i thought i'd expand on the operational side, for those who might be confused, or just ignorant, about a vocoder's workings. if you already know, ignore this and go read the rest of your mail.

a vocoder's main equipment consists of two sets of bandpass filters. these are filters that pass only a selected range of frequencies, the center of that range known as the center frequency, and the breadth of that range known as the bandwidth. a single set of these filters constitutes a filter bank whose center frequencies and bandwidths are designed to provide coverage for pretty much the whole range of hearing. one possible configuration for example would use octave bandwidth filters, with their center frequencies set an octave apart. one octave bandwidth filter might have its center frequency set to 1000Hz, for instance. it's nearest upper neighbor would be centered at 2000Hz, and its nearest lower neighbor would be centered at 500Hz. the frequency response curves (the map of their pass-bands) would overlap each other at the points a half-octave between each pair, providing filter coverage for the entire range from about 250Hz to 3000Hz. of course, the narrower the filter bandwidth (and therefore the more plentiful the filters inthe bank), the more precision the bank has, since each filter will differentiate a smaller range of frequencies. now... a signal, say from a synthesizer (let's call it the source), is passed into one of these banks. the signal is applied in parallel to all of them at the same time, and the output of each passed through a gain control element (a VCA) before the signal is recombined. so far, not unlike a graphic EQ, except that the VCAs don't actually boost the signal from a filter, just provide controllable attenuation. the VCAs are normally off. the application of a control to any one of the VCAs will cause a certain amount of the frequency selected for by the bandpass filter feeding it to be passed to the output:

source---->filter 1---->vca 1-----|
|->filter 2---->vca 2-----|---mix---->out (whatever portion of
|->filter 3---->vca 3-----| the signal is passed
| by filter 3)
control------------------|
(applied to vca 3 only for this example)

got it? good. now, we've got another signal, say from a microphone (let's call it the control), that is passed through another filter bank that is matched to the first one... its filter parameters are exactly the same as those of the first bank (the one through which the source is passed). the output of these filters, though, rather than being passed through VCAs which control their final gain, is measured to determine their gain. in an analog vocoder, this is simply done by rectifying the
output from each filter. what you get is a DC voltage whose amplitude is proprtional to the amount of control signal that got through that filter, which is of course proportional to the amount of that frequency range which was present in the spectrum of the control. the DC control voltage associated with ecah control filter is applied to the VCA of each matching source filter, so that the amount of energy present in the control signal spectrum that makes it through any given filter in the control filter bank determines the amount of energy from the source signal that passes through the matching filter in the source filter bank that is allowed through to the output. in other words, the two signals' spectra are separated to some degree, and the reconstruction of the source spectrum is made contingent upon the
relative weighting of the assiociated portions of the control spectrum. put yet another way, the formant envelope characteristic of the control is imposed on the spectrum of the source.
here's the final diagram, for one matched filter pair:

synth---->filter------------------------------------------>vca---->out
/|\
|
|
mic---->filter---->amplitude to control level conversion----|


any questions?

-------------< Extremism in the Pursuit of Good Noise is no Vice >-------
Eric Harnden (Ronin)
HARNDEN@AUVM.BITNET
HARNDEN@AUVM.AMERICAN.EDU or
The American University Physics Dept.
Washington, D.C

When performing vocoding digitally the filters are usually replaced by fourier transforms of the signal, and it is the individual frequency components in the transformed signal that are modified.

What is Ring Modulation?

Ring Modulation is multiplication of two signals together to make the third (output) signal.

so out(t) = in1(t) * in2(t)

Usually one of the signals is from the "outside world" and the other is a sine wave generated "inside" the effect.

ie out(t) = in(t) * sin( 2*PI*t*f )
(if t is time and f is the frequency of the "internal" modulating signal)

In frequency terms the tones produced are the sum and product of the original frequencies. So if in(t) = 500hz tone & f = 300hz then out(t) will have 200hz
and 800hz signals. It should be ensured that the input signal is zero offset (ie zero when no signal present) or else some 300hz tone will be output at all times.

Some effects units and analogue synths (where this effect is most widely used) use a single XOR gate to perform the multiplication digitally. To do this in dsp we would need to convert the input and modulating signal into one of two values (1 or 0) and then xor them together. Filtering is probably needed on the output signal, as the signals are badly distorted, which will have given rise to some unwanted output frequencies.

Surround Sound (coming soon)

Likely sections are:
How do I encode/decode dolby(tm) surround sound?
Is pro-logic like normal surround sound?
What other surround sound coding schemes exist?
I would like to include the following information if anyone can provide pointers or help:
Where can I get surround sound coded files?
How does Q sound get surround sound from two speakers?

Miscellaneous Effects

What does the Aphex Aural Enhancer and similar units do?

Apparently they make music sound more dynamic and give a professional edge to the sound. As far as I can remenber, from an explanation I read a while back, these units distort the high frequencies of the input signal to create more "top end", ie they create more high frequency signals without just boosting the existing
frequencies like eq does. While this seems to make sense to me I am sure there is some extra processing being done (??) in these units.

References

In this peliminary version of the FAQ there are few references in the actual text, hopefully this section will provide pointers to information on what you are interested in. Much of the information below is incomplete (or wrong), I am relying on you experts out there to help me get it right :-)

p.s. Remember I am will not be reponsible for the accuracy of any information in this FAQ. If its wrong, sorry, but the only thing you can do is help me
get it right in the future. (Just in-case you forgot)

Internet Resources

Other FAQ's

Other related FAQ's (I hope I did not cover too much common ground) are:

DSP FAQ - The FAQ for the Comp.dsp newsgroup. Covers all types of DSP including some audio. Also gives addresses and details of DSP chip/software vendors/products. Occasionally posted to comp.dsp but only version I have found was last updated over a year ago. Available at rtfm.mit.edu ftp site.

Guitar FX FAQ - Details various effects (mostly as descriptions of sound rather than implementation) for guitars. Good section of
various methods of producing distortion. Updated regularly (posted to rec.music.makers.builders, rec.music.makers.guitar, and alt.guitar news groups), and
by ftp to rtfm.mit.edu.

CSound FAQ - Frequently asked questions for the CSound music synthesis program. Available via www at :
http://coos.dartmouth.edu/~dupras/Csound/Csound.faq.html
(should also be at ftp.maths.bath.ac.uk).

All the above FAQ's should be available by anonymous ftp to rtfp.mit.edu (login : anonymous).


News groups

Almost certainly the best place to get answers to questions or to discuss stuff.

[winge mode on]
In ALL case READ THE FAQ before posting questions. If theres one thing that will get you flamed its posting questions that are blatently in the FAQ. The other way of getting flamed for sure is to ask a question that was asked and answered only a week before, so please read for a while before posting. Since the amount of noise on some groups is high interesting posts get lots of good responses (usually). If you get answers to questions relevent to the FAQ then please mail them to me for future FAQ's, thanks.
[winge mode off]

Here are my favourite groups for Audio effects stuff:

Comp.dsp - the best group to read if into dsp (oh really?) or implementations of the effects described here.

Comp.music - Discussions on here tend to cover a lot of stuff. Sometimes taken over by midi questions, sometimes very interesting, sometimes lamer city :-) Questions on when to use effects tend get good answers (just dont expect definite answers, things like when to use compression will get 5 conflicting opinions).

Rec.audio.tech - Tends to cover less digital techniques than the other groups, but if you need to know about analogue, speakers, or surround sound then this is probably the best group.


Mailing Lists

Numerous e-mailing lists exist on the internet where people with similarinterests can send messages which are then forwarded to everyone else on the
list (like a news group but more personal). Some of these lists are moderated (messages are checked by a single person before forwarding) and of course
since you have to subscribe, you can also be kicked off a list. As a result the amount of rubbish posted is less than most news groups.

Here are a few mailing lists relevent to this FAQ :

CSound List - concerned with the csound sound synthesis program. See the software section for csound. Whilst discussion is centered around csound issues, csound can do effects, and help can be found here with these.

subscribe csound-list-request@maths.exeter.ac.uk

FTP sites

The sites listed below are the ones that have quite a lot of relevent files,
not just general sites with a few programs.

ccrma-ftp.stanford.edu - Computer music department at Stanford Uni (US). Site contains some effects related files and sound synthesis stuff. Hunt around for files, some good files on filtering can be found in pub/dsp.

ftp.hyperreal.com - Directory /raves/music/machines/information/effects contains some files about effects, mostly collected from the internet. A few popular commercial effects units are also discussed. Below this directory is a lot of electronic music bits and pieces.

ftp.analog.com - Analog Devices ftp site. Contains software for Analog Devices DSP's and has some example code for popular DSP operations (ie Fourier transforms and filtering).

ftp.maths.bath.ac.uk - CSound ftp site. Should have the major versions of CSound (and source) as well as some example files.


World Wibe Web (WWW) pages

http://www.wwu.edu/~n9343176/schems.html
Contains electronics schematics (circuit diagrams) for lots of popular audio effects. Most of the circuits are 'classic' commercial effects, so are pretty simple. It even may be possible to convert some into their digital (DSP) equivalents.

Software Packages

Shareware / Public Domain Software

Filterkit - Filtering and resampling program. Written for Unix machines, but C code is included which should complile with little effort for other platforms.

Cool Edit - Microsoft Windows based shareware sample editor. Implements a number of audio effects which can be applied to samples. I have yet to use this package but is widely thought to be the best Windows sample editor, although rather slow.

Gold Play - Shareware competitor to Cool Edit. Implements a few simple effects - echo, reverb, panning, filters, resampling. Also provides the user with facilities for programming their own simple effects. Operations are applied to sounds very quickly.

CSound - Music synthesis package available for most popular machines (PC, Unix, Atari, Next, Power PC). Powerful but difficult to use. From a text description of a music score, and description of the instruments you want to use CSound can produce synthesise a whole song. Sampled sounds can be used as the basis for sounds, and effects applied to the sound, or sounds can be produced by describing them mathematically. CSound implements its own langauge in order to be such a powerful package and so has a steep learning curve. Package is public domain and constantly evolving. Users are recommended to join the CSound mailing list (section A-3). The offical ftp site is ftp.maths.bath.ac.uk.

Effect - Written for an article in Dr Dobbs Journal about realtime audio effects. Two versions of the code are included, assembler for ???? DSP chips, and C for IBM PC (uses Microsoft Sound Card). Available by ftp from ftp.mv.com, file /pub/ddj/1994.07/audio.zip.

Commercial Software

The following commercial software packages provide digital recording facilities and implement some audio effects. More details on these packages from users
would be much appreciated, as well as details of other pieces of software.

Acoustica (1.0) - Microsoft Windows based sample editor. Concentrates on adding effects to samples, effects implemented include - Compressor, echo, reverb, flange, chorus, stereo enhancement, time stretch and equalisation. Effect processing can be very slow (on a 33mhz 486sx). A demo version of this program is available from most large ftp sites (eg oak.oakland.edu).

S.A.W - Software Audio Workshop. - PC based sound recording software. Provides a replacement to traditional 4 track recorders, with advantages of digital editing. Add on effects "racks" are available (only one at time of writing) allow affects to be added to sounds. The currently available version supports stereo recording (with simultaneous playback if soundcard allows it) and four stereo tracks mixed together on playback. It is rumoured that future versions will support 16 tracks. This program needs a high end PC (fast disk, 8Mb memory, and 486 essential), MS Windows, and good quality soundcard
is recommended. A demo verion is avaliable by ftp to ftp.vortex.com .

Sound Forge - High specification sample editor / processing program for the PC. This package is the business when it comes to effects, with more effects than any other package that I know of. At $495 it seems a little overpriced, but professional uses may feel this is justified by the sheer range of effects. A crippled demo version of the program, which is well worth a look, is avaliable via ftp (from oak.oakland.edu).

Logic Audio - Mac based digital audio recording system. Requires dedicated sound card.

Recommended Books and Papers

[ Since I only have one audio related book (Ifeachor and Jervis), I need recommendations and reviews. Thanks. ]

General Books on DSP :

Ifeachor E.W. and Jervis B.W. (1993). "Digital Signal Processing - A Practical Approach". Addison-Wesley
- good basic dsp book with lots of examples. Some audio processing included.

Oppeheim and Schaffer "Discrete-Time Signal Proceesing" Prentice-Hall

Rabiner and Schaffer "Digital Processing of Speech Signals"

Audio Related books :

Moore F.R."Elements of Computer Music"
- well recommended as being a good book on audio synthesis and effects. C source included.
[if you know where I can get this in the UK please mail me!]

Strawn J. (ed) "Digital Audio Signal Processing". A-R Editions Inc. ISBN 0-89579-279-6.
- contains technical reference on phase vocoders.

Pohlmann, Ken "Digital Audio" and "Advanced Digital Audio". Howard Sams.

Important Journal Papers :

The Computer Music Journal (CMJ) and the Audio Engineering Society(???)
(AES) journal both carry relevent articles/papers. Please mail me if you have contact addresses.

General FAQ Questions and Comments

How do I get this document
Err, if you didn't already notice you have the document already. But I suppose you may be after the latest version (if there is one), so here are the best ways of getting it :

(the below should apply with the next version - v1.0 posted to Comp.dsp only)

USENET - I will be posting the released version every 2 weeks or so to:
Comp.dsp - the birthplace of this FAQ
Comp.music
Rec.audio.tech

If you think it should go anywhere else, then please tell me, but I don't want it going to every music group as it has a fairly technical content.

FTP - No site as yet, can anyone help ?????

EMAIL - Please only mail me with correction, additions, contributions, and comments. I have to use my works email account (for the moment) and have not the time or inclination to mail this to everyone who asks for it...sorry.
It may get sent to a list server, if there are any suitable ???

WWW - A html version may get done if somewhere to put it can be found.

How do I contribute to the FAQ?

Send me an email my address is curring@ferndown.ate.slb.com.

As this is only the first version of the FAQ, I have not covered some large areas of Audio FX. I want this FAQ to be as comprehensive as possible, and so
need input from you the reader.

Comments, corrections, and pointers are greatfully recieved, text that can be included in future versions will be rewarded with eternal gratitude (maybe).
Please note that I may well trim, edit, and change any contributions, and I am still the copyright holder of all parts of the text. Contributions will be
acknoledged where possible.
Thanks.


Help Desparately wanted...

As well as needing contibutions (text/comments - although money is also fine!) I also need help with a few other bits:

If anyone can help get ftp site(s) to hold this faq PLEASE get in touch.

I also need someone who would be prepared to proof read the faq and make changes concerning grammar and spelling - its a dirty job but some's gotta do
it.

Finally I really need internet access from home, if any internet providers in England (esp Bounemouth area) want to get in touch please do. I need a CHEAP
connection with email, usenet news, and possibly WWW capability. If I can avoid long distance BT calls thats even better :-)


About the Author

I'm Michael John Currington, a 22 year old Software Engineer from England. I have a degree in Electronic Engineering with Computing from Sheffield
University. Currently I live in the south of England working for Schlumberger Technologies.

In my spare time I listen to all sorts of music (from Portishead to The Wildhearts), compute, drink beer, play football (Soccer), and watch the X-files,
although not all simultaneously. This FAQ is also written in my spare time (it is not assocated in any way with my employers). Because of this I have to reply to mails during my spare time, which means you may have to wait a few days/weeks if you write to me, it also means I have no time to answer general questions unless directly FAQ related.

To contact me use one of the following methods:

Email - curring@ferndown.ate.slb.com

Remember this is a work account so the speed of reply will depend on how much spare time I have - ie be prepared to wait - sorry.

Snail mail -

Michael Currington
16 Leigh Gardens
Leigh Road
Wimbourne Minster
Dorset
England
BH21 2EW

Credits

Thanks to the following people for helping, encouraging, and contributing :

Robert Bristow-Johnson robert@audioheads.win.net
Arun Chandra arunc@ux1.cso.uiuc.edu
Oyvind Hammer oyvind.hammer@notam.uio.no
Eric Harnden harnden@auvm.bitnet
Paul G Russell paulr@neweast.ca
Frederick Umminger umminger@math.berkeley.edu


Thanks for moral support and sugestions :

Ho Chi Bun h9207737@hkuxa.hku.hk
Chip Burwell cburwell@cts.com
Andrew Gaylard gaylard@sixes-n-sevens.ee.wits.ac.za
Stefan Huy huy@tnt.uni-hannover.de
Steve Miller millersg@dmapub.dma.org
Bill Thompson bth@eznet.net

Thanks also to anyone I missed out (sorry), keep up the good work.


Errors, omissions, comments? Send us your feedback