Audio Mixing to Improve the Clarity of Speech

Posted on May 15, 2011 by Larry

[ This article was first published in the June, 2010, issue of
Larry’s Monthly Final Cut Studio Newsletter. Click here to subscribe. ]

This article was suggested several months ago by Philip Fass. Sorry it took me so long to write.

One of the signs of getting older is that our hearing is not as sharp as it once was. (Yes, that includes me, too, and I’m still frustrated about it.) So one of the things I do in my mixes is to be sure that I make things as clear and easy to understand as possible.

But first, a bit of audio theory.

VOLUME AND FREQUENCY

We frequently think about the loudness of a sound making it understandable. But that is only part of the solution.

All sound is composed of two elements: volume and frequency. The volume determines how loud a sound is, the frequency determines the pitch.

NOTE: Volume is also describe as “levels” and “gain.” Your choice.

When someone speaks, their voice contains lots of different volumes and frequencies. You can easily prove this to yourself by trying to make words using just a tone generator, like the one in Bars and Tone. You will get a series of beeps, but no words.

Human speech is “bursty.” This means that each syllable is a short burst of sound. These are the puffs we see when looking at waveforms in Final Cut’s Viewer or Timeline. Singing is more “steady-state.”

Here, for example, is a waveform of me saying: “to learn more about our weekly podcasts, visit DigitalProductionBuzz dot com.”

Each syllable is its own puff. The two puffs at the end are “dot com”.

But looking at a waveform only shows the volume of the sound. We need to look at a frequency chart to see the distribution of frequencies.

Happily, Soundtrack Pro provides this as part of an Audio File project. Here is the spectrum analysis for the words “dot com”. “Dot” is on the left and “com” is on the right.

This chart represents the frequency range and characteristics of human hearing.

Blue represents the frequencies at which there is no sound at that point in time
Pale blue represents frequencies with some level of sound
Green represents frequencies with a moderate level of sound
Dark green represents frequencies with a significant level of sound
Yellow represents frequencies with a high level of sound

Notice how the bulk of the sound is in the lower frequencies — less than 500 cycles? Well, this is partly due to the fact that I’m a guy. Girls have somewhat higher voices, but a surprising amount of audio is in low frequencies for both guys and girls.

Notice, also, that I have almost no frequencies above about 5 kHz. Girls would max out around 8 kHz. Remember that point, we’ll come back to it.

FREQUENCY MEANS DIFFERENT THINGS

Human hearing ranges from 20 cycles (so deep as to feel more like a vibration than a pitch) to 20,000 cycles (so high as to feel more like the wind than a tone); assuming we are all about 18 years old with average hearing for that age. As we get older, our hearing declines, which I have already grumped about.

NOTE: Two other thoughts about getting older. First, we lose high frequency hearing first. Second, guys start to lose higher frequencies before girls do.

Human speech, however, is more restricted in frequencies. Speech ranges from roughly 150 cycles to 6,000 cycles for men, and 350 cycles to 8,000 cycles for women. Kids are slightly higher yet. And, there is plenty of individual variation.

That which adds richness, tone, and “sexiness” to a voice are the lower frequencies. (This is why many radio DJ’s try to pitch their voice as deep as they can.)

Vowels live in the low frequencies.

Consonants, however, live in the high frequencies. And it is consonants that provide diction to speech.

NOTE: Someone who mumbles has almost no high-frequencies in their voice, which is why they are so hard to understand. It isn’t just volume, it’s pitch.

I read somewhere that the difference between the letter “F” and “S” is 6,100 cycles for a guy and 8,000 cycles for a girl. In other words, clarity is in the high frequencies.

This last statement gives us guidance on how to approach mixing projects for, ah, older folks. We need to boost the higher frequencies to improve intelligibility.

ADDING EQ TO THE MIX

One of the reasons I like mixing in Soundtrack Pro is the high-quality and precise filters that it contains, coupled with a sophisticated interface. Final Cut’s filters can’t begin to compete.

While it is true that every voice is different and that you should never use the same preset for everyone, I’m going to give you a couple of settings you can use as starting points to improve your own mixes.

In Soundtrack, unlike Final Cut, filters are applied to the track, not the clip. So, select a track by clicking on it. Here, I clicked the track Larry Intro.

Then, click the Effects tab to select it; the default location is in the Left Pane.

The left side of the window shows filter categories. Click EQ.

The right side of the window shows filters. Either double-click the filter named Fat EQ or highlight it and click the Plus button.

The filter interface that appears has been known to scare small children. However, it isn’t as bad as all that.

As I said, human hearing extends from 20 cycles, on the left, to 20,000 cycles, on the right. The Fat EQ filter separates this into five bands, going from left to right:

Frequencies below human speech
Low-frequency human speech
Mid-range human speech
High-frequency human speech
Frequencies above human speech

While this filter has a LOT of flexibility, I want to concentrate on two things:

Making a voice sound warmer
Making a voice easier to understand

The warmth of a voice is in the lower frequencies — Band 2. As we increase this setting, we make a voice more inviting. As we decrease this setting, we make a voice more sterile.

The Fat EQ filter allows us to increase or decrease specific ranges of frequencies. This allows us to change portions of the sound without changing all of it. This is very similar to color correction, where we can change the color of the shadows while not changing the color of highlights.

To vary the amount of the change, drag the circular wheel up or down.

NOTE: When we change frequencies, we want to make small changes. We are not creating mountains here, we are creating molehills!

To vary the frequencies we are changing, grab the frequency number and drag it up or down. (You can also double-click it to enter a specific number.)

Notice that we are not changing a specific frequency. We are changing a range of frequencies around a central point. When we work with audio, we are always working with ranges, not specific frequencies.

Now that you know how to make changes to the filter, let’s look at some specific settings that we can use to improve our sound.

GENERAL SETTINGS FOR A MALE VOICE

NOTE: Before the emails start flying, allow me to state, again, that every voice is different. Use these as guides, then adjust until the voice sounds good to your ears.

I recently changed my opinions on where to set my presets and I’m continuing to refine them. However, to warm up the low-end of a voice, I’ll add +3 dB of gain around 170 cycles. Then, to improve clarity, I’ll add +4.5 dB of gain around 3500 cycles.

Finally, I’ll drop the Master Gain by -1 dB to help prevent distortion caused by boosting frequencies.

GENERAL SETTINGS FOR A FEMALE VOICE

I do much less work with female voices, but when I do, here is where I start.

To warm up the low-end of a voice, I’ll add +3 dB of gain around 390 cycles. Then, to improve clarity, I’ll add +4.5 dB of gain around 5500 cycles.

Finally, I’ll drop the Master Gain by -1 dB to help prevent distortion caused by boosting frequencies.

SUMMARY

By boosting the high-frequencies a bit, I add sparkle and clarity to the voice to help make sure that what my actors are saying is intelligible to the audience. Even us older folks…!

BIG NOTE: If you, like me, use the Limiter filter to help even out levels, be sure to apply the Limiter filter LAST, so it is at the bottom of the filter stack. Otherwise, the EQ filter is likely to distort your audio.

Bookmark the permalink.

9 Responses to Audio Mixing to Improve the Clarity of Speech

egreenz says:

August 31, 2011 at 2:47 pm

Nice class. I wish it will continue …… Please how can I make my mix heavy. And I really don’t understand the reverb in cubase 5. Thanks

Reply
- Larry Jordan says:
  
  August 31, 2011 at 4:29 pm
  
  Egreenz:
  
  I don’t know what you mean by “heavy” — and you’re on your own for Cubase, it is a program I’ve never used.
  
  Larry
  
  P.S. Thanks for the kind words!
  
  Reply
Paul says:

October 14, 2012 at 7:59 pm

Great article. I will try your settings tips. This enlightens me enormously.

I’m nowhere near a professional. Just an amateur who, by default, is drafted to fine-tune the mixers at some of the clubs and restaurants where the female jazz singer whom I manage is performing live.

Clarity of speech is always an issue when she is performing because people really want to hear the words she is singing, and I certainly will try your settings.

We want to keep her voice warm, which it is, but clear. I have been using something approximating 2 o’clock for the highs, 1 o’clock for the mid (both the frequency and strength dials) and 12 o’clock or 11 o’clock for the Lows. Based on your recommendations, I’m going to try something like 1 o’clock for the Lows, 12 o’clock for the mid, and 2-3 o’clock for the highs. I will play with it from there, and hopefully this will help clarity of speech and keep her voice warm at the same time. Naturally, most of the soundboards we work with do not allow fine-tuning of the exact frequency range, like the mixer you cite in your article, except sometimes in the mid-range. Also, reducing the master is something I did not know about, but sounds right because we do experience some distortion from time to time.

My question concerns reverb. She likes a fair amount of reverb. We don’t get fancy by changing the reverb program settings, basically because we don’t understand how to do it. We usually use Large Hall or something similar.

Question: am I mistaken, or does using reverb (such as Large Hall) make clarity of speech even more challenging?

Are there any tips you can give as to equalization settings which will offset any deterioration in voice clarity because of the reverb? Or, do you just suggest to use the target settings you have suggested (bearing in mind that every voice is different) and then dealing with any remaining voice clarity issues by just reducing the reverb?

In other words, is there a different rule of thumb re settings for increasing clarity of speech while at the same time giving her quite a bit of the reverb that she wants?

Appreciate your thoughts and suggestions, as well as tips for settings. Thanks.

Reply
Håkan says:

July 8, 2013 at 9:37 am

Hi!

How would you set up the EQ in Windows Media Player to make speech in videos most easy to hear?

Reply
Colin Trainer says:

September 30, 2013 at 8:48 am

Thank you Larry. I have been mixing/recording for years, more obsessed with the relative volumes of various sounds which in the Atari 520 with Cubase and what was then expensive outboard stuff yet still sounded muddled for vocals (It was’t the luxury of multitracking back in 97.. I am rediscovering the joys after a 16 year break by purchasing Cubase 7 for the PC with all these wonderful VST modules so your page here could not have come at a better time. So very well explained and 10 out of 10 for not mentioning Q point ! It also explained why my partner never understands what I say. I’ve been toying with the idea of ‘suggesting’ she go for a hearing test but now it may be ‘Me’ after all 🙂 I must user more consonants!

Reply
Amanda Mayer says:

May 30, 2014 at 6:34 am

Thank you so much for doing this. It’s a very nice article. Question — I don’t “mix” I work with audio files that I must hear very clearly… does anyone know of an eq that I can download that will improve my listening experience, but not alter the digital audio file itself? Thanks so much.

Reply
beni says:

January 8, 2017 at 3:17 am

Larry nice post. I discovered u did not cut or roll off in the low frequency,between 40 to 90hz.

Reply
- Larry says:
  
  January 8, 2017 at 10:56 am
  
  Correct.
  
  I roll off low frequencies in the mic, rather than in post.
  
  Larry
  
  Reply
Pésimo sonido en TV: Efectos del coronavirus en la comunicación says:

March 29, 2020 at 2:45 pm

[…] Si el mensaje es menos entendible, nos cuesta mucho más procesar el mensaje. Tenemos que hacer un esfuerzo extra para comprender lo que se está diciendo. Necesitamos prestar más atención. Hacer este sobreesfuerzo durante un tiempo prolongado produce una fatiga cognitiva, lo que hace que al cabo de un rato nos sintamos agobiados con ese tipo de sonido. En definitiva, el sobreesfuerzo cognitivo por la mala calidad de sonido crea una mala experiencia sonora que no nos permite disfrutar del contenido en sí. En la web del fabricante de micrófonos DPA (los micrófonos que se usan en los platós de TV) hay un artículo sobre la la importancia de la inteligibilidad en la voz en los medios. […]

Reply