Understanding Audio Sample Rate Conversions

Posted on by Larry

workflowNow that the Digital Production Buzz is back into full production, I’ve been thinking a lot about audio recently; specifically, audio sample rates.

When analog audio is digitized, it’s converted from waves into samples. The more samples per second, the higher the accuracy of the digitized sound.

The Nyquist Theorem states that if you divide the sample rate by 2, the resulting number represents the highest frequency that can be reproduced by that sample rate. Thus, 48,000 samples / 2 = 24,000 Hz. Since normal human hearing can only hear frequencies up to 20,000 Hz, a 48K sample rate means that digital audio clip will exceed the frequency requirements of normal human hearing; all other things being equal.


The reason for my thinking about sample rates is that when I produce The Buzz, I’m working at two different sample rates: 48,000 (called “48K”) and 44,100 (called “44.1K”). Both exceed the requirements of normal human hearing, but that doesn’t mean they are the same.

For the live show on Thursday nights, I play back a “show bed,” which is a QuickTime movie containing all recorded elements – like the open, close, commercials and music bumpers – along with visual segment timings. This vastly simplifies the audio production for a live show. This QuickTime show bed movie is created in stereo using a 48K sample rate.

The show is recorded on a Marantz PMD-661 digital audio recorder, also at 48K.

Then, the show is imported into Adobe Audition for any necessary clean-up; most often evening out audio levels.

NOTE: Here’s a recent article on how I handle audio levels, including the settings I use.

Then, to save file size, I export the show and all individual interviews in mono at 44.1K. This reduces file about 60% before compression.

Which returns me to my original question: what harm am I doing to my audio by converting it from 48K to 44.1K? So, I contacted Durin Gleaves, product manager for audio and Adobe Audition, at Adobe Systems to learn the answer.

– – –

[Durin writes:] Audition’s sample rate conversion algorithms are about the best out there. I recommend checking out http://src.infinitewave.ca/ which compares 50 or 60 DAWs and audio tools, measuring the artifacts when downsampling a 96kHz recording to 44.1kHz. It provides a very visual example of the

artifacts certain application algorithms can leave.


[Larry adds: This chart is from Infinite Wave comparing Audition CS6 (top) vs. ProTools HD 10.3.5. Notice the echo artifact falling off from the original sweep in the ProTools measurement (lower graph).]

[Larry continues: Here’s another by Infinite Wave comparing Audition CS6 (top) vs. Sony Vegas 9.0. Look at the audio mess that downsampling using Sony Vegas creates.]

[Durin continues:] In your example, you recorded your audio at 48K, which means the highest frequency which can be captured is 24kHz. (A little bit of math called the Nyquist Theorem posits that digital recordings can reproduce a frequency at half the sample rate.)

When you perform a sample-rate conversion downwards to 44.1kHz, you’re essentially chopping off any frequencies above 22,050 Hz. Audition offers a “quality” slider, as well as a pre/post filter, both of which can reduce artifacts and false reflections. As you might see from the infinitewave screenshot above, the down sample process can sometimes “bounce” any frequencies that WOULD go above 22.5K back downwards, which you certainly don’t want. You’re not changing pitch or timing at all, unless you’re using a REALLY bad audio device. (There are or were certain sound cards which only ever operated at a single sample rate, and while they tried to perform a realtime sample rate conversion for mismatched playback, sometimes the mileage varied.)

Apart from the loss of the more or less inaudible-to-most-of-us high frequencies, you’re not damaging your sound much in Audition.

Another step in the process may also include reducing the Bit Depth, which is another step where possible damage might occur. Audition records and processes at 32-bit, which means it stores a lot of potential dynamic range for every sample. Audio CD’s were fixed at 16-bit, which implies a maximum dynamic range of 96dB (the difference between the quietest and loudest possible signals), but through the use of dithering, adding a really quiet noise, the perceived dynamic range can actually be a little more.

In practice, for general speech and normal audio, you shouldn’t need to do much and the defaults will usually work just fine. If you’ve recorded and mixed a symphony orchestra at, say 192K sample rate and 32-bit depth and are prepping it for a standard audio CD, dithering can soften the harsh noise caused by digital quantization, which can sometimes be heard at the tail end of a fade-out.

– – –

Larry adds: Thanks, Durin, for your comments. As with all transcodes, I try to only convert between sample rates, or anything else, once. The fewer times I transcode an audio file, the better it will sound.

Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

Larry Recommends:

FCPX Complete

NEW & Updated!

Edit smarter with Larry’s latest training, all available in our store.

Access over 1,900 on-demand video editing courses. Become a member of our Video Training Library today!


Subscribe to Larry's FREE weekly newsletter and save 10%
on your first purchase.