Why 24-bit sound is crazy talk, but we’ll all end up there anyway

I like music — it’s in my blood. My mother, in her maiden days, sang in a local Big Band and my grandfather played Hammond organs wherever he could find the work (which wasn’t often during the Great Depression). I grew up in a house that always had guitars and keyboards around. The first instrument I learned to play was, to the chagrin of my parents, the snare drum. I was in the fourth grade and I sucked. In fact my entire section of snare drummers sucked.

Sucked Bad.

We had so little ability with rhythm that our section had to practice separately from the rest of the school orchestra. The emotional scars are still healing, thanks for asking.

I learned guitar in high school with the help of school chum, Nick Sevano. The first song I learned to “play” was Pink Floyd’s Set the Controls for the Heart of the Sun, which comprises four different notes. I soon learned the vocal line to Duran Duran’s seminal anthem, Hungry Like the Wolf. Somehow, The Association’s Wendy slipped in the mix, but that’s not relevant right now.

I used to record my “playing” using a single track of a standard stereo tape deck. I discovered early on that I could record one track on, say, the left channel and then I could record on right track while listening to the playback on the left channel. The result? Awful beyond words.

When I got to college (the first time), I discovered the unalloyed joy of 4-track cassette recording. My first solo 4-trackin’ experience was a trippy cover of Hendrix’s/Dylan’s All Along the Watchtower in E minor. Somewhere, I still have that festival of digitally delayed feedback mastered onto stereo cassette.

The point is that I’ve been involved with home recording to some degree for over a decade. I’ve got both imperical and academic knowledge of the way sound works, although it’s not as extensive as the lore known by professional sound engineers. However, I do understand how computers interpret analog sound. Here’s a brief primer on the challenges of digitizing audio.

The fundamental problem with sound is that there’s so darn much of it. More precisely, analog sound comes in continuous waves. Computers (I’m referring to PC’s here) are digital and can only understand discrete values (like “on” and “off”). It is easy enough to measure the amplitude of the wave at given point in time. This produces a discrete value which makes the computer happy. However, converting a continuous analog sound wave into a series discrete values means that some parts of that wave will not be captured. If you are numerically inclined, this problem is similar to asking what real number comes after 1.0. Is it 1.1? 1.01? 1.0001? There is no correct answer to this question. You simply have to pick a precision that you can handle. Similiarly, the Digital Audio Converters (DACs) on your PC’s soundcard can’t take an infinite number of measurements (or samples) of a soundwave. Even if the hardware could, there isn’t a hard drive in existence that can hold an infinity of samples. Instead, DACs take a fixed number of samples per second — enough to fool the human ear into hearing a continous sound. For CD-quality sound, that sample rate is around 44,000 samples per second.

What’s so magical about 44,000 samples per second? The answer is that it’s better than twice 20,000. That doesn’t clear it up for you? First, understand that the range of human hearing is roughly between 20 and 20,000 hertz (hertz being a measurement of sound frequency). What’s interesting to me about this fact is that it is unrelated to the loudness (decibels) of the sound. Even a 120 dB 10 hz sound wave (a level of noise normally encountered on tarmacs and Jimmy Page concerts) isn’t perceived as sound by humans. Remember that DACs sample sound a specific rate. Imagine a very low sample rate of 1 sample per time unit, but the sound’s frequency is a little faster than that, as the horrible ASCII art below suggests.

f| x . . . r| . . . x e+——.—|——.————.———|.———————- q| . . . . .| 1 . . 2 .

             (time)

If each ‘x’ is the place where the DAC samples the sound, it’s easy to see that there’s a lot of wave dynamic missing from that data set. When played back, the wave form is said to be aliased. That is, instead of getting that nice sine wave back, the computer “draws” a straight line between the points, like the following badly drawn figure illustrates.

f| …………………. r| . ……………. e+——….——————————|———————— q| .| 1 2

             (time)

This is the problem of representing continuous data in a discrete format: information gets lost. Computer monitors have exactly the same problem with displaying curved images, like fonts (which has given rise “anti-aliasing”, the trick of painting “in-between” pixels in a combination of the font and background color that almost fools the human eye into seeing a curve where none really is).

So to prevent noticable aliasing, you want to sample at rate that’s below human perception. Since the fastest frequency humans perceive is 20 khz, you might be tempted to think that that’s “good enough” for sampling. It’s not quite. At 20 khz, it’s still possible to notice aliasing. Foolishly, I recorded Question of Doubt with a sampling rate of 20 khz and the result sounds “fuzzy” or “cottony” (it’s the song’s arrangement that sounds “crappy”). The sounds lack the clarity and punch of the analog cassette master. If we doubled the sampling rate in the figures above, notice how the signal begins to approximate the original more closely.

f| x . . . r| . . . x e+——.—|——.————.———|.———————- q| . . . . .| 1 x . 2 .x

             (time)

f| ……… r| . . ………. e+——….———-.—————-.|———-.———— q| . . . .| 1 ………….2 ……..

             (time)

By sampling twice as much as the top of human hearing, aliasing occurs at levels that are noticed by very few human ears, if any (of course, you will run into plenty of “experts” that claim to hear the difference).

So the sample is important to sound quality, but what about the numeric value of each sample? Recall that samples are measurements of the wave’s amplitude, which indicate the volume of the sound. The actual units aren’t important, but the number of units is. In other words, if I can only represent sound as being on or off, I have to pick some decibel level below which a sound is considered to be ‘off’ and louder sounds are ‘on’. Because I have only one value for ‘on’, all frequencies will be reproduced at the same decibel level. Eew. So, the more gradations I choose to represent the decibel level, the richer the sound will be at playback. Again, CD-quality bit rate is 16. As all good Comp Sci majors know, 16 bits holds 65,536 values. Is this good enough for human perception? While the frequency perception of human hearing has a rather broad range, our perception of decibel changes is much more limited, somewhere above 0 to about 120 (at which point auditory damage occurs). Keeping in mind that aliasing is as much a concern for representing decibels as frequency, are 16 bits enough? The answer is that it’s way more than enough. Simple math informs the curious that there are about 546 values available to represent loudness between each integer value of human decibel perception. Humdinger, that’s a lot! Of course, it’s only a lot if humans aren’t very sensitive to decibel changes. They aren’t. It usually takes sound to change by a couple of decibles before most people will notice the difference (again, you can find double-latte-swelling blow-hards who claim that beating of cochroach hearts ruin their experience listening to the latest Tori Amos offering).

If 16-bit sound sampled 44,000 times a second is overkill for human perception, why bother with 24-bit sound at higher sample rates? For the home audio enthusiast, there is no earthly reason at all to upgrade your working equipment to handle numerically higher values of sound.

But you will anyway.

The first force that will cause you to upgrade will be amazing advertising pressure to do so. Even if this issue can be skirted, you’ll find that almost all consumer hardware is going to support these improved formats anyway for no extra charge. The beefer integrated circuits that handle 24-bit sound will cost the same or less than those 16-bit ICs. That’s why you don’t see 2x CD-ROM drives anymore: the technology to make the slower drives is now more expensive that what’s needed to make a 54x drive. Thank you muchly, Mr. Moore!

So we’ll all be paying for sound quality we can never appreciate. But think of the numbers, man!

[Original use.perl.org post and comments.]