1159: Bit resolutie in audio opname en processing

Dr.Godfried-Willem RAES

Kursus Experimentele Muziek: Boekdeel 1: Algoritmische kompositie - audiotechnologie

Hogeschool Gent : Departement Muziek

Bit-resolutie in digitale audio.

1159

Why (Almost) Everything You Thought You Knew About Bit Depth Is Probably Wrong

August 29, 2013

by Justin Colletti

A lot of competent audio engineers working in the field today have some real misconceptions and gaps in their knowledge around digital audio. Not a month goes by that I don’t encounter an otherwise capable music professional who makes simple errors about all sorts of basic digital audio principles – The very kinds of fundamental concepts that today’s 22 year-olds couldn’t graduate college without understanding. There are a few good reasons for this, and two big ones come to mind immediately: The first is that you don’t really need to know a lot about science in order to make great-sounding records. It just doesn’t hurt. A lot of people have made good careers in audio by focusing on the aesthetic and interpersonal aspects of studio work, which are arguably the most important. (Similarly, a race car driver doesn’t need to know everything about how his engine works. But it can help.) The second is that digital audio is a complex and relatively new field – its roots lie in a theorem set to paper by Harry Nyquist 1928 and further developed by Claude Shannon in 1946 – and quite honestly, we’re still figuring out how to explain it to people properly. In fact, I wouldn’t be surprised if a greater number of people had a decent understanding of Einstein’s theories of relativity, originally published in 1905 and 1916! You’d at least expect to encounter those in a high school science class. If your education was anything like mine, you’ve probably taken college level courses, seminars, or done some comparable reading in which well-meaning professors or authors tried to describe digital audio with all manner of stair-step diagrams and jagged-looking line drawings. It’s only recently that we’ve come to discover that such methods have led to almost as much confusion as understanding. In some respects, they are just plain wrong. What You Probably Misunderstand About Bit Depth I’ve tried to help correct some commonly mistaken notions about ultra-high sampling rates, decibels and loudness, the real fidelity of historical formats, and the sound quality of today’s compressed media files. Meanwhile, Monty Montgomery of xiph.org does an even better job than I ever could of explaining how there are no stair-steps in digital audio, and why “inferior sound quality” is not actually among the problems facing the music industry today. A bad way, and a better way to visualize digital audio. Images courtesy Monty Montgomery's Digital Show and Tell video. (Xiph.org) A bad way, and a better way to visualize digital audio. Images courtesy Monty Montgomery’s Digital Show and Tell video. (Xiph.org) After these, some of the most common misconceptions I encounter center around “bit depth.” Chances are that if you’re reading SonicScoop, you understand that the bit depth of an audio file is what determines its “dynamic range” – the distance between the quietest sound and the loudest sound we can reproduce. But things start to go a little haywire when people start thinking about bit depth in terms of the “resolution” of an audio file. In the context of digital audio, that word is technically correct. It’s only what people think the word “resolution” means that’s the problem. For the purpose of talking about audio casually among peers, we might be even better off abandoning it completely. When people imagine the “resolution” of an audio file, they tend to immediately think of the “resolution” of their computer screen. Turn down the resolution of your screen, and the image gets fuzzier. Things get blockier, hazier, and they start to lose their clarity and detail pretty quickly. Perfect analogy, right? Well, unfortunately, it’s almost exactly wrong. All other things being equal, when your turn down the bit depth of a file, all you’ll get is an increasing amount of low-level noise, kind of like tape hiss. (Except that with any reasonable digital audio file, that virtual “tape hiss” will be far lower than it ever was on tape.) That’s it. The whole enchilada. Keep everything else the same but turn down the bit depth? You’ll get a slightly higher noise floor. Nothing more. And, in all but extreme cases, that noise floor is still going to be – objectively speaking – “better” than analog. On Bits, Bytes and Gameboys This sounds counter-intuitive to some people. A common question at this point is: “But what about all that terrible low-resolution 8-bit sound on video games back in the day? That sounded like a lot more than just tape hiss.” That’s a fair question to ask. Just like with troubleshooting a signal path, the key to untangling the answer is to isolate our variables. Do you know what else was going on with 8-bit audio back in the day? Here’s a partial list: Lack of dither, aliasing, ultra-low sampling rates, harmonic distortion from poor analog circuits, low-quality dither, low-quality DA converters and filters, early digital synthesis, poor quality computer speakers… We could go on like this. I’ll spare you. Nostalgia, being one of humanity’s most easily renewable resources, has made it so that plenty of folks around my age even remember some of these old formats fondly. Today there are electronic musicians who make whole remix albums with Nintendos and Gameboys, which offer only 4 bits of audio as well as a myriad of other, far more significant issues. (If you like weird music and haven’t checked out 8-Bit Operators’ The Music of Kraftwerk, you owe it to yourself. They’ve also made tributes to Devo and The Beatles.) But despite all that comes to mind when we think of the term “8 Bits,” the reality is that if you took all of today’s advances in digital technology and simply turned down the bit depth to 8, all you’d get is a waaaaaaay better version of tape cassette. There’d be no frequency problems, no extra distortion, none of the “wow” and “flutter” of tape, nor the aliasing and other weird artifacts of early digital. You’d just have a higher-than-ideal noise floor. But with at least 48 dB of dynamic range, even the noise floor of modern 8-bit audio would still be better than cassette. (And early 78 RPM records, too.) Don’t take my word for it. Try it yourself! Many young engineers discover this by accident when they first play around with bit-crushers as a creative tool, hoping to emulate old video game-style effects. They’ll often become confused and even disappointed to find that simply lowering the bit count doesn’t accomplish 1/50th of what they were hoping for. It takes a lot more than a tiny touch of low-level white noise to get a “bad” sounding signal. The Noise Floor, and How It Effects Dynamic Range This is where the idea of “dynamic range” kicks in. In years past, any sound quieter than a certain threshold would disappear below the relatively high noise floor of tape or vinyl. Today, the same is true of digital, except that the noise floor is far lower than ever before. It’s so low, in fact, that even at 16 bits, human beings just can’t hear it. An 8-bit audio file gives us a theoretical noise floor 48dB below the loudest signal it can reproduce. But in practice, dithering the audio can give us much more dynamic range than that. 16-bit audio, which is found on CDs, provides a theoretical dynamic range of 96dB. But in practice it too can be even better. Let’s compare that to analog audio: Early 78 RPM records offered us about 30-40 dB of dynamic range, for an effective bit depth of about 5 -6 bits. This is still pretty useable, and it didn’t stop people from buying 78s back in the day. It can even be charming. It’s just nowhere near ideal. Cassette tapes started at around 6 bits worth of “resolution”, with their 40 dB of dynamic range. Many (if not most) mass-produced cassettes were this lousy. Yet still, plenty of people bought them. If you were really careful, and you made your tapes yourself on nice stock and in small batches, you could maybe get as much as 70dB of dynamic range. This is about equivalent to what you might expect out of decent vinyl. Yes, it’s true, it’s true. Our beloved vinyl, with its average dynamic range of around 60-70dB, essentially offers about 11 bits worth of “resolution.” On a good day. Professional-grade magnetic tape was the king of them all. When the first tape players arrived in the U.S. after being captured in Nazi Germany at the end of World War II, jaws dropped in the American music community. Where was the noise? (And you could actually edit and maybe even overdub? Wow.) By the end of the tape era, you could get anywhere from 60dB all the way up to 110dB of dynamic range out of a high-quality reel – provided you were willing to push your tape to about 3% distortion. Those were the tradeoffs. (And even today, some people still like the sound of that distortion in the right context. I know I do.) Digital can give us even more signal-to-noise and dynamic range, but at a certain point, it’s our analog circuits that just can’t keep up. In theory, 16-bit digital gives us 96 dB of dynamic range. But in practice, the dynamic range of a 16-bit audio file can reach well over 100 dB – Even as high as 120 dB or more. This is more than enough range to differentiate between a fly on the wall halfway across your home and a jackhammer right in front of your face. It is a higher “resolution” than any other consumer format that came before it, ever. And, unless human physiology changes over some stretch of evolution, it will be enough “resolution” for any media playback, forever. Audio capture and processing however, are a different story. Both require more bits for ideal performance. But there’s a limit as to how many bits we need. At a certain point, enough is enough. Luckily, we’ve already reached that point. And we’ve been there for some time. All we need to do now is realize it. Why More Bits? Here’s one good reason to switch to 24 bits for recording: You can be lazy about setting levels. 24 bits gives us a noise floor that’s at least 144 dB below our peak signal. This is more than the difference between leaves rustling in the distance and a jet airplane taking off from inside your home. This is helpful for tracking purposes, because you have all that extra room to screw up or get sloppy about your gain staging. But for audio playback? Even super-high-end audiophile playback? It’s completely unnecessary. Compare 24-bit’s 144 dB of dynamic range to the average dynamic range of commercially available music: Even very dynamic popular music rarely exceeds 4 bits (24dB) or so worth of dynamic range once it’s mixed and mastered. (And these days, the averages are probably even lower than that, much to the chagrin of some and the joy of others.) Even wildly dynamic classical music rarely gets much over 60 dB of dynamic range. But it doesn’t stop there: 24-bit consumer playback is such overkill, that if you were able to set your speakers or headphones loud enough so that you could hear the quietest sound possible above the noise floor of the room you were in (let’s say, 30db-50dB) then the 144 dB peak above that level would be enough to send you into a coma, perhaps even killing you instantly. The fact is, that when listening to recorded music at anything near reasonable levels, no one is able to differentiate 16-bit from 24-bit. It just doesn’t happen. Our ears, brains and bodies just can’t process the difference. To just barely hear the noise floor of dithered 16 bit audio in the real world, you’d have to find a near-silent passage of audio and jack your playback level up so high that if you actually played any music, you’d shear through speakers and shatter ear drums. (If you did that same test in an anechoic chamber, you might be able to get away with near-immediate hearing loss instead. Hooray anechoic chambers.) But for some tasks, even 24-bits isn’t enough. If you’re talking about audio processing, you might go higher still. 32 Bits and Beyond Almost all native DAWs use what’s called “32-bit Floating Point” for audio processing. Some of them might even use 64 bits in certain places. But this has absolutely no effect on either the raw sound “quality” of the audio, or the dynamic range that you’re able to play back in the end. What these super-high bit depths do, is allow for additional processing without the risk of clipping plugins and busses, and without adding super-low levels of noise that no one will ever hear. This extra wiggle room lets you do insane amounts of processing and some truly ridiculous things with your levels and gain-staging without really thinking twice about it. (If that happens to be your kind of thing.) To get the benefit of 32-bit processing, you don’t need to do anything. Chances are that your DAW already does it, and that almost all of your plugins do too. (The same goes for “oversampling,” a similar technique in which an insanely high sample rate is used at the processing stage). Some DAWs also allow the option of creating 32-bit float audio files. Once again, these give your files no added sound quality or dynamic range. All this does is take your 24-bit audio and rewrite it in a 32-bit language. In theory, the benefit is that plugins and other processors don’t have to convert your audio back and forth between 24-bit and 32-bit, thereby eliminating any extremely low-level noise from extra dither or quantization errors that no one will ever hear. To date, it’s not clear whether using 32-bit float audio files are of any real practical benefit when it comes to noise or processing power. The big tradeoff is that they do make all of your projects at least 50% larger. But if you have the space and bandwidth to spare, it probably can’t hurt things any. Even if there were a slight noise advantage at the microscopic level, it would likely be smaller than the noise contribution of even one piece of super-quiet analog gear. Still, if you have the disk space and do truly crazy amounts of processing, why not go for it? Maybe you can do some tests of your own. On the other hand, if you mix on an analog desk you stand to gain no advantage from these types of files. Not even a theoretical one. A Word On 48-bit Years ago, Pro Tools, probably the most popular professional-level DAW in America, used a format called “48-Bit Fixed Point” for its TDM line. Like 32-bit floating, this was a processing format, and it had pretty much nothing to do with audio capture, playback, or effective dynamic range. The big difference was in how it handled digital “overs”, or clipping. 32-bit float is a little bit more forgiving when it comes to internal clipping and level-setting. The tradeoff is that it has a potentially higher, and less predictable noise floor. The noise floor of 48-bit fixed processing was likely to be even lower and more consistent than 32-bit float, but the price was that you’d have to be slightly more rigorous about setting your levels in order to avoid internal clipping of plugins and busses. In the end, the differences between the two noise floors is basically inaudible to human beings at all practical levels, so for simplicity’s sake, 32-bit float won the day. Although the differences are negligible, arguing about which one was better took up countless hours for audio forum nerds who probably could have made better use of that time making records or talking to girls. All Signal, No Noise To give a proper explanation of the mechanics of just how the relationship between bit depth and noise floor works (and why the term “resolution” is both technically correct and so endlessly misleading for so many people) would be beyond the scope of this article. It requires equations, charts, and quite possibly, more intelligence than I can muster. The short explanation is that when we sample a continuous real-world waveform with a non-infinite number of digital bits, we have to fudge that waveform slightly in one direction or another to have it land at the nearest possible bit-value. This waveform shifting is called a “quantization error,” and it happens every time we capture a signal. It may sound counter-intuitive, but this doesn’t actually distort the waveform. The difference is merely rendered as noise. From there, we can “dither” the noise, reshaping it in a way that is even less noticeable. That gives us even more dynamic range. At 16 bits and above, this practically unnecessary. The noise floor is so low that you’d have to go far out of your way to try and hear it. Still, it’s wise to dither when working at 16 bits, just to be safe. There are no real major tradeoffs, and only a potential benefit to be had. And so, applying dither to a commercial 16-bit release remains the accepted wisdom. Now You Know If you’re anything like me, you didn’t know all of this stuff, even well into your professional career in audio. And that’s okay. This is a relatively new and somewhat complex field, and there are a lot of people who can profit on misinforming you about basic digital audio concepts. What I can tell you is that the 22-year olds coming out of my college courses in audio do know this stuff. And if you don’t, you’re at a disadvantage. So spread the word. Thankfully, lifelong learning is half the point of getting involved in a field as stimulating, competitive and ever-evolving as audio or music. Keep on keeping up, and just as importantly, keep on making great records on whatever tools work for you – Science be damned.

Justin Colletti is a Brooklyn-based producer/engineer, journalist and educator. He records and mixes all over NYC, masters at JLM, teaches at CUNY, is a regular contributor to SonicScoop, and edits the music blog Trust Me, I’m a Scientist. - See more at: http://www.sonicscoop.com/2013/08/29/why-almost-everything-you-thought-you-knew-about-bit-depth-is-probably-wrong/

Filedate: 980129/ last update: 2013-09-30

Terug naar inhoudstafel kursus: <Index Kursus>

Naar homepage dr.Godfried-Willem RAES

Robots