© 2017 All Rights Reserved
This article, written in 2012 by Chris Monty Montgomery was posted in an FB Group that I frequent. In the world of audio that would be a downright old article, but there isn’t likely much that has changed on a technical level since then. In it, Monty talks about sample rates, and our hearing, and the main crux of the article is that a sample rate of 192k provides diminishing returns.
Now, anyone who knows me, understands that these sorts of technical articles seriously put me to sleep. For whatever reason, I managed to read this whole long ass article this morning. Overall, many of his statements align with what I’ve observed over the years. But here are two things that stuck out to me.
First, there’s this little nugget:
“At low frequencies, the cochlea works like a bass reflex cabinet. The helicotrema is an opening at the apex of the basilar membrane that acts as a port tuned to somewhere between 40Hz to 65Hz depending on the individual. Response rolls off steeply below this frequency.”
That’s a stunning revelation and explains a lot. It may not seem like that big a deal, but that amount of variance in human hearing, particularly in the low end, would make a tremendous difference in how a professional might mix. Someone with a steep drop off below 65Hz is likely to mix with considerably more low end than someone with a steep roll off after 40Hz.
Then there’s this statement, which I question, and I’ll explain why.
“Empirical evidence from listening tests backs up the assertion that 44.1kHz/16 bit provides highest-possible fidelity playback.”
This one I don’t agree with, but there’s a very good reason why I come to a different conclusion on this. Technically, from a physics perspective, he might be right about that statement. I’m certainly not in position to argue. But I can tell you, when I’m in the room, I know instantly whether the ME is monitoring at 24 bit or 16, and much of that has to do with my intimacy to the material. Oddly, in many ways I prefer the 16bit playback once we’re at the mastering stage, as it makes it sound more like a “record” to me. I always figured that was a perception borne of familiarity with the sound of a CD. Whereas 16bit/44.1 is the bare minimum for critical listening evaluations, it’s also a sound we’re quite used to by now. I don’t think that can be discounted entirely.
Further, there’s clearly an audible difference between 48k fidelity and 44.1k, which will be readily discernible by most top professionals. There are a ton of semi-pros and hacks in this business too. So, that would muddy the waters significantly, particularly from a scientific perspective. But anyone who has trained their ears over time and who has the physical and mental qualities to be good at evaluating sound, can readily hear the difference between 44.1k, 48k, 16bit and 24bit in a proper critical listening environment, on music they are super familiar with, particularly when that professional has a stake in the outcome of that music.
That’s an important distinction. Your average punter generally can’t hear these differences even in a critical listening environment, mostly because he’s not trained for what to listen for. But even for the highly trained professional, judging these differences can be exceptionally difficult to hear in a casual setting. As professionals, if we don’t like the music, or if we aren’t familiar with the music, or if we don’t have some kind of stake in the music being successful, we really aren’t in a position to evaluate the sound of it, because it doesn’t have the same kind of relevance to us.
When I’m close to finishing a mix I find it difficult to stop singing it. Essentially, once I have the mix in a place where it carries me away with the music, I’ve successfully used sound, in particular how I balanced those sounds, to cause a reaction to the music by the listener. If I don’t cause a reaction to the music in myself, I can’t rightly expect that I’ll cause one in someone else. This is also why I try to do my best to avoid mixing a song that I don’t particularly like. I find myself almost lost when evaluating sound without the benefit of music that moves me in some way. There must be something more redeeming than just the sound.
And this is the thing that I think has been missing in all of the science on this for decades now. Music is nothing more than organized sound which can cause an involuntary response. Goosebumps happen, right? And while we have control over the sound, it’s the sound and how we put it together that directly affects the music, and it’s the music that affects us as listeners.
If I’m mixing towards a musical goal (which takes into account how the music makes me feel), then mixing is easy. If I’m mixing towards a sonic goal, mixing is nearly impossible. It’s kind of like trying to draw a perpendicular line without the benefit of knowing where the base line is, or which way is up for that matter.
This was the basis of the argument between Ethan Winer and I (and my Womb Forums brethren) back in 2010 that he’s still unhinged over to this day. Ethan thinks that the emotional impact of the music can be completely ignored and separated when making scientific evaluations. In fact, he rejects the entire concept of emotional impact, which is staggering, and argues that music can be completely separated from the sound when it comes to critical evaluation of it. Yeah. if we’re comparing farts maybe. Unfortunately, it’s not only impossible to evaluate the sound outside of emotional impact of the music, it’s counter to our goals as professional producers and mixers. Whereas our ears accept the sonic information physically, it’s our brains that decipher the information. Which would explain why we sometimes aren’t in the mood for a song that we normally love.
When sound is organized into a musical presentation, we aren’t looking for the listener to say that it sounds awesome. We’re looking for a reaction to the music. Preferably a positive reaction, but any reaction is better than none.
So, if the goal of what we do is to get a reaction to the music, and if music is nothing but organized sound, then we would be foolish to discount the effect of the emotional impact, because that’s what lights up our brains. Whether that’s by combining a screeching violin with a low drone to produce a feeling of foreboding. Or by using a diminished 7 chord to create the feeling of mystery. Or whether that’s a skippy beat that causes a nearly unstoppable toe tap, none of the emotional response can be discounted when evaluating sound, since that’s the overarching goal with the music.
As I said, much of what Monty argues in this article aligns with my own anecdotal evidence over the years. I have always felt that 192k was diminishing returns at best. In fact, I often describe reproduction at that sample rate as “weird, weird, weird.” But where Monty (and just about everyone else) gets it wrong is when all of these scientific evaluations are made as the music itself is outright ignored, as if it’s irrelevant to the discussion. You cannot ignore the music. Music garners a reaction out of us that we don’t have a tremendous amount of control over. Once you introduce an emotional reaction, you cannot then evaluate how we decipher the sound without including it. Yes, we listen to music. But ultimately, we feel it. And when we evaluate sound without evaluating how the music is making us feel, we are literally ignoring at least half the equation, perhaps as much as 90% of it.
Talk about lossy!
Until we start to take emotional impact into account, we’re never going to get the format right.
Which is why mixing remains an art. Because the people who are good at mixing, understand that all that matters is how the rhythm, melody, harmony, counter-melodies, and responses work together to cause an emotional reaction in the listener. And yes, as a mixer I must concern myself with sound, but the mix isn’t done until I’ve lit up my emotions in just the right way, all in the hopes that I might light up your emotions when you hear the song.
At the moment, there are quite a few professionals that prefer 96k. Many of my good friends insist the plugins work better at that sample rate. That may be true, particularly on Pro Tools, perhaps even exclusively to it. But in my world, 48k still wins the day. And so long as I keep my concentration on the music and the way that it makes me feel, I’ll never really be at a disadvantage against those using higher sample rates. Neither will you.
So, if you’re looking to make music for a living at a high level, the best advice I can offer is to focus on what matters–emotional impact. If that’s right, the sound is right.
Eventually scientists will figure it out that you can’t separate our state of mind from what we hear.
If you’d like to discus these concepts further, join me and my knowledgable friends at Facebook Mixermania
Be sure to read my newest book! #Mixerman and the Billionheir Apparent – a satire of the Modern Music Business through the prism of US Politics and vice versa.
These are the most comprehensive courses on vocals you’re going to find anywhere (and we’re currently hard at work on the PRODUCING Vocals Course), with over 7.5 hours of lessons in 2 courses.