we listen
vintage classroom
Dynamix Productions, Inc.
  • © 2003 - 2023 Dynamix Productions, Inc. Contact Us 0

A Sound Education

Chocolate Milk

chocolate milk

"All you need is love. But a little chocolate now and then doesn't hurt."
Charles M. Schulz

Chocolate MilkThere's a phrase we use in the audio industry to explain to someone that doesn't understand that when something's been mixed down, like a song, it can't be unmixed. In other words, once all the elements have been married together, we can't easily pluck out the vocals and replace them. The phrase goes something like, "Here's a glass of milk, and here's chocolate powder. Mix the chocolate into the milk and you have chocolate milk. You can't take the chocolate out and just have milk."

Well, we are all eating a big ol' crow sandwich with chocolate sprinkles on top right about now. It was inevitable that we would reach the point where we could not only isolate the vocals, but the guitar, drums, and even the arena crowd. It's by no means perfect, but the technology is advancing at warp speed. It's all thanks to artificial intelligence, or AI. Before that, the only way to separate tracks was a lot of EG (elbow grease) and a ton of luck.

If a song was mixed in traditional stereo, that is with lead vocals in the center and various instruments spread across the stereo spectrum, then we could use phasing and equalization to "sort of" remove a vocal. There was always a hint of the vocal left as if the singer was a ghost lurking in the back of the room. You also couldn't get rid of the vocal reverb if it was in stereo. And good luck isolating individual instruments. If the recording is in mono – no way, no can do.

Early attempts at pushing the envelope of isolating parts in a mix started in earnest about twenty years ago. Christopher Kissel, an amateur engineer, was attempting to turn a 1959 mono recording of Miss Toni Fisher’s hit “The Big Hurt” into a more pleasing stereo mix. Using software that visualized the sound spectrum, Kissel was able to isolate the vocals and strings and then pan them a little to the left, with the remaining instruments panned a little to the right. Kissel says it wan't perfect, but it led to several more years of perfecting his newfound craft of "upmixing" mono recordings. After a laborious 60 hours of work separating elements and remixing in stereo, Kissel and producer Tom Moulton released the first spectrally-edited upmix, the 1951 R&B song "The Glory of Love" by the Five Keys.

Then the floodgates opened in 2005 when several mono-to-stereo versions of songs were released. By 2007, software began appearing that made it easier to upmix mono music into stereo, and even surround. With this newfound ability to isolate, some producers began using services such as Audionamix to remove elements, making stripped-down versions of songs or removing vocals for use in commercials. Audioamix can also remove music from old television shows that is too expensive to license.

Meanwhile at Abbey Road Studio in London, engineer James Clarke had been toying with an old 1970s television technology called a spectrogram, which can recognize and break video signals down into separate components like faces and backgrounds. Clarke found that this worked for audio as well, so he engineered software around that principle to recognize different instruments. Since Clarke worked at the Taj Mahal of recording studios, he had access to the Beatles recordings, including the album Live at the Hollywood Bowl. Clarke says that the crowd's shrill screams had buried the band making it a difficult listen. The usual approach to reducing crowd noise wasn't working. But by having the software learn the crowd as a separate instrument, he was able to isolate and reduce the screams, making for a more pleasant mix. Clarke also used his expertise and software on the massive Woodstock–Back to the Garden: The Definitive 50th Anniversary Archive 38-CD collection. This included a never released set by sitarist Ravi Shankar that was in really bad shape. Clarke was able to separate the sitar from rain, the crowd, the nearby tabla player, and the noise from the old tape and inferior electronics.

Clarke, Kissel, and other upmix engineers were reaching the limits of what they could do manually. Enter AI and deep learning. "Deep learning" is accomplished by feeding audio sample after audio sample into software in order to create an algorithm of a certain instrument, voice, noise, or other sound. Deep learning AI is the basis of new software that can more accurately isolate sounds to create an upmix. It's also used to create deep fakes of voices, like this deep fake of President Nixon's address to the nation about Apollo 11's demise. Clarke did a proof-of-concept project where, after 200 hours, he isolated George Harrison's guitar in "She Loves You," For the deep learning part, he hunted down other recordings of Harrison's Gretsch Chet Atkins Country Gentleman guitar.

In just the last few years, some incredible new software and services have emerged, like the open source software Spleeter. Now one can do mashups, demixes, and upmixes of their favorite song. For professionals, a new service called AudioShake allows producers and artists to upload their music and automatically create stems for media licensing. Although mono recordings with tightly-packed instruments in the same frequency range are still nearly impossible to demix, the solution is probably just around the corner.

Some possible other uses for this demix-upmix technology could dramatically affect location audio. I already use a similar "dialog isolate" tool in my Izotope RX software that is amazing. Another application would be to further reduce background noise in cell phone calls and Zoom/Skype meetings. Yet another use could enhance instrument emulation. Want your guitar to sound just like George Harrison's Gretsch Chet Atkins Country Gentleman in Abbey Road's Studio Two? Poof, just apply to sound print to your $50 Walmart special. And still another use for demix software could apply old school mixing techniques to new recordings that emulate a specific era or music group. This could be very helpful for biopics, documentaries, and tribute bands. The possibilities are endless for deep learning AI, I just hope we use it responsibly. (Although a deep fake of Richard Nixon singing "She Loves You" while standing on the moon with Neil and Buzz would be epic.)