Recently, Adobe released a free AI-powered audio processing tool that can enhance some poor-quality voice recordings by removing background noise and making the voice sound stronger. When it works, the result sounds like a recording made in a professional sound booth with a high-quality microphone.
The new tool, called Enhance Speech, originated as part of an AI research project called Project Shasta. Recently, Adobe rebranded Project Shasta to Adobe Podcast.
Using Enhance Speech is free, but it requires creating an Adobe account and works best with a desktop web browser. Once registered, users can upload an MP3 or WAV file up to one hour long or 1GB in size. After several minutes, you can listen to the result in your browser or download the resulting cleaned-up audio.
In our tests with the service, Enhance Speech worked best with audio that contained a voice without crosstalk or excessive noise. For example, we recorded audio from an iMac’s built-in microphone of a person standing 10 feet away, including fan noise nearby, and the resulting audio (once processed by Enhance Speech) sounded like it had been recorded up close in a noise-free studio with a professional microphone.
How does it work? Adobe did not provide any details, but we suspect that the company trained a deep learning model on many (possibly thousands) of hours of clean and noisy audio. The model could then “learn” to pick out the human voice frequencies and synthesize a facsimile that accurately matches the source. This is speculation until Adobe provides more technical details, and we have reached out to the company for comment.
On that count, some Hacker News commenters have reported hallucinated results—unexpected output like phantom voices where the AI misinterprets the input audio—from extremely noisy audio (such as speech recorded beside a waterfall) or from non-English language sources, which suggests that Enhance Speech is doing more than just a conventional noise reduction technique.
Enhance Speech isn’t the first tool to provide this kind of AI-powered noise reduction capability. An open source package called mayavoz and a commercial service called Audo Studio do something similar, for example.
It’s worth noting that Enhance Speech is part of a larger group of AI-powered podcasting tools from Adobe, including a Mic Check tool (currently available for free as well) and a transcript-based audio editing tool that is still undergoing an invitation-only beta test.