In an effort to shield our conversations from snoopers, Columbia University researchers have developed a Neural Voice Camouflage method that disrupts automatic speech recognition systems in real-time without inconveniencing people. “With the invasion of [smart voice-activated devices] into our lives, the idea of privacy starts to evaporate as these listening devices are always on and monitoring what’s being said,” Charles Everette, Director of Cyber Advocacy, Deep Instinct, told Lifewire via email. “This research is a direct response to the need to hide or camouflage an individual’s voice and conversations from these electronic eavesdroppers, known or unknown in an area.”
Talking Over
The researchers have developed a system that generates whisper-quiet sounds that you can play in any room to block rogue microphones from spying on your conversations. The way this type of technology counters eavesdropping reminds Everette of noise-canceling headphones. Instead of generating whisper quiet sounds to cancel out the background noise, the researchers broadcast background sounds that disrupt the Artificial Intelligence (AI) algorithms that interpret soundwaves into understandable audio. Such mechanisms to camouflage a person’s voice aren’t unique, but what sets Neural Voice Camouflage apart from the other methods is that it works in real-time on streaming audio. “To operate on live speech, our approach must predict [the correct scrambling audio] into the future so that they may be played in real-time,” note the researchers in their paper. Currently, the method works for the majority of the English language. Hans Hansen, CEO of Brand3D, told Lifewire that the research is very significant since it attacks a major weakness in today’s AI systems. In an email conversation, Hansen explained that current deep learning AI systems in general and natural speech recognition in particular work after processing millions of speech data records collected from thousands of speakers. In contrast, Neural Voice Camouflage works after conditioning itself on just two seconds of input speech.
Wrong Tree?
Brian Chappell, chief security strategist at BeyondTrust, believes the research is more beneficial to business users who fear they could be in the midst of compromised devices that are listening for keywords that indicate valuable information is being spoken. “Where this technology would potentially be more interesting is in a more authoritarian surveillance state where AI video and voice print analysis is used against citizens,” James Maude, BeyondTrust’s Lead Cyber Security Researcher, told Lifewire over email. Maude suggested that a better alternative would be to implement privacy controls on how data is captured, stored, and used by these devices. Moreover, Chappell believes the usefulness of the researcher’s method is limited since it isn’t designed to stop human eavesdropping. “For the home, bear in mind that, at least in theory, using such a tool will cause Siri, Alexa, Google Home, and any other system that’s activated with a spoken trigger word to ignore you,” said Chappell. But experts believe that with the increasing inclusion of AI/ML specific technology in our smart devices, it’s entirely possible that this technology could end up inside our phones, in the near future. Maude is concerned since AI technologies can learn quickly to differentiate between noise and real audio. He thinks that while the system might be initially successful, it could quickly turn into a cat and mouse game as a listening device learns to filter out the jamming noises. More worryingly, Maude pointed out that anyone using it could, in fact, draw attention to themselves as disrupting the voice recognition would appear unusual and might indicate you are trying to hide something. “Personally, if I am concerned about devices listening in, my solution would not be to add another listening device that seeks to generate background noise,” shared Maude. “Especially as it just increases the risk of a device or app being hacked and able to listen to me.”