Pindrop, a company that focuses on voice fraud, is warning that voice cloning technology is becoming a huge threat. With the help of AI-powered software, cybercriminals are starting to clone people’s voices to commit scams. There are only a handful of cases until now, but the amount of money stolen is as high as $17 million. Hackers use machine learning to clone someone’s voice and then combine that voice clone with social engineering techniques to convince people to move money where it shouldn’t be.
Deepfaking someone’s voice is taking the scheme to next level. Especially if you’re CEO of a company and you have a lot of YouTube content out there, fraudsters are starting to use that to start synthesizing your audio. All they need is five minutes of someone’s audio to get a fairly realistic clone. If accessibility to five hours or more of their audio is possible, you end up having a simulation that’s not perceptible by humans. Just hearing your CEO’s voice on a phone can convince you to follow orders and comply with a large money request, even though it may not be legit.
The best known and first reported example of an audio scam took place in 2019, where the chief executive of a UK energy firm was tricked into sending $240,000 to a Hungarian supplier after receiving a phone call supposedly from the senior executive of his company’s parent firm. The money that was transferred to the Hungarian bank account was subsequently moved to Mexico and then distributed to other locations. Another example below shows how synthesized audios can be matched with a perfect lip syncing technique in order to create fake videos:
The only good news is that the technology is still in its early stages, but how long will that last! Cybercriminals always use the most sophisticated tools at their disposal, and once voice cloning becomes more mainstream, it’s scary to think where it can lead to. However, the same advancements can be put to use for more positive cases like providing a familiar voice to healthcare services for comforting patients and letting bot users experience a personalized version of oral interactions. Whether the technology will eventually turn out to be a boom or a bane is yet to be seen.