AI Flavours

Things I wish I knew earlier in tech ▾

3 reasons why audio deepfake may be better than we think

Table of Contents

Recently I stumbled into an article describing multiple situations when thieves managed to use deepfakes (specifically: a voice cloning technology) to steal money from companies. The scenario was similar –  thieves managed to find the executives of a company, record their voice and then use it to generate a real-time deepfake to one of the employees able to do the transfer.

And the most scary thing is that this isn’t the only bad usage of voice deepfake. Blackmailing people with their deepfaked voice, deepfaking politicians to generate some controversial thougts – those are as serious crimes as the first one.

Even the name itself, deepfake, gives us a vibe of something negative, of something not being real and potentially unnatural to us. We fear it to the extent that Kaggle’s Deepfake Detection Challenge has a total amount of prizes worth 1 million dollars.

Does this mean deepfake is only all bad and we should avoid it? Today I am going to write about 3 cases when deepfake actually looks promising.

 

Giving voice to those that can’t have it

 

ALS, or Amyotrophic Lateral Sclerosis, is a disease of a nervous system that causes the loss of control in your muscles. The disease progresses over time, taking away from people the ability to move, eat, but also talk, creating a speech impairment which makes it hard to communicate freely.

Tim Shaw, a former American NFL player, was diagnosed with ASL shortly after his 30th birthday. As a sportsman, everything changed for him then and also would change for him in the future.

This is when Google and Deep Mind came into action. They helped through Project Euphonia, which aimed to improve speech recognition and recreate the voice for people with medical conditions. They decided to use the existing technologies in the area of speech synthesis to generate Tim’s voice using his past interviews and recordings before the disease caused major speech impairments. Moreover, they created a tool that would map his current speech into text, giving the potential to translate his current voice into an old voice in real time.

Similar technology is being improved by Rolls-Royce in their AI agent called Quips, which uses the idea of voice-banking. It creates a database of voice samples from before the disease progressed to be later used in voice generation. Rolls-Royce claims that Quips will not only be able to generate speech, but also generate it with a proper accent and other smaller features that are unique to how a person communicates.

 

Having a voice matching the gender identity

 

Over the last few years, so-called “skins” became a wide-spread phenomenon in the gaming community. Skins allowed people to modify their in-game looks to represent their personality better or to create a totally new character they more identified with.

Unfortunately, another wide-spread phenomenon is not similarly inclusive, showing the bad side of the gaming community – the online harassment. And the group that reports a higher risk of being harassed online are the LGBTQ+ community members.

A company called Modulate aims to change it. They proposed to create a voice skin that would allow people from the community to use their preferred voice during playing online. But it’s not only that. Many community members asked if it would be possible to use the technology outside of the gaming environment to fight their dysphoria or even increase the privacy and security through shielding behind a modulated voice.

Although the technology still seems to have many drawbacks, it definitely shows a major potential for the future.

 

Coping with death

 

Another promising, yet a bit dangerous, technology resembles a Black Mirror episode. In fact, it is a BM episode – a young woman faces the loss of her boyfriend, and decides to upload all of his texts, recordings and photos to the company which promises her to recreate his whole personality. Although in the episode he is fully “brought back to life”, some of the technologies shown there are already here. 

But can a speech synthesis be enough in this case? After all, it’s also about personality, inside jokes and phrases people were using to give you the experience of talking with them.

This is the problem Eugenia Kuyda wished to solve, after her friend’s death. She, her friends and family gathered her friend’s text messages in order to train a neural network to mimic his behaviour in the form of a chat bot. Although she said the bot rather resembled “a shadow of a person” to her, it was still a proof of concept for the technology and potentially could be merged with speech-to-text.

So should we try to revive our closed ones like that? I think, if unsupervised, it may be really dangerous in terms of people’s mental health. On the other hand, it can be used together with a therapy to help make a closure after a sudden death or even help cope with traumas caused by others.

 

Conclusions

 

What are your thoughts on deepfake technology? Do you think it is doing more harm than good or has a potential to solve many of our problems? Despite the stand we take on that, I like to think we should have awareness on both sides of the coin. And learn more about audio in deep learning – but this may be my personal bias. 😉

 

Sources

[1] Healed through A.I., episode 2 of The Age of A.I.

[2] K. Wiggers, DeepMind and Google recreate former NFL linebacker Tim Shaw’s voice using AI

[3] Technology breakthrough offers hope for people silenced by disability

[4] T. Simonite, These Deepfake Voices Can Help Trans Gamers

[5] T. McMullan, Artificial Intelligence Will Keep Our Loved Ones Alive

Leave a Reply