How AI is helping birth digital humans that look and sound just like us - MIT Technology Review
Share on facebook

How AI is helping birth digital humans that look and sound just like us - MIT Technology Review

AI-powered replicas of real people are taking on the jobs of entertainers, law enforcement and more
Digital twins capture the physical look and expressions of real humans. Increasingly these replicas are showing up in the entertainment industry and beyond. It gives rise to some interesting opportunities as well as thorny questions. 
Greg Cross, CEO and co-founder of Soul Machines
This episode was produced by Anthony Green with help from Emma Cillekens. It was edited by Jennifer Strong and Mat Honan, mixed by Garret Lang, with original music from Jacob Gorski. It’s hosted by Jennifer Strong.
 [TR ID]
[music and applause swells] 
Jennifer: It’s the closing night of 20-12 Coachella music festival… And Dr. Dre and Snoop Dogg are joined on stage by a surprise guest: Tupac… despite the fact that the hip-hop legend died more than 15 years earlier.
Tupac hologram: Yeah! You know what the **** this is!
[crowd cheers]
Tupac hologram: What up Dre!
Dr. Dre: I’m chilling, what’s up Pac?
Tupac hologram: What the **** is up Coachella?!
[fade out] 
[scoring in]
Jennifer: A holographic-like image of the late rapper appeared alongside the real-life Dr. Dre and Snoop Dogg… bantering with them…and addressing the crowd. 
This illusion took over a year to create, and was accomplished by piecing together audio, physical characteristics and movements from performances recorded before the rapper’s death.
To festival-goers and audiences streaming live, the effect was stunning… and a bit unsettling. 
These days, digital humans are taking on the jobs of entertainers in increasingly nuanced ways.
Miquela: I was programmed to believe that I was a 19 year old, half brazilian, half spanish girl named Miquela.
Jennifer: This digital influencer and model is a project that began as a CGI instagram profile… but has gone on to release music and collaborate with luxury brands, such as Calvin Klein and Prada… amassing millions of followers along the way. 
Miquela: Don’t worry y’all. I am a robot but I’m not here to, like, hack your venmo or leak your private browser history. 
Jennifer: For next-gen systems… AI is the core creation tool. With it comes interactive, human-like experiences… as well as some familiar, thorny questions about ownership.
WFAA News Anchor: I mean you got all these real people that want to get into this industry but yall choose to sign a virtual person? 
Hitmaka: If this would’ve went under the radar, they would’ve been making hundreds of millions of dollars from this. And nobody would’ve said nothing. 
Jennifer: I’m Jennifer Strong and this episode, we explore the real task of building these not real digital humans. 
[TITLES]
OC: …You have reached your destination.
[MUSIC IN] 
Greg Cross: You know, the art of creating digital characters and digital personalities, I mean, that’s been finely honed in the movie industry and what makes us fall in love with these avatars and these CGI characters is that they do express emotion in a very human-like way.
Greg Cross: Hi, I’m Greg Cross. I’m CEO and co-founder of Soul Machines. Soul machines is an artificial intelligence company. We make avatars and we bring them to life using a completely new paradigm in the world of animation. Something we call autonomous animation. So autonomous animation is what we are doing in this conversation. So my brain is animating me. It brings me to life. It chooses my words. The way I express them. And that just happens autonomously. And your brain at the same time as I’m talking is animating you. You’re hearing my words, you’re deciding what to think of them, how to feel about them. And so if we think about high quality CGI or avatar type animation, it’s all human acted content. So human actors play the role of the avatars. They get captured by these incredibly specialized cameras. The data gets processed and then. The data is used to bring the avatar to life.  
Jennifer: It’s the process used to create Gollum in the Lord of the Rings and it transformed the entire cast of the 2009 blockbuster, Avatar.
But the approach at Soul Machines relies on AI.
Greg Cross: Artificial intelligence has become a big part of the way in which we think about autonomous animation and the way it enables us to make machines more like us. We can interact with them in a more human-like way. So our digital people, our avatars are being rendered in the cloud and literally they’re being broadcast as a video stream from the cloud into the device. So it’s just like a zoom call, except you’re talking to a digital person rather than a real person. 
Jennifer: And it’s becoming popular within the entertainment industry. 
Greg Cross: Celebrities are looking for new ways to engage with their fans. So, social media started this trend where celebrities could create a direct connection with fan bases. This takes it to the next level. 
Jennifer: And he says celebrities are choosing to have digital twins for a whole range of reasons.
Greg Cross: We recently started working with Jack Nicklaus, you know, Jack’s 82 in real life. And for him, this is a legacy project. How does he make his legacy relevant to kids taking up golf for the first time today? So, we’ve announced that we are going to be reimagining Marilyn Monroe. For the 21st century working with the folk at Authentic Brands Group who own the digital rights to Marilyn. So, you know, this is a project where, you know, this huge amount of interest, I mean, in Marilyn today. So this is another way that we can tell her story. K-pop, uh, Mark Tuan, one of the biggest K-pop stars in the world. Mark is just one of these people who’s incredibly time-poor. You know, he never has enough time to interact with his fans in the way that he wants to be able to interact with his fans. So this becomes a way in which he can do that without him having to always be there. 
Jennifer: This might sound familiar to fans of the tv show, Black Mirror… where a popstar, portrayed by real-life singer Miley Cyrus, uses AI to create a digital version of her personality.
Announcer: Now you can be best friends with your favorite pop star! 
Young girl: Ashley, wake up. 
Ashley Too: Hey, there I’m Ashley Too!
Announcer: An all new intelligent companion based on Ashley O’s actual personality.
Jennifer: Soul Machines captures the physical look and expressions of someone they’re digitizing… then, the real work begins. 
Greg Cross: In the case of synthetic voices, we work with partners to recreate these voices and, and these voices can be trained based on existing audio content with, uh, Carmello Anthony, Camelo recently released a book. He recorded the  audio book op version of the book. So we use the audio book version to create his synthetic voice. But here’s the thing. We just don’t create a synthetic voice in English, you know, in his natural voice, we can create a synthetic voice in Japanese and Mandarin and Korean, you know, Carmello can now speak any one of 15 languages in his own natural voice.
Jennifer: And despite being called a digital twin… which in every other industry means an exact copy of something… these digital celebrities aren’t necessarily the same as their human counterparts. For example, celebrities might choose to create a less anxious or more chatty version of themselves for fans.
Greg Cross: One of the things that we’re exploring particularly with Carmello is, I mean, Jack wants digital Jack to be a representation of who he was at that age. Carmello actually wants his digital twin to have a different personality so that they can play off each other and they can interact with each other and have fun with each other. One of the things you don’t want to do is you don’t want to connect a celebrity to the internet because you know, you do that and you’re going to end up with TikTok videos where the content is not appropriate or not consistent with their brand or their image. So content in the digital realm has to be curated, you know, in the same way that celebrities curate their content in social media, they have to do the same with their digital twins as well.  
Jennifer: Though, some companies do hand that curation off to an algorithm.
FN Meka: Big sticks like I’m marching band. (Boom, boom!)
Too deep like clowns in minivan (Clown)
50 said it best. It ain’t many men (many men)
When you steppin’ on me, come get me, man (Grrah)
Jennifer: This is a song partly composed by FN Meka, an AI created by the company Factory New… which describes itself as a record label specializing in virtual artists. 
The system analyzes popular songs from specific genres and generates the building blocks of new songs… such as melody, and tempo… with vocals performed by a real human. 
FN Meka was designed and marketed to represent a kind of digital rapper… His TikTok videos—showing him in designer clothes and luxury cars with an edgy haircut and plenty of attitude—they generated… more than a billion views on the platform. 
In August, it was announced that the digital human had been signed to one of the most powerful music labels in the world: Capitol Records—which retains rights to the works of artists like ABBA and Katy Perry. 
Then… this happened
Billboard News Anchor: From stepping into the virtual future to back on the proverbial streets, the AI rapper everybody has been talking about has been dropped from his label. 
Jennifer: In addition to his virtual jewels and custom Tesla cybertruck, FN Meka is depicted as a black man… something its human creators are not. 
The system was soon called into question by the group “Industry Blackout”… an organization representing black people in the music industry. 
WFAA News Anchor: In a statement on Twitter the group said the rapper is “an offensive caricature and a direct insult to the black community and our culture.” 
Jennifer: In the hours following the statement, Capitol Records severed ties with the AI and issued an apology to the black community. FN Meka’s music was quickly removed from streaming services and as for his viral TikTok content… it’s pretty much vanished. 
And Kyle the Hooligan—the black rapper whose real voice was used for the system—is taking legal action against the company.
Kyle the Hooligan: Basically, my lawyer’s been reaching out to them and their attorneys, but we haven’t heard back as of yet. Well, at the time… like, I was young, you know what I’m saying? I had no representation. So… and they didn’t really have the money behind it as of yet. So they promised me equity. It basically was like a collaboration. So we could do this together and just like build it up instead of like upfront money and stuff like that. 
Jennifer: But he says that didn’t happen. 
Kyle the Hooligan: So I wanna kind of shed light on that and show that it’s not cool just to use the culture and just ghost people and not compensate them. Cause this, I know it’s… this industry that happens a lot. So that’s basically what I would like to happen. Be compensated and shine light on the situation. 
Hitmaka: I think it’s a disservice to the culture. Like it’s, it’s some of the most disrespectful stuff I’ve seen in a long time.
Jennifer: This is Grammy nominated rapper and producer Hitmaka, in an interview with TMZ. 
Hitmaka: You know, how many layers and contracts and things that had to happen to get to this point. So the legal department, the A and R team, the high level execs, everybody agreed with this. If this would’ve went under the radar, they would’ve been making hundreds of millions of dollars from this. And nobody would’ve said nothing. So it’s just ridiculous, man, for real.
[MUSIC]
Jennifer: You can find links to our reporting in the show notes… and you can support our journalism by going to tech review dot com slash subscribe.
We’ll be back… right after this.
[MIDROLL]
[scoring in]
Jennifer: It’s not just celebrities looking to use digital replicas… 
This technology is being trialed in everything from customer service to law enforcement… 
Mao Lin Liao: So, this project is about a virtual girl that has been used to attract pedophiles online
Jennifer: This is Mao Lin Liao, speaking at a conference. He’s the CEO of REBLIKA—a designer of digital humans.
Mao Lin Liao: And this whole story was very, uh, important for us because it was helping the world become a safer place. 
Jennifer: The project was codenamed Sweetie. It’s a computer model created to look and move like a real girl. Sweetie was deployed across a number of online chat rooms where she appears to be sitting in front of a webcam in the Philippines. In reality, a team of sleuths were operating the system from a warehouse in Amsterdam.  
Sweetie AI: My name is Sweetie. I’m 10 years old. I live in the Philippines. Every day I have to sit in front of the webcam and talk to men, just like tens of thousands of other kids. The men ask me to take off my clothes. But what they don’t know, I’m not real. I’m a computer model made piece by piece to track down this man who do this. 
Jennifer: In just 10 weeks, the team identified a thousand predators from 71-different countries… thanks in no small part to the system’s ability to replicate the subtle, physical nuances that come with talking to a real person. 
It’s those same nuances, like a shifting gaze or returned smile, that underpins the realism of digital humans created by Soul Machines.  
Digital Jack: Hi, how are you?
Greg Cross: I’m good thanks, Jack. 
Digital Jack: I’m glad to hear it. I’m Jack Nicholas coming to you circa 1978. I’m here to tell you about my life as a professional golfer. Where should we begin? 
Greg Cross: So I’ve just muted Jack at this point. So you can see Jack’s on the screen. He’s looking at me. You know, if you look up close, you can see my hand reflection going across.  
Jennifer: That’s CEO Greg Cross giving a demo of golfer Jack Nicklaus’ digital twin… The system uses the computer’s microphone and webcam to seemingly hear and see the real human on the other end. And it has the contextual awareness to know it’s supposed to be chatting as if on a video call.  
Jennifer Strong: When you say his name, he smiles a little bit. 
Greg Cross: Yeah, of course and if I smile at him.
Greg Cross: He’ll smile back. I mean, that’s a natural.. that’s the natural reaction. So how this works is, if I smile at you, your eyes are seeing the smile and your brain then, or the limbic system in your brain, which is our emotional connection system that secretes dopamine and serotonin, the happy drugs, sends electrical signals from your brain to your facial muscles, through your nervous system and you smile back. And that’s the natural human reaction to a smile. And so Jack’s digital brain is actually working in exactly the same way. So if you know, you get concerned or worried.
Jennifer Strong: I see. 
Greg Cross: He will respond by looking a little bit more empathetic. You know, so these are some of the ways in which we are, you know, using Jack’s digital brain to bring him to life in, in real time. 
Greg Cross: Can you tell us about the 1978 British open 
Digital Jack: In 1978, the British open returned to its most traditional location. The old course of St. I fell in love with St. Andrews on my first visit in 1964. And from there, my attachment only group. Have you ever been to Scotland?
Greg Cross: Yes, I have.
Digital Jack: It’s truly something. The birthplace of golf and so much more. Needless to say, playing St. Andrews again in 1978 was a dream, but the months prior to the term were anything but ideal. Have you ever tried to deal with a problem only to watch it grow? The more you try to fix it?
Greg Cross: No, I don’t have any problems, Jack. 
Digital Jack: Really, you’re pretty darn lucky then.  
Greg Cross: In time, we might end up building a number of different versions of digital Jack. I mean, Jack selected that age. He wanted to be 38. He considered himself to be playing the finest golf of his career at that age, so that’s why he chose it. But we might do, you know, in 1962, I mean, Jack was a, you know, a very handsome young golfer with movie star looks. So we might do that version of Jack or, you know, we might even do a current, you know, um, Jack as he was, when he was in his seventies as well. So these are all different ways in which we can develop the concept as we move forward.
Jennifer: And the team has also been exploring how these digital twins can be useful beyond the 2D world of a video conference. 
Greg Cross: I guess the.. the big, you know, shift that’s coming right at the moment is the move from the 2D world of the internet, into the 3D world of the metaverse. So, I mean, and that, and that’s something we’ve always thought about and we’ve always been preparing for, I mean, Jack exists in full 3D, um, You know, Jack exists as a full body. So I mean, Jack can, you know, today we have, you know, we’re building augmented reality, prototypes of Jack walking around on a golf course. And, you know, we can go and ask Jack, how, how should we play this hole? Um, so these are some of the things that we are starting to imagine in terms of the way in which digital people, the way in which digital celebrities. Interact with us as we move into the 3D world.
Jennifer: And he thinks this technology can go a lot further.
Greg Cross: Healthcare and education are two amazing applications of this type of technology. And it’s amazing because we don’t have enough real people to deliver healthcare and education in the real world. So, I mean, so you can, you know, you can imagine how you can use a digital workforce to augment. And, and extend the skills and capability, not replace, but extend the skills and, and capabilities of real people. 
Jennifer: This episode was produced by Anthony Green with help from Emma Cillekens. It was edited by me and Mat Honan, mixed by Garret Lang… with original music from Jacob Gorski.   
If you have an idea for a story or something you’d like to hear, please drop a note to podcasts at technology review dot com.
Thanks for listening… I’m Jennifer Strong.
Greg Rutkowski is a more popular prompt than Picasso.
Large language models are trained on troves of personal data hoovered from the internet. So I wanted to know: What does it have on me?
The machine-learning tool could help researchers discover entirely new proteins not yet known to science.
The lab trained a chatbot to learn from human feedback and search the internet for information to support its claims.
Discover special offers, top stories, upcoming events, and more.
Thank you for submitting your email!
It looks like something went wrong.
We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at [email protected] with a list of newsletters you’d like to receive.
Our in-depth reporting reveals what’s going on now to prepare you for what’s coming next.
Subscribe to support our journalism.
© 2022 MIT Technology Review

source

Trending