By Supasorn Suwajanakorn, who works on ways to reconstruct, preserve and reanimate anyone — just from their existing photos and videos.
Do you think you’re good at spotting fake videos, where famous people say things they’ve never said in real life? See how they’re made in this astonishing talk and tech demo. Computer scientist Supasorn Suwajanakorn shows how, as a grad student, he used AI and 3D modeling to create photorealistic fake videos of people synced to audio. Learn more about both the ethical implications and the creative possibilities of this tech — and the steps being taken to fight against its misuse.
None of these is actually real. So let me tell you how we got here. My inspiration for this work was a project meant to preserve our last chance for learning about the Holocaust from the survivors. It’s called New Dimensions in Testimony, and it allows you to have interactive conversations with a hologram of a real Holocaust survivor.
SS: Turns out these answers were prerecorded in a studio. Yet the effect is astounding. You feel so connected to his story and to him as a person. I think there’s something special about human interaction that makes it much more profound and personal than what books or lectures or movies could ever teach us.
So I saw this and began to wonder, can we create a model like this for anyone? A model that looks, talks and acts just like them? So I set out to see if this could be done and eventually came up with a new solution that can build a model of a person using nothing but these: existing photos and videos of a person. If you can leverage this kind of passive information, just photos and video that are out there,that’s the key to scaling to anyone.
By the way, here’s Richard Feynman, who in addition to being a Nobel Prize winner in physics was also known as a legendary teacher. Wouldn’t it be great if we could bring him back to give his lectures and inspire millions of kids, perhaps not just in English but in any language? Or if you could ask our grandparents for advice and hear those comforting words even if they’re no longer with us? Or maybe using this tool, book authors, alive or not, could read aloud all of their books for anyone interested.
First, we introduce a new technique that can reconstruct a high-detailed 3D face model from any imagewithout ever 3D-scanning the person. And here’s the same output model from different views. This also works on videos, by running the same algorithm on each video frame and generating a moving 3D model. And here’s the same output model from different angles.
It turns out this problem is very challenging, but the key trick is that we are going to analyze a large photo collection of the person beforehand. For George W. Bush, we can just search on Google, and from that, we are able to build an average model, an iterative, refined model to recover the expressionin fine details, like creases and wrinkles. What’s fascinating about this is that the photo collection can come from your typical photos. It doesn’t really matter what expression you’re making or where you took those photos. What matters is that there are a lot of them. And we are still missing color here, so next, we develop a new blending technique that improves upon a single averaging method and produces sharp facial textures and colors. And this can be done for any expression.
SS: So coming back a little bit, our ultimate goal, rather, is to capture their mannerisms or the unique way each of these people talks and smiles. So to do that, can we actually teach the computer to imitate the way someone talks by only showing it video footage of the person? And what I did exactly was, I let a computer watch 14 hours of pure Barack Obama giving addresses. And here’s what we can produce given only his audio.
SS: I think these results seem very realistic and intriguing, but at the same time frightening, even to me.Our goal was to build an accurate model of a person, not to misrepresent them. But one thing that concerns me is its potential for misuse. People have been thinking about this problem for a long time,since the days when Photoshop first hit the market. As a researcher, I’m also working on countermeasure technology, and I’m part of an ongoing effort at AI Foundation, which uses a combination of machine learning and human moderators to detect fake images and videos, fighting against my own work. And one of the tools we plan to release is called Reality Defender, which is a web-browser plug-in that can flag potentially fake content automatically, right in the browser.
There’s still a long way to go before we can fully model individual people and before we can ensure the safety of this technology. But I’m excited and hopeful, because if we use it right and carefully, this tool can allow any individual’s positive impact on the world to be massively scaled and really help shape our future the way we want it to be.