A new viral video has been making the rounds this week. Perhaps you’ve seen it. Ostensibly a formal address by former president Barack Obama, the video depicts Obama lobbing lighthearted, profanity-laced insults at the likes of Donald Trump and Ben Carson. It is soon revealed, however, that the words are being spoken not by the former president, but by actor and filmmaker Jordan Peele, whose facial movements have been superimposed (quite convincingly) onto Barack Obama’s features. Of course, specialists have long been able to create videos like this using computer-generated imaging (CGI). But CGI has typically been prohibitively expensive for anyone but big-budget film or video game producers. What is notable about the Buzzfeed video is how easy and cheap it was to produce. The message of this PSA for the Internet Age is clear: “This is a dangerous time. Moving forward, we need to be more vigilant with what we trust from the internet.”
About a year ago, Simon Adler and the team at Radiolab released this podcast about the advent of two new technologies with potentially troubling implications. The first is Voco, an audio-editing program that allows users to translate recorded speech to text, and then change the voice recording by editing the text—think “Photoshop for voice.” The second is a face-to-face video-editing program produced by The GRAIL Lab at the University of Washington, which, in Adler’s words, is essentially “a form of puppetry where your face is the puppeteer.” These technologies hold much promise for the communications and entertainment industries. Developers point to the value of Voco for easy dialogue editing for film and television, and Ira Kemelmacher-Shlizerman of The GRAIL Lab hopes their facial puppetry technology will be used to develop virtual reality-augmented communication. Perhaps more ambitious, Steve Seitz, also of The GRAIL Lab, wonders whether their invention might serve as “a building block to virtually bring someone back from the dead.”
Unfortunately, though perhaps unsurprisingly, the most widespread current use of this technology is a more sordid affair. In a recent article for Rolling Stone, Kristen Dold tells the story of the rise of “deepfakes,” videos in which one person’s face is superimposed onto another’s body. Once the pet-project of an obscure group of Reddit users, the practice of pasting celebrity faces onto pornographic videos has given rise to an explosion of deepfakes, pornographic and otherwise. According to Dold,
They're invasive, mostly legal, and a terrifying snapshot of what machine-learning technology can accomplish in the wrong hands. Or, it seems, in almost any hands. Easy-to-use desktop apps have enabled the spread of deepfakes, which can pull images from places like Google search (making celebs an easy target) but also Facebook and Instagram, meaning anyone could be at risk of unwittingly starring in their own adult film.
FakeApp was created by a Reddit user whose stated goal is to democratize the ability to create deepfake videos, and the program uses the same machine-learning principles as other widely available technologies like Snapchat’s Lenses and Apple’s Animoji. Anybody with a little time and motivation can make a deepfake using Adobe AfterEffects and FakeApp (the two programs used to create the Buzzfeed video). In other words, it seems only a matter of time before one can create a convincing fake video on one’s iPhone in the time it takes to wait for the bus.
Clearly, the stakes are high. Deepfakes could be used to embarrass, incriminate, blackmail, or coerce. Fake videos involving public officials could be used to incite panic (announcing a missile strike, for example), or sway an election. What’s more, law professors Robert Chesney and Danielle Citron describe a “long-term systemic dimension” to the problem:
The spread of [deepfakes] will threaten to erode the trust necessary for democracy to function effectively, for two reasons. First, and most obviously, the marketplace of ideas will be injected with a particularly dangerous form of falsehood. Second, and more subtly, the public may become more willing to disbelieve true but uncomfortable facts. Cognitive biases already encourage resistance to such facts, but awareness of ubiquitous [deepfakes] may enhance that tendency, providing a ready excuse to disregard unwelcome evidence. At a minimum, as fake videos become widespread, the public may have difficulty believing what their eyes (or ears) are telling them—even when the information is quite real.
Franklin Foer, writing in The Atlantic, rightly emphasizes the role of social institutions in sustaining a shared sense of reality and truth. Media, government, and academia, he writes, have long “helped people coalesce around a consensus—rooted in a faith in reason and empiricism—about how to describe the world, albeit a fragile consensus that has been unraveling in recent years.” This unraveling of trust has given rise to a “reality hunger,” a strong desire for the unvarnished and unedited. In this context, video seems to be the last bastion of truth. (Think, for example, of the role of video in the NFL’s response to Ray Rice, or the rise of cell-phone camera footage in documenting police brutality). For Foer, “…in a world where our eyes routinely deceive us…we’re not so far from the collapse of reality” itself. In fact, according to Vox’s Brian Resnick, research suggests “it’s not just our present and future reality that could collapse; it’s also our past. Fake media could manipulate what we remember, effectively altering the past by seeding the population with false memories.” Unsettlingly, Foer argues that the “collapse of reality” is not an unintended consequence, but “an objective—or at least a dalliance—of some of technology’s most storied architects” in the anti-institutional, open-source world of Silicon Valley.
What can be done about deepfakes? Some suggest self-regulation on the part of developers—but if Foer is right, we shouldn’t hold our breath. In a poignant moment, RadioLab’s Simon Adler presses The Grail Lab’s Kemelmacher-Shlizerman about the social implications of her technology. Her response (“I’m a technologist”) reveals a troubling moral inarticulacy. Others suggest a legal response (deepfakes are not currently illegal); but fake videos do not clearly constitute a violation of privacy, and deepfake “speech” may even be protected under the 1st Amendment. Still others believe a technological problem requires a technological fix: proposals include blockchain tracking of digital video data, AI-generated forensic detection, and something called “immutable authentication trails” (IATs). These are useful, but limited responses. As the programs that create deepfakes become more sophisticated, detection technology will likely be playing perpetual catch-up. And paying a company to track and authenticate your every movement (as IATs promise to do), would only exacerbate problems we outlined in last week's briefing.
With no foolproof solution in view, it seems helpful to allow this phenomenon to provoke us to take stock of our habits of news consumption. Peele-as-Obama suggests sticking to “trusted news sources,” which seems a good place to start. But when major news outlets are vying for the most coveted resource in a digital economy—attention—the pressure to release videos without extensive verification can cause even the best sources to fall prey to forgery. Farhad Manjoo’s recent New York Times op-ed offers a more radical suggestion: disconnecting from social networks and getting our news the old-fashioned way. After two months of reading print news only, Manjoo reports being blissfully unaware of a host of false and misleading claims about current events. For example, in the hours following the Parkland shooting, Manjoo missed reports (all false, but widely circulated online) that the shooter was a leftist, an anarchist, a member of ISIS, a Syrian refugee, and that he had not acted alone.
This was the surprise blessing of the newspaper. I was getting news a day old, but in the delay between when the news happened and when it showed up on my front door, hundreds of experienced professionals had done the hard work for me. Now I was left with the simple, disconnected and ritualistic experience of reading the news, mostly free from the cognitive load of wondering whether the thing I was reading was possibly a blatant lie.
When information is disconnected from institutional expertise, it becomes unhinged from institutional wisdom. Manjoo is not suggesting that newspapers never make mistakes or print misinformation, but he is pointing to the unique role “slow news” plays in maintaining social trust. A Slow News Movement won’t stop deepfakes—but it may mitigate some of the damage currently being done to our shared sense of reality.