How Close Are We to Getting Virtual Assistants Like Samantha in Her?

By
"Yes, hi, Samantha, how did you solve the symbol grounding problem?"
Photo: Courtesy of Warner Bros. Pictures/(c) MMXIII Untitled Rick Howard Company LLC

Joaquin Phoenix and Scarlett Johansson may have received top billing in Her. But, to my eye, the film's real star was Samantha — the futuristic operating system with whom protagonist Theodore Twombly falls in love. Samantha is a kind of robot we haven't seen before — a lyrical and expressive virtual assistant, who composes music and digests books by Alan Watts in addition to taking e-mail dictation and conversing in natural language, and whose humanity seems, at times, to transcend the humanity of her human owner. If there's a more alluring vision of technology in recent cinema, I haven't seen it.

After seeing Her last weekend, my first question, naturally, was when virtual assistants like Samantha will actually exist, and be available to the masses. So I phoned up two leading experts in artificial intelligence, and asked them when I should expect to have my own fully capable Samantha.

My first call was to Tim Tuttle, the CEO of San Francisco–based start-up Expect Labs. Tuttle, who got his PhD from the MIT artificial-intelligence lab, led the team that created a virtual-assistant program called MindMeld, which has been billed as a "supercharged Siri." He hadn't seen Her yet, but he'd heard enough about it to know that it was several steps removed from what's currently possible, given present-day AI technology.

"I don’t see Scarlett Johansson being encapsulated in computer form any time within the next 30 years," Tuttle told me. "However, I do see an abundance of more mundane and practical intelligent assistants that we'll use every day."

Right now, Tuttle said, machines are good at imitating behaviors that are predictable, and common to large numbers of people. When you start typing into Google, the search engine is able to auto-complete your sentences because millions of other people have previously searched for the same thing. When Amazon recommends new items to us based on our past purchases, it's because thousands of other people have bought the same combination of items.

"The reality is, most of us do the same things as other people 90 percent of the time," Tuttle said. "90 percent of the ideas we have, somebody else has already had them. 90 percent of the clever remarks we make to our friends, someone has already made those remarks."

The other 10 percent — the unpredictable and original part of human behavior — is the difference between Siri and Samantha. Right now, computers can recognize our words, match them up against a database, and help us get what they think we want. But if we want virtual assistants who can surprise us, who can teach us new things and pick up on nonverbal cues — in other words, if we want robots worth falling in love with — we'll have to wait a few years.

As an example of the kinds of problems scientists will have to solve in the meantime, Tuttle asked me to consider the sentence, "I went to see the Giants play on Sunday." A human hearing me say that phrase would instantly be able to guess whether I was talking about the New York Giants or the San Francisco Giants, based on bits of information such as whether or not I live in New York or San Francisco, and whether it's football season or baseball season. But a machine has a hard time making that kind of context-based inference.

"There’s a sequence of steps that has to happen on the back end," he said. "If you want the computation and storage capacity needed to reproduce human intelligence, you’d need to have a system that’s capable of storing tens of billions of different pieces of information."

For a second opinion, I called D. Scott Phoenix (no relation to Joaquin), the co-founder of machine-learning company Vicarious. Phoenix's company, which he started in 2010 with Dileep George, has already built algorithms capable of solving CAPTCHAs (the tests websites use to filter out spam bots), and is now working on what's called a "recursive cortical network" — a piece of software that mimics the human brain, and can interpret photos and videos as a human would.

Phoenix enjoyed Her, which he said was "one of the more accurate renditions of AI that I've seen" in a movie. But he thinks that it will be "longer than a decade" before we have Samantha-like virtual assistants.

Right now, Phoenix said, we have computers that can assist us by matching what we say to a stored list of commands. But this isn't the same as understanding language, or being able to pass the Turing Test. "We understand the world and language through our experience in a sensory universe," he said. "So I can tell you, 'I’m thinking heavy thoughts,' or 'I’m thinking light thoughts.' And you know what that means, because you've felt both types before. Siri doesn't."

Computer scientists call this the symbol grounding problem. It's one of the stickier concepts in the field of machine learning, and it's one of the reasons some experts say we're still far from true artificial intelligence. The symbol grounding problem means that, theoretically, you could load a robot's database with every symbol in the known universe — the entire contents of the Internet, all the words in all the books ever printed, all the words ever spoken by humans — and the robot still wouldn't be able to act fully human, because it would have no way of connecting those symbols to objects and concepts in the real world. 

Belgian scientist Luc Steels, one of the world's foremost AI experts, believes that the symbol grounding problem has already been solved, in a certain sense. In a 2008 paper, Steels makes a distinction between "groundable" symbols that refer to real-world objects (like "ball") and symbols that refer to abstract concepts ("serendipity") or symbols whose meaning is cultural (like "holy water"). Steels thinks it's easy to teach robots to interpret groundable symbols, and extremely hard (perhaps impossible) to teach them to interpret the ones that aren't.

That doesn't mean that we'll never have virtual assistants we can fall for. (After all, some humans already form relationships with non-human objects, like dolls.) But it does mean that the assistants we'll have in the near future may not be fully Samantha-esque in their ability to converse naturally, understand complex concepts, and express a full range of human emotions.

Phoenix cautioned that even if scientists did come up with the technical framework needed for fully lifelike artificial intelligence, it probably wouldn't be used to build sultry virtual assistants right away. ("If we actually do that, we’re going to use it to solve cancer, build fusion power, and get us to space," he said.) But he and Tuttle are both optimistic that the day will come, possibly in our lifetimes, when advanced artificial intelligence will be widely commercially available.

"We’ll have intelligent computers like Samantha," Phoenix said. "It’s a question of when."