Algorithms Are Black Boxes — Even to the Tech Companies That Make Them

Photo: Melina Mara/The Washington Post/Getty Images

After ducking behind Facebook and using that company as a not-quite-human shield for most of the year, Google finally had its turn in the hot seat. Yesterday, CEO Sundar Pichai appeared before Congress to answer questions about … everything? Pichai was grilled on potential bias in Google’s search rankings, employee politics, location tracking, YouTube, its possible entry into China, and many other murky topics. As was the case with most of the marquee congressional tech hearings this year, it was meandering and unfocused and didn’t accomplish much. Both sides are to blame: Congressmen and women, inexact and inarticulate in in their questioning, indicated that they did not entirely comprehend the issues at hand, and Pichai took advantage of this by being vague or elusive. It was similar to the Zuckerberg hearings, in which, given any opportunity to evade a question by citing a technicality, he took it. “We don’t sell user data” is technically true, even though Facebook obviously heavily monetizes user data.

One thing that the hearings have demonstrated over the course of this past year is how closely guarded the oft-cited “algorithms” are. An algorithm, in this tech context, refers to a complex bundle of software powering a service. Google’s search-ranking algorithm, Facebook’s News Feed algorithm, and the like, are not simple formulas. They crunch the numbers on dozens, if not hundreds of signals — things like what you click on, or type, or hover your mouse over, or how long you spend staring at a photo, and who your friends are, and your physical location, and sites you’ve visited, and so on — and make a determination.

Here’s how Pichai put it, explaining why Donald Trump’s picture appears when a user searches “idiot”:

Any time you type in a keyword, as Google we have gone out and crawled and stored copies of billions of pages in our index. And we take the keyword and match it against their pages and rank them based on over 200 signals — things like relevance, freshness, popularity, how other people are using it. And based on that, at any given time, we try to rank and find the best search results for that query. And then we evaluate them with external raters, and they evaluate it to objective guidelines. And that’s how we make sure the process is working.

That’s about as much as we know, because algorithms are black boxes. It’s unclear how, precisely, an algorithm analyzes the data that users feed it. Queries go in and results come out. And the results are good enough that nobody feels compelled to ask how they were produced.

Data-powered companies like Google and Facebook are reluctant, to put it lightly, to explain how these black boxes work, for a number of reasons. One, they view them as trade secrets, like the KFC herbs and spices or the formula for Coke. If Pichai explains the intricacies of Google’s program, then someone can copy it. Secondly, if the intricacies are laid bare, it becomes easier to game the algorithm.

But here’s another reason why Pichai didn’t explain how Google’s search algorithm works in detail: he can’t. I’m not saying Pichai is stupid, I’m saying that nobody can fully explain how Google gets from Query A to Result B. Consider this excerpt from Eli Parisier’s The Filter Bubble, which was published all the way back in 2011 and has been embedded in my brain ever since:

Even to its engineers, the workings of the algorithm are somewhat mysterious. “If they opened up the mechanics,” says search expert Danny Sullivan, “you still wouldn’t know what to do with them.” The core software engine of Google search is hundreds of thousands of line of code. According to one Google employee I talked to who had spoken to the search team, “The team tweaks and tunes, they don’t really know what works or why it works, they just know the result.”

Right out of the bag, let’s acknowledge that this is second-hand info from almost eight years ago that should be taken with a grain of salt (also, fun fact: Sullivan now works for Google). Okay. But in the intervening years, has Google gotten any better at explaining why its search results are the way they are? Not really. If anything, the problem has grown even larger. A recent study (conducted by privacy-focused search engine DuckDuckGo) found that even users who were logged out and using a private browsing mode received dozens of variations for identical queries.

The Google search algorithm is very complex, and even the act of logging every step taken by the program for every one of the millions of queries Google sees every day is a huge undertaking. But those same logs would go towards illuminating how automated software goes through its processes.

Let’s bring back the fast food analogy: Imagine McDonald’s made a machine that took a bunch of random ingredients and turned those ingredients into a burger. Maybe it doesn’t even use all of the ingredients, only the relevant burger ones. After years of tinkering with the machine, McDonald’s figured out how to get the machine to produce a burger customers love. Then, what if people started getting sick from those burgers? You’d probably want to know how the mysterious burger machine worked. Can you imagine if McDonald’s answered that with, “Well, the burger machine takes a bunch of ingredients and processes them and does it differently every time and we dunno precisely how it gets to the end-result burger?” That would be infuriating, and it is currently how algorithm-powered companies are responding to Congress.

The point is: Google is the burger machine. A decent starting point for Congress’ moving forward might be to ask not why the search engine behaves a certain way but whether Google can even explain why it does so. That’s a pretty simple yes-or-no question, that a CEO should be able to answer, and the answer to which would be very illuminating. If Google and similar companies like Facebook (or self-driving car companies like Uber) are deploying automation they cannot fully explain, then they are not acting responsibly in the public interest.

Tech Companies Just Don’t Know How Their Algorithms Work