Unable to Avoid Bias, Gmail Stops Using Gender in Its Automated Replies

Gmail, now 20 percent more woke. Photo: S3studio/Getty Images

Opinions vary about Gmail’s Smart Compose feature, which offers up predictive suggestions about what a user will type next. Some worry that it might flatten out emails into “Google speak,” removing some of the spark and personality from emails. I’m more in the camp of appreciating the utility — nobody says you have to use the options suggested by Google, and in my experience it mainly just speeds up bits of typing I’d be doing by rote anyways.

But Gmail’s Smart Compose does rely on natural language generation, a type of machine learning that learns how to write by scanning millions of documents and attempting to determine the relationship between certain words and phrases. The subclause “How will this weekend,” for instance, is often followed up by “work for you?,” and so Google’s Smart Compose projects that suggestion for you as you type.

The problem that the team working on the Smart Compose functionality ran into is that it was unable to stop its suggestions from incorporating gender pronouns that would reinforce assumptions that people working in certain fields, such as finance or engineering, were men. As Reuters reports:

Gmail product manager Paul Lambert said a company research scientist discovered the problem in January when he typed “I am meeting an investor next week,” and Smart Compose suggested a possible follow-up question: “Do you want to meet him?” instead of “her.”

The team tried various workarounds, but nothing worked. In the end, the only solution was to prevent Smart Compose from suggesting any gendered pronoun (e.g., “he/him/his” or “she/her/hers”) at all. With Smart Compose currently being used in about 11 percent of all sent Gmail messages, Google saw a hard ban as the best option. “The only reliable technique we have is to be conservative,” said Google’s vice president of engineering Prabhakar Raghavan, speaking to Reuters.

The problem with Smart Compose is one that crops up with disturbing regularity as companies attempt to offload tasks to machine learning. Amazon had to kill off a secret recruiting tool meant to screen resumés because it had a strong bias against women — the tool had been trained with ten years’ worth of Amazon’s hiring records, and was therefore heavily biased towards suggesting male applicants. A computer program used by the U.S. courts for risk assessment in potential criminal behavior was found to be biased against black people. Twitter users managed to turn an experimental Microsoft AI chatbot racist in less than 24 hours.

It’s a problem with no easy solution: machines need material to study in order to learn how to roughly emulate human decision-making processes, and that material has to come from humans. But the output of humans, in aggregate, has the tendency to be problematic across nearly every axis you can imagine.

And the guts of machine-learning programs are black boxes even to their creators — engineers can try to control the inputs, carefully study the outputs, and try to train the programs towards certain results, but it can often be hard to explain why a machine-learning tool made a decision. (A problem, again, that humans suffer from as well.)

Feed a machine-learning program the corpus of Kant and ask it why machine-learning programs tend to become such dicks, and it might suggest that out of the crooked timber of humanity, no straight thing was ever made. Train it on TV from the ’80s and it might explain that it learned it by watching you. Train it in the terse language of programmers, and it might simply say: garbage in, garbage out.

It’s good that Google caught that Smart Compose was assuming all bankers are men and took steps to fix it. But as we continue to try to automate our way around tasks both simple and complex, more and more of these built-in biases will crop up, and it’s unrealistic that human oversight will catch them all. As machines study us, they keep learning humanity’s worst tendencies. Gmail attempted to automate away the drudgery of typing out the same bits of emails ad nauseam, and it automated decades of gender discrimination in the workplace in the process.

Unable to Stop Bias, Gmail Avoids Gender in AI Replies