Psych Researchers Might Have an Amazon Mechanical-Turk Problem on Their Hands

By

If you’re a psychologist who likes to run experiments, access to subjects for those experiments is a perpetual concern. The more subjects you have in a given experiment, the more oomph that experiment will pack once it’s published; and the more diverse and representative of the broader population your subjects are, the more meaningful your results will be seen as by other researchers. But large, diverse subject pools cost a lot of money to bring in — this can be a major logistical hurdle.

The first, easiest option for psychologists is college students. If you’re an academic psychologist, after all, students are constantly buzzing around you, like insects, and since many of them are cash-strapped, they are more than willing to sit in a lab for a couple hours in exchange for enough cash to buy their friends a couple rounds at the local campus pub — or, in other cases, for course credit (participation in an experiment or two is part of many psychology courses).

College students aren’t perfect, though: For one thing, while on an individual level they might be willing to work for a bit of pocket change, on many campuses they do tend to be richer and less diverse, in the aggregate, than the broader U.S. population. And they’re certainly younger. So if you identify some interesting psychological mechanism, but your reference point is a group of college students, an easy, valid critique is “Okay, that’s kinda interesting, but are we sure this applies to people who aren’t 20 years old?”

In recent years, an unexpected alternative to college students has emerged: Amazon Mechanical Turk, a service where human workers compete endless series of tasks for tiny chunks of change. One of the main purposes of MTurk is to have humans help make computers smarter at tasks that have typically come easily to humans but less so to machines — by training them to identify the contents of photos and so forth — but researchers have also found it offers a huge pool of potential participants for certain types of experiments.

While MTurk can’t replace all college students — sometimes, for whatever reason, experiments require in-person interaction — psychologists have enthusiastically assigned countless thousands of Turkers the “job” of acting as subjects in various psych experiments. This approach has exploded: As an article in Science points out, the number of studies using Turk workers jumped from 61 in 2011 to more than 1,200, according to Google Scholar searches.

But there’s some uneasiness in Turk-land. As that article, written by John Bohannon, points out, researchers have realized two things about MTurk: They’ve grown crazily dependent on a private commercial service that wasn’t even originally built for psych research; and they might not be as sure as they thought they were about who the Turk workers (Turkers) are, which could raise questions about the validity of some insights gleaned from Turk-based experiments.

“At this point, MTurk has become so important for social science that the National Science Foundation should be negotiating directly with Amazon,” Todd Gureckis, a New York University psych researcher, told Bohannon. “We’re subsidizing this service with millions of dollars in federal grant money.” But the reality is that Amazon runs Turk as a money-making operation — it takes a cut of the transactions between job assigners and Turk workers — and never claimed to be in it for the science. That can account for some of the ethical problems that have popped up with MTurk experiments: Subjects who drop out don’t get paid for the time they’ve already put in, as they would in a real-world lab, and in some cases it is apparently possible to determine the identities of individual participants in an experiment, which is a major no-no from a research-ethics standpoint.

“And looming over” all of this, writes Bohannon, “are questions about who these anonymous volunteers actually are, and concerns that they are less numerous and diverse than researchers hope.” While researchers who enjoy the flexibility of MTurk, and what appears to be a nicely sized pool of Turkers, have argued that it’s a sufficiently diverse group to make for sturdy experimental results, Bohannon notes that there have been some recent red flags:

Neil Stewart, a psychologist at the University of Warwick in Coventry, U.K., led the first effort to estimate the effective MTurk research population with this method—and the results sent shock waves through the community last year. Seven psychology labs in the United States, Europe, and Australia ran 114,000 experimental sessions over a 3-year period. The number of unique people among the subjects came to only 30,000. Rather than a pool of half-a-million subjects always on tap, Stewart estimated that the true number of Turkers that are willing to take part in an experiment at any one time is only about 7300.

“What seemed like a virtually infinite subject pool was in fact more like a very large state university psychology pool,” Gureckis says. Stewart’s data show that the population churns rapidly: Half the Turker population that participates in research is replaced by fresh people every 7 months.

Those Turkers are also far less diverse than was thought. Though Amazon has long noted the global nature of the community, surveys of those completing experimental tasks reveal that the vast majority are based in the Unites States. And compared with the average American, Litman says, Turkers “skew young, they are more liberal, more urban, and more likely to be single.” Knowing such traits, he notes, is crucial for researchers as they try to interpret their data.

Young, liberal, likely to be single … that sounds a lot like another group commonly used for experimental research. It would be funny if it turned out that Turkers are mostly just college kids who want to participate in experiments but have trouble rolling out of bed before noon to get to the lab — though I’m not sure psych researchers would appreciate the humor.