replication crisis

Inside Psychology’s ‘Methodological Terrorism’ Debate

It isn’t every day that an academic researcher publicly compares some of her colleagues to terrorists, so it’s probably no surprise that what happened last month sparked a heated debate. That’s when a draft version of an upcoming column in the Association for Psychological Science’s Observer magazine was published online. Written by Susan Fiske, a highly regarded social psychologist at Princeton, the former head of the APS, and a longtime editor at the online journal Proceedings of the National Academy of Sciences, or PNAS, the column decries the current tone of academic debate within the field of psychology. Fiske portrays a landscape in which the long-standing scientific tradition of thoughtful, collaborative critique has given way to a Wild West of anonymous social-media sniping, personal attacks, and all sorts of other unsavory, incivil tactics.

Here are the first two paragraphs:

Our field has always encouraged – required, really – peer critiques. But the new media (e.g., blogs, twitter, Facebook posts) are encouraging uncurated, unfiltered trash-talk. In the most extreme examples, online vigilantes are attacking individuals, their research programs, and their careers. Self-appointed data police are volunteering critiques of such personal ferocity and relentless frequency that they resemble a denial-of-service attack that crashes a website by sheer volume of traffic.

Only what’s crashing are people. These unmoderated attacks create collateral damage to targets’ careers and well being, with no accountability for the bullies. Our colleagues at all career stages are leaving the field because of the sheer adversarial viciousness. I have heard from graduate students opting out of academia, assistant professors afraid to come up for tenure, mid-career people wondering how to protect their labs, and senior faculty retiring early, all because of methodological terrorism. I am not naming names because ad hominem smear tactics are already damaging our field. Instead, I am describing a dangerous minority trend that has an outsized impact and a chilling effect on scientific discourse. I am not a primary target, but my goal is to give voice to others too sensible to object publicly.

Fiske concludes the column, “Psychological science has achieved much through collaboration, but also through responding to constructive adversaries who make their critiques respectfully. The key word here is constructive.”

As soon as the draft went up, the column elicited a fair amount of anger and sarcasm — particularly at Fiske’s use of the phrase “methodological terrorism” to describe the activities of social-media critics (much of this anger and sarcasm was expressed on social media, of course). For a lot of social-science researchers, journalists, and interested bystanders, it became a grab-the-popcorn moment.

But it’s also a lot more than that. If you examine Fiske’s argument and the rebuttals to it closely — and talk to the researchers who have the most at stake when it comes to the question of how social science goes about correcting errors — there’s a strong case to be made that the particular brand of civility touted by Fiske could be bad for psychology, that it will only solidify practices which haven’t done a very good job keeping sloppy findings out of the literature, leading to what is currently a growing crisis of confidence in the field’s ability to publish sturdy results. There’s a strong case to be made that the only real way forward is for psychology to open itself up yet further to informal, online forms of criticism and debate — even if that means that things get unruly, uncomfortable, and sometimes rude.

***

Part of the problem is that Fiske doesn’t offer any specifics other than that unidentified researchers are being harmed by unidentified parties, whose vicious activities are stifling their victims’ research efforts, if not chasing them out of the field entirely. This vagueness led some skeptical observers to argue that Fiske was overreacting, in an opportunistic way, to criticisms recently launched at certain researchers in her own circle.

After all, psychology is in the midst of a replication crisis — lots of findings that have, for a long time, been largely accepted as true seem to be teetering in the wakes of the results of newer and often more methodologically sophisticated experiments. Fiske mentored Amy Cuddy, the “power-posing” guru whose work has recently come under heavy fire, for instance, partly as a result of failed replications. Other popular ideas in the field, like grit and ego depletion, have faced serious replications issues as well. And as an editor at PNAS (one of many editors, it should be pointed out), Fiske has edited a bunch of published papers which garnered media headlines for their splashy findings, but which later faced scrutiny after statistically inclined researchers claimed that those findings were based on quantitatively iffy work.

“Scrutiny” can mean a lot of things these days. In the most extreme cases, shoddy papers are corrected or retracted altogether through established processes of editorial comment and critique. In 2016, though, if you’re uninterested in a potentially months-long back-and-forth with a paper’s authors and editors debating whether and to what extent a key statistic was miscalculated and whether this error warrants a correction, you can take your qualms to a blog or Twitter instead, where the rest of the academic community can judge for themselves whether those qualms have merit, and can offer suggestions for further lines of inquiry. And perhaps no one has used this parallel, informal critique process more prolifically, and to greater effect, than Columbia University statistician and political scientist Andrew Gelman. Fiske doesn’t mention him by name in her column (or anyone else, for that matter), but he’s a big part of this story.

Gelman is a highly respected numbers guy whose peppery blog exists in large part to call out what he sees as an endless torrent of statistically shoddy science being churned out by what are, in his view, often low-quality academic journals — even some of the big-name ones. If you work anywhere in the vicinity of quantitative social science, whether as a major macher or a lowly grad student, you likely know who Gelman is and check his blog, at least once in a while. It helps drive the conversation scientists are constantly having among themselves. Frequently, methodological disputes are hashed out in Gelman’s very active comments section, with Gelman himself often chiming in or surfacing useful insights up into the posts themselves. (Gelman is also a frequent contributor at Slate, where he regularly covers many of the same issues.)

Gelman has been a frequent critic of Proceedings of the National Academy of Sciences — here he is referring to the publication as a “tabloid top journal” in a headline — and of Fiske’s performance as a gatekeeper there, in particular: He has mentioned her a lot by name, according to Google, and usually not in a flattering manner (though some of those hits point to Gelman’s commenters, who don’t seem to be fans, either). It isn’t as though he only focuses on Fiske and PNAS; he regularly calls out what he sees as flashy, statistically unserious clickbait science wherever he sees it. It’s just that PNAS and Fiske seem to come up a lot: Take, for example, Gelman’s blog posts about studies she edited which claimed people don’t take hurricanes with female names as seriously as ones with male names; that the perception of inequality on airplanes might lead to air-rage incidents; and that when people approach an age with a zero at the end, it makes them more likely to embark on certain big new life journeys. All got media attention, and all elicited complaints and methodological critiques from Gelman. (We probably need a parentheses worth of disclosure here, since Science of Us covered all three studies: Melissa Dahl wrote a skeptical piece on hurricanes/himmicanes, in which she quoted an email from Gelman, and a piece about the round-numbers thing that generally accepted that finding, but which Gelman complimented in one of his posts for its detail, for what that’s worth; and I edited a generally credulous article about the air-rage study that earned me a gentle rebuke via email from Gelman, who sent along links to his skeptical posts on the subject. “This was a story that people wanted to hear, so they took it at face value,” he wrote. “I hope New York Magazine can do better next time!”)

Unsurprisingly, Gelman didn’t agree with Fiske’s diagnosis that a tone problem is seriously affecting psychological research. First, he published a very short post with the headline “Methodological terrorism.” “Methodological terrorism is when you publish a paper in a peer-reviewed journal, its claim is supported by a statistically significant t statistic of 5.03, and someone looks at your numbers, figures out that the correct value is 1.8, and then posts that correction on social media,” he writes in that post. “Terrorism is when somebody blows shit up and tries to kill you.” Accompanying the post is a photo of a fragment of the dumpster which was recently blown up in Chelsea.

The very next day, Gelman returned to the issue and went all in. That post is headlined “What has happened down here is the winds have changed,” which is the first line of Randy Newman’s song about the devastatingly biblical Mississippi River flood of 1927, “Louisiana 1927.”

Over the course of more than 5,000 words, with each subhed a subsequent line from the Newman song — “What has happened down here is the winds have changed/Clouds roll in from the north and it started to rain/Rained real hard and it rained for a real long time” — Gelman lays out the recent history of the replication crisis and how it has manifested itself in a variety of fields, not just psychology.

In Gelman’s view, it’s only been very recently that researchers, himself included, have even begun to fully grasp the dire problems with how statistically based science has often been conducted and disseminated to the public. Decades ago, some researchers had identified the potential for certain common statistical practices to yield false-positive results and muck up scientific findings in other ways, but it took far longer for this lesson to fully set in. “As of five years ago—2011—the replication crisis was barely a cloud on the horizon,” Gelman writes. At that time, “there’s a sense that something’s wrong, but it’s not so clear to people how wrong things are, and observers (myself included) remain unaware of the ubiquity, indeed the obviousness, of fatal multiple comparisons problems in so much published research.”

Since then, Gelman argues, attempts to fix social science’s statistical issues have finally achieved full liftoff, thanks to a number of factors: a series of high-profile retractions and instances of fraud; a larger and larger cohort of researchers dedicated to enacting more robust norms of transparency and data-sharing; and the introduction of sophisticated new statistical tools to root out fishy results.

It all happened so quickly, Gelman argues, that old-school researchers might be feeling a bit like, well, they’ve just been hit by some sort of raging flood. “If you’d been deeply invested in the old system, it must be pretty upsetting to think about change,” Gelman writes. “Fiske is in the position of someone who owns stock in a failing enterprise, so no wonder she wants to talk it up.” He continues: “[S]he’s living in 2016 but she’s stuck in 2006-era thinking. Back 10 years ago, maybe I would’ve fallen for the himmicanes and air rage papers too. I’d like to think not, but who knows? Following [Uri Simonsohn, a noted reformer of social science’s methodological practices] and others, I’ve become much more skeptical about published research than I used to be.”

To Gelman, then, this is all about adapting to the times, to the realization that there’s a lot of bad practice in science, that open online collaboration and criticism can play a role in fixing things, and that the “official” gatekeepers have, in many respects, failed. So who are those gatekeepers to then complain when Facebook posts — or, say, blog posts by Andrew Gelman — are a bit heated or angry? Who cares, as long as those critiques are correct? That’s the argument, at least.

***

Another noteworthy response to Fiske’s column was written by Tal Yarkoni, an assistant professor of psychology at the University of Texas at Austin. In a lengthy blog post entitled “There is no ‘tone’ problem in psychology,” he strongly critiques Fiske’s premise that psychological researchers should be particularly worried about civility issues.

Yarkoni argues that to cower at the “methodological terrorists” lobbing their mean tweets and Facebook posts at the authors of published findings is to misunderstand what science is. “We can perhaps distinguish between two ways of thinking about what it means to do science,” he writes. First, he lays out what he calls a negotiation model of science in which “when two people disagree over some substantive scientific issue, what they’re doing is trying to find a compromise position that’s palatable to both parties.” Think of a bustling bazaar where haggling is encouraged: The ultimate goal of the place is to foster mutually satisfactory exchanges, so there’s good reason for everyone to be as polite as possible.

But science isn’t really like that, Yarkoni argues. If a finding is statistically faulty, it’s statistically faulty. And if criticism of a given error makes the author of that error unhappy or uncomfortable, or they think the criticism is unfair or stated too bluntly, or they are convinced the the critiquer is really only going after them because of a spat they had at a conference five years ago — well, tough. If, in fact, the original author truly erred, that error needs to be corrected. That’s what science is.

That’s why Yarkoni rejects the idea of a negotiation model of science, opting for a different and more colorful idea:

A better way to think about science is in terms of what we might call, with great nuance and sophistication, the “engine-on-fire” model. This model can be understood as follows. Suppose you get hungry while driving a long distance, and pull into a convenience store to buy some snacks. Just as you’re opening the door to the store, some guy yells out behind you, “hey, asshole, your engine’s on fire!” He then continues to stand around and berate you while you call for emergency services and frantically run around looking for a fire extinguisher–all without ever lifting a finger to help you.

Two points about this story should be obvious. First, the guy who alerted you to your burning engine is very likely a raging asshole. And second, the fact that he’s a raging asshole doesn’t absolve you in any way from taking steps to put out your flaming engine. It may absolve you from saying thank you to him after the fact, but his unpleasant demeanor unfortunately doesn’t mean you can just choose to look the other way out of spite, and calmly head inside to buy your teriyaki beef jerky as the flames outside engulf your vehicle

In the Yarkonian view, then, at the moment the field of psychology is a giant parking lot where cars (findings) keep spontaneously combusting, and where passersby (those who replicate and critique those findings) keep notifying the owners of the cars (the authors of the findings) that their cars are on fire. Some of the bystanders are rude — they scream “Hey, asshole — your car’s on fire!” — while others practice a deeply ingrained sense of decorum, gently tapping the owners of the smoldering cars on the shoulders and saying things like, “My good sir, I’m frightfully sorry to alarm you, but I do believe your automobile has caught fire.” To Yarkoni, the key takeaway here is Holy shit, why are all these cars blowing up? And in his view, Fiske is more concerned with Why are people being so rude when they point out these flaming cars?

“Much of her op-ed, and subsequent argument in interviews, seems driven by a prioritization of the people doing science over the science itself,” Yarkoni told me in an email. “I’m not saying that that’s wrong, necessarily, but I think it would be good for her to recognize and ideally admit that her argument is, at bottom, about values, not reasoned argument: she’s essentially placing a higher value on protecting one subset of people currently doing science than on doing everything we can to get the science right. That’s a perfectly consistent position to hold, but I don’t think she recognizes that a large (and growing) subset of psychologists don’t share her value system, and so they’re not really interested in, for example, expressing criticism privately rather than publicly, because that’s not nearly as scientifically useful — even if it’s more likely to hurt feelings.”

That question of which “subset of people” in psychology might benefit from Fiske’s call for a renewed focus on civility is absolutely vital to understanding this debate, and it animated many of the skeptical responses to Fiske’s column. Social science has never been a particularly remunerative field, of course, but it can be safely said that in recent years it has gotten more cutthroat and more winner-take-all: Opportunities for research funding and for tenured positions are contracting, partly because of a relentless series of state budget cuts affecting big state universities and partly for other reasons.

Meanwhile, when success does strike it’s increasingly visible, partly because, despite the field’s problems, the masses seem to be hungrier than ever before for social-science-tinged findings: If you’re Amy Cuddy and you study power posing, or you’re Angela Duckworth and you study grit, or you’re one of the other handful of researchers whose work has been embraced by a mainstream audience, the spoils are considerable. There are a host of outlets eager to write up your findings; there are book deals from general-audience, or trade, publishers (which can often deliver far more money and bigger audiences than academic-press ones); there are speaking gigs at corporate retreats and at airy ideas conferences in pretty resort towns.

It’s much harder, of course, to be a researcher who, because of your esoteric interests or your bad luck with your adviser when you were 24 or whatever else, likely isn’t going to get a hefty book deal anytime soon. It’s much harder to be a researcher who is barely clinging on to a bottom-rung adjunct position, and who then sees that Susan Fiske, who runs a lab at Princeton and who has basically everything you never will, is using column space in the Observer to decry … rude Facebook posts. Members of this group — the have-nots, basically — far outnumber social science’s superstars. They’ve watched as a number of the superstars’ findings have imploded, and now they’re being told they need to practice a bit more decorum with regard to the tone and venue of their critiques. How could resentment not be a factor here?

***

In an interview, Fiske said that her column really is just about a small group of abusive personalities. She didn’t want to get into the broader conflicts currently roiling the field, nor did she want to mention specific names.

When I mentioned Gelman and how I felt like his blog served as an important flashpoint in this conversation, for example, Fiske responded, “I don’t want to talk about him in particular, because I think it’s not about him.” Rather, “I think there are several people who habitually engage in focused attacks. And I use the word attack because they often impugn people’s motives, and, secondly, some of the commentators have such an intense frequency of focusing on one person that it doesn’t seem entirely scientifically justified. It seems more like they’re going after the person than looking for particular things to comment on. I don’t want to speculate on their motives either, but the pattern has been called — not just by me — has been called cyberbullying, and I think cyberbullying is people who are powerful by dint of either outranking the person they’re criticizing, who is a vulnerable person, or by dint of sheer frequency of critique by a focal person and then also by people who pile on to critiquing an individual. Once a point has been made that there might be a statistical error, it doesn’t require everybody piling on and saying, Isn’t it awful?”

Fiske kept going back to the importance of maintaining rules of decorum. “I’m just asking for reasonable discourse — part of it is just, would you say the same thing to the person’s face,” she said. The challenge, of course, is that different people define reasonable in different ways, but Fiske did offer one suggestion: “If there were a neutral third party judging the exchange, would they judge it to be a fair critique?”

She emphasized the role of peer review, even in initial critiques of someone else’s work. “I personally think that editorial and peer review of letters of concern is a good idea,” she said. Were this the norm, a lot of what Gelman does would be considered untoward. To him and a lot of other tech-savvy critics of how science is currently conducted, we’re well past the age of letters of concern. But Fiske thinks it’s very important that the process of critiquing people’s work be a fairly controlled one, because otherwise things get chaotic and unfair. “Just because somebody thinks they have found an error in somebody else’s work, it doesn’t make them above the norm of getting other people’s input as to whether you’re making a reasonable public statement or not,” she said.

So what’s the problem with people posting their critique online and seeing whether other people agree that it holds up? “Because it’s potentially damaging to the authors,” said Fiske, “both to their reputation and to the standing of the finding if the critique is wrong or trivial.” I asked if she was saying that, if one researcher finds a flaw in another’s paper, they shouldn’t post that flaw online until they go through some sort of peer-review process. “Yeah, that would be my preference,” said Fiske. “I mean, I know not everybody agrees with that, but I think that would be a more reasonable route. And I think the authors ought to have some chance to respond in writing as well, preferably posted at the same time as the critique, but certainly they should have an equal chance to respond.”

I was also curious about Fiske’s claim that psychology has a serious problem with powerful people cyberbullying less powerful ones. I’d had in mind a slightly more populist conception of the impact of technology on social science — whether or not you think there’s a rudeness problem, it certainly feels like blogs and social media level the playing field and made it easier and safer for small fish to nibble at bigger ones in often important ways. I brought up Fiske’s former mentee Amy Cuddy as an example: Yes, there’s been a lot of vitriol directed her way, but isn’t this also a case in which newer, less formal structures of online critique helped bring scrutiny to some very famous findings that might not have been ready for prime time? “Well, I don’t really want to talk about Amy, because she wasn’t the reason I wrote it,” said Fiske. “She was one of many reasons. But I’m not sure I would characterize her as powerful — you know, she’s untenured.”

***

Fiske, in part by dint of her proximity to Cuddy and to others who have, in her eyes, been the brunt of severe, frequently unhinged criticism, does seem to really believe psychology has a pressing tone problem, that things are skidding off the road a bit. But it’s important to think seriously about what would happen if the field enacted her recommendations, because it would entail a serious clawing-back to an earlier time in which the scientific discourse was much more tightly controlled.

The researchers who have the most at stake here are debunkers — those who have discovered errors, fraudulent or otherwise, in other people’s work, and have attempted to get them fixed. Generally, this is not an easy thing to do; social science, and science in general, frequently swat angrily at attempts at correction. But what’s striking is that when debunkers tell stories about how frustrating and bruising it was to get even straightforward errors fixed, they blame exactly the norms Fiske seems intent on defending, such as the presumption that published findings need to lent a certain gravitas and critiqued only in a very specific, sanctioned manner, and the idea that it’s untoward to bring too much attention to other researchers’ possible errors.

The political scientist David Broockman, for example, was delayed in his discovery and publication of Michael LaCour’s fraud by prevailing social norms which caused friends and colleagues to tell him to stay quiet, to keep potentially explosive early hints that something was amiss to himself, lest it harm his career since he’d be seen as attacking a senior researcher (Don Green, the now-retracted paper’s co-author) and a blockbuster finding. He even pasted some of his statistical findings to PoliSciRumors.com, an anonymous academic message board consisting largely of the sort of vitriol Fiske decries in her column, hoping someone else would pick up and run with the weirdness he’d uncovered, since he was so frightened to do so.

Then there’s Steven Ludeke. While a graduate student at the University of Minnesota, he noticed a couple of very simple published errors involving the psychological construct of “psychoticism,” and in a careful, by-the-book fashion, had his adviser reach out to one of the authors who had erred. This led to a years-long ordeal, the details of which you can read here, in which those authors gave him the runaround, refused to share their data, and tried to slag Ludeke — to others in the field and, once I began reporting on the story, to me — as an obsessed zealot, when the extensive email record Ludeke shared with me showed he’d been polite throughout his attempts to alert the authors to their error and gain access to their data.

In light of all of this, Ludeke doesn’t have much faith in the ability of the traditional approach to getting errors rectified. “The amount of time that this project took for me, the amount of time and the amount of stress, makes this clearly a bad choice to have pursued, unequivocally,” he said back when we first spoke over the summer. “If you have a chance to say something to that effect in the article — ‘I do not find this to be a recommendable experience’ — I would like that.” And when I emailed him to ask about Fiske’s column, he told me that, while “[t]here are points worth discussing” on the civility front, “The idea that the playing field is somehow currently balanced against the authors of a given piece in favor of those critiquing a particular piece is entirely inconsistent with my own experience.” He’s also worried about the potential effects of big-name researchers like Fiske criticizing the recent trend toward online openness. “It might be that the defensive responses from senior people have convinced some [young researchers] that there’s no need to update their approach to research,” he said, despite the steady stream of errors highlighted by Gelman and others which hint strongly that social-science researchers must get more methodologically rigorous. “That would be sad, if true.”

Or, finally, take a story I haven’t written about — one involving Marcus Crede, a psychologist at Iowa State who has enthusiastically poked and prodded at what he see as fragile-seeming findings (he recently reported findings which suggest that the purported connection between grit and performance is overstated, for example). He wasn’t rewarded for his muckraking, either. Starting around 2011 or 2012, Crede said, he uncovered what he views as serious irregularities in the data underpinning the work of Fred Walumbwa, a management professor at Florida International University, eventually leading journals to retract seven of Walumbwa’s papers. As Crede started to investigate and report his findings, contacting the editors who had published Walumbwa’s work and some of Walumbwa’s co-authors, he says he was hit with a wave of backlash from Walumbwa. According to Crede, Walumbwa himself called Iowa State’s provost and dean to accuse Crede of racism (Walumbwa is black), and made the same allegation to one of Crede’s former deans and a journal editor as well. Crede said he also received a “cease and desist letter from Walumbwa’s lawyer threatening me with a defamation lawsuit for talking to journal editors and some of his co-authors.”

So like Ludeke, Crede did not come away inspired by social science’s established practices for correcting errors. “I’d like to have a sense of how people should respond,” Crede said of Fiske’s column. “My only experiences with jumping through the hoops, as she kind of recommends, to submit commentaries and to contact editors, and all these kinds of things, ended up with me being threatened legally and having me being accused of being a racist to my dean and provost.”

In these three cases, Broockman, Ludeke, and Crede had a miserable time taking the official, respectable approach to getting bad findings fixed. In all three cases, part of the problem was the sense that they were on their own, up against researchers who were more powerful or esteemed than they were, highly motivated to sweep their errors or alleged errors under the rug, or both. Do unofficial critiques on blogs and Twitter solve these problems? Of course not. But they do solidly nudge scientific norms and standards in a more open direction, where, to take one example, junior researchers don’t need to feel like they could be making a career-risking move by pointing out an obvious error or a suspicious snippet of code. They destigmatize and demystify the process of saying, “These numbers look weird to me. Guys, do you think these numbers are weird?”

That’s what seems to be missing from Fiske’s call for civility — an acknowledgement of the wider context of the replication crisis and the social and professional norms that can stymie science’s gradual, zigzagging journey toward the truth. For now Fiske is showing little desire to budge from her position. She does, though, seem to have acknowledged that her word choice may have been off; Fiske said that the final version of her column, which should be online November 1, will not have the phrase methodological terrorism in it. “I agree that it was distracting people,” she said. “But at the same time, it had elements of truth to it, so that’s why I used it.”

Inside Psychology’s ‘Methodological Terrorism’ Debate