The Food and Brand Lab at Cornell University publishes a huge amount of research about how people perceive, consume, and think about food. The lab covers subjects ranging from seasonal trends in weight gain to how happy music influences employees, and its director, the marketing and consumer behavior expert Brian Wansink, regularly touts his lab’s research during his frequent media appearances, focusing particularly on the behavioral science underlying people’s consumption habits.
At first glance, the Food and Brand Lab is an exemplar of a wildly successful, publicly facing research institution dedicated to improving the real world through applied behavioral-science findings. Or it felt that way, at least, before Wansink published a strange blog post last month, which led to the subsequent discovery of 150 errors in just four of his lab’s papers, strong signs of major problems in the lab’s other research, and a spate of questions about the quality of the work that goes on there. Wansink, meanwhile, has refused to share data that could help clear the whole thing up.
The blog post that kicked off this episode was published back on November 21 with the headline “The Grad Student Who Never Said ‘No.’” In it, Wansink told the story of a Turkish Ph.D. student who came to work in his lab for free. “When she arrived,” he wrote, “I gave her a data set of a self-funded, failed study which had null results (it was a one month study in an all-you-can-eat Italian restaurant buffet where we had charged some people ½ as much as others). I said, ‘This cost us a lot of time and our own money to collect. There’s got to be something here we can salvage because it’s a cool (rich & unique) data set.’ I had three ideas for potential Plan B, C, & D directions (since Plan A had failed).”
Wansink went on to explain that the Turkish student, revealed later in the post to be Ozge Sigirci, gamely took on the assignment, and compared her favorably to a paid postdoc working in his lab at the time who turned down both that data set and another one he offered them, saying they just didn’t have time to do the work. “Six months after arriving, the Turkish woman had one paper accepted, two papers with revision requests, and two others that were submitted (and were eventually accepted — see below),” Wansink wrote. “In comparison, the post-doc left after a year (and also left academia) with 1/4 as much published (per month) as the Turkish woman. I think the person was also resentful of the Turkish woman.” The lesson here, he explained, was to “Make hay while the sun shines” — if a senior researcher or mentor gives you a research opportunity, take it, even if it means less time to devote to “Facebook, Twitter, Game of Thrones, Starbucks, [or] spinning class.” While “most of us will never remember what we read or posted on Twitter or Facebook yesterday,” Wansink concluded his post, “this Turkish woman’s resume will always have the five papers below.” Wansink then listed five papers with Sigirci’s name on them, all of which had Wansink as a co-author as well.
A few weeks later, news of the blog post began circulating among a group of researchers online, and some of them were disturbed by it. “Brian - Is this a tongue-in-cheek satire of the academic process or are you serious?” asked the first commenter, who chimed in in mid-December. “I hope it’s the former.” The critics were concerned on two fronts. First, some viewed the labor practices Wansink was describing as at least a little bit exploitative — in a sense, he was rewarding Sigirci, an unpaid young Ph.D. student, for taking on an assignment while seeming to criticize a paid postdoc who couldn’t fit in Wansink’s desired data-analysis because of other projects.
Second, and more important for our purposes, Wansink was acknowledging, with surprising openness, taking a “failed study which had null results,” slicing and dicing the data until something interesting came out, and then publishing not one but four papers based on said slicing and dicing. Researchers have broad latitude to approach their data however they want, of course — just because an initial hypothesis fails doesn’t mean an entire data set is suddenly radioactive and a researcher shouldn’t use it further, of course. But the sort of fishing expedition Wansink described is very likely to lead to false-positive results that mean a lot less than they appear to. One of the truisms of statistics, after all, is that if you analyze enough data from enough angles, you will discover relationships that are “significant,” in the statistical sense of the term, but that don’t actually mean anything. (There’s an “xkcd” comic strip that totally nails the abuse of the concept of “significance.”)
And Wansink described these questionable practices at a very sensitive time for psychology. The field’s ongoing replication crisis is littered with instances in which researchers reported cool, media-friendly results that, it was later revealed, were little more than artifacts of data butchery — when other teams subsequently tried to replicate those results in a more rigorous way, they failed. Many of psychology’s most exciting “This One Simple Trick Can X”–style findings have turned out to be little more than statistical noise shaped sloppily into something that, in the right light and if you don’t look too hard, looks meaningful. So to those worried about what they see as a current lack of rigor in psychological research, Wansink’s blog post was like a million red flags shot out of a thousand cannons while deafening klaxons screamed.
Eventually Wansink’s blog post, and the concerns it had generated, came to the attention of Jordan Anaya. Anaya is a computational biologist and independent researcher who created PrePubMed, a search engine for preprints, or draft versions of research that hasn’t yet been peer-reviewed. Anaya has also created tools to help detect statistical anomalies in published research, and a friend of his asked him to apply one of those tools to the Wansink papers. “The entire post was so unbelievable I just wanted to see what the papers looked like, and how carefully they were done,” he explained in an email. “At least in my field, submitting five papers over a period of a couple months [as Wansink did] is basically unheard of, and combined with the description of how the analyses were performed, I suspected the papers would be of very low quality. In addition, I was looking for a good opportunity to take my tool out for a spin, and it’s really easy to do, so I went ahead and ran the papers through my web application[.]” When he found some problems, he notified Nick Brown, a Ph.D. student at University Medical Center Groningen and “Self-appointed data police cadet,” as per his Twitter bio, whose data-sleuthing techniques power Anaya’s software. They teamed up with Tim van der Zee, a researcher at Leiden University in the Netherlands, and looked at four of the five papers mentioned by Wansink that appeared to be based on the same buffet data set (of the four, the most attention-getting one reported that “Men eat more in the company of women” — it got a lot of coverage). Their work culminated in a preprint called “Statistical heartburn: An attempt to digest four pizza publications from the Cornell Food and Brand Lab,” which they published January 25.
The trio reported a shocking number of errors — about 150 “inconsistencies” in the data reported in the four papers. And in many cases, they wrote, the data tables in the papers reported results that, to a statistically sophisticated reader, should have immediately jumped out as impossible on their face. To take an oversimplified example of the types of mistakes Wansink and his co-authors made, imagine if someone measured two people’s height in inches, rounding to the nearest inch, and then reported that their average height was 71.3. On its face, that’s an impossible result — if you divide a whole number by 2, math tells us that the decimal value of the result must be either 0 or 5. The four papers were littered with these sorts of errors — situations in which in light of A and B, C was simply an impossible figure. For example, van der Zee and his colleagues report that all the red values in these tables from the men-eat-more-around-women paper must be wrong:
The preprint quickly came to the attention of Andrew Gelman, a Columbia University statistician known as one of the smartest and most widely read skeptics of sexy but shoddy research findings — not to mention one of the most acerbic. Gelman, who had already blogged on the subject of Wansink’s blog post and pizza studies in December, did so again in late January, running down both the van der Zee team’s findings and “other inconsistencies even beyond” what they had uncovered. (In both blog posts, Gelman, who is not one to mince words when he comes across research he views as unprofessional, wrote that he disagreed with Wansink’s claim that watching Game of Thrones would be a less fruitful use of a young researcher’s time than generating these sorts of results.)
Van der Zee and his colleagues would like to look at the raw data underpinning Wansink’s studies, but have had no luck in acquiring it. Brown told me that the team started by emailing individual authors on the paper asking for the data. When they didn’t reply, they emailed what appeared to be a generic address for the lab, and there was some back-and-forth. But the lab stopped responding on January 10, when the research team made it explicitly clear that they had found errors in the research and wanted to look into them by reanalyzing the data. Later in the month, one of the two authors of one of the flawed papers, either Wansink or Sigirci (Brown thinks it’s much more likely to be Wansink), claimed in a comment on that article’s preprint that they can’t share the data because doing so would jeopardize the anonymity of participants. “It’s sensitive to the diners because it has the names of everyone they were dining with, how much alcohol they had ordered, how much food they ate, and other identifying data (a 46 year old woman 5’9 and 136 lbs dining with a 49 year old man how is 6’3 and … ).” Wansink sounded a similar note just a couple days ago in a reply to van der Zee on the original blog post that sparked this whole episode: “This was collected in a town of less than 1000, and knowing gender and age and BMI (or height and weight) might not be so anonymous at the extreme.” (Wansink has made these argument in one of the addenda to his original post, where he also said that he had assured the restaurant in question he wouldn’t release the data since it contains sales stats.)
Wansink seems to be saying that if the data became public, even in anonymized form, a dedicated snooper could trawl through it, trying to connect who was eating dinner with whom at an all-you-can-eat pizza buffet on the basis of their heights and weights or other information. In a Skype conversation, Brown responded that even if he’d taken this claim seriously — which he didn’t — that might be a reason not to post the data publicly, but isn’t a reason to refuse to share it with researchers who have been made aware of the anonymity concerns. As he and others have pointed out, Wansink hasn’t been consistent in his claims about the need to protect participants’ identities. In one of his addenda, he wrote that “All of the editors were contacted when we learned of some of the inconsistencies, and a non-coauthor Stats Pro is redoing the analyses.” But if Wansink can share the data with his “Stats Pro,” who wasn’t an original author and therefore doesn’t have initial permission to access the data, why can’t he share it with the researchers who discovered the 150 errors?
Last week, van der Zee, Brown, and Anaya escalated the issue by sending a letter to Amita Verma, director of Cornell’s Office of Research Integrity and Assurance, CC’ing Carol Devine, chair of the university’s Institutional Review Board, in which they requested access to the data in a more formal manner. Yesterday, they heard back. “Cornell University supports open inquiry and vigorous scientific debate,” Verma wrote. “In the absence of sponsor or publisher data sharing requirements, however, Cornell allows its investigators to determine if and when it is appropriate to release raw data, subject to any IRB imposed limitations.” In other words, Wansink gets to decide whether he’s going to share the data. (Wansink didn’t respond to an emailed request for comment. Cornell’s head of media relations, John J. Carberry, wrote in an email that “Recent questions have arisen regarding the statistical methods utilized by Professor Brian Wansink. Cornell is committed to the scientific method, fully supporting open inquiry and vigorous scientific debate. While Cornell encourages transparent responses to scientific critique, we respect our faculty’s role as independent investigators to determine the most appropriate response to such requests, absent claims of misconduct or data sharing agreements.”)
So that’s where we are now. At the moment, the situation is uncertain in part because while Wansink did say in one of his addenda that his lab will be enacting stronger standards, other than his promise that his “Stats Pro” will reanalyze the data, he hasn’t fully addressed, head-on, the question of how so many errors could have found their way into his papers.
Here’s what is clear at the moment, though:
1. Until Wansink can explain exactly what happened, no one should trust anything that comes out of his lab, and should also be skeptical of anything published in the journals in question. This might sound harsh or like an overreaction, but it very much isn’t. It isn’t the job of researchers and journals to publish perfect research — that would be impossible. It is very much their jobs, however, to publish research that can sustain a bare minimum of poking and prodding without falling apart like a sloppy joe. The four papers Anaya’s team analyzed are shockingly unprofessional, and they’re just the start of it — in a blog post Anaya published on Medium Monday, he identified errors in six more papers from the Food and Brand Lab. “I wouldn’t say the issues were as major as the pizza papers, but I saw the exact same problems as I found in the pizza papers, so there seems to be a clear pattern,” he said in an email. “What is causing these same errors to keep showing up is a mystery.” And if those ten papers were riddled with problems, what reason is there to think the Food and Brand Lab’s other research isn’t? None, until Wansink can fully explain what makes those ten papers special.
Much of the above applies to the journals where this work was published — the Journal of Sensory Studies, the Journal of Product & Brand Management, Evolutionary Psychological Science, and BMC Nutrition. Until they can explain how Wansink’s work got past their quality-control mechanisms and either retract or make massive corrections to the papers themselves, it just isn’t clear why anyone should view them as trustworthy outlets for scientific research.
2. Whatever happened here was likely exacerbated by the insatiable appetites of the academic-research ecosystem. The Cornell Food and Brand Lab produces a large quantity of studies, and many of them get written up because they offer such interesting, practical-seeming findings (both Science of Us and the Cut have covered Wansink’s research in the past, and Wansink himself has written for SOU). Undoubtedly, the lab and its occupants benefit in myriad ways from just how prolific they are. But when you churn out a lot of studies, it’s hard to maintain high quality-control standards. As van der Zee and his colleagues put it in their paper, at the moment the replication crisis is being fueled by “an incentive structure that rewards large numbers of publications reporting sensational findings with little penalty for being wrong.”
We can’t know that’s what happened here, but it’s a reasonable bet that it’s part of the story.
Along those same lines, lower-level researchers working at Cornell’s lab, and at others like it, face some brutal incentives of their own. Wansink himself made that clear in one of his addenda. “For Post-docs, publishing is make-or-break — it determines whether they stay in academia or they struggle in academia,” he wrote. “Metaphorically, if they can’t publish enough to push past the academic gravitational pull as a post-doc, they’ll have to unfairly fight gravity until they find the right fit. Some post-docs are willin[g] to make huge sacrifices for productivity because they think it’s probably their last chance. For many others, these sacrifices aren’t worth it.” Of course, the sacrifices also aren’t worth it if they lead to a bunch of mangled papers.
3. The data-sharing revolution can’t come to psychology fast enough. In some fields, data sharing and transparency are already established norms. While psychology is making progression on this front, it has a ways to go, and that fact is doing real damage to the field’s credibility. Now, it’s conceivable that in certain isolated cases, researchers might have good reason not to share data. There can be legitimate anonymization concerns, for one thing, and there can also be free-rider problems given that rich data can be expensive to collect. But setting aside a small handful of exceptional cases, it doesn’t make sense that in 2017, researchers can still get away with refusing or slow-walking data requests when their work comes under justifiable fire. Van der Zee and his team shouldn’t have to jump through hoops like this — especially not after they have done the public service of revealing that the Food and Brand Lab, a generally celebrated and frequently cited institution, has been churning out some extremely sloppy work.
Unless and until the data are shared, this episode will do real damage to the Food and Brand Lab’s, well, brand. “If he doesn’t want to share the data, there’s no rule that he has to, right?” argues Gelman. “It seems pretty simple to me: Wansink has no obligation whatsoever to share his data, and we have no obligation to believe anything in his papers. No data, no problem, right?” Yes, exactly.