Not so long ago, an astronomer who wanted an answer to a celestial question would book time on a telescope at a university and study only the galaxies or quasars or other astral bodies pertaining to that question. If he wanted to learn about black holes, for example, he would search the skies for individual black holes. Cosmologists mostly worked in isolation, drawing from smallish data sets to test single hypotheses at a time. As a consequence, the field was considered a “fluffy science”; the universe is, after all, so vast and varied — impossible to observe in any meaningful, comprehensive way.
Enter the Sloan Digital Sky Survey in 2000. The telescope survey system* allowed for the first big data approach to the study of deep space, by collecting information concerning more than 500 million stars and galaxies and other heavenly bodies, and making that data openly available online. “Before the Sloan, cosmology was fractured into many fields whose relation to each other wasn’t obvious and wasn’t being studied,” Ann Finkbeiner, the author of a 2010 book about the impact of the survey, once told Popular Science.
“Sloan found all kinds of things in all areas of astronomy: asteroids in whole families, stars that had only been theories, star streams around the Milky Way, the era when quasars were born, the evolution of galaxies, the structure of the universe on the large scale, and compelling evidence for dark energy,” she continued. “So after the Sloan, cosmologists began seeing the universe as a whole, as a single system with parts that interact and evolve.”
This is like that, only for human beings.
In 2017, researchers at New York University will begin to assemble a similarly massive database with the Kavli HUMAN Project, an audaciously ambitious study that will track the biology and behavior of 10,000 New Yorkers for the next 20 years. Their humble goal: to gather enough data over time to learn “everything there is to know about a group of people,” said Paul Glimcher, a neuroscientist and economist, who is the director of the project. The project will begin by recruiting 2,500 New York volunteers, from across all five boroughs, whose entire households will also need to agree to the terms of the research project. For the next two decades, practically everything that happens in their lives will turn to data, which will be made accessible by researchers in a wide variety of fields, such as medicine, psychology, sociology, economics, and public policy.
The project bears some striking similarities to the Sloan Digital Sky Survey. While there are a number of quality large-scale, longitudinal studies, these data sets often focus narrowly on only one, or perhaps a few, aspects of human health and behavior. More often than not in these current studies, questions concerning the biology of a population does not factor in behavior, and vice versa. And yet behavior is a key factor in some of the most pressing problems Americans are facing today: obesity, education, smoking cessation, saving for retirement. The latest research — not to mention common sense — suggests that a person’s health, for example, is a result of a combination of their genetics and their environment; same goes for decision-making.
Upon recruitment, the participants will spend half a day undergoing medical tests; they’ll give samples of their blood, urine, saliva, hair, and feces so that the researchers can keep records of their genetics, blood chemistry, and microbiome, among other things. They’ll repeat this process every three years, and, in addition, the Kavli scientists will also gain access to every participant’s medical records. They’ll also test the participants for various psychological traits, including personality traits, IQ, mental health, and memory; they’ll note each person’s educational and employment history, and track their financial transactions, too. But the data-collection tool the scientists consider most crucial to the project is one that most people carry around with them all day, anyway: their smartphones, of course. Through an app on the participants’ phones, the researchers will be able to record their physical activity, their sleep behavior, their location, and their socialization (i.e., who they text, email, and speak to over the phone).
In other words: This is a lot of data that these 10,000 New Yorkers will be graciously giving to science. For 20 years, practically everything they do will be permanently recorded, to be later scrutinized by scientists from all over the globe. In a 284-page overview of the project, the researchers acknowledge that this may prove to be a recruitment issue:
Many stakeholders express great concern over the highly detailed nature of the data we propose to gather and many scholars express concern that it will be difficult to identify participants willing to undergo this level of scrutiny.
And yet, as Glimcher points out, any person with a smartphone is already shedding this data. “I have my phone in my pocket right now,” Glimcher said in an interview with Science of Us. “That means AT&T knows my location within a meter — and I did not give them consent for that.” The Kavli HUMAN project, on the other hand, is going to politely ask permission first. The researchers promise “the highest possible degree of privacy and security,” with an encrypted security framework similar to that currently required of the HIPAA and FERPA acts. It cannot be sold; it cannot be subpoenaed.
And yet data breaches can and do happen. Judging from their report, the researchers seem to believe that if they can clearly communicate the depth and breadth of this project, people will sign up.
Perhaps they will. Consider, for example, obesity, a massive public-health nightmare in the United States. The latest figures from the U.S. Centers for Disease Control and Prevention show that at least one-third of the adult population is obese, and yet researchers have argued for years about the causes and, as a consequence, the best bets for interventions. As of yet, no study has been able to study all of the food choices a person makes while measuring those against their genetics and environment, write Adam Drewnowski of the University of Washington and Ichiro Kawachi at Harvard in a recent paper published in the journal Big Data. Beyond the biomarkers collected at the start of the study, researchers will also be able to track the food people purchase, along with their physical activity and sleep behaviors, plus the structure of the neighborhood where they live. They’ll also be able to compare a person’s health, weight, or BMI against their personality traits, or their education level.
Or consider cognitive decline — including dementia and Alzheimer’s disease — which affects 16 million adult Americans, according to the CDC. Some intriguing research has recently suggested that cognitive decline is a “whole life issue,” not just something that happens in old age, Andrew Caplin, the “economic data engineer” for the project, told Science of Us. Perhaps people who eventually develop Alzheimer’s disease show signs of it early in life, with very small changes in behavior. Quality studies have already shown that people who are more socially active and have a higher level of education are less likely to develop cognitive decline. But what exactly is it about these individuals who stay healthy, and what separates them from those who do not? Right now, there isn’t a great answer for that. Maybe with the help of a massive database like the one the Kavli HUMAN Project scientists hope to assemble, there will be.
It’s also worth noting that the social sciences, and particularly psychology, is currently grappling with a replication crisis: Too many of the neat little experiments researchers perform in their lab often don’t turn out the same way twice. As a result, the social sciences are often seen as “soft” sciences, especially when compared to the hard data that the natural sciences can provide. How can science possibly presume, many are now asking, to pin down the nuances of something as vast and varied and difficult to observe as human nature?
Then again, remember that this very same question was once posed about the stars and that “fluffy” science of astronomy. Maybe a big data approach to the study of human beings is exactly what this field has been waiting for.
*An earlier version of this post stated that the Sloan Digital Sky Survey is an automated telescope system. It is not automated.