Yale study highlights importance of larger datasets in brain-behavior research

Peter Salovey President | Yale University

By SC Connecticut News

Jul 31, 2024

Datasets that are too small may lead researchers to overlook relationships between the brain and behavior, a new study finds.

When designing machine learning models, researchers first train the models to recognize data patterns and then test their effectiveness. But if the datasets used to train and test aren’t sufficiently large, models may appear to be less capable than they actually are, a new Yale study reports.

When it comes to models that identify patterns between the brain and behavior, this could have implications for future research, contribute to the replication crisis affecting psychological research, and hamper understanding of the human brain, researchers say.

The findings were published July 31 in the journal Nature Human Behavior.

Researchers increasingly use machine learning models to uncover patterns that link brain structure or function to cognitive attributes like attention or symptoms of depression. Making these links allows researchers to better understand how the brain contributes to these attributes (and vice versa) and potentially enables them to predict who might be at risk for certain cognitive challenges based on brain imaging alone.

But models are only useful if they’re accurate across the general population, not just among the people included in the training data.

Often, researchers will split one dataset into a larger portion on which they train the model and a smaller portion used to test the model’s ability since collecting two separate sets of data requires greater resources. A growing number of studies, however, have subjected machine learning models to a more rigorous test in order to evaluate their generalizability by testing them on an entirely different dataset made available by other researchers.

“And that’s good,” said Matthew Rosenblatt, lead author of the study and a graduate student in the lab of Dustin Scheinost, associate professor of radiology and biomedical imaging at Yale School of Medicine. “If you can show something works in a totally different dataset, then it’s probably a robust brain-behavior relationship.”

Adding another dataset into the mix comes with its own complications — namely regarding a study’s “power.” Statistical power is the probability that a research study will detect an effect if one exists. For example, a child’s height is closely related to their age. If a study is adequately powered, then that relationship will be observed. If the study is “low-powered,” there’s a higher risk of overlooking the link between age and height.

There are two important aspects of statistical power —the size of the dataset (also known as sample size) and effect size. The smaller one aspect is, the larger the other needs to be. The link between age and height is strong; meaning effect size is large; one can observe that relationship even in a small dataset. But when the relationship between two factors is more subtle — like age and how well one can sense through touch — researchers would need more data from people to uncover that connection.

While there are equations that can calculate how big a dataset should be for adequate power, there aren’t any easy calculations for determining how large two datasets —one training and one testing— should be.

To understand how training and testing dataset sizes affect study power, researchers in this new study used data from six neuroimaging studies and resampled it repeatedly while changing dataset sizes to see how this affected statistical power.

“We showed that statistical power requires relatively large sample sizes for both training and external testing datasets,” said Rosenblatt. “When we looked at published studies in this field using this approach —testing models on a second dataset— we found most datasets were too small, underpowering their studies.”

Among already published studies, researchers found median sizes for training and testing datasets were 129 and 108 participants respectively. For measures with large effect sizes like age those sizes achieved adequate power but for medium effect sizes such as working memory those sizes resulted in a 51% chance of not detecting any relationship between brain structure/measures; low effect sizes like attention problems had odds increasing up till 91%.

“For these measures with smaller effect sizes researchers may need datasets ranging hundreds-to-thousands” said Rosenblatt.

As more neuroimaging datasets become available Rosenblatt & colleagues expect more researchers opting/testing their models on separate datasets

“That’s moving right direction” said Scheinost especially with reproducibility being problem validating model second/external dataset solution but want people think about their dataset sizes Researchers must do what they can with available data but as more becomes available aim externally ensuring those test datasets large enough.”

ORGANIZATIONS IN THIS STORY

Yale University