The Curious Case of Meta-Science: Why One Study is Not Enough

Senior Lecturer Jan Feld's research explores meta-science, which is the study of how the scientific process works.

Meta-science, also known as "science on science," has become my current intellectual obsession. It’s the study of how the scientific process works, and when it doesn’t, how it can be improved. This meta-perspective has transformed how I approach research, forcing me to confront uncomfortable truths about the limitations of individual studies.

Let me take you back to the beginning. My collaborator, Ulf Zölitz, and I began our academic careers by investigating peer effects at a Dutch business school. We found that students who were randomly assigned to higher-achieving peers performed better academically. Simple enough, right? But what happened next was more revealing than our original findings. Other researchers began citing our work as evidence of a universal truth about human behaviour: that being around high achievers boosts everyone's performance. This interpretation was flattering but misleading. We had shown this effect in a very specific context—one Dutch business school—not across the entire spectrum of human experience.

This misinterpretation isn’t unique to our work. It’s a common story in the social sciences, where findings from specific studies are often generalised far beyond their original context. I realised that I, too, was guilty of this. I would read a study showing, say, that women are more risk-averse than men, and I would assume this applied universally, rather than considering the particular sample or methodology used.

Then I stumbled into the world of meta-science, which made me question my assumptions. Four major revelations stood out:

1. Diverse findings across studies

Meta-analyses, which aggregate data from multiple studies on the same topic, often reveal wildly different results. These discrepancies are visualised in funnel plots, where the estimates from various studies are plotted. Instead of being closely together, they usually scatter across the chart, sometimes even contradicting each other. This variability suggests that findings can be highly context-dependent.

2. The replication crisis

Perhaps the most startling revelation was that many studies fail to replicate. If a finding is robust, repeating the study with a different sample should yield similar results. However, large-scale replication projects have shown that this is often not the case. In fields like psychology and economics, the replication rate hovers around 50%.

3. Publication bias and p-hacking

The scientific community has a preference for statistically significant results, which are more likely to get published. This creates a perverse incentive for researchers to "p-hack"—that is, to tweak their analyses until they get the desired p-value. The consequence? A literature filled with false positives, where findings appear more reliable than they actually are.

4. The WEIRD problem

Most social science research is conducted on WEIRD (Western, Educated, Industrialised, Rich, and Democratic) populations, often university students. This narrow focus can lead to misleading generalisations. For instance, Joseph Henrich and his colleagues have shown that WEIRD populations are often outliers, not representative of humanity at large. They point out that phenomena like the Müller-Lyer illusion, a staple in psychology textbooks, vary greatly across cultures—being strong in WEIRD societies and almost absent in others.

These insights have made me more sceptical of individual studies. However, there's a silver lining: the scientific community is increasingly aware of these issues. Organisations like the Institute for Replication aim to test the robustness of published findings. Top journals are beginning to require that researchers share their datasets and analysis scripts, and pre-registering studies to curb p-hacking is becoming more common.

There’s also a cultural shift. Many researchers, myself included, are now committed to improving transparency and replicability in science. In my own work, I’ve started to focus on examining the same research question across multiple contexts. This approach helps clarify how generalisable a phenomenon is.

For example, I've been investigating whether women are more risk-averse than men using data from 120 countries and 64 measures of risk attitudes. Preliminary findings suggest that the answer varies by context, challenging many studies that take gender differences in risk-taking as universal.

While the replication crisis and other issues may seem like grim news, they also highlight the self-correcting nature of science. The road to more reliable science involves embracing meta-science, promoting transparency, and broadening our research focus beyond WEIRD samples. We may not have all the answers yet, but we’re asking better questions.

To view the infographic based on this article, click here.

Dr Jan Feld

Senior Lecturer

School of Economics and Finance

jan.feld@vuw.ac.nz