How many studies in the reproducibility project were able to be replicated and held up the original study findings?

How many studies in the reproducibility project were able to be replicated and held up the original study findings?
How many studies in the reproducibility project were able to be replicated and held up the original study findings?

  • Overview
  • Contributors & Supporters
  • Press & News
  • Get Involved
  • Papers on eLife
  • Data & Code on OSF

Project Overview

The Reproducibility Project: Cancer Biology was an 8-year effort to replicate experiments from high-impact cancer biology papers published between 2010 and 2012. The project was a collaboration between the Center of Open Science and Science Exchange with all papers published as part of this project available in a collection at eLife and all replication data, code, and digital materials for the project available in a collection on OSF.

When preparing replications of 193 experiments from 53 papers there were a number of challenges.

2%

experiments with open data

70%

of experiments required asking for key reagents

69%

of experiments needing a key reagent original authors were willing to share

0%

of protocols completely described

32%

of experiments the original authors were not helpful (or unresponsive)

41%

of experiments the original authors were very helpful

Fully designed protocols were submitted to eLife for peer review before conducting the replications - a publishing format called Registered Reports - to ensure that the proposed experiments were of the appropriate rigor and quality. Accepted replication protocols received a commitment in advance to publish the findings regardless of outcome.

Additional challenges were encountered for the experiments that were conducted.

67%

required modifications to complete

41%

of modifications completely implemented

Ultimately, 50 replication experiments from 23 of the original papers were completed, generating data about the replicability of a total of 158 effects. There are many ways to evaluate and characterize replication outcomes, some simplified summaries of the findings include:

  • Replication effect sizes were 85% smaller on average than the original findings
  • 46% of effects replicated successfully on more criteria than they failed
  • Original positive results were half as likely to replicate successfully (40%) than original null results (80%)

Collectively, this evidence suggests opportunities to improve the transparency, sharing, and rigor of preclinical research to advance the pace of discovery.

How many studies in the reproducibility project were able to be replicated and held up the original study findings?
How many studies in the reproducibility project were able to be replicated and held up the original study findings?
How many studies in the reproducibility project were able to be replicated and held up the original study findings?

Get Involved

Supporters

Consider supporting more projects like RP:CB and other efforts to increase openness, integrity, and reproducibility of science.

How many studies in the reproducibility project were able to be replicated and held up the original study findings?

Researchers

Improve workflow efficiency with integrated tools, keep track of your research in one place, learn new tools, preregister your work, and more.

How many studies in the reproducibility project were able to be replicated and held up the original study findings?

Institutions

Learn about tools and services for your communities to increase transparency and visibility of research outputs, accelerating discovery and reuse.

How many studies in the reproducibility project were able to be replicated and held up the original study findings?

Journals & Societies

Increase the transparency and openness of research, learn how to signal open practices, increase your conference’s reach, and more.

How many studies in the reproducibility project were able to be replicated and held up the original study findings?

Gray Matter

  • May 27, 2016

How many studies in the reproducibility project were able to be replicated and held up the original study findings?

Credit...Oscar Bolton Green

LAST year, a colleague asked me if I would send her the materials needed to try to replicate one of my published papers — that is, to rerun the study to see if its findings held up. “I’m not trying to attack you or anything,” she added apologetically.

I laughed. To a scientist, replication is like breathing. Successful replications strengthen findings. Failed replications root out false claims and help refine imprecise ones. Testing and retesting make science what it is.

But I understood why my colleague was being delicate. Around that time, the largest replication project in the history of psychology was underway. This initiative, called the Reproducibility Project, reran 100 studies published in prominent psychology journals.

In theory, this was an opportunity for celebration — psychology was leading the way on one of the most important issues in science. But in practice, people were nervous: Having one’s work replicated is an intense form of scrutiny. Failing to replicate an important finding can tarnish the research, as well as the reputation of the scientist who originally conducted it, even when there is no evidence of fraud.

In the end, people’s skittishness was warranted. The Reproducibility Project reported that only 39 percent of the studies were successfully replicated. These results were highly contentious and led many people to declare that the field of psychology was in crisis.

But this reaction overlooked a fundamental aspect of scientific inquiry: the importance of context. In a paper published on Monday in the Proceedings of the National Academy of Sciences, my collaborators and I shed new light on this issue. Our results suggest that many of the studies failed to replicate because it was difficult to recreate, in another time and place, the exact same conditions as those of the original study.

Our hypothesis was that certain topics would be more sensitive to context than others. Imagine a study that examined whether an advertisement for a “colorblind work environment” was reassuring or threatening to African-Americans. We assumed it would make a difference if the study was conducted in, say, Birmingham, Ala., in the 1960s or Atlanta in the 2000s. On the other hand, a study that examined how people encoded the shapes and colors of abstract objects in their visual field would be less likely to vary if it were rerun at another place and time.

To test this hypothesis, we had a team of psychologists read the abstracts of the 100 papers in the Reproducibility Project, unaware of the results of the replication attempt. They rated each paper on a scale from 1 to 5, according to how contextually sensitive the topic was. Studies were deemed contextually sensitive if they were likely to vary over time (e.g., pre- versus post-recession), culture (e.g., Eastern versus Western), location (e.g., rural versus urban setting) or population (e.g., a racially diverse population versus a predominantly white population).

As we predicted, there was a correlation between these context ratings and the studies’ replication success: The findings from topics that were rated higher on contextual sensitivity were less likely to be reproduced. This held true even after we statistically adjusted for methodological factors like sample size of the study and the similarity of the replication attempt. The effects of some studies could not be reproduced, it seems, because the replication studies were not actually studying the same thing.

There is little question that scientists like us can improve our practices. We should aim to use larger samples and share our materials more readily. (Our own research would have been impossible without access to data from the Reproducibility Project.) However, those sorts of improvements will never eliminate the fact that human behavior varies across contexts.

So what, if anything, can psychologists do? It may be that more conversations like the one I had with my colleague are part of the solution.

When I was approached to help with a replication, my collaborators and I not only sent our research materials to the researchers on the replication team; we also shared an important insight: They would have to completely change the research stimuli to run the replication.

Our original study measured emotional responses in the brain to famous people. The catch was that it was run in Canada in 2006 and the replication would be run at the University of Denver almost a decade later. If the names Jean Chretien, Don Cherry and Karla Homolka don’t get a rise out of you, then you would have a hard time completing our study. The fact is that Canadian politicians, hockey icons and serial killers have little impact on the brains of most American undergraduates.

On our advice, the replication team in Denver generated and pilot-tested a new list of famous figures to use in its replication study. This extra effort paid dividends: The team was able to successfully replicate the results with a much larger sample. Now we have more confidence in the conclusions from our original paper.

The original researchers and the replicators both have a stake in cooperation. Even if a replication attempt fails, the field will find the failure far more informative because both parties agreed on the process in the first place. Then they can set their sights on understanding why the replication results differed from the original study. The lesson here is not that context is too hard to study, but rather that context is too important to ignore.

What percent of studies can be replicated?

A 2019 study in Scientific Data estimated with 95% confidence that out of 1,989 articles on water resources and management published in 2017, study results might be reproduced for only 0.6% to 6.8% of all articles, even if each of these articles were to provide sufficient information that allowed for replication.

How many studies are replicated?

In psychology, only 39 percent of the 100 experiments successfully replicated. In economics, 61 percent of the 18 studies replicated as did 62 percent of the 21 studies published in Nature/Science.

How many studies Cannot be replicated?

The reproducibility problem A 2016 Nature survey3, for example, revealed that in the field of biology alone, over 70% of researchers were unable to reproduce the findings of other scientists and approximately 60% of researchers could not reproduce their own findings.

What percent of psychology studies are replicable?

The average replicability of results in social psychology journals is less than 50%. The reason is that original studies have low statistical power. To improve replicability, social psychologists need to conduct realistic a priori power calculations and honestly report non-significant results when they occur.