Thursday, September 01, 2005

Half of science papers wrong?

World Net Daily, in a bow to the "Wing Nut Daily" demographic, links to this piece with the headline, "Most Scientific Papers Probably Wrong".

What's going on?

The linked piece, in New Scientist, says, in part:

Assuming that the new paper is itself correct, problems with experimental and statistical methods mean that there is less than a 50% chance that the results of any randomly chosen scientific paper are true.

...continued in full post...

The "experimental and statistical methods" referred to include:

John Ioannidis, an epidemiologist at the University of Ioannina School of Medicine in Greece, says that small sample sizes, poor study design, researcher bias, and selective reporting and other problems combine to make most research findings false. But even large, well-designed studies are not always right, meaning that scientists and the public have to be wary of reported findings.
Traditionally a study is said to be "statistically significant" if the odds are only 1 in 20 that the result could be pure chance. But in a complicated field where there are many potential hypotheses to sift through - such as whether a particular gene influences a particular disease - it is easy to reach false conclusions using this standard. If you test 20 false hypotheses, one of them is likely to show up as true, on average.

Correlation matrices are good at this. A 5X5 correlation matrix will generate 20 numbers you can test for significance. There is a 64% chance that at least one of those 20 correlations will test out as significant, based on pure random chance. If you test the correlations among ten items, there's a 99% chance of finding a publishable result, by pure chance. If you're doing anything like this in your research, you need to correct for it.

Odds get even worse for studies that are too small, studies that find small effects (for example, a drug that works for only 10% of patients), or studies where the protocol and endpoints are poorly defined, allowing researchers to massage their conclusions after the fact.

It's a fact – statistics get "noisier" as sample sizes shrink. The other factor – poorly defined protocol and endpoints – is a big problem, especially in new fields where we're not sure what's happening. In a new field, we're still learning what questions to ask.

However, on a brief scan of the original article, I'm inclined to examine just what the author means by "true". The author uses a figure, the PPV or "positive predictive value", which is the probability that a reported finding is not a false positive. Essentially, it's the probability that attempts to replicate the finding will succeed.

What this paper says is that research results can't be considered written in stone. This is especially true in cases where a field is new, or is hot, or where it's hard to get a large number of observations.

WND likes to group together headlines that relate to broad topics. The link to this article is in a section titled "Evolution Watch". Obviously, this paper is being cited to make the case that evolution is wrong.

The problem is, this paper applies to very specific, new results in the literature. It doesn't apply to findings that have been replicated, as those will have been subjected to better designed tests. It also doesn't apply to theories that are based on large numbers of confirmed findings – the likelihood that all, or even most, of the findings that get incorporated into a theory are wrong is pretty low. It certainly doesn't apply to broad principles that have been worked out over years, decades, or centuries. The likelihood that the laws of thermodynamics are wrong, for example, is incredibly small. The likelihood that evolution as a broad scientific principle is wrong is also vanishingly small.

(I'd be interested to see a study on the odds that a media report on a scientific finding gets it right.)

(Update: I left out a factor of two in the probability calculations. For all the correlations among five variables, there's a 40% chance of a publishable result; and 90% for the ten variable case.)

No comments: