Tuesday, October 28, 2008


Most research findings include a lot of statistical analysis of data, and many of the conclusions or assertions made in research papers are based on the statistical analysis.

This is a major advance in the science of analyzing and interpreting data.

Yet, there are a few complaints I have about the way statistical analyses are reported:

The application of statistics is meant to give the reader a very clear, objective summary of what data show, or what data mean. The spirit is neutral objectivity, without the biases of arbitrary subjective opinion or judgment, of people "eyeballing" the data and concluding there is something meaningful there, when in fact there is not.

Yet, in most statistical summaries of research data, the words "significant" and "not significant" are frequently used. The criterion for "significance", however, is arbitrarily determined. It is part of the research, or the statistical, culture, to consider that a "significant" difference means that the data shows a difference that could be due to random chance only 5% of the time or less. If the data show a difference which could be due to randomness with a probability of 6%, then the difference would be reported as "non-significant".
This is an intrusion of human-generated arbitrariness into what is supposed to be an objective, clear analysis of data.

What I feel is a much more accurate way to report on a statistical analysis in a research paper is the following:

the probability ("P value") of a difference being due to chance, rather than to a real difference, should always be given prominently in the paper, and in the abstract, rather than the words "significant" or "non-significant". The reader can then decide whether the finding is significant or not.

As far as I'm concerned, any P value less than 0.5 (50%) carries some degree of significance to it, and the reader of a paper or abstract deserves to see this value prominently given. And it seems absurd to me that results showing a P value of 0.06 would be deemed "non-significant" while results with a P value of 0.05 would be "significant".

**note: there are more rigorous and precise definitions for the statistical terms above, I use a somewhat simplified definition to make my general point more clear and accessible; I encourage the interested reader to research the exact definitions.

Another thought I've had is that, when it comes to clinical decision-making, "eyeballing" the data-- provided the data are fairly represented (for example, on a clear graph which includes the point {0,0} ) --can often lead to more intuitively accurate interpretations than some kind of numerical statistical summary. There is more information represented visually in a graph than in a single number which summarizes the graph, in the same way that there is more information in a photograph than in a number which summarizes some quality about the photograph.

The biggest advantage of sophisticated statistical summaries lies in optimizing research resources, such that we can re-direct our attention away from treatments that work less well, and focus instead on treatments that work better, particularly if there are limited resources, and if a given treatment could determine survival (or not). Also, if there is abundant data, but little way of understanding the data well, then a good statistical analysis can guide treatment decisions. It may help to choose the best chemotherapy drug for cancer, or the best regimen to manage a heart attack. For depression, though, and perhaps other mental illnesses, the statistical analyses can often add more "fuzziness" and distortion to clinical judgment, unless the reader has a sharp eye to recognize the many sources of bias.

No comments: