Behavior and Law - Blog by Michael Novakhov, M.D.

Monday, May 7, 2012

Dr. Dilip V. Jeste fell in love with psychiatry while growing up in a remote village in India. - General Psychiatry News

APA President-Elect Stresses Unity - Clinical Psychiatry News Digital Network

via international psychiatry - Google News on 5/7/12

APA President-Elect Stresses Unity
Clinical Psychiatry News Digital Network
PHILADELPHIA – Dr. Dilip V. Jeste fell in love with psychiatry while growing up in a remote village in India. "Becoming an APA member seemed like a dream," Dr. Jeste told a packed hotel ballroom May 6 during the opening session of the annual meeting of ...

and more »

APA President-Elect Stresses Unity : Clinical Psychiatry News

NaturalNews— Are your imperfect relationships a disease? Psychiatry - General Psychiatry News

Google Reader - General Psychiatry News

NaturalNews— Are your imperfect relationships a disease? Psychiatry

via international psychiatry journals - Google Blog Search by admin on 5/7/12

The ever-expanding list of so-called psychiatric conditions included in the American Psychiatric Association's (APA) Diagnostic and Statistical Manual (DSM) may soon include relational disorders, or mental illnesses supposedly attributed to two or more people involved in a relationship together. According to the official definition, relational disorders are persistent and painful patterns of ... Citizens Commission on Human Rights International | Mental Health Watchdog ...

Abbott to Pay $1.6 Billion Over Illegal Marketing - General Psychiatry News

Google Reader - General Psychiatry News

Abbott to Pay $1.6 Billion Over Illegal Marketing

via NYT > Health by By MICHAEL S. SCHMIDT and KATIE THOMAS on 5/7/12

Abbott Laboratories said it reached an agreement with the federal and nearly all state governments to pay $1.6 billion in connection with its marketing of the anti-seizure drug Depakote.

DSM-5 Panel Tiptoes on Grief, Depression - MedPage Today - General Psychiatry News

Google Reader - General Psychiatry News

DSM-5 Panel Tiptoes on Grief, Depression - MedPage Today

via psychiatric diagnosis - Google News on 5/7/12

MedPage Today

DSM-5 Panel Tiptoes on Grief, Depression
MedPage Today
By John Gever, Senior Editor, MedPage Today PHILADELPHIA -- Psychiatrists writing new diagnostic criteria for major depression hope an explanatory note will mollify critics of a proposal to let patients grieving a dead loved one receive a formal ...

Study finds psychopaths have distinct brain structure - Reuters - General Psychiatry News

Google Reader - General Psychiatry News

Study finds psychopaths have distinct brain structure - Reuters

via psychiatric diagnosis - Google News on 5/7/12

Study finds psychopaths have distinct brain structure
Reuters
The researchers, based at King's College London's Institute of Psychiatry, said the differences in psychopaths' brains mark them out even from other violent criminals with anti-social personality disorders (ASPD), and from healthy non-offenders.

and more »

Galen is able to demonstrate that living arteries contain blood. His error, which will become the established medical orthodoxy for centuries, is to assume that the blood goes back and forth from the heart in an ebb-and-flow motion. This theory holds sway in medical circles until the time of Harvey. - HISTORY OF MEDICINE

HISTORY OF MEDICINE

The influential errors of Galen: 2nd century AD

The newly appointed chief physician to the gladiators in Pergamum, in AD 158, is a native of the city. He is a Greek doctor by the name of Galen. The appointment gives him the opportunity to study wounds of all kinds. His knowledge of muscles enables him to warn his patients of the likely outcome of certain operations - a wise precaution recommended in Galen's Advice to doctors.

But it is Galen's dissection of apes and pigs which give him the detailed information for his medical tracts on the organs of the body. Nearly 100 of these tracts survive. They become the basis of Galen's great reputation in medieval medicine, unchallenged until the anatomical work of Vesalius.

Through his experiments Galen is able to overturn many long-held beliefs, such as the theory (first proposed by the Hippocratic school in about 400 BC, and maintained even by the physicians of Alexandria) that the arteries contain air - carrying it to all parts of the body from the heart and the lungs. This belief is based originally on the arteries of dead animals, which appear to be empty.

Galen is able to demonstrate that living arteries contain blood. His error, which will become the established medical orthodoxy for centuries, is to assume that the blood goes back and forth from the heart in an ebb-and-flow motion. This theory holds sway in medical circles until the time of Harvey.
Read more: http://www.historyworld.net/wrldhis/PlainTextHistories.asp?groupid=474&HistoryID=aa52&gtrack=pthc#ixzz1uDtUF4Ls

Science-Based Medicine » Lessons from History of Medical Delusions

Lessons from History of Medical Delusions

Published by Brennen McKenzie under Book & movie reviews,General,History,Science and Medicine
Comments: 4

A brief reference on the web site The Quackometer recently drew my attention to a very short book (really more of a pamphlet, in the historical sense) by Dr. Worthington Hooker, Lessons from the History of Medical Delusions, which I thought might be of interest to readers of this blog. Though published in 1850, the book contains many eloquent observations that are just as relevant to understanding how pseudoscience and quackery persist and even flourish in what we otherwise assume to be an age of scientific medicine. The book is available online as a Google eBook, and relatively cheap printed facsimiles are available as well.
Dr. Hooker was a physician, a professor at Yale, and an outspoken critic of homeopathy in it’s early days. His critique of homeopathy still resonates today, and has long drawn the ire of Hahneman loyalists, such as this one who makes reference to Dr. Hooker’s, “periodical fulminations for the destruction of Homoeopathy that have appeared like locusts or cholera at certain dates.” Though Dr. Hooker wrote an entire book discussing homeopathy, Homeopathy: An Examination of its Doctrines and Evidences, he does spare a few words here for this less-than-venerated practice:

The error I have been illustrating is carried to an extreme by the Homeopathist. He attributes palpable results to doses of medicine which are so small that they cannot produce any perceptible effect except by miracle.

He also includes a lengthy and preposterous example of a homeopathic proving, taken from a homeopathic text of the time, illustrating the absurdity of simply listing every imaginable (and imagined) experience following the taking of a substance and then attributing the entire list to that substance in order to guide the selection and use of homeopathic remedies. However, the focus of this booklet is to illustrate more generally the sorts of errors in thinking that lead even otherwise intelligent and reasonable people to believe such nonsense.
And Hooker makes a specific point of reminding us that belief in medical absurdities is not by any means a characteristic only of the unintelligent, the uneducated or the past.

The history of medical delusions most copiously illustrates the truth, that folly is very far from being confined to fools.
The present generation laugh at the follies of the past but have quite as great follies of their own, an follies too of a similar character, and products of the same fundamental errors.
The majority [of believers in quackery] is made up of those who are more or less intelligent and rational on most subjects, but who…are especially deluded on the subject of medicine…The exposition I make is not a partial one. It is not a one-sided argument-a plea for the doctors against the people. But it is an attempt to show how both doctors and people have ever been liable to error, and how they have been alike in the common elements, if not in the forms and modes and fashions of their delusions.
The medical profession, like the community at large, is made up of fallible men, and the elements of delusion are the same in the one class as in the other [though] the error of the physician would be refined, and would have the pomp and circumstance of erudition.

Error gilded with the pomp and circumstance of erudition….That certainly brings a few names to mind, eh?
Some of the specific examples he uses are fine tidbits of historical minutia. Apparently, one of the founding fathers of chemistry, Bacon”>Francis Bacon, that luminary of critical thinking and scientific philosophy, advocated for applying healing salves to the weapon that made a wound rather than the wound itself (though given the loathsome nature of many therapeutic unguents of the time, this may not have been a bad idea since apply them to wounds doesn’t sound wise).
So what are the common “elements of delusion” that Hooker wishes to warn us of? He begins with the post hoc ergo propter hoc fallacy.

The first [element] which I shall notice is the too ready disposition to consider whatever follows as a cause as being the result of that cause.

He then points out the most obvious reason why this sort of reasoning so often misleads us in medicine:

The most important of the confounding causes is “vis medicatrix naturae, or the tendency there is in the system to remove disease and cure itself….there is in the system a tendency to spontaneous restoration in case of injury and disease…This tendency is the chief agency in most cases in curing disease. Sometimes it is the only one; and very often it effects a cure in spite of the mistaken and officious interference of art.
And yet quacks, and even physicians, and the public generally, are very prone to leave this agency out of view, and to attribute cures, as a matter of course, entirely to some favorite remedy which has been used. This disposition is the chief source of medical errors of all classes of men.

Hooker also touches on several other key sources of erroneous conclusions in evaluating medical theories, including confirmation bias, availability bias, anchoring, premature closure, sloppy use of analogous reasoning, passionate commitment to theories without empirical evidence, and medical fads, though all describe in a language rather more poetic than we would ordinarily use today.
He then goes on to talk about the issue of the commercial and political success of medical nonsense, which are certainly still relevant issues often discussed here.

So extensive is the popular delusion in regard to quack medicines, that the nostrum system has become an organized system, with an enormous machinery of certificates and advertisements. It has become a monstrous business interest, and is linked in with a thousand ties with other business interests. So powerful is it in this respect, that it has almost entirely subsidized the press, forcing it to be silent except when it speaks in it’s favor. The same may be substantially said when speaking of the action of legislatures on this subject.

Similarly, Hooker touches on the unfortunate aura of legitimacy that attaches to quack therapies when they are embraced by what he calls “medical men in good standing,” which could certainly be applied to the quackademic medicine phenomenon and the endorsement of medical nonsense by the likes of Dr. Oz and others.
Despite the eloquent expression of many issues associated with medical nonsense that are as relevant today as they were in 1850, not all of Dr. Hooker’s book will resonate with a modern audience. Apart from the florid prose style of the time, and the unabashedly sexist language, he scoffs a bit “the skeptic,” who he describes as sitting in “his ‘doubting castle’ well-fortified against all the shafts of truth.” He also was a fan of bloodletting as a remedy, and sneered at the research of Pierre Charles Alexandre Louis and others who demonstrated its lack of effect. In general, he was no fan of the “numerical” methods which have since developed into epidemiology, and he was overly respectful of the experience and judgment on individual doctors. Citing the same sloppy reasoning as is often used by modern proponents of alternative therapies, he argues that such “numerical observations…can be of no practical use to the physician in deciding in regard to any individual case…”
However, as a whole this little historical gem is strikingly applicable to the issues this blog deals with today. And it ends with a nice description of the gradual and imperfect process of vetting ideas through scientific inquiry, from initial unjustified enthusiasm to a gradual withering of bad ideas and a fitting of good ones into their appropriate but limited places.

While many remedies, once potent to cure in the public estimation, have….been wholly discarded, others, which have more real merit, while they have lost the extravagant reputation of their nascent state, have, under the watchful eye of experience, gradually obtained very nearly their right valuation, and the circumstances which should regulate their use have been ascertained with considerable accuracy. Others, in great numbers, are now going through this searching process; and others still are just now wearing the brilliant honors of an enthusiastic reception.

He also suggests, mistakenly I hope, that direct attacks on medical nonsense rarely have a salutary impact on the popularity of such practices. However, he also describes with some hopefulness the goal of his book, which I think to some extent describes the purpose of this blog as well.

No delusion however fiercely it may have been attacked was ever killed. Each after having withstood all assaults, has laid itself down o die in the most quiet manner, benumbed into the sleep of death by the chill of popular neglect, while the warm breeze of popular favor which it once enjoyed are now bestowed upon some other delusion…
And such exposition as this essay presents, of the common causes of medical delusion, both in the profession and in the community, will, I believe, commend itself to the reason and common sense of such persons, and will therefore have some influence, in connection with other kindred efforts, in deterring them from giving heir patronage to quackery in any form…

Why Most Published Research Findings Are False

Journal List > PLoS Med > v.2(8); Aug 2005

PLoS Med. 2005 August; 2(8): e124.

Published online 2005 August 30. doi: 10.1371/journal.pmed.0020124

PMCID: PMC1182327

Copyright : © 2005 John P. A. Ioannidis. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Why Most Published Research Findings Are False

John P. A. Ioannidis

John P. A. Ioannidis is in the Department of Hygiene and Epidemiology, University of Ioannina School of Medicine, Ioannina, Greece, and Institute for Clinical Research and Health Policy Studies, Department of Medicine, Tufts-New England Medical Center, Tufts University School of Medicine, Boston, Massachusetts, United States of America. E-mail: jioannid@cc.uoi.gr

Competing Interests: The author has declared that no competing interests exist.

See "Minimizing Mistakes and Embracing Uncertainty" , e272.

See "Truth, Probability, and Frameworks" , e361.

See "Power, Reliability, and Heterogeneous Results" , e386.

See "The Clinical Interpretation of Research" , e395.

See "Author's Reply" , e398.

See "Why Most Published Research Findings Are False: Problems in the Analysis" in volume 4, e168.

See "Why Most Published Research Findings Are False: Author's Reply to Goodman and Greenland" in volume 4, e215.

See "Why Current Publication Practices May Distort Science" in volume 5, e201.

This article has been cited by other articles in PMC.

Abstract

Summary

There is increasing concern that most current published research findings are false. The probability that a research claim is true may depend on study power and bias, the number of other studies on the same question, and, importantly, the ratio of true to no relationships among the relationships probed in each scientific field. In this framework, a research finding is less likely to be true when the studies conducted in a field are smaller; when effect sizes are smaller; when there is a greater number and lesser preselection of tested relationships; where there is greater flexibility in designs, definitions, outcomes, and analytical modes; when there is greater financial and other interest and prejudice; and when more teams are involved in a scientific field in chase of statistical significance. Simulations show that for most study designs and settings, it is more likely for a research claim to be false than true. Moreover, for many current scientific fields, claimed research findings may often be simply accurate measures of the prevailing bias. In this essay, I discuss the implications of these problems for the conduct and interpretation of research.

Published research findings are sometimes refuted by subsequent evidence, with ensuing confusion and disappointment. Refutation and controversy is seen across the range of research designs, from clinical trials and traditional epidemiological studies [] to the most modern molecular research [,]. There is increasing concern that in modern research, false findings may be the majority or even the vast majority of published research claims []. However, this should not be surprising. It can be proven that most claimed research findings are false. Here I will examine the key factors that influence this problem and some corollaries thereof.

Modeling the Framework for False Positive Findings

Several methodologists have pointed out [] that the high rate of nonreplication (lack of confirmation) of research discoveries is a consequence of the convenient, yet ill-founded strategy of claiming conclusive research findings solely on the basis of a single study assessed by formal statistical significance, typically for a p-value less than 0.05. Research is not most appropriately represented and summarized by p-values, but, unfortunately, there is a widespread notion that medical research articles should be interpreted based only on p-values. Research findings are defined here as any relationship reaching formal statistical significance, e.g., effective interventions, informative predictors, risk factors, or associations. “Negative” research is also very useful. “Negative” is actually a misnomer, and the misinterpretation is widespread. However, here we will target relationships that investigators claim exist, rather than null findings.

It can be proven that most claimed research findings are false

As has been shown previously, the probability that a research finding is indeed true depends on the prior probability of it being true (before doing the study), the statistical power of the study, and the level of statistical significance [,]. Consider a 2 × 2 table in which research findings are compared against the gold standard of true relationships in a scientific field. In a research field both true and false hypotheses can be made about the presence of relationships. Let R be the ratio of the number of “true relationships” to “no relationships” among those tested in the field. R is characteristic of the field and can vary a lot depending on whether the field targets highly likely relationships or searches for only one or a few true relationships among thousands and millions of hypotheses that may be postulated. Let us also consider, for computational simplicity, circumscribed fields where either there is only one true relationship (among many that can be hypothesized) or the power is similar to find any of the several existing true relationships. The pre-study probability of a relationship being true is R/(R + 1). The probability of a study finding a true relationship reflects the power 1 - β (one minus the Type II error rate). The probability of claiming a relationship when none truly exists reflects the Type I error rate, α. Assuming that c relationships are being probed in the field, the expected values of the 2 × 2 table are given in Table 1. After a research finding has been claimed based on achieving formal statistical significance, the post-study probability that it is true is the positive predictive value, PPV. The PPV is also the complementary probability of what Wacholder et al. have called the false positive report probability []. According to the 2 × 2 table, one gets PPV = (1 - β)R/(R - βR + α). A research finding is thus more likely true than false if (1 - β)R > α. Since usually the vast majority of investigators depend on a = 0.05, this means that a research finding is more likely true than false if (1 - β)R > 0.05.

Table 1

Research Findings and True Relationships

What is less well appreciated is that bias and the extent of repeated independent testing by different teams of investigators around the globe may further distort this picture and may lead to even smaller probabilities of the research findings being indeed true. We will try to model these two factors in the context of similar 2 × 2 tables.

Bias

First, let us define bias as the combination of various design, data, analysis, and presentation factors that tend to produce research findings when they should not be produced. Let u be the proportion of probed analyses that would not have been “research findings,” but nevertheless end up presented and reported as such, because of bias. Bias should not be confused with chance variability that causes some findings to be false by chance even though the study design, data, analysis, and presentation are perfect. Bias can entail manipulation in the analysis or reporting of findings. Selective or distorted reporting is a typical form of such bias. We may assume that u does not depend on whether a true relationship exists or not. This is not an unreasonable assumption, since typically it is impossible to know which relationships are indeed true. In the presence of bias (Table 2), one gets PPV = ([1 - β]R + uβR)/(R + α − βR + u − uα + uβR), and PPV decreases with increasing u, unless 1 − β ≤ α, i.e., 1 − β ≤ 0.05 for most situations. Thus, with increasing bias, the chances that a research finding is true diminish considerably. This is shown for different levels of power and for different pre-study odds in Figure 1. Conversely, true research findings may occasionally be annulled because of reverse bias. For example, with large measurement errors relationships are lost in noise [12], or investigators use data inefficiently or fail to notice statistically significant relationships, or there may be conflicts of interest that tend to “bury” significant findings []. There is no good large-scale empirical evidence on how frequently such reverse bias may occur across diverse research fields. However, it is probably fair to say that reverse bias is not as common. Moreover measurement errors and inefficient use of data are probably becoming less frequent problems, since measurement error has decreased with technological advances in the molecular era and investigators are becoming increasingly sophisticated about their data. Regardless, reverse bias may be modeled in the same way as bias above. Also reverse bias should not be confused with chance variability that may lead to missing a true relationship because of chance.

Table 2

Research Findings and True Relationships in the Presence of Bias

Figure 1

PPV (Probability That a Research Finding Is True) as a Function of the Pre-Study Odds for Various Levels of Bias, u

Testing by Several Independent Teams

Several independent teams may be addressing the same sets of research questions. As research efforts are globalized, it is practically the rule that several research teams, often dozens of them, may probe the same or similar questions. Unfortunately, in some areas, the prevailing mentality until now has been to focus on isolated discoveries by single teams and interpret research experiments in isolation. An increasing number of questions have at least one study claiming a research finding, and this receives unilateral attention. The probability that at least one study, among several done on the same question, claims a statistically significant research finding is easy to estimate. For n independent studies of equal power, the 2 × 2 table is shown in Table 3: PPV = R(1 − βⁿ)/(R + 1 − [1 − α]ⁿ − Rβⁿ) (not considering bias). With increasing number of independent studies, PPV tends to decrease, unless 1 - β < a, i.e., typically 1 − β < 0.05. This is shown for different levels of power and for different pre-study odds in Figure 2. For n studies of different power, the term βⁿ is replaced by the product of the terms β_i for i = 1 to n, but inferences are similar.

Table 3

Research Findings and True Relationships in the Presence of Multiple Studies

Figure 2

PPV (Probability That a Research Finding Is True) as a Function of the Pre-Study Odds for Various Numbers of Conducted Studies, n

Corollaries

A practical example is shown in Box 1. Based on the above considerations, one may deduce several interesting corollaries about the probability that a research finding is indeed true.

Box 1. An Example: Science at Low Pre-Study Odds

Let us assume that a team of investigators performs a whole genome association study to test whether any of 100,000 gene polymorphisms are associated with susceptibility to schizophrenia. Based on what we know about the extent of heritability of the disease, it is reasonable to expect that probably around ten gene polymorphisms among those tested would be truly associated with schizophrenia, with relatively similar odds ratios around 1.3 for the ten or so polymorphisms and with a fairly similar power to identify any of them. Then R = 10/100,000 = 10⁻⁴, and the pre-study probability for any polymorphism to be associated with schizophrenia is also R/(R + 1) = 10⁻⁴. Let us also suppose that the study has 60% power to find an association with an odds ratio of 1.3 at α = 0.05. Then it can be estimated that if a statistically significant association is found with the p-value barely crossing the 0.05 threshold, the post-study probability that this is true increases about 12-fold compared with the pre-study probability, but it is still only 12 × 10⁻⁴.

Now let us suppose that the investigators manipulate their design, analyses, and reporting so as to make more relationships cross the p = 0.05 threshold even though this would not have been crossed with a perfectly adhered to design and analysis and with perfect comprehensive reporting of the results, strictly according to the original study plan. Such manipulation could be done, for example, with serendipitous inclusion or exclusion of certain patients or controls, post hoc subgroup analyses, investigation of genetic contrasts that were not originally specified, changes in the disease or control definitions, and various combinations of selective or distorted reporting of the results. Commercially available “data mining” packages actually are proud of their ability to yield statistically significant results through data dredging. In the presence of bias with u = 0.10, the post-study probability that a research finding is true is only 4.4 × 10⁻⁴. Furthermore, even in the absence of any bias, when ten independent research teams perform similar experiments around the world, if one of them finds a formally statistically significant association, the probability that the research finding is true is only 1.5 × 10⁻⁴, hardly any higher than the probability we had before any of this extensive research was undertaken!

Corollary 1: The smaller the studies conducted in a scientific field, the less likely the research findings are to be true. Small sample size means smaller power and, for all functions above, the PPV for a true research finding decreases as power decreases towards 1 − β = 0.05. Thus, other factors being equal, research findings are more likely true in scientific fields that undertake large studies, such as randomized controlled trials in cardiology (several thousand subjects randomized) [] than in scientific fields with small studies, such as most research of molecular predictors (sample sizes 100-fold smaller) [].

Corollary 2: The smaller the effect sizes in a scientific field, the less likely the research findings are to be true. Power is also related to the effect size. Thus research findings are more likely true in scientific fields with large effects, such as the impact of smoking on cancer or cardiovascular disease (relative risks 3–20), than in scientific fields where postulated effects are small, such as genetic risk factors for multigenetic diseases (relative risks 1.1–1.5) []. Modern epidemiology is increasingly obliged to target smaller effect sizes []. Consequently, the proportion of true research findings is expected to decrease. In the same line of thinking, if the true effect sizes are very small in a scientific field, this field is likely to be plagued by almost ubiquitous false positive claims. For example, if the majority of true genetic or nutritional determinants of complex diseases confer relative risks less than 1.05, genetic or nutritional epidemiology would be largely utopian endeavors.

Corollary 3: The greater the number and the lesser the selection of tested relationships in a scientific field, the less likely the research findings are to be true. As shown above, the post-study probability that a finding is true (PPV) depends a lot on the pre-study odds (R). Thus, research findings are more likely true in confirmatory designs, such as large phase III randomized controlled trials, or meta-analyses thereof, than in hypothesis-generating experiments. Fields considered highly informative and creative given the wealth of the assembled and tested information, such as microarrays and other high-throughput discovery-oriented research [,,], should have extremely low PPV.

Corollary 4: The greater the flexibility in designs, definitions, outcomes, and analytical modes in a scientific field, the less likely the research findings are to be true. Flexibility increases the potential for transforming what would be “negative” results into “positive” results, i.e., bias, u. For several research designs, e.g., randomized controlled trials [] or meta-analyses [,], there have been efforts to standardize their conduct and reporting. Adherence to common standards is likely to increase the proportion of true findings. The same applies to outcomes. True findings may be more common when outcomes are unequivocal and universally agreed (e.g., death) rather than when multifarious outcomes are devised (e.g., scales for schizophrenia outcomes) []. Similarly, fields that use commonly agreed, stereotyped analytical methods (e.g., Kaplan-Meier plots and the log-rank test) [] may yield a larger proportion of true findings than fields where analytical methods are still under experimentation (e.g., artificial intelligence methods) and only “best” results are reported. Regardless, even in the most stringent research designs, bias seems to be a major problem. For example, there is strong evidence that selective outcome reporting, with manipulation of the outcomes and analyses reported, is a common problem even for randomized trails []. Simply abolishing selective publication would not make this problem go away.

Corollary 5: The greater the financial and other interests and prejudices in a scientific field, the less likely the research findings are to be true. Conflicts of interest and prejudice may increase bias, u. Conflicts of interest are very common in biomedical research [], and typically they are inadequately and sparsely reported [,]. Prejudice may not necessarily have financial roots. Scientists in a given field may be prejudiced purely because of their belief in a scientific theory or commitment to their own findings. Many otherwise seemingly independent, university-based studies may be conducted for no other reason than to give physicians and researchers qualifications for promotion or tenure. Such nonfinancial conflicts may also lead to distorted reported results and interpretations. Prestigious investigators may suppress via the peer review process the appearance and dissemination of findings that refute their findings, thus condemning their field to perpetuate false dogma. Empirical evidence on expert opinion shows that it is extremely unreliable [].

Corollary 6: The hotter a scientific field (with more scientific teams involved), the less likely the research findings are to be true. This seemingly paradoxical corollary follows because, as stated above, the PPV of isolated findings decreases when many teams of investigators are involved in the same field. This may explain why we occasionally see major excitement followed rapidly by severe disappointments in fields that draw wide attention. With many teams working on the same field and with massive experimental data being produced, timing is of the essence in beating competition. Thus, each team may prioritize on pursuing and disseminating its most impressive “positive” results. “Negative” results may become attractive for dissemination only if some other team has found a “positive” association on the same question. In that case, it may be attractive to refute a claim made in some prestigious journal. The term Proteus phenomenon has been coined to describe this phenomenon of rapidly alternating extreme research claims and extremely opposite refutations []. Empirical evidence suggests that this sequence of extreme opposites is very common in molecular genetics [].

These corollaries consider each factor separately, but these factors often influence each other. For example, investigators working in fields where true effect sizes are perceived to be small may be more likely to perform large studies than investigators working in fields where true effect sizes are perceived to be large. Or prejudice may prevail in a hot scientific field, further undermining the predictive value of its research findings. Highly prejudiced stakeholders may even create a barrier that aborts efforts at obtaining and disseminating opposing results. Conversely, the fact that a field is hot or has strong invested interests may sometimes promote larger studies and improved standards of research, enhancing the predictive value of its research findings. Or massive discovery-oriented testing may result in such a large yield of significant relationships that investigators have enough to report and search further and thus refrain from data dredging and manipulation.

Most Research Findings Are False for Most Research Designs and for Most Fields

In the described framework, a PPV exceeding 50% is quite difficult to get. Table 4 provides the results of simulations using the formulas developed for the influence of power, ratio of true to non-true relationships, and bias, for various types of situations that may be characteristic of specific study designs and settings. A finding from a well-conducted, adequately powered randomized controlled trial starting with a 50% pre-study chance that the intervention is effective is eventually true about 85% of the time. A fairly similar performance is expected of a confirmatory meta-analysis of good-quality randomized trials: potential bias probably increases, but power and pre-test chances are higher compared to a single randomized trial. Conversely, a meta-analytic finding from inconclusive studies where pooling is used to “correct” the low power of single studies, is probably false if R ≤ 1:3. Research findings from underpowered, early-phase clinical trials would be true about one in four times, or even less frequently if bias is present. Epidemiological studies of an exploratory nature perform even worse, especially when underpowered, but even well-powered epidemiological studies may have only a one in five chance being true, if R = 1:10. Finally, in discovery-oriented research with massive testing, where tested relationships exceed true ones 1,000-fold (e.g., 30,000 genes tested, of which 30 may be the true culprits) [,], PPV for each claimed relationship is extremely low, even with considerable standardization of laboratory and statistical methods, outcomes, and reporting thereof to minimize bias.

Table 4

PPV of Research Findings for Various Combinations of Power (1 - ß), Ratio of True to Not-True Relationships (R), and Bias (u)

Claimed Research Findings May Often Be Simply Accurate Measures of the Prevailing Bias

As shown, the majority of modern biomedical research is operating in areas with very low pre- and post-study probability for true findings. Let us suppose that in a research field there are no true findings at all to be discovered. History of science teaches us that scientific endeavor has often in the past wasted effort in fields with absolutely no yield of true scientific information, at least based on our current understanding. In such a “null field,” one would ideally expect all observed effect sizes to vary by chance around the null in the absence of bias. The extent that observed findings deviate from what is expected by chance alone would be simply a pure measure of the prevailing bias.

For example, let us suppose that no nutrients or dietary patterns are actually important determinants for the risk of developing a specific tumor. Let us also suppose that the scientific literature has examined 60 nutrients and claims all of them to be related to the risk of developing this tumor with relative risks in the range of 1.2 to 1.4 for the comparison of the upper to lower intake tertiles. Then the claimed effect sizes are simply measuring nothing else but the net bias that has been involved in the generation of this scientific literature. Claimed effect sizes are in fact the most accurate estimates of the net bias. It even follows that between “null fields,” the fields that claim stronger effects (often with accompanying claims of medical or public health importance) are simply those that have sustained the worst biases.

For fields with very low PPV, the few true relationships would not distort this overall picture much. Even if a few relationships are true, the shape of the distribution of the observed effects would still yield a clear measure of the biases involved in the field. This concept totally reverses the way we view scientific results. Traditionally, investigators have viewed large and highly significant effects with excitement, as signs of important discoveries. Too large and too highly significant effects may actually be more likely to be signs of large bias in most fields of modern research. They should lead investigators to careful critical thinking about what might have gone wrong with their data, analyses, and results.

Of course, investigators working in any field are likely to resist accepting that the whole field in which they have spent their careers is a “null field.” However, other lines of evidence, or advances in technology and experimentation, may lead eventually to the dismantling of a scientific field. Obtaining measures of the net bias in one field may also be useful for obtaining insight into what might be the range of bias operating in other fields where similar analytical methods, technologies, and conflicts may be operating.

How Can We Improve the Situation?

Is it unavoidable that most research findings are false, or can we improve the situation? A major problem is that it is impossible to know with 100% certainty what the truth is in any research question. In this regard, the pure “gold” standard is unattainable. However, there are several approaches to improve the post-study probability.

Better powered evidence, e.g., large studies or low-bias meta-analyses, may help, as it comes closer to the unknown “gold” standard. However, large studies may still have biases and these should be acknowledged and avoided. Moreover, large-scale evidence is impossible to obtain for all of the millions and trillions of research questions posed in current research. Large-scale evidence should be targeted for research questions where the pre-study probability is already considerably high, so that a significant research finding will lead to a post-test probability that would be considered quite definitive. Large-scale evidence is also particularly indicated when it can test major concepts rather than narrow, specific questions. A negative finding can then refute not only a specific proposed claim, but a whole field or considerable portion thereof. Selecting the performance of large-scale studies based on narrow-minded criteria, such as the marketing promotion of a specific drug, is largely wasted research. Moreover, one should be cautious that extremely large studies may be more likely to find a formally statistical significant difference for a trivial effect that is not really meaningfully different from the null [32–34].

Second, most research questions are addressed by many teams, and it is misleading to emphasize the statistically significant findings of any single team. What matters is the totality of the evidence. Diminishing bias through enhanced research standards and curtailing of prejudices may also help. However, this may require a change in scientific mentality that might be difficult to achieve. In some research designs, efforts may also be more successful with upfront registration of studies, e.g., randomized trials []. Registration would pose a challenge for hypothesis-generating research. Some kind of registration or networking of data collections or investigators within fields may be more feasible than registration of each and every hypothesis-generating experiment. Regardless, even if we do not see a great deal of progress with registration of studies in other fields, the principles of developing and adhering to a protocol could be more widely borrowed from randomized controlled trials.

Finally, instead of chasing statistical significance, we should improve our understanding of the range of R values—the pre-study odds—where research efforts operate []. Before running an experiment, investigators should consider what they believe the chances are that they are testing a true rather than a non-true relationship. Speculated high R values may sometimes then be ascertained. As described above, whenever ethically acceptable, large studies with minimal bias should be performed on research findings that are considered relatively established, to see how often they are indeed confirmed. I suspect several established “classics” will fail the test [].

Nevertheless, most new discoveries will continue to stem from hypothesis-generating research with low or very low pre-study odds. We should then acknowledge that statistical significance testing in the report of a single study gives only a partial picture, without knowing how much testing has been done outside the report and in the relevant field at large. Despite a large statistical literature for multiple testing corrections [], usually it is impossible to decipher how much data dredging by the reporting authors or other research teams has preceded a reported research finding. Even if determining this were feasible, this would not inform us about the pre-study odds. Thus, it is unavoidable that one should make approximate assumptions on how many relationships are expected to be true among those probed across the relevant research fields and research designs. The wider field may yield some guidance for estimating this probability for the isolated research project. Experiences from biases detected in other neighboring fields would also be useful to draw upon. Even though these assumptions would be considerably subjective, they would still be very useful in interpreting research claims and putting them in context.

Abbreviation


PPV	positive predictive value

Footnotes

Citation: Ioannidis JPA (2005) Why most published research findings are false. PLoS Med 2(8): e124.

References

Ioannidis JP, Haidich AB, Lau J. Any casualties in the clash of randomised and observational evidence? BMJ. 2001;322:879–880. [PMC free article][PubMed]
Lawlor DA, Davey Smith G, Kundu D, Bruckdorfer KR, Ebrahim S. Those confounded vitamins: What can we learn from the differences between observational versus randomised trial evidence? Lancet. 2004;363:1724–1727.[PubMed]
Vandenbroucke JP. When are observational studies as credible as randomised trials? Lancet. 2004;363:1728–1731.[PubMed]
Michiels S, Koscielny S, Hill C. Prediction of cancer outcome with microarrays: A multiple random validation strategy. Lancet. 2005;365:488–492.[PubMed]
Ioannidis JPA, Ntzani EE, Trikalinos TA, Contopoulos-Ioannidis DG. Replication validity of genetic association studies. Nat Genet. 2001;29:306–309.[PubMed]
Colhoun HM, McKeigue PM, Davey Smith G. Problems of reporting genetic associations with complex outcomes. Lancet. 2003;361:865–872.[PubMed]
Ioannidis JP. Genetic associations: False or true? Trends Mol Med. 2003;9:135–138.[PubMed]
Ioannidis JPA. Microarrays and molecular research: Noise discovery? Lancet. 2005;365:454–455.[PubMed]
Sterne JA, Davey Smith G. Sifting the evidence—What's wrong with significance tests. BMJ. 2001;322:226–231. [PMC free article][PubMed]
Wacholder S, Chanock S, Garcia-Closas M, Elghormli L, Rothman N. Assessing the probability that a positive report is false: An approach for molecular epidemiology studies. J Natl Cancer Inst. 2004;96:434–442.[PubMed]
Risch NJ. Searching for genetic determinants in the new millennium. Nature. 2000;405:847–856.[PubMed]
Kelsey JL, Whittemore AS, Evans AS, Thompson WD. Methods in observational epidemiology, 2nd ed. New York: Oxford U Press; 1996. 432 pp.
Topol EJ. Failing the public health—Rofecoxib, Merck, and the FDA. N Engl J Med. 2004;351:1707–1709.[PubMed]
Yusuf S, Collins R, Peto R. Why do we need some large, simple randomized trials? Stat Med. 1984;3:409–422.[PubMed]
Altman DG, Royston P. What do we mean by validating a prognostic model? Stat Med. 2000;19:453–473.[PubMed]
Taubes G. Epidemiology faces its limits. Science. 1995;269:164–169.[PubMed]
Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, et al. Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science. 1999;286:531–537.[PubMed]
Moher D, Schulz KF, Altman DG. The CONSORT statement: Revised recommendations for improving the quality of reports of parallel-group randomised trials. Lancet. 2001;357:1191–1194.[PubMed]
Ioannidis JP, Evans SJ, Gotzsche PC, O'Neill RT, Altman DG, et al. Better reporting of harms in randomized trials: An extension of the CONSORT statement. Ann Intern Med. 2004;141:781–788.[PubMed]
International Conference on Harmonisation E9 Expert Working Group. ICH Harmonised Tripartite Guideline. Statistical principles for clinical trials. Stat Med. 1999;18:1905–1942.[PubMed]
Moher D, Cook DJ, Eastwood S, Olkin I, Rennie D, et al. Improving the quality of reports of meta-analyses of randomised controlled trials: The QUOROM statement. Quality of Reporting of Meta-analyses. Lancet. 1999;354:1896–1900.[PubMed]
Stroup DF, Berlin JA, Morton SC, Olkin I, Williamson GD, et al. Meta-analysis of observational studies in epidemiology: A proposal for reporting. Meta-analysis of Observational Studies in Epidemiology (MOOSE) group. JAMA. 2000;283:2008–2012.[PubMed]
Marshall M, Lockwood A, Bradley C, Adams C, Joy C, et al. Unpublished rating scales: A major source of bias in randomised controlled trials of treatments for schizophrenia. Br J Psychiatry. 2000;176:249–252.[PubMed]
Altman DG, Goodman SN. Transfer of technology from statistical journals to the biomedical literature. Past trends and future predictions. JAMA. 1994;272:129–132.[PubMed]
Chan AW, Hrobjartsson A, Haahr MT, Gotzsche PC, Altman DG. Empirical evidence for selective reporting of outcomes in randomized trials: Comparison of protocols to published articles. JAMA. 2004;291:2457–2465.[PubMed]
Krimsky S, Rothenberg LS, Stott P, Kyle G. Scientific journals and their authors' financial interests: A pilot study. Psychother Psychosom. 1998;67:194–201.[PubMed]
Papanikolaou GN, Baltogianni MS, Contopoulos-Ioannidis DG, Haidich AB, Giannakakis IA, et al. Reporting of conflicts of interest in guidelines of preventive and therapeutic interventions. BMC Med Res Methodol. 2001;1:3. [PMC free article][PubMed]
Antman EM, Lau J, Kupelnick B, Mosteller F, Chalmers TC. A comparison of results of meta-analyses of randomized control trials and recommendations of clinical experts. Treatments for myocardial infarction. JAMA. 1992;268:240–248.[PubMed]
Ioannidis JP, Trikalinos TA. Early extreme contradictory estimates may appear in published research: The Proteus phenomenon in molecular genetics research and randomized trials. J Clin Epidemiol. 2005;58:543–549.[PubMed]
Ntzani EE, Ioannidis JP. Predictive ability of DNA microarrays for cancer outcomes and correlates: An empirical assessment. Lancet. 2003;362:1439–1444.[PubMed]
Ransohoff DF. Rules of evidence for cancer molecular-marker discovery and validation. Nat Rev Cancer. 2004;4:309–314.[PubMed]
Lindley DV. A statistical paradox. Biometrika. 1957;44:187–192.
Bartlett MS. A comment on D.V. Lindley's statistical paradox. Biometrika. 1957;44:533–534.
Senn SJ. Two cheers for P-values. J Epidemiol Biostat. 2001;6:193–204.[PubMed]
De Angelis C, Drazen JM, Frizelle FA, Haug C, Hoey J, et al. Clinical trial registration: A statement from the International Committee of Medical Journal Editors. N Engl J Med. 2004;351:1250–1251.[PubMed]
Ioannidis JPA. Contradicted and initially stronger effects in highly cited clinical research. JAMA. 2005;294:218–228.[PubMed]
Hsueh HM, Chen JJ, Kodell RL. Comparison of methods for estimating the number of true null hypotheses in multiplicity testing. J Biopharm Stat. 2003;13:675–689.[PubMed]