Of Mice and Men Again: New Genomic Study Helps Explain why Mouse Models of Acute Inflammation do not Work in Men

25 02 2013


This post is update after a discussion at Twitter with @animalevidence who pointed me at a great blog post at Speaking of Research ([19], a repost of [20], highlighting the shortcomings of the current study using just one single inbred strain of mice (C57Bl6)  [2013-02-26]. Main changes are in blue

A recent paper published in PNAS [1] caused quite a stir both inside and outside the scientific community. The study challenges the validity of using mouse models to test what works as a treatment in humans. At least this is what many online news sources seem to conclude: “drug testing may be a waste of time”[2], “we are not mice” [3, 4], or a bit more to the point: mouse models of inflammation are worthless [5, 6, 7].

But basically the current study looks only at one specific area, the area of inflammatory responses that occur in critically ill patients after severe trauma and burns (SIRS, Systemic Inflammatory Response Syndrome). In these patients a storm of events may eventually lead to organ failure and death. It is similar to what may occur after sepsis (but here the cause is a systemic infection).

Furthermore the study only uses one single approach: it compares the gene response patterns in serious human injuries (burns, trauma) and a human model partially mimicking these inflammatory diseases (human healthy volunteers receiving  a low dose endotoxin) with the corresponding three animal models (burns, trauma, endotoxin).

And, as highlighted by Bill Barrington of “Understand Nutrition” [8], the researchers have only tested the gene profiles in one single strain of mice: C57Bl6 (B6 for short). If B6 was the only model used in practice this would be less of a problem. But according to Mark Wanner of the Jackson Laboratory [19, 20]:

 It is now well known that some inbred mouse strains, such as the C57BL/6J (B6 for short) strain used, are resistant to septic shock. Other strains, such as BALB and A/J, are much more susceptible, however. So use of a single strain will not provide representative results.

The results in itself are very clear. The figures show at a glance that there is no correlation whatsoever between the human and B6 mouse expression data.

Seok and 36 other researchers from across the USA  looked at approximately 5500 human genes and their mouse analogs. In humans, burns and traumatic injuries (and to a certain extent the human endotoxin model) triggered the activation of a vast number of genes, that were not triggered in the present C57Bl6 mouse models. In addition the genomic response is longer lasting in human injuries. Furthermore, the top 5 most activated and most suppressed pathways in human burns and trauma had no correlates in mice. Finally, analysis of existing data in the Gene Expression (GEO) Database showed that the lack of correlation between mouse and human studies was also true for other acute inflammatory responses, like sepsis and acute infection.

This is a high quality study with interesting results. However, the results are not as groundbreaking as some media suggest.

As discussed by the authors [1], mice are known to be far more resilient to inflammatory challenge than humans*: a million fold higher dose of endotoxin than the dose causing shock in humans is lethal to mice.* This, and the fact that “none of the 150  candidate agents that progressed to human trials has proved successful in critically ill patients” already indicates that the current approach fails.

[This is not entirely correct the endotoxin/LPS dose in mice is 1000–10,000 times the dose required to induce severe disease with shock in humans [20] and mice that are resilient to endotoxin may still be susceptible to infection. It may well be that the endotoxin response is not a good model for the late effects of  sepsis]

The disappointing trial results have forced many researchers to question not only the usefulness of the current mouse models for acute inflammation [9,10; refs from 11], but also to rethink the key aspects of the human response itself and the way these clinical trials are performed [12, 13, 14]. For instance, emphasis has always been on the exuberant inflammatory reaction, but the subsequent immunosuppression may also be a major contributor to the disease. There is also substantial heterogeneity among patients [13-14] that may explain why some patients have a good prognosis and others haven’t. And some of the initially positive results in human trials have not been reproduced in later studies either (benefit of intense glucose control and corticosteroid treatment) [12]. Thus is it fair to blame only the mouse studies?

dick mouse

dick mouse (Photo credit: Wikipedia)

The coverage by some media is grist to the mill of people who think animal studies are worthless anyway. But one cannot extrapolate these findings to other diseases. Furthermore, as referred to above, the researchers have only tested the gene profiles in one single strain of mice: C57Bl6, meaning that “The findings of Seok et al. are solely applicable to the B6 strain of mice in the three models of inflammation they tested. They unduly generalize these findings to mouse models of inflammation in general. [8]“

It is true that animal studies, including rodent studies, have their limitations. But what are the alternatives? In vitro studies are often even more artificial, and direct clinical testing of new compounds in humans is not ethical.

Obviously, the final proof of effectiveness and safety of new treatments can only be established in human trials. No one will question that.

A lot can be said about why animal studies often fail to directly translate to the clinic [15]. Clinical disparities between the animal models and the clinical trials testing the treatment (like in sepsis) are one reason. Other important reasons may be methodological flaws in animal studies (i.e. no randomization, wrong statistics) and publication bias: non-publication of “negative” results appears to be prevalent in laboratory animal research.[15-16]. Despite their shortcomings, animal studies and in vitro studies offer a way to examine certain aspects of a process, disease or treatment.

In summary, this study confirms that the existing (C57Bl6) mouse model doesn’t resemble the human situation in the systemic response following acute traumatic injury or sepsis: the genomic response is entirely different, in magnitude, duration and types of changes in expression.

The findings are not new: the shortcomings of the mouse model(s) were long known. It remains enigmatic why the researchers chose only one inbred strain of mice, and of all mice only the B6-strain, which is less sensitive to endotoxin, and only develop acute kidney injury (part of organ failure) at old age (young mice were used) [21]. In this paper from 2009 (!) various reasons are given why the animal models didn’t properly mimic the human disease and how this can be improved. The authors stress that:

the genetically heterogeneous human population should be more accurately represented by outbred mice, reducing the bias found in inbred strains that might contain or lack recessive disease susceptibility loci, depending on selective pressures.” 

Both Bill Barrington [8] and Mark Wanner [18,19] propose the use of “diversity outbred cross or collaborative cross mice that  provide additional diversity.” Indeed, “replicating genetic heterogeneity and critical clinical risk factors such as advanced age and comorbid conditions (..) led to improved models of sepsis and sepsis-induced AKI (acute kidney injury). 

The authors of the PNAS paper suggest that genomic analysis can aid further in revealing which genes play a role in the perturbed immune response in acute inflammation, but it remains to be seen whether this will ultimately lead to effective treatments of sepsis and other forms of acute inflammation.

It also remains to be seen whether comprehensive genomic characterization will be useful in other disease models. The authors suggest for instance,  that genetic profiling may serve as a guide to develop animal models. A shotgun analyses of gene expression of thousands of genes was useful in the present situation, because “the severe inflammatory stress produced a genomic storm affecting all major cellular functions and pathways in humans which led to sufficient perturbations to allow comparisons between the genes in the human conditions and their analogs in the murine models”. But rough analysis of overall expression profiles may give little insight in the usefulness of other animal models, where genetic responses are more subtle.

And predicting what will happen is far less easy that to confirm what is already known….

NOTE: as said the coverage in news and blogs is again quite biased. The conclusion of a generally good Dutch science  news site (the headline and lead suggested that animal models of immune diseases are crap [6]) was adapted after a critical discussion at Twitter (see here and here), and a link was added to this blog post). I wished this occurred more often….
In my opinion the most balanced summaries can be found at the science-based blogs: ScienceBased Medicine [11] and NIH’s Director’s Blog [17], whereas “Understand Nutrition” [8] has an original point of view, which is further elaborated by Mark Wanner at Speaking of Research [19] and Genetics and your health Blog [20]


  1. Seok, J., Warren, H., Cuenca, A., Mindrinos, M., Baker, H., Xu, W., Richards, D., McDonald-Smith, G., Gao, H., Hennessy, L., Finnerty, C., Lopez, C., Honari, S., Moore, E., Minei, J., Cuschieri, J., Bankey, P., Johnson, J., Sperry, J., Nathens, A., Billiar, T., West, M., Jeschke, M., Klein, M., Gamelli, R., Gibran, N., Brownstein, B., Miller-Graziano, C., Calvano, S., Mason, P., Cobb, J., Rahme, L., Lowry, S., Maier, R., Moldawer, L., Herndon, D., Davis, R., Xiao, W., Tompkins, R., , ., Abouhamze, A., Balis, U., Camp, D., De, A., Harbrecht, B., Hayden, D., Kaushal, A., O’Keefe, G., Kotz, K., Qian, W., Schoenfeld, D., Shapiro, M., Silver, G., Smith, R., Storey, J., Tibshirani, R., Toner, M., Wilhelmy, J., Wispelwey, B., & Wong, W. (2013). Genomic responses in mouse models poorly mimic human inflammatory diseases Proceedings of the National Academy of Sciences DOI: 10.1073/pnas.1222878110
  2. Drug Testing In Mice May Be a Waste of Time, Researchers Warn 2013-02-12 (science.slashdot.org)
  3. Susan M Love We are not mice 2013-02-14 (Huffingtonpost.com)
  4. Elbert Chu  This Is Why It’s A Mistake To Cure Mice Instead Of Humans 2012-12-20(richarddawkins.net)
  5. Derek Low. Mouse Models of Inflammation Are Basically Worthless. Now We Know. 2013-02-12 (pipeline.corante.com)
  6. Elmar Veerman. Waardeloos onderzoek. Proeven met muizen zeggen vrijwel niets over ontstekingen bij mensen. 2013-02-12 (wetenschap24.nl)
  7. Gina Kolata. Mice Fall Short as Test Subjects for Humans’ Deadly Ills. 2013-02-12 (nytimes.com)

  8. Bill Barrington. Are Mice Reliable Models for Human Disease Studies? 2013-02-14 (understandnutrition.com)
  9. Raven, K. (2012). Rodent models of sepsis found shockingly lacking Nature Medicine, 18 (7), 998-998 DOI: 10.1038/nm0712-998a
  10. Nemzek JA, Hugunin KM, & Opp MR (2008). Modeling sepsis in the laboratory: merging sound science with animal well-being. Comparative medicine, 58 (2), 120-8 PMID: 18524169
  11. Steven Novella. Mouse Model of Sepsis Challenged 2013-02-13 (http://www.sciencebasedmedicine.org/index.php/mouse-model-of-sepsis-challenged/)
  12. Wiersinga WJ (2011). Current insights in sepsis: from pathogenesis to new treatment targets. Current opinion in critical care, 17 (5), 480-6 PMID: 21900767
  13. Khamsi R (2012). Execution of sepsis trials needs an overhaul, experts say. Nature medicine, 18 (7), 998-9 PMID: 22772540
  14. Hotchkiss RS, Coopersmith CM, McDunn JE, & Ferguson TA (2009). The sepsis seesaw: tilting toward immunosuppression. Nature medicine, 15 (5), 496-7 PMID: 19424209
  15. van der Worp, H., Howells, D., Sena, E., Porritt, M., Rewell, S., O’Collins, V., & Macleod, M. (2010). Can Animal Models of Disease Reliably Inform Human Studies? PLoS Medicine, 7 (3) DOI: 10.1371/journal.pmed.1000245
  16. ter Riet, G., Korevaar, D., Leenaars, M., Sterk, P., Van Noorden, C., Bouter, L., Lutter, R., Elferink, R., & Hooft, L. (2012). Publication Bias in Laboratory Animal Research: A Survey on Magnitude, Drivers, Consequences and Potential Solutions PLoS ONE, 7 (9) DOI: 10.1371/journal.pone.0043404
  17. Dr. Francis Collins. Of Mice, Men and Medicine 2013-02-19 (directorsblog.nih.gov)
  18. Tom/ Mark Wanner Why mice may succeed in research when a single mouse falls short (2013-02-15) (speakingofresearch.com) [repost, with introduction]
  19. Mark Wanner Why mice may succeed in research when a single mouse falls short (2013-02-13/) (http://community.jax.org) %5Boriginal post]
  20. Warren, H. (2009). Editorial: Mouse models to study sepsis syndrome in humans Journal of Leukocyte Biology, 86 (2), 199-201 DOI: 10.1189/jlb.0309210
  21. Doi, K., Leelahavanichkul, A., Yuen, P., & Star, R. (2009). Animal models of sepsis and sepsis-induced kidney injury Journal of Clinical Investigation, 119 (10), 2868-2878 DOI: 10.1172/JCI39421

BAD Science or BAD Science Journalism? – A Response to Daniel Lakens

10 02 2013

ResearchBlogging.orgTwo weeks ago  there was a hot debate among Dutch Tweeps on “bad science, bad science journalism and bad science communication“. This debate was started and fueled by different Dutch blog posts on this topic.[1,4-6]

A controversial post, with both fierce proponents and fierce opposition was the post by Daniel Lakens [1], an assistant professor in Applied Cognitive Psychology.

I was among the opponents. Not because I don’t like a new fresh point of view, but because of a wrong reasoning and because Daniel continuously compares apples and oranges.

Since Twitter debates can’t go in-depth and lack structure and since I cannot comment to his Google sites blog, I pursue my discussion here.

The title of Daniels post is (freely translated, like the rest of his post):

Is this what one calls good science?” 

In his post he criticizes a Dutch science journalist, Hans van Maanen, and specifically his recent column [2], where Hans discusses a paper published in Pediatrics [3].

This longitudinal study tested the Music Marker theory among 309 Dutch kids. The researchers gathered information about the kids’ favorite types of music and tracked incidents of “minor delinquency”, such as shoplifting or vandalism, from the time they were 12 until they reached age 16 [4]. The researchers conclude that liking music that goes against the mainstream (rock, heavy metal, gothic, punk, African American music, and electronic dance music) at age 12 is a strong predictor of future minor delinquency at 16, in contrast to chart pop, classic music, jazz.

The University press office send out a press release [5 ], which was picked up by news media [4,6] and one of the Dutch authors of this study,  Loes Keijsers,  tweeted enthusiastically: “Want to know whether a 16 year old adult will suffer from delinquency, than look at his music taste at age 12!”

According to Hans, Loes could have easily broadcasted (more) balanced tweets, likeMusic preference doesn’t predict shoplifting” or “12 year olds who like Bach keep quiet about shoplifting when 16.” But even then, Hans argues, the tweets wouldn’t have been scientifically underpinned either.

In column style Hans explains why he thinks that the study isn’t methodologically strong: no absolute numbers are given; 7 out of 11 (!) music styles are positively associated with delinquency, but these correlations are not impressive: the strongest predictor (Gothic music preference) can explain no more than 9%  of the variance in delinquent behaviour, which can include anything from shoplifting, vandalism, fighting, graffiti spraying, switching price tags.  Furthermore the risks of later “delinquent” behavior are small:  on a scale 1 (never) to 4 (4 times or more) the mean risk was 1,12. Hans also wonders whether it is a good idea to monitor kids with a certain music taste.

Thus Hans concludesthis study isn’t good science”. Daniel, however, concludes that Hans’ writing is not good science journalism.

First Daniel recalls he and other PhD’s took a course on how to peer review scientific papers. On basis of their peer review of a (published) article 90% of the students decided to reject it. The two main lessons learned by Daniel were:

  • It is easy to critize a scientific paper and grind it down. No single contribution to science (no single article) is perfect.
  • New scientific insights, although imperfect, are worth sharing, because they help to evolve science. *¹

According to Daniel science jounalists often make the same mistakes as the peer reviewing PhD-students: critisizing the individuel studies without a “meta-view” on science.

Peer review and journalism however are different things (apples and oranges if you like).

Peer review (with all its imperfections) serves to filter, check and to improve the quality of individual scientific papers (usually) before they are published  [10]. My papers that passed peer review, were generally accepted. Of course there were the negative reviewers, often  the ignorant ones, and the naggers, but many reviewers had critique that helped to improve my paper, sometimes substantially. As a peer reviewer myself I only try to separate the wheat from the chaff and to enhance the quality of the papers that pass.

Science journalism also has a filter function: it filters already peer reviewed scientific papers* for its readership, “the public” by selecting novel relevant science and translating the scientific, jargon-laded language, into language readers can understand and appreciate. Of course science journalists should put the publication into perspective (call it “meta”).

Surely the PhD-students finger exercise resembles the normal peer review process as much as peer review resembles science journalism.

I understand that pure nitpicking seldom serves a goal, but this rarely occurs in science journalism. The opposite, however, is commonplace.

Daniel disapproves Hans van Maanen’s criticism, because Hans isn’t “meta” enough. Daniel: “Arguing whether an effect size is small or mediocre is nonsense, because no individual study gives a good estimate of the effect size. You need to do more research and combine the results in a meta-analysis”.

Apples and oranges again.

Being “meta” has little to do with meta-analysis. Being meta is … uh … pretty meta. You could think of it as seeing beyond (meta) the findings of one single study*.

A meta-analysis, however, is a statistical technique for combining the findings from independent, but comparable (homogeneous) studies in order to more powerfully estimate the true effect size (pretty exact). This is an important, but difficult methodological task for a scientist, not a journalist. If a meta-analysis on the topic exist, journalists should take this into account, of course (and so should the researchers). If not, they should put the single study in broader perspective (what does the study add to existing knowledge?) and show why this single study is or is not well done?

Daniel takes this further by stating that “One study is no study” and that journalists who simply echo the press releases of a study ànd journalists who just amply criticizes only single publication (like Hans) are clueless about science.

Apples and oranges! How can one lump science communicators (“media releases”), echoing journalists (“the media”) and critical journalists together?

I see more value in a critical analysis than a blind rejoicing of hot air. As long as the criticism guides the reader to appreciate the study.

And if there is just one single novel study, that seems important enough to get media attention, shouldn’t we judge the research on its own merits?

Then Daniel asks himself: “If I do criticize those journalists, shouldn’t I criticize those scientists who published just a single study and wrote a press release about it? “

His conclusion? “No”.

Daniel explains: science never provides absolute certainty, at the most the evidence is strong enough to state what is likely true. This can only be achieved by a lot of research by different investigators. 

Therefore you should believe in your ideas and encourage other scientists to pursue your findings. It doesn’t help when you say that music preference doesn’t predict shoplifting. It does help when you use the media to draw attention to your research. Many researchers are now aware of the “Music Marker Theory”. Thus the press release had its desired effect. By expressing a firm belief in their conclusions, they encourage other scientists to spend their sparse time on this topic. These scientists will try to repeat and falsify the study, an essential step in Cumulative Science. At a time when science is under pressure, scientists shouldn’t stop writing enthusiastic press releases or tweets. 

The latter paragraph is sheer nonsense!

Critical analysis of one study by a journalist isn’t what undermines the  public confidence in science. Rather it’s the media circus, that blows the implications of scientific findings out of proportion.

As exemplified by the hilarious PhD Comic below research results are propagated by PR (science communication), picked up by media, broadcasted, spread via the internet. At the end of the cycle conclusions are reached, that are not backed up by (sufficient) evidence.

PhD Comics – The news Cycle

Daniel is right about some things. First one study is indeed no study, in the sense that concepts are continuously tested and corrected: falsification is a central property of science (Popper). He is also right that science doesn’t offer absolute certainty (an aspect that is often not understood by the public). And yes, researchers should believe in their findings and encourage other scientists to check and repeat their experiments.

Though not primarily via the media. But via the normal scientific route. Good scientists will keep track of new findings in their field anyway. Suppose that only findings that are trumpeted in the media would be pursued by other scientists?

7-2-2013 23-26-31 media & science

And authors shouldn’t make overstatements. They shouldn’t raise expectations to a level which cannot be met. The Dutch study only shows weak associations. It simply isn’t true that the Dutch study allows us to “predict” at an individual level if a 12 year old will “act out” at 16.

This doesn’t help lay-people to understand the findings and to appreciate science.

The idea that media should just serve to spotlight a paper, seems objectionable to me.

Going back to the meta-level: what about the role of science communicators, media, science journalists and researchers?

According to Maarten Keulemans, journalist, we should just get rid of all science communicators as a layer between scientists and journalists [7]. But Michel van Baal [9] and Roy Meijer[8] have a point when they say that  journalists do a lot PR-ing too and they should do better than to rehash news releases.*²

Now what about Daniel criticism of van Maanen? In my opinion, van Maanen is one of those rare critical journalists who serve as an antidote against uncritical media diarrhea (see Fig above). Comparable to another lone voice in the media: Ben Goldacre. It didn’t surprise me that Daniel didn’t approve of him (and his book Bad Science) either [11]. 

Does this mean that I find Hans van Maanen a terrific science journalist? No, not really. I often agree with him (i.e. see this post [12]). He is one of those rare journalists who has real expertise in research methodology . However, his columns don’t seem to be written for a large audience: they seem too complex for most lay people. One thing I learned during a scientific journalism course, is that one should explain all jargon to one’s audience.

Personally I find this critical Dutch blog post[13] about the Music Marker Theory far more balanced. After a clear description of the study, Linda Duits concludes that the results of the study are pretty obvious, but that the the mini-hype surrounding this research is caused by the positive tone of the press release. She stresses that prediction is not predetermination and that the musical genres are not important: hiphop doesn’t lead to criminal activity and metal not to vandalism.

And this critical piece in Jezebel [14],  reaches far more people by talking in plain, colourful language, hilarious at times.

It also a swell title: “Delinquents Have the Best Taste in Music”. Now that is an apt conclusion!


*¹ Since Daniel doesn’t refer to  open (trial) data access nor the fact that peer review may , I ignore these aspects for the sake of the discussion.

*² Coincidence? Keulemans has covered  the music marker study quite uncritically (positive).

Photo Credits



  1. Daniel Lakens: Is dit nou goede Wetenschap? – Jan 24, 2013 (sites.google.com/site/lakens2/blog)
  2. Hans van Maanen: De smaak van boefjes in de dop,De Volkskrant, Jan 12, 2013 (vanmaanen.org/hans/columns/)
  3. ter Bogt, T., Keijsers, L., & Meeus, W. (2013). Early Adolescent Music Preferences and Minor Delinquency PEDIATRICS DOI: 10.1542/peds.2012-0708
  4. Lindsay Abrams: Kids Who Like ‘Unconventional Music’ More Likely to Become Delinquent, the Atlantic, Jan 18, 2013
  5. Muziekvoorkeur belangrijke voorspeller voor kleine criminaliteit. Jan 8, 2013 (pers.uu.nl)
  6. Maarten Keulemans: Muziek is goede graadmeter voor puberaal wangedrag – De Volkskrant, 12 januari 2013  (volkskrant.nl)
  7. Maarten Keulemans: Als we nou eens alle wetenschapscommunicatie afschaffen? – Jan 23, 2013 (denieuwereporter.nl)
  8. Roy Meijer: Wetenschapscommunicatie afschaffen, en dan? – Jan 24, 2013 (denieuwereporter.nl)
  9. Michel van Baal. Wetenschapsjournalisten doen ook aan PR – Jan 25, 2013 ((denieuwereporter.nl)
  10. What peer review means for science (guardian.co.uk)
  11. Daniel Lakens. Waarom raadde Maarten Keulemans me Bad Science van Goldacre aan? Oct 25, 2012
  12. Why Publishing in the NEJM is not the Best Guarantee that Something is True: a Response to Katan – Sept 27, 2012 (laikaspoetnik.wordpress.com)
  13. Linda Duits: Debunk: worden pubers crimineel van muziek? (dieponderzoek.nl)
  14. Lindy west: Science: “Delinquents Have the Best Taste in Music” (jezebel.com)

Why Publishing in the NEJM is not the Best Guarantee that Something is True: a Response to Katan

27 10 2012

ResearchBlogging.orgIn a previous post [1] I reviewed a recent  Dutch study published in the New England Journal of Medicine (NEJM [2] about the effects of sugary drinks on the body mass index of school children.

The study got widely covered by the media. The NRC, for which the main author Martijn Katan works as a science columnist,  columnist, spent  two full (!) pages on the topic -with no single critical comment-[3].
As if this wasn’t enough, the latest column of Katan again dealt with his article (text freely available at mkatan.nl)[4].

I found Katan’s column “Col hors Catégorie” [4] quite arrogant, especially because he tried to belittle a (as he called it) “know-it-all” journalist who criticized his work  in a rivaling newspaper. This wasn’t fair, because the journalist had raised important points [5, 1] about the work.

The piece focussed on the long road of getting papers published in a top journal like the NEJM.
Katan considers the NEJM as the “Tour de France” among  medical journals: it is a top achievement to publish in this paper.

Katan also states that “publishing in the NEJM is the best guarantee something is true”.

I think the latter statement is wrong for a number of reasons.*

  1. First, most published findings are false [6]. Thus journals can never “guarantee”  that published research is true.
    Factors that  make it less likely that research findings are true include a small effect size,  a greater number and lesser preselection of tested relationships, selective outcome reporting, the “hotness” of the field (all applying more or less to Katan’s study, he also changed the primary outcomes during the trial[7]), a small study, a great financial interest and a low pre-study probability (not applicable) .
  2. It is true that NEJM has a very high impact factor. This is  a measure for how often a paper in that journal is cited by others. Of course researchers want to get their paper published in a high impact journal. But journals with high impact factors often go for trendy topics and positive results. In other words it is far more difficult to publish a good quality study with negative results, and certainly in an English high impact journal. This is called publication bias (and language bias) [8]. Positive studies will also be more frequently cited (citation bias) and will more likely be published more than once (multiple publication bias) (indeed, Katan et al already published about the trial [9], and have not presented all their data yet [1,7]). All forms of bias are a distortion of the “truth”.
    (This is the reason why the search for a (Cochrane) systematic review must be very sensitive [8] and not restricted to core clinical journals, but even include non-published studies: for these studies might be “true”, but have failed to get published).
  3. Indeed, the group of Ioannidis  just published a large-scale statistical analysis[10] showing that medical studies revealing “very large effects” seldom stand up when other researchers try to replicate them. Often studies with large effects measure laboratory and/or surrogate markers (like BMI) instead of really clinically relevant outcomes (diabetes, cardiovascular complications, death)
  4. More specifically, the NEJM does regularly publish studies about pseudoscience or bogus treatments. See for instance this blog post [11] of ScienceBased Medicine on Acupuncture Pseudoscience in the New England Journal of Medicine (which by the way is just a review). A publication in the NEJM doesn’t guarantee it isn’t rubbish.
  5. Importantly, the NEJM has the highest proportion of trials (RCTs) with sole industry support (35% compared to 7% in the BMJ) [12] . On several occasions I have discussed these conflicts of interests and their impact on the outcome of studies ([13, 14; see also [15,16] In their study, Gøtzsche and his colleagues from the Nordic Cochrane Centre [12] also showed that industry-supported trials were more frequently cited than trials with other types of support, and that omitting them from the impact factor calculation decreased journal impact factors. The impact factor decrease was even 15% for NEJM (versus 1% for BMJ in 2007)! For the journals who provided data, income from the sales of reprints contributed to 3% and 41% of the total income for BMJ and The Lancet.
    A recent study, co-authored by Ben Goldacre (MD & science writer) [17] confirms that  funding by the pharmaceutical industry is associated with high numbers of reprint ordersAgain only the BMJ and the Lancet provided all necessary data.
  6. Finally and most relevant to the topic is a study [18], also discussed at Retractionwatch[19], showing that  articles in journals with higher impact factors are more likely to be retracted and surprise surprise, the NEJM clearly stands on top. Although other reasons like higher readership and scrutiny may also play a role [20], it conflicts with Katan’s idea that  “publishing in the NEJM is the best guarantee something is true”.

I wasn’t aware of the latter study and would like to thank drVes and Ivan Oranski for responding to my crowdsourcing at Twitter.


  1. Sugary Drinks as the Culprit in Childhood Obesity? a RCT among Primary School Children (laikaspoetnik.wordpress.com)
  2. de Ruyter JC, Olthof MR, Seidell JC, & Katan MB (2012). A trial of sugar-free or sugar-sweetened beverages and body weight in children. The New England journal of medicine, 367 (15), 1397-406 PMID: 22998340
  3. NRC Wim Köhler Eén kilo lichter.NRC | Zaterdag 22-09-2012 (http://archief.nrc.nl/)
  4. Martijn Katan. Col hors Catégorie [Dutch], (published in de NRC,  (20 oktober)(www.mkatan.nl)
  5. Hans van Maanen. Suiker uit fris, De Volkskrant, 29 september 2012 (freely accessible at http://www.vanmaanen.org/)
  6. Ioannidis, J. (2005). Why Most Published Research Findings Are False PLoS Medicine, 2 (8) DOI: 10.1371/journal.pmed.0020124
  7. Changes to the protocol http://clinicaltrials.gov/archive/NCT00893529/2011_02_24/changes
  8. Publication Bias. The Cochrane Collaboration open learning material (www.cochrane-net.org)
  9. de Ruyter JC, Olthof MR, Kuijper LD, & Katan MB (2012). Effect of sugar-sweetened beverages on body weight in children: design and baseline characteristics of the Double-blind, Randomized INtervention study in Kids. Contemporary clinical trials, 33 (1), 247-57 PMID: 22056980
  10. Pereira, T., Horwitz, R.I., & Ioannidis, J.P.A. (2012). Empirical Evaluation of Very Large Treatment Effects of Medical InterventionsEvaluation of Very Large Treatment Effects JAMA: The Journal of the American Medical Association, 308 (16) DOI: 10.1001/jama.2012.13444
  11. Acupuncture Pseudoscience in the New England Journal of Medicine (sciencebasedmedicine.org)
  12. Lundh, A., Barbateskovic, M., Hróbjartsson, A., & Gøtzsche, P. (2010). Conflicts of Interest at Medical Journals: The Influence of Industry-Supported Randomised Trials on Journal Impact Factors and Revenue – Cohort Study PLoS Medicine, 7 (10) DOI: 10.1371/journal.pmed.1000354
  13. One Third of the Clinical Cancer Studies Report Conflict of Interest (laikaspoetnik.wordpress.com)
  14. Merck’s Ghostwriters, Haunted Papers and Fake Elsevier Journals (laikaspoetnik.wordpress.com)
  15. Lexchin, J. (2003). Pharmaceutical industry sponsorship and research outcome and quality: systematic review BMJ, 326 (7400), 1167-1170 DOI: 10.1136/bmj.326.7400.1167
  16. Smith R (2005). Medical journals are an extension of the marketing arm of pharmaceutical companies. PLoS medicine, 2 (5) PMID: 15916457 (free full text at PLOS)
  17. Handel, A., Patel, S., Pakpoor, J., Ebers, G., Goldacre, B., & Ramagopalan, S. (2012). High reprint orders in medical journals and pharmaceutical industry funding: case-control study BMJ, 344 (jun28 1) DOI: 10.1136/bmj.e4212
  18. Fang, F., & Casadevall, A. (2011). Retracted Science and the Retraction Index Infection and Immunity, 79 (10), 3855-3859 DOI: 10.1128/IAI.05661-11
  19. Is it time for a Retraction Index? (retractionwatch.wordpress.com)
  20. Agrawal A, & Sharma A (2012). Likelihood of false-positive results in high-impact journals publishing groundbreaking research. Infection and immunity, 80 (3) PMID: 22338040


* Addendum: my (unpublished) letter to the NRC

Tour de France.
Nadat het NRC eerder 2 pagina’ s de loftrompet over Katan’s nieuwe studie had afgestoken, vond Katan het nodig om dit in zijn eigen column dunnetjes over te doen. Verwijzen naar je eigen werk mag, ook in een column, maar dan moeten wij daar als lezer wel wijzer van worden. Wat is nu de boodschap van dit stuk “Col hors Catégorie“? Het beschrijft vooral de lange weg om een wetenschappelijke studie gepubliceerd te krijgen in een toptijdschrift, in dit geval de New England Journal of Medicine (NEJM), “de Tour de France onder de medische tijdschriften”. Het stuk eindigt met een tackle naar een journalist “die dacht dat hij het beter wist”. Maar ach, wat geeft dat als de hele wereld staat te jubelen? Erg onsportief, omdat die journalist (van Maanen, Volkskrant) wel degelijk op een aantal punten scoorde. Ook op Katan’s kernpunt dat een NEJM-publicatie “de beste garantie is dat iets waar is” valt veel af te dingen. De NEJM heeft inderdaad een hoge impactfactor, een maat voor hoe vaak artikelen geciteerd worden. De NEJM heeft echter ook de hoogste ‘artikelterugtrekkings’ index. Tevens heeft de NEJM het hoogste percentage door de industrie gesponsorde klinische trials, die de totale impactfactor opkrikken. Daarnaast gaan toptijdschriften vooral voor “positieve resultaten” en “trendy onderwerpen”, wat publicatiebias in de hand werkt. Als we de vergelijking met de Tour de France doortrekken: het volbrengen van deze prestigieuze wedstrijd garandeert nog niet dat deelnemers geen verboden middelen gebruikt hebben. Ondanks de strenge dopingcontroles.

#EAHIL2012 CEC 2: Visibility & Impact – Library’s New Role to Enhance Visibility of Researchers

4 07 2012

This week I’m blogging at (and mostly about) the 13th EAHIL conference in Brussels. EAHIL stands for European Association for Health Information and Libraries.

The second Continuing Education Course (CEC) I followed was given by Tiina Heino and Katri Larmo of the Terkko Meilahti Campus Library at the University of Helsinki in Finland.

The full title of the course was Visibility and impact – library’s new role: How the library can support the researcher to get visibility and generate impact to researcher’s work. You can read the abstract here.

The hands-on workshop mainly concentrated on the social bookmarking sites ConnoteaMendeley and Altmetric.

Furthermore we got information on CiteULike, ORCID,  Faculty of 1000 Posters and Pinterest. Also services developed in Terkko, such as ScholarChart and TopCited Articles, were shortly demonstrated.

What I especially liked in the hands on session is that the tutors had prepared a wikispace with all the information and links on the main page ( https://visibility2012.wikispaces.com) and a separate page for each participant to edit (here is my page). You could add links to your created accounts and embed widgets for Mendeley.

There was sufficient time to practice and try the tools. And despite the great number of participants there was ample room for questions (& even for making a blog draft ;)).

The main message of the tutors is that the process of publishing scientific research doesn’t end at publishing the article: it is equally important what happens after the research has been published. Visibility and impact in the scientific community and in the society are  crucial  for making the research go forward as well as for getting research funding and promoting the researcher’s career. The Fig below (taken from the presentation) visualizes this process.

The tutors discussed ORCID, Open Researcher and contributor ID, that will be introduced later this year. It is meant to solve the author name ambiguity problem in scholarly communication by central registry of unique identifiers for each author (because author names can’t be used to reliably identify all scholarly author). It will be possible for authors to create, manage and share their ORCID record without membership fee. For further information see several publications and presentations by Martin Fenner. I found this one during the course while browsing Mendeley.

Once published the author’s work can be promoted using bookmarking tools, like CiteULike, Connotea and Mendeley. You can easily registrate for Connotea and Mendeley using your Facebook account. These social bookmarking tools are also useful for networking, i.e. to discover individuals and groups with the same field of interest. It is easy to synchronize your Mendeley with your CiteULike account.

Mendeley is available in a desktop and a web version. The web version offers a public profile for researchers, a catalog of documents, and collaborative groups (the cloud of Mendeley). The desktop version of Mendeley is specially suited for reference management and organizing your PDF’s. That said Mendeley seems most suitable for serendipitous use (clicking and importing a reference you happen to see and like) and less useful for managing and deduplicating large numbers of records, i.e. for a systematic review.
Also (during the course) it was not possible to import several PubMed records at once in either CiteULike or Mendeley.

What stroke me when I tried Mendeley is that there were many small or dead groups. A search for “cochrane”  for instance yielded one large group Cochrane QES Register, owned by Andrew Booth, and 3 groups with one member (thus not really a group), with 0 (!) to 6 papers each! It looks like people are trying Mendeley and other tools just for a short while. Indeed, most papers I looked up in PubMed were not bookmarked at all. It makes you wonder how widespread the use of these bookmarking tools is. It probably doesn’t help that there are so many tools with different purposes and possibilities.

Another tool that we tried was Altmetric. This is a free bookmarklet on scholarly articles which allows you to track the conversations around scientific articles online. It shows the tweets, blogposts, Google+ and Facebook mentions, and the numbers of bookmarks on Mendeley, CiteULike and Connotea.

I tried the tool on a paper I blogged about , ie. Seventy-Five Trials and Eleven Systematic Reviews a Day: How Will We Ever Keep Up?

The bookmarklet showed the tweets and the blogposts mentioning the paper.

Indeed altmetrics did correctly refer to my blog (even to 2 posts).

I liked altmetrics*, but saying that it is suitable for scientific metrics is a step too far. For people interested in this topic I would like to refer -again- to a post of Martin Fenner on altmetrics (in general).  He stresses that “usage metrics”  has its limitations because of its proness  to “gaming” (cheating).

But the current workshop didn’t address the shortcomings of the tools, for it was meant as a first practical acquaintance with the web 2.0 tools.

For the other tools (Faculty of 1000 Posters, Pinterest) and the services developed in Terkko, such as ScholarChart and TopCited Articles,  see the wikipage and the presentation:

*Coincidentally I’m preparing a post on handy chrome extensions to look for tweets about a webpage. Altmetric is another tool which seems very suitable for this purpose

Related articles

Jeffrey Beall’s List of Predatory, Open-Access Publishers, 2012 Edition

19 12 2011

Perhaps you remember that I previously wrote [1] about  non-existing and/or low quality scammy open access journals. I specifically wrote about Medical Science Journals of  the http://www.sciencejournals.cc/ series, which comprises 45 titles, none of which having published any article yet.

Another blogger, David M [2] also had negative experiences with fake peer review invitations from sciencejournals. He even noticed plagiarism.

Later I occasionally found other posts about open access spam, like the post of Per Ola Kristensson [3] (specifically about Bentham, Hindawi and InTech OA publishers), of Peter Murray-Rust [4] ,a chemist interested in OA (about spam journals and conferences, specifically about Scientific Research Publishing) and of Alan Dove PhD [5] (specifically about The Journal of Computational Biology and Bioinformatics Research (JCBBR) published by Academic Journals).

But now it appears that there is an entire list of “Predatory, Open-Access Publishers”. This list was created by Jeffrey Beall, academic librarian at the University of Colorado Denver. He just updated the list for 2012 here (PDF-format).

According to Jeffrey predatory, open-access publishers

are those that unprofessionally exploit the author-pays model of open-access publishing (Gold OA) for their own profit. Typically, these publishers spam professional email lists, broadly soliciting article submissions for the clear purpose of gaining additional income. Operating essentially as vanity presses, these publishers typically have a low article acceptance threshold, with a false-front or non-existent peer review process. Unlike professional publishing operations, whether subscription-based or ethically-sound open access, these predatory publishers add little value to scholarship, pay little attention to digital preservation, and operate using fly-by-night, unsustainable business models.

Jeffrey recommends not to do business with the following (illegitimate) publishers, including submitting article manuscripts, serving on editorial boards, buying advertising, etc. According to Jeffrey, “there are numerous traditional, legitimate journals that will publish your quality work for free, including many legitimate, open-access publishers”.

(For sake of conciseness, I only describe the main characteristics, not always using the same wording; please see the entire list for the full descriptions.)

Watchlist: Publishers, that may show some characteristics of  predatory, open-access publisher
  • Hindawi Way too many journals than can be properly handled by one publisher
  • MedKnow Publications vague business model. It charges for the PDF version
  • PAGEPress many dead links, a prominent link to PayPal
  • Versita Open paid subscription for print form. ..unclear business model

An asterisk (*) indicates that the publisher is appearing on this list for the first time.

How complete and reliable is this list?

Clearly, this list is quite exhaustive. Jeffrey did a great job listing  many dodgy OA journals. We should watch (many) of these OA publishers with caution. Another good thing is that the list is updated annually.

(http://www.sciencejournals.cc/ described in my previous post is not (yet) on the list ;)  but I will inform Jeffrey).

Personally, I would have preferred a distinction between real bogus or spammy journals and journals that seem to have “too many journals to properly handle” or that ask (too much ) money for subscription/from the author. The scientific content may still be good (enough).

Furthermore, I would rather see a neutral description of what is exactly wrong about a journal. Especially because “Beall’s list” is a list and not a blog post (or is it?). Sometimes the description doesn’t convince me that the journal is really bogus or predatory.

Examples of subjective portrayals:

  • Dove Press:  This New Zealand-based medical publisher boasts high-quality appearing journals and articles, yet it demands a very high author fee for publishing articles. Its fleet of journals is large, bringing into question how it can properly fulfill its promise to quickly deliver an acceptance decision on submitted articles.
  • Libertas Academia “The tag line under the name on this publisher’s page is “Freedom to research.” It might better say “Freedom to be ripped off.” 
  • Hindawi  .. This publisher has way too many journals than can be properly handled by one publisher, I think (…)

I do like funny posts, but only if it is clear that the post is intended to be funny. Like the one by Alan Dove PhD about JCBBR.

JCBBR is dedicated to increasing the depth of research across all areas of this subject.

Translation: we’re launching a new journal for research that can’t get published anyplace else.

The journal welcomes the submission of manuscripts that meet the general criteria of significance and scientific excellence in this subject area.

We’ll take pretty much any crap you excrete.

Hattip: Catherine Arnott Smith, PhD at the MedLib-L list.

  1. I Got the Wrong Request from the Wrong Journal to Review the Wrong Piece. The Wrong kind of Open Access Apparently, Something Wrong with this Inherently… (laikaspoetnik.wordpress.com)
  2. A peer-review phishing scam (blog.pita.si)
  3. Academic Spam and Open Access Publishing (blog.pokristensson.com)
  4. What’s wrong with Scholarly Publishing? New Journal Spam and “Open Access” (blogs.ch.cam.ac.uk)
  5. From the Inbox: Journal Spam (alandove.com)
  6. Beall’s List of Predatory, Open-Access Publishers. 2012 Edition (http://metadata.posterous.com)
  7. Silly Sunday #42 Open Access Week around the Globe (laikaspoetnik.wordpress.com)

FUTON Bias. Or Why Limiting to Free Full Text Might not Always be a Good Idea.

8 09 2011

ResearchBlogging.orgA few weeks ago I was discussing possible relevant papers for the Twitter Journal Club  (Hashtag #TwitJC), a succesful initiative on Twitter, that I have discussed previously here and here [7,8].

I proposed an article, that appeared behind a paywall. Annemarie Cunningham (@amcunningham) immediately ran the idea down, stressing that open-access (OA) is a pre-requisite for the TwitJC journal club.

One of the TwitJC organizers, Fi Douglas (@fidouglas on Twitter), argued that using paid-for journals would defeat the objective that  #TwitJC is open to everyone. I can imagine that fee-based articles could set a too high threshold for many doctors. In addition, I sympathize with promoting OA.

However, I disagree with Annemarie that an OA (or rather free) paper is a prerequisite if you really want to talk about what might impact on practice. On the contrary, limiting to free full text (FFT) papers in PubMed might lead to bias: picking “low hanging fruit of convenience” might mean that the paper isn’t representative and/or doesn’t reflect the current best evidence.

But is there evidence for my theory that selecting FFT papers might lead to bias?

Lets first look at the extent of the problem. Which percentage of papers do we miss by limiting for free-access papers?

survey in PLOS by Björk et al [1] found that one in five peer reviewed research papers published in 2008 were freely available on the internet. Overall 8,5% of the articles published in 2008 (and 13,9 % in Medicine) were freely available at the publishers’ sites (gold OA).  For an additional 11,9% free manuscript versions could be found via the green route:  i.e. copies in repositories and web sites (7,8% in Medicine).
As a commenter rightly stated, the lag time is also important, as we would like to have immediate access to recently published research, yet some publishers (37%) impose an access-embargo of 6-12 months or more. (these papers were largely missed as the 2008 OA status was assessed late 2009).

PLOS 2009

The strength of the paper is that it measures  OA prevalence on an article basis, not on calculating the share of journals which are OA: an OA journal generally contains a lower number of articles.
The authors randomly sampled from 1.2 million articles using the advanced search facility of Scopus. They measured what share of OA copies the average researcher would find using Google.

Another paper published in  J Med Libr Assoc (2009) [2], using similar methods as the PLOS survey examined the state of open access (OA) specifically in the biomedical field. Because of its broad coverage and popularity in the biomedical field, PubMed was chosen to collect their target sample of 4,667 articles. Matsubayashi et al used four different databases and search engines to identify full text copies. The authors reported an OA percentage of 26,3 for peer reviewed articles (70% of all articles), which is comparable to the results of Björk et al. More than 70% of the OA articles were provided through journal websites. The percentages of green OA articles from the websites of authors or in institutional repositories was quite low (5.9% and 4.8%, respectively).

In their discussion of the findings of Matsubayashi et al, Björk et al. [1] quickly assessed the OA status in PubMed by using the new “link to Free Full Text” search facility. First they searched for all “journal articles” published in 2005 and then repeated this with the further restrictions of “link to FFT”. The PubMed OA percentages obtained this way were 23,1 for 2005 and 23,3 for 2008.

This proportion of biomedical OA papers is gradually increasing. A chart in Nature’s News Blog [9] shows that the proportion of papers indexed on the PubMed repository each year has increased from 23% in 2005 to above 28% in 2009.
(Methods are not shown, though. The 2008 data are higher than those of Björk et al, who noticed little difference with 2005. The Data for this chart, however, are from David Lipman, NCBI director and driving force behind the digital OA archive PubMed Central).
Again, because of the embargo periods, not all literature is immediately available at the time that it is published.

In summary, we would miss about 70% of biomedical papers by limiting for FFT papers. However, we would miss an even larger proportion of papers if we limit ourselves to recently published ones.

Of course, the key question is whether ignoring relevant studies not available in full text really matters.

Reinhard Wentz of the Imperial College Library and Information Service already argued in a visionary 2002 Lancet letter[3] that the availability of full-text articles on the internet might have created a new form of bias: FUTON bias (Full Text On the Net bias).

Wentz reasoned that FUTON bias will not affect researchers who are used to comprehensive searches of published medical studies, but that it will affect staff and students with limited experience in doing searches and that it might have the same effect in daily clinical practice as publication bias or language bias when doing systematic reviews of published studies.

Wentz also hypothesized that FUTON bias (together with no abstract available (NAA) bias) will affect the visibility and the impact factor of OA journals. He makes a reasonable cause that the NAA-bias will affect publications on new, peripheral, and under-discussion subjects more than established topics covered in substantive reports.

The study of Murali et al [4] published in Mayo Proceedings 2004 confirms that the availability of journals on MEDLINE as FUTON or NAA affects their impact factor.

Of the 324 journals screened by Murali et al. 38.3% were FUTON, 19.1%  NAA and 42.6% had abstracts only. The mean impact factor was 3.24 (±0.32), 1.64 (±0.30), and 0.14 (±0.45), respectively! The authors confirmed this finding by showing a difference in impact factors for journals available in both the pre and the post-Internet era (n=159).

Murali et al informally questioned many physicians and residents at multiple national and international meetings in 2003. These doctors uniformly admitted relying on FUTON articles on the Web to answer a sizable proportion of their questions. A study by Carney et al (2004) [5] showed  that 98% of the US primary care physicians used the Internet as a resource for clinical information at least once a week and mostly used FUTON articles to aid decisions about patient care or patient education and medical student or resident instruction.

Murali et al therefore conclude that failure to consider FUTON bias may not only affect a journal’s impact factor, but could also limit consideration of medical literature by ignoring relevant for-fee articles and thereby influence medical education akin to publication or language bias.

This proposed effect of the FFT limit on citation retrieval for clinical questions, was examined in a  more recent study (2008), published in J Med Libr Assoc [6].

Across all 4 questions based on a research agenda for physical therapy, the FFT limit reduced the number of citations to 11.1% of the total number of citations retrieved without the FFT limit in PubMed.

Even more important, high-quality evidence such as systematic reviews and randomized controlled trials were missed when the FFT limit was used.

For example, when searching without the FFT limit, 10 systematic reviews of RCTs were retrieved against one when the FFT limit was used. Likewise when searching without the FFT limit, 28 RCTs were retrieved and only one was retrieved when the FFT limit was used.

The proportion of missed studies (appr. 90%) is higher than in the studies mentioned above. Possibly this is because real searches have been tested and that only relevant clinical studies  have been considered.

The authors rightly conclude that consistently missing high-quality evidence when searching clinical questions is problematic because it undermines the process of Evicence Based Practice. Krieger et al finally conclude:

“Librarians can educate health care consumers, scientists, and clinicians about the effects that the FFT limit may have on their information retrieval and the ways it ultimately may affect their health care and clinical decision making.”

It is the hope of this librarian that she did a little education in this respect and clarified the point that limiting to free full text might not always be a good idea. Especially if the aim is to critically appraise a topic, to educate or to discuss current best medical practice.


  1. Björk, B., Welling, P., Laakso, M., Majlender, P., Hedlund, T., & Guðnason, G. (2010). Open Access to the Scientific Journal Literature: Situation 2009 PLoS ONE, 5 (6) DOI: 10.1371/journal.pone.0011273
  2. Matsubayashi, M., Kurata, K., Sakai, Y., Morioka, T., Kato, S., Mine, S., & Ueda, S. (2009). Status of open access in the biomedical field in 2005 Journal of the Medical Library Association : JMLA, 97 (1), 4-11 DOI: 10.3163/1536-5050.97.1.002
  3. WENTZ, R. (2002). Visibility of research: FUTON bias The Lancet, 360 (9341), 1256-1256 DOI: 10.1016/S0140-6736(02)11264-5
  4. Murali NS, Murali HR, Auethavekiat P, Erwin PJ, Mandrekar JN, Manek NJ, & Ghosh AK (2004). Impact of FUTON and NAA bias on visibility of research. Mayo Clinic proceedings. Mayo Clinic, 79 (8), 1001-6 PMID: 15301326
  5. Carney PA, Poor DA, Schifferdecker KE, Gephart DS, Brooks WB, & Nierenberg DW (2004). Computer use among community-based primary care physician preceptors. Academic medicine : journal of the Association of American Medical Colleges, 79 (6), 580-90 PMID: 15165980
  6. Krieger, M., Richter, R., & Austin, T. (2008). An exploratory analysis of PubMed’s free full-text limit on citation retrieval for clinical questions Journal of the Medical Library Association : JMLA, 96 (4), 351-355 DOI: 10.3163/1536-5050.96.4.010
  7. The #TwitJC Twitter Journal Club, a new Initiative on Twitter. Some Initial Thoughts. (laikaspoetnik.wordpress.com)
  8. The Second #TwitJC Twitter Journal Club (laikaspoetnik.wordpress.com)
  9. How many research papers are freely available? (blogs.nature.com)

To Retract or Not to Retract… That’s the Question

7 06 2011

In the previous post I discussed [1] that editors of Science asked for the retraction of a paper linking XMRV retrovirus to ME/CFS.

The decision of the editors was based on the failure of at least 10 other studies to confirm these findings and on growing support that the results were caused by contamination. When the authors refused to retract their paper, Science issued an Expression of Concern [2].

In my opinion retraction is premature. Science should at least await the results of two multi-center studies, that were designed to confirm or disprove the results. These studies will continue anyway… The budget is already allocated.

Furthermore, I can’t suppress the idea that Science asked for a retraction to exonerate themselves for the bad peer review (the paper had serious flaws) and their eagerness to swiftly publish the possibly groundbreaking study.

And what about the other studies linking the XMRV to ME/CFS or other diseases: will these also be retracted?
And what happens in the improbable case that the multi-center studies confirm the 2009 paper? Would Science republish the retracted paper?

Thus in my opinion, it is up to other scientists to confirm or disprove findings published. Remember that falsifiability was Karl Popper’s basic scientific principle. My conclusion was that “fraud is a reason to retract a paper and doubt is not”. 

This is my opinion, but is this opinion shared by others?

When should editors retract a paper? Is fraud the only reason? When should editors issue a letter of concern? Are there guidelines?

Let first say that even editors don’t agree. Schekman, the editor-in chief of PNAS, has no direct plans to retract another paper reporting XMRV-like viruses in CFS [3].

Schekman considers it “an unusual situation to retract a paper even if the original findings in a paper don’t hold up: it’s part of the scientific process for different groups to publish findings, for other groups to try to replicate them, and for researchers to debate conflicting results.”

Back at the Virology Blog [4] there was also a vivid discussion about the matter. Prof. Vincent Ranciello gave the following answer in response to a question of a reader:

I don’t have any hard numbers on how often journals ask scientists to retract a paper, only my sense that it is very rare. Author retractions are more frequent, but I’m only aware of a handful of those in a year. I can recall a few other cases in which the authors were asked to retract a paper, but in those cases scientific fraud was involved. That’s not the case here. I don’t believe there is a standard policy that enumerates how such decisions are made; if they exist they are not public.

However, there is a Guideline for editors, the Guidance from the Committee on Publication Ethics (COPE) (PDF) [5]

Ivanoranski, of the great blog Retraction Watch, linked to it when we discussed reasons for retraction.

With regard to retraction the COPE-guidelines state that journal editors should consider retracting a publication if:

  1. they have clear evidence that the findings are unreliable, either as a result of misconduct (e.g. data fabrication) or honest error (e.g. miscalculation or experimental error)
  2. the findings have previously been published elsewhere without proper crossreferencing, permission or justification (i.e. cases of redundant publication)
  3. it constitutes plagiarism
  4. it reports unethical research

According to the same guidelines journal editors should consider issuing an expression of concern if:

  1. they receive inconclusive evidence of research or publication misconduct by the authors 
  2. there is evidence that the findings are unreliable but the authors’ institution will not investigate the case 
  3. they believe that an investigation into alleged misconduct related to the publication either has not been, or would not be, fair and impartial or conclusive 
  4. an investigation is underway but a judgement will not be available for a considerable time

Thus in the case of the Science XMRV/CSF paper an expression of concern certainly applies (all 4 points) and one might even consider a retraction, because the results seem unreliable (point 1). But it is not 100%  established that the findings are false. There is only serious doubt……

The guidelines seem to leave room for separate decisions. To retract a paper in case of plain fraud is not under discussion. But when is an error sufficiently established ànd important to warrant retraction?

Apparently retractions are on the rise. Although still rare (0.02% of all publications by the late 2000s) there has been a tenfold increase in retractions compared to the early 1980s (see review at Scholarly Kitchen [6] about two papers: [7] and [8]). However it is unclear whether increasing rates of retraction reflect more fraudulent or erroneous papers or a better diligence. The  first paper [7] also highlights that, out of fear of litigation, editors are generally hesitant to retract an article without the author’s permission.

At the blog Nerd Alert they give a nice overview [9] (based on Retraction Watch, but then summarized in one post ;) ) . They clarify that papers are retracted for “less dastardly reasons then those cases that hit the national headlines and involve purposeful falsification of data”, such as the fraudulent papers of Andrew Wakefield (autism caused by vaccination). Besides the mistaken publication of the same paper twice, data over-interpretation, plagiarism and the like, the reason can also be more trivial: ordering the wrong mice or using an incorrectly labeled bottle.

Still, scientist don’t unanimously agree that such errors should lead to retraction.

Drug Monkey blogs about his discussion [10] with @ivanoransky over a recent post at Retraction Watch, which asks whether a failure to replicate a result justifies a retraction [11]“. Ivanoransky presents a case, where a researcher (B) couldn’t reproduce the findings of another lab (A) and demonstrated mutations in the published protein sequence that excluded the mechanism proposed in A’s paper. This wasn’t retracted, possibly because B didn’t follow the published experimental protocols of A in all details. (reminds me of the XMRV controversy). 

Drugmonkey says (quote):  (cross-posted at Scientopia here — hmmpf isn’t that an example of redundant publication?)

“I don’t give a fig what any journals might wish to enact as a policy to overcompensate for their failures of the past.
In my view, a correction suffices” (provided that search engines like Google and PubMed make clear that the paper was in fact corrected).

Drug Monkey has a point there. A clear watermark should suffice.

However, we should note that most papers are retracted by authors, not the editors/journals, and that the majority of “retracted papers” remain available. Just 13.2% are deleted from the journal’s website. And 31% are not clearly labelled as such.

Summary of how the naïve reader is alerted to paper retraction (from Table 2 in [7], see: Scholarly Kitchen [6])

  • Watermark on PDF (41.1%)
  • Journal website (33.4%)
  • Not noted anywhere (31.8%)
  • Note appended to PDF (17.3%)
  • PDF deleted from website (13.2%)

My conclusion?

Of course fraudulent papers should be retracted. Also papers with obvious errors that invalidate the conclusions.

However, we should be extremely hesitant to retract papers that can’t be reproduced, if there is no undisputed evidence of error.

Otherwise we should retract almost all published papers at one point or another. Because if Professor Ioannides is right (and he probably is) “Much of what medical researchers conclude in their studies is misleading, exaggerated, or flat-out wrong”. ( see previous post [12],  “Lies, Damned Lies, and Medical Science” [13])  and Ioannides’ crushing article “Why most published research findings are false [14]”)

All retracted papers (and papers with major deficiencies and shortcomings) should be clearly labeled as such (as Drugmonkey proposed, not only at the PDF and at the Journal website, but also by search engines and biomedical databases).

Or lets hope, with Biochembelle [15], that the future of scientific publishing will make retractions for technical issues obsolete (whether in the form of nano-publications [16] or otherwise):

One day the scientific community will trade the static print-type approach of publishing for a dynamic, adaptive model of communication. Imagine a manuscript as a living document, one perhaps where all raw data would be available, others could post their attempts to reproduce data, authors could integrate corrections or addenda….

NOTE: Retraction Watch (@ivanoransky) and @laikas have voted in @drugmonkeyblog‘s poll about what a retracted paper means [here]. Have you?


  1. Science Asks to Retract the XMRV-CFS Paper, it Should Never Have Accepted in the First Place. (laikaspoetnik.wordpress.com 2011-06-02)
  2. Alberts B. Editorial Expression of Concern. Science. 2011-05-31.
  3. Given Doubt Cast on CFS-XMRV Link, What About Related Research? (blogs.wsj.com)
  4. XMRV is a recombinant virus from mice  (Virology Blog : 2011/05/31)
  5. Retractions: Guidance from the Committee on Publication Ethics (COPE) Elizabeth Wager, Virginia Barbour, Steven Yentis, Sabine Kleinert on behalf of COPE Council:
  6. Retract This Paper! Trends in Retractions Don’t Reveal Clear Causes for Retractions (scholarlykitchen.sspnet.org)
  7. Wager E, Williams P. Why and how do journals retract articles? An analysis of Medline retractions 1988-2008. J Med Ethics. 2011 Apr 12. [Epub ahead of print] 
  8. Steen RG. Retractions in the scientific literature: is the incidence of research fraud increasing? J Med Ethics. 2011 Apr;37(4):249-53. Epub 2010 Dec 24.
  9. Don’t touch that blot. (nerd-alert.net/blog/weeklies/ : 2011/02/25)
  10. What_does_a_retracted_paper_mean? (scienceblogs.com/drugmonkey: 2011/06/03)
  11. So when is a retraction warranted? The long and winding road to publishing a failure to replicate (retractionwatch.wordpress.com : 2011/06/03/)
  12. Much Ado About ADHD-Research: Is there a Misrepresentation of ADHD in Scientific Journals? (laikaspoetnik.wordpress.com 2011-06-02)
  13. “Lies, Damned Lies, and Medical Science” (theatlantic.com :2010/11/)
  14. Ioannidis, J. (2005). Why Most Published Research Findings Are False. PLoS Medicine, 2 (8) DOI: 10.1371/journal.pmed.0020124
  15. Retractions: What are they good for? (biochembelle.wordpress.com : 2011/06/04/)
  16. Will Nano-Publications & Triplets Replace The Classic Journal Articles? (laikaspoetnik.wordpress.com 2011-06-02)

NEW* (Added 2011-06-08):



Get every new post delivered to your Inbox.

Join 610 other followers