Of Mice and Men Again: New Genomic Study Helps Explain why Mouse Models of Acute Inflammation do not Work in Men

25 02 2013

ResearchBlogging.org

This post is update after a discussion at Twitter with @animalevidence who pointed me at a great blog post at Speaking of Research ([19], a repost of [20], highlighting the shortcomings of the current study using just one single inbred strain of mice (C57Bl6)  [2013-02-26]. Main changes are in blue

A recent paper published in PNAS [1] caused quite a stir both inside and outside the scientific community. The study challenges the validity of using mouse models to test what works as a treatment in humans. At least this is what many online news sources seem to conclude: “drug testing may be a waste of time”[2], “we are not mice” [3, 4], or a bit more to the point: mouse models of inflammation are worthless [5, 6, 7].

But basically the current study looks only at one specific area, the area of inflammatory responses that occur in critically ill patients after severe trauma and burns (SIRS, Systemic Inflammatory Response Syndrome). In these patients a storm of events may eventually lead to organ failure and death. It is similar to what may occur after sepsis (but here the cause is a systemic infection).

Furthermore the study only uses one single approach: it compares the gene response patterns in serious human injuries (burns, trauma) and a human model partially mimicking these inflammatory diseases (human healthy volunteers receiving  a low dose endotoxin) with the corresponding three animal models (burns, trauma, endotoxin).

And, as highlighted by Bill Barrington of “Understand Nutrition” [8], the researchers have only tested the gene profiles in one single strain of mice: C57Bl6 (B6 for short). If B6 was the only model used in practice this would be less of a problem. But according to Mark Wanner of the Jackson Laboratory [19, 20]:

 It is now well known that some inbred mouse strains, such as the C57BL/6J (B6 for short) strain used, are resistant to septic shock. Other strains, such as BALB and A/J, are much more susceptible, however. So use of a single strain will not provide representative results.

The results in itself are very clear. The figures show at a glance that there is no correlation whatsoever between the human and B6 mouse expression data.

Seok and 36 other researchers from across the USA  looked at approximately 5500 human genes and their mouse analogs. In humans, burns and traumatic injuries (and to a certain extent the human endotoxin model) triggered the activation of a vast number of genes, that were not triggered in the present C57Bl6 mouse models. In addition the genomic response is longer lasting in human injuries. Furthermore, the top 5 most activated and most suppressed pathways in human burns and trauma had no correlates in mice. Finally, analysis of existing data in the Gene Expression (GEO) Database showed that the lack of correlation between mouse and human studies was also true for other acute inflammatory responses, like sepsis and acute infection.

This is a high quality study with interesting results. However, the results are not as groundbreaking as some media suggest.

As discussed by the authors [1], mice are known to be far more resilient to inflammatory challenge than humans*: a million fold higher dose of endotoxin than the dose causing shock in humans is lethal to mice.* This, and the fact that “none of the 150  candidate agents that progressed to human trials has proved successful in critically ill patients” already indicates that the current approach fails.

[This is not entirely correct the endotoxin/LPS dose in mice is 1000–10,000 times the dose required to induce severe disease with shock in humans [20] and mice that are resilient to endotoxin may still be susceptible to infection. It may well be that the endotoxin response is not a good model for the late effects of  sepsis]

The disappointing trial results have forced many researchers to question not only the usefulness of the current mouse models for acute inflammation [9,10; refs from 11], but also to rethink the key aspects of the human response itself and the way these clinical trials are performed [12, 13, 14]. For instance, emphasis has always been on the exuberant inflammatory reaction, but the subsequent immunosuppression may also be a major contributor to the disease. There is also substantial heterogeneity among patients [13-14] that may explain why some patients have a good prognosis and others haven’t. And some of the initially positive results in human trials have not been reproduced in later studies either (benefit of intense glucose control and corticosteroid treatment) [12]. Thus is it fair to blame only the mouse studies?

dick mouse

dick mouse (Photo credit: Wikipedia)

The coverage by some media is grist to the mill of people who think animal studies are worthless anyway. But one cannot extrapolate these findings to other diseases. Furthermore, as referred to above, the researchers have only tested the gene profiles in one single strain of mice: C57Bl6, meaning that “The findings of Seok et al. are solely applicable to the B6 strain of mice in the three models of inflammation they tested. They unduly generalize these findings to mouse models of inflammation in general. [8]”

It is true that animal studies, including rodent studies, have their limitations. But what are the alternatives? In vitro studies are often even more artificial, and direct clinical testing of new compounds in humans is not ethical.

Obviously, the final proof of effectiveness and safety of new treatments can only be established in human trials. No one will question that.

A lot can be said about why animal studies often fail to directly translate to the clinic [15]. Clinical disparities between the animal models and the clinical trials testing the treatment (like in sepsis) are one reason. Other important reasons may be methodological flaws in animal studies (i.e. no randomization, wrong statistics) and publication bias: non-publication of “negative” results appears to be prevalent in laboratory animal research.[15-16]. Despite their shortcomings, animal studies and in vitro studies offer a way to examine certain aspects of a process, disease or treatment.

In summary, this study confirms that the existing (C57Bl6) mouse model doesn’t resemble the human situation in the systemic response following acute traumatic injury or sepsis: the genomic response is entirely different, in magnitude, duration and types of changes in expression.

The findings are not new: the shortcomings of the mouse model(s) were long known. It remains enigmatic why the researchers chose only one inbred strain of mice, and of all mice only the B6-strain, which is less sensitive to endotoxin, and only develop acute kidney injury (part of organ failure) at old age (young mice were used) [21]. In this paper from 2009 (!) various reasons are given why the animal models didn’t properly mimic the human disease and how this can be improved. The authors stress that:

the genetically heterogeneous human population should be more accurately represented by outbred mice, reducing the bias found in inbred strains that might contain or lack recessive disease susceptibility loci, depending on selective pressures.” 

Both Bill Barrington [8] and Mark Wanner [18,19] propose the use of “diversity outbred cross or collaborative cross mice that  provide additional diversity.” Indeed, “replicating genetic heterogeneity and critical clinical risk factors such as advanced age and comorbid conditions (..) led to improved models of sepsis and sepsis-induced AKI (acute kidney injury). 

The authors of the PNAS paper suggest that genomic analysis can aid further in revealing which genes play a role in the perturbed immune response in acute inflammation, but it remains to be seen whether this will ultimately lead to effective treatments of sepsis and other forms of acute inflammation.

It also remains to be seen whether comprehensive genomic characterization will be useful in other disease models. The authors suggest for instance,  that genetic profiling may serve as a guide to develop animal models. A shotgun analyses of gene expression of thousands of genes was useful in the present situation, because “the severe inflammatory stress produced a genomic storm affecting all major cellular functions and pathways in humans which led to sufficient perturbations to allow comparisons between the genes in the human conditions and their analogs in the murine models”. But rough analysis of overall expression profiles may give little insight in the usefulness of other animal models, where genetic responses are more subtle.

And predicting what will happen is far less easy that to confirm what is already known….

NOTE: as said the coverage in news and blogs is again quite biased. The conclusion of a generally good Dutch science  news site (the headline and lead suggested that animal models of immune diseases are crap [6]) was adapted after a critical discussion at Twitter (see here and here), and a link was added to this blog post). I wished this occurred more often….
In my opinion the most balanced summaries can be found at the science-based blogs: ScienceBased Medicine [11] and NIH’s Director’s Blog [17], whereas “Understand Nutrition” [8] has an original point of view, which is further elaborated by Mark Wanner at Speaking of Research [19] and Genetics and your health Blog [20]

References

  1. Seok, J., Warren, H., Cuenca, A., Mindrinos, M., Baker, H., Xu, W., Richards, D., McDonald-Smith, G., Gao, H., Hennessy, L., Finnerty, C., Lopez, C., Honari, S., Moore, E., Minei, J., Cuschieri, J., Bankey, P., Johnson, J., Sperry, J., Nathens, A., Billiar, T., West, M., Jeschke, M., Klein, M., Gamelli, R., Gibran, N., Brownstein, B., Miller-Graziano, C., Calvano, S., Mason, P., Cobb, J., Rahme, L., Lowry, S., Maier, R., Moldawer, L., Herndon, D., Davis, R., Xiao, W., Tompkins, R., , ., Abouhamze, A., Balis, U., Camp, D., De, A., Harbrecht, B., Hayden, D., Kaushal, A., O’Keefe, G., Kotz, K., Qian, W., Schoenfeld, D., Shapiro, M., Silver, G., Smith, R., Storey, J., Tibshirani, R., Toner, M., Wilhelmy, J., Wispelwey, B., & Wong, W. (2013). Genomic responses in mouse models poorly mimic human inflammatory diseases Proceedings of the National Academy of Sciences DOI: 10.1073/pnas.1222878110
  2. Drug Testing In Mice May Be a Waste of Time, Researchers Warn 2013-02-12 (science.slashdot.org)
  3. Susan M Love We are not mice 2013-02-14 (Huffingtonpost.com)
  4. Elbert Chu  This Is Why It’s A Mistake To Cure Mice Instead Of Humans 2012-12-20(richarddawkins.net)
  5. Derek Low. Mouse Models of Inflammation Are Basically Worthless. Now We Know. 2013-02-12 (pipeline.corante.com)
  6. Elmar Veerman. Waardeloos onderzoek. Proeven met muizen zeggen vrijwel niets over ontstekingen bij mensen. 2013-02-12 (wetenschap24.nl)
  7. Gina Kolata. Mice Fall Short as Test Subjects for Humans’ Deadly Ills. 2013-02-12 (nytimes.com)

  8. Bill Barrington. Are Mice Reliable Models for Human Disease Studies? 2013-02-14 (understandnutrition.com)
  9. Raven, K. (2012). Rodent models of sepsis found shockingly lacking Nature Medicine, 18 (7), 998-998 DOI: 10.1038/nm0712-998a
  10. Nemzek JA, Hugunin KM, & Opp MR (2008). Modeling sepsis in the laboratory: merging sound science with animal well-being. Comparative medicine, 58 (2), 120-8 PMID: 18524169
  11. Steven Novella. Mouse Model of Sepsis Challenged 2013-02-13 (http://www.sciencebasedmedicine.org/index.php/mouse-model-of-sepsis-challenged/)
  12. Wiersinga WJ (2011). Current insights in sepsis: from pathogenesis to new treatment targets. Current opinion in critical care, 17 (5), 480-6 PMID: 21900767
  13. Khamsi R (2012). Execution of sepsis trials needs an overhaul, experts say. Nature medicine, 18 (7), 998-9 PMID: 22772540
  14. Hotchkiss RS, Coopersmith CM, McDunn JE, & Ferguson TA (2009). The sepsis seesaw: tilting toward immunosuppression. Nature medicine, 15 (5), 496-7 PMID: 19424209
  15. van der Worp, H., Howells, D., Sena, E., Porritt, M., Rewell, S., O’Collins, V., & Macleod, M. (2010). Can Animal Models of Disease Reliably Inform Human Studies? PLoS Medicine, 7 (3) DOI: 10.1371/journal.pmed.1000245
  16. ter Riet, G., Korevaar, D., Leenaars, M., Sterk, P., Van Noorden, C., Bouter, L., Lutter, R., Elferink, R., & Hooft, L. (2012). Publication Bias in Laboratory Animal Research: A Survey on Magnitude, Drivers, Consequences and Potential Solutions PLoS ONE, 7 (9) DOI: 10.1371/journal.pone.0043404
  17. Dr. Francis Collins. Of Mice, Men and Medicine 2013-02-19 (directorsblog.nih.gov)
  18. Tom/ Mark Wanner Why mice may succeed in research when a single mouse falls short (2013-02-15) (speakingofresearch.com) [repost, with introduction]
  19. Mark Wanner Why mice may succeed in research when a single mouse falls short (2013-02-13/) (http://community.jax.org) %5Boriginal post]
  20. Warren, H. (2009). Editorial: Mouse models to study sepsis syndrome in humans Journal of Leukocyte Biology, 86 (2), 199-201 DOI: 10.1189/jlb.0309210
  21. Doi, K., Leelahavanichkul, A., Yuen, P., & Star, R. (2009). Animal models of sepsis and sepsis-induced kidney injury Journal of Clinical Investigation, 119 (10), 2868-2878 DOI: 10.1172/JCI39421




BAD Science or BAD Science Journalism? – A Response to Daniel Lakens

10 02 2013

ResearchBlogging.orgTwo weeks ago  there was a hot debate among Dutch Tweeps on “bad science, bad science journalism and bad science communication“. This debate was started and fueled by different Dutch blog posts on this topic.[1,4-6]

A controversial post, with both fierce proponents and fierce opposition was the post by Daniel Lakens [1], an assistant professor in Applied Cognitive Psychology.

I was among the opponents. Not because I don’t like a new fresh point of view, but because of a wrong reasoning and because Daniel continuously compares apples and oranges.

Since Twitter debates can’t go in-depth and lack structure and since I cannot comment to his Google sites blog, I pursue my discussion here.

The title of Daniels post is (freely translated, like the rest of his post):

Is this what one calls good science?” 

In his post he criticizes a Dutch science journalist, Hans van Maanen, and specifically his recent column [2], where Hans discusses a paper published in Pediatrics [3].

This longitudinal study tested the Music Marker theory among 309 Dutch kids. The researchers gathered information about the kids’ favorite types of music and tracked incidents of “minor delinquency”, such as shoplifting or vandalism, from the time they were 12 until they reached age 16 [4]. The researchers conclude that liking music that goes against the mainstream (rock, heavy metal, gothic, punk, African American music, and electronic dance music) at age 12 is a strong predictor of future minor delinquency at 16, in contrast to chart pop, classic music, jazz.

The University press office send out a press release [5 ], which was picked up by news media [4,6] and one of the Dutch authors of this study,  Loes Keijsers,  tweeted enthusiastically: “Want to know whether a 16 year old adult will suffer from delinquency, than look at his music taste at age 12!”

According to Hans, Loes could have easily broadcasted (more) balanced tweets, likeMusic preference doesn’t predict shoplifting” or “12 year olds who like Bach keep quiet about shoplifting when 16.” But even then, Hans argues, the tweets wouldn’t have been scientifically underpinned either.

In column style Hans explains why he thinks that the study isn’t methodologically strong: no absolute numbers are given; 7 out of 11 (!) music styles are positively associated with delinquency, but these correlations are not impressive: the strongest predictor (Gothic music preference) can explain no more than 9%  of the variance in delinquent behaviour, which can include anything from shoplifting, vandalism, fighting, graffiti spraying, switching price tags.  Furthermore the risks of later “delinquent” behavior are small:  on a scale 1 (never) to 4 (4 times or more) the mean risk was 1,12. Hans also wonders whether it is a good idea to monitor kids with a certain music taste.

Thus Hans concludesthis study isn’t good science”. Daniel, however, concludes that Hans’ writing is not good science journalism.

First Daniel recalls he and other PhD’s took a course on how to peer review scientific papers. On basis of their peer review of a (published) article 90% of the students decided to reject it. The two main lessons learned by Daniel were:

  • It is easy to critize a scientific paper and grind it down. No single contribution to science (no single article) is perfect.
  • New scientific insights, although imperfect, are worth sharing, because they help to evolve science. *¹

According to Daniel science jounalists often make the same mistakes as the peer reviewing PhD-students: critisizing the individuel studies without a “meta-view” on science.

Peer review and journalism however are different things (apples and oranges if you like).

Peer review (with all its imperfections) serves to filter, check and to improve the quality of individual scientific papers (usually) before they are published  [10]. My papers that passed peer review, were generally accepted. Of course there were the negative reviewers, often  the ignorant ones, and the naggers, but many reviewers had critique that helped to improve my paper, sometimes substantially. As a peer reviewer myself I only try to separate the wheat from the chaff and to enhance the quality of the papers that pass.

Science journalism also has a filter function: it filters already peer reviewed scientific papers* for its readership, “the public” by selecting novel relevant science and translating the scientific, jargon-laded language, into language readers can understand and appreciate. Of course science journalists should put the publication into perspective (call it “meta”).

Surely the PhD-students finger exercise resembles the normal peer review process as much as peer review resembles science journalism.

I understand that pure nitpicking seldom serves a goal, but this rarely occurs in science journalism. The opposite, however, is commonplace.

Daniel disapproves Hans van Maanen’s criticism, because Hans isn’t “meta” enough. Daniel: “Arguing whether an effect size is small or mediocre is nonsense, because no individual study gives a good estimate of the effect size. You need to do more research and combine the results in a meta-analysis”.

Apples and oranges again.

Being “meta” has little to do with meta-analysis. Being meta is … uh … pretty meta. You could think of it as seeing beyond (meta) the findings of one single study*.

A meta-analysis, however, is a statistical technique for combining the findings from independent, but comparable (homogeneous) studies in order to more powerfully estimate the true effect size (pretty exact). This is an important, but difficult methodological task for a scientist, not a journalist. If a meta-analysis on the topic exist, journalists should take this into account, of course (and so should the researchers). If not, they should put the single study in broader perspective (what does the study add to existing knowledge?) and show why this single study is or is not well done?

Daniel takes this further by stating that “One study is no study” and that journalists who simply echo the press releases of a study ànd journalists who just amply criticizes only single publication (like Hans) are clueless about science.

Apples and oranges! How can one lump science communicators (“media releases”), echoing journalists (“the media”) and critical journalists together?

I see more value in a critical analysis than a blind rejoicing of hot air. As long as the criticism guides the reader to appreciate the study.

And if there is just one single novel study, that seems important enough to get media attention, shouldn’t we judge the research on its own merits?

Then Daniel asks himself: “If I do criticize those journalists, shouldn’t I criticize those scientists who published just a single study and wrote a press release about it? “

His conclusion? “No”.

Daniel explains: science never provides absolute certainty, at the most the evidence is strong enough to state what is likely true. This can only be achieved by a lot of research by different investigators. 

Therefore you should believe in your ideas and encourage other scientists to pursue your findings. It doesn’t help when you say that music preference doesn’t predict shoplifting. It does help when you use the media to draw attention to your research. Many researchers are now aware of the “Music Marker Theory”. Thus the press release had its desired effect. By expressing a firm belief in their conclusions, they encourage other scientists to spend their sparse time on this topic. These scientists will try to repeat and falsify the study, an essential step in Cumulative Science. At a time when science is under pressure, scientists shouldn’t stop writing enthusiastic press releases or tweets. 

The latter paragraph is sheer nonsense!

Critical analysis of one study by a journalist isn’t what undermines the  public confidence in science. Rather it’s the media circus, that blows the implications of scientific findings out of proportion.

As exemplified by the hilarious PhD Comic below research results are propagated by PR (science communication), picked up by media, broadcasted, spread via the internet. At the end of the cycle conclusions are reached, that are not backed up by (sufficient) evidence.

PhD Comics – The news Cycle

Daniel is right about some things. First one study is indeed no study, in the sense that concepts are continuously tested and corrected: falsification is a central property of science (Popper). He is also right that science doesn’t offer absolute certainty (an aspect that is often not understood by the public). And yes, researchers should believe in their findings and encourage other scientists to check and repeat their experiments.

Though not primarily via the media. But via the normal scientific route. Good scientists will keep track of new findings in their field anyway. Suppose that only findings that are trumpeted in the media would be pursued by other scientists?

7-2-2013 23-26-31 media & science

And authors shouldn’t make overstatements. They shouldn’t raise expectations to a level which cannot be met. The Dutch study only shows weak associations. It simply isn’t true that the Dutch study allows us to “predict” at an individual level if a 12 year old will “act out” at 16.

This doesn’t help lay-people to understand the findings and to appreciate science.

The idea that media should just serve to spotlight a paper, seems objectionable to me.

Going back to the meta-level: what about the role of science communicators, media, science journalists and researchers?

According to Maarten Keulemans, journalist, we should just get rid of all science communicators as a layer between scientists and journalists [7]. But Michel van Baal [9] and Roy Meijer[8] have a point when they say that  journalists do a lot PR-ing too and they should do better than to rehash news releases.*²

Now what about Daniel criticism of van Maanen? In my opinion, van Maanen is one of those rare critical journalists who serve as an antidote against uncritical media diarrhea (see Fig above). Comparable to another lone voice in the media: Ben Goldacre. It didn’t surprise me that Daniel didn’t approve of him (and his book Bad Science) either [11]. 

Does this mean that I find Hans van Maanen a terrific science journalist? No, not really. I often agree with him (i.e. see this post [12]). He is one of those rare journalists who has real expertise in research methodology . However, his columns don’t seem to be written for a large audience: they seem too complex for most lay people. One thing I learned during a scientific journalism course, is that one should explain all jargon to one’s audience.

Personally I find this critical Dutch blog post[13] about the Music Marker Theory far more balanced. After a clear description of the study, Linda Duits concludes that the results of the study are pretty obvious, but that the the mini-hype surrounding this research is caused by the positive tone of the press release. She stresses that prediction is not predetermination and that the musical genres are not important: hiphop doesn’t lead to criminal activity and metal not to vandalism.

And this critical piece in Jezebel [14],  reaches far more people by talking in plain, colourful language, hilarious at times.

It also a swell title: “Delinquents Have the Best Taste in Music”. Now that is an apt conclusion!

———————-

*¹ Since Daniel doesn’t refer to  open (trial) data access nor the fact that peer review may , I ignore these aspects for the sake of the discussion.

*² Coincidence? Keulemans has covered  the music marker study quite uncritically (positive).

Photo Credits

http://www.phdcomics.com/comics/archive.php?comicid=1174

References

  1. Daniel Lakens: Is dit nou goede Wetenschap? – Jan 24, 2013 (sites.google.com/site/lakens2/blog)
  2. Hans van Maanen: De smaak van boefjes in de dop,De Volkskrant, Jan 12, 2013 (vanmaanen.org/hans/columns/)
  3. ter Bogt, T., Keijsers, L., & Meeus, W. (2013). Early Adolescent Music Preferences and Minor Delinquency PEDIATRICS DOI: 10.1542/peds.2012-0708
  4. Lindsay Abrams: Kids Who Like ‘Unconventional Music’ More Likely to Become Delinquent, the Atlantic, Jan 18, 2013
  5. Muziekvoorkeur belangrijke voorspeller voor kleine criminaliteit. Jan 8, 2013 (pers.uu.nl)
  6. Maarten Keulemans: Muziek is goede graadmeter voor puberaal wangedrag – De Volkskrant, 12 januari 2013  (volkskrant.nl)
  7. Maarten Keulemans: Als we nou eens alle wetenschapscommunicatie afschaffen? – Jan 23, 2013 (denieuwereporter.nl)
  8. Roy Meijer: Wetenschapscommunicatie afschaffen, en dan? – Jan 24, 2013 (denieuwereporter.nl)
  9. Michel van Baal. Wetenschapsjournalisten doen ook aan PR – Jan 25, 2013 ((denieuwereporter.nl)
  10. What peer review means for science (guardian.co.uk)
  11. Daniel Lakens. Waarom raadde Maarten Keulemans me Bad Science van Goldacre aan? Oct 25, 2012
  12. Why Publishing in the NEJM is not the Best Guarantee that Something is True: a Response to Katan – Sept 27, 2012 (laikaspoetnik.wordpress.com)
  13. Linda Duits: Debunk: worden pubers crimineel van muziek? (dieponderzoek.nl)
  14. Lindy west: Science: “Delinquents Have the Best Taste in Music” (jezebel.com)




Why Publishing in the NEJM is not the Best Guarantee that Something is True: a Response to Katan

27 10 2012

ResearchBlogging.orgIn a previous post [1] I reviewed a recent  Dutch study published in the New England Journal of Medicine (NEJM [2] about the effects of sugary drinks on the body mass index of school children.

The study got widely covered by the media. The NRC, for which the main author Martijn Katan works as a science columnist,  columnist, spent  two full (!) pages on the topic -with no single critical comment-[3].
As if this wasn’t enough, the latest column of Katan again dealt with his article (text freely available at mkatan.nl)[4].

I found Katan’s column “Col hors Catégorie” [4] quite arrogant, especially because he tried to belittle a (as he called it) “know-it-all” journalist who criticized his work  in a rivaling newspaper. This wasn’t fair, because the journalist had raised important points [5, 1] about the work.

The piece focussed on the long road of getting papers published in a top journal like the NEJM.
Katan considers the NEJM as the “Tour de France” among  medical journals: it is a top achievement to publish in this paper.

Katan also states that “publishing in the NEJM is the best guarantee something is true”.

I think the latter statement is wrong for a number of reasons.*

  1. First, most published findings are false [6]. Thus journals can never “guarantee”  that published research is true.
    Factors that  make it less likely that research findings are true include a small effect size,  a greater number and lesser preselection of tested relationships, selective outcome reporting, the “hotness” of the field (all applying more or less to Katan’s study, he also changed the primary outcomes during the trial[7]), a small study, a great financial interest and a low pre-study probability (not applicable) .
  2. It is true that NEJM has a very high impact factor. This is  a measure for how often a paper in that journal is cited by others. Of course researchers want to get their paper published in a high impact journal. But journals with high impact factors often go for trendy topics and positive results. In other words it is far more difficult to publish a good quality study with negative results, and certainly in an English high impact journal. This is called publication bias (and language bias) [8]. Positive studies will also be more frequently cited (citation bias) and will more likely be published more than once (multiple publication bias) (indeed, Katan et al already published about the trial [9], and have not presented all their data yet [1,7]). All forms of bias are a distortion of the “truth”.
    (This is the reason why the search for a (Cochrane) systematic review must be very sensitive [8] and not restricted to core clinical journals, but even include non-published studies: for these studies might be “true”, but have failed to get published).
  3. Indeed, the group of Ioannidis  just published a large-scale statistical analysis[10] showing that medical studies revealing “very large effects” seldom stand up when other researchers try to replicate them. Often studies with large effects measure laboratory and/or surrogate markers (like BMI) instead of really clinically relevant outcomes (diabetes, cardiovascular complications, death)
  4. More specifically, the NEJM does regularly publish studies about pseudoscience or bogus treatments. See for instance this blog post [11] of ScienceBased Medicine on Acupuncture Pseudoscience in the New England Journal of Medicine (which by the way is just a review). A publication in the NEJM doesn’t guarantee it isn’t rubbish.
  5. Importantly, the NEJM has the highest proportion of trials (RCTs) with sole industry support (35% compared to 7% in the BMJ) [12] . On several occasions I have discussed these conflicts of interests and their impact on the outcome of studies ([13, 14; see also [15,16] In their study, Gøtzsche and his colleagues from the Nordic Cochrane Centre [12] also showed that industry-supported trials were more frequently cited than trials with other types of support, and that omitting them from the impact factor calculation decreased journal impact factors. The impact factor decrease was even 15% for NEJM (versus 1% for BMJ in 2007)! For the journals who provided data, income from the sales of reprints contributed to 3% and 41% of the total income for BMJ and The Lancet.
    A recent study, co-authored by Ben Goldacre (MD & science writer) [17] confirms that  funding by the pharmaceutical industry is associated with high numbers of reprint ordersAgain only the BMJ and the Lancet provided all necessary data.
  6. Finally and most relevant to the topic is a study [18], also discussed at Retractionwatch[19], showing that  articles in journals with higher impact factors are more likely to be retracted and surprise surprise, the NEJM clearly stands on top. Although other reasons like higher readership and scrutiny may also play a role [20], it conflicts with Katan’s idea that  “publishing in the NEJM is the best guarantee something is true”.

I wasn’t aware of the latter study and would like to thank drVes and Ivan Oranski for responding to my crowdsourcing at Twitter.

References

  1. Sugary Drinks as the Culprit in Childhood Obesity? a RCT among Primary School Children (laikaspoetnik.wordpress.com)
  2. de Ruyter JC, Olthof MR, Seidell JC, & Katan MB (2012). A trial of sugar-free or sugar-sweetened beverages and body weight in children. The New England journal of medicine, 367 (15), 1397-406 PMID: 22998340
  3. NRC Wim Köhler Eén kilo lichter.NRC | Zaterdag 22-09-2012 (http://archief.nrc.nl/)
  4. Martijn Katan. Col hors Catégorie [Dutch], (published in de NRC,  (20 oktober)(www.mkatan.nl)
  5. Hans van Maanen. Suiker uit fris, De Volkskrant, 29 september 2012 (freely accessible at http://www.vanmaanen.org/)
  6. Ioannidis, J. (2005). Why Most Published Research Findings Are False PLoS Medicine, 2 (8) DOI: 10.1371/journal.pmed.0020124
  7. Changes to the protocol http://clinicaltrials.gov/archive/NCT00893529/2011_02_24/changes
  8. Publication Bias. The Cochrane Collaboration open learning material (www.cochrane-net.org)
  9. de Ruyter JC, Olthof MR, Kuijper LD, & Katan MB (2012). Effect of sugar-sweetened beverages on body weight in children: design and baseline characteristics of the Double-blind, Randomized INtervention study in Kids. Contemporary clinical trials, 33 (1), 247-57 PMID: 22056980
  10. Pereira, T., Horwitz, R.I., & Ioannidis, J.P.A. (2012). Empirical Evaluation of Very Large Treatment Effects of Medical InterventionsEvaluation of Very Large Treatment Effects JAMA: The Journal of the American Medical Association, 308 (16) DOI: 10.1001/jama.2012.13444
  11. Acupuncture Pseudoscience in the New England Journal of Medicine (sciencebasedmedicine.org)
  12. Lundh, A., Barbateskovic, M., Hróbjartsson, A., & Gøtzsche, P. (2010). Conflicts of Interest at Medical Journals: The Influence of Industry-Supported Randomised Trials on Journal Impact Factors and Revenue – Cohort Study PLoS Medicine, 7 (10) DOI: 10.1371/journal.pmed.1000354
  13. One Third of the Clinical Cancer Studies Report Conflict of Interest (laikaspoetnik.wordpress.com)
  14. Merck’s Ghostwriters, Haunted Papers and Fake Elsevier Journals (laikaspoetnik.wordpress.com)
  15. Lexchin, J. (2003). Pharmaceutical industry sponsorship and research outcome and quality: systematic review BMJ, 326 (7400), 1167-1170 DOI: 10.1136/bmj.326.7400.1167
  16. Smith R (2005). Medical journals are an extension of the marketing arm of pharmaceutical companies. PLoS medicine, 2 (5) PMID: 15916457 (free full text at PLOS)
  17. Handel, A., Patel, S., Pakpoor, J., Ebers, G., Goldacre, B., & Ramagopalan, S. (2012). High reprint orders in medical journals and pharmaceutical industry funding: case-control study BMJ, 344 (jun28 1) DOI: 10.1136/bmj.e4212
  18. Fang, F., & Casadevall, A. (2011). Retracted Science and the Retraction Index Infection and Immunity, 79 (10), 3855-3859 DOI: 10.1128/IAI.05661-11
  19. Is it time for a Retraction Index? (retractionwatch.wordpress.com)
  20. Agrawal A, & Sharma A (2012). Likelihood of false-positive results in high-impact journals publishing groundbreaking research. Infection and immunity, 80 (3) PMID: 22338040

——————————————–

* Addendum: my (unpublished) letter to the NRC

Tour de France.
Nadat het NRC eerder 2 pagina’ s de loftrompet over Katan’s nieuwe studie had afgestoken, vond Katan het nodig om dit in zijn eigen column dunnetjes over te doen. Verwijzen naar je eigen werk mag, ook in een column, maar dan moeten wij daar als lezer wel wijzer van worden. Wat is nu de boodschap van dit stuk “Col hors Catégorie“? Het beschrijft vooral de lange weg om een wetenschappelijke studie gepubliceerd te krijgen in een toptijdschrift, in dit geval de New England Journal of Medicine (NEJM), “de Tour de France onder de medische tijdschriften”. Het stuk eindigt met een tackle naar een journalist “die dacht dat hij het beter wist”. Maar ach, wat geeft dat als de hele wereld staat te jubelen? Erg onsportief, omdat die journalist (van Maanen, Volkskrant) wel degelijk op een aantal punten scoorde. Ook op Katan’s kernpunt dat een NEJM-publicatie “de beste garantie is dat iets waar is” valt veel af te dingen. De NEJM heeft inderdaad een hoge impactfactor, een maat voor hoe vaak artikelen geciteerd worden. De NEJM heeft echter ook de hoogste ‘artikelterugtrekkings’ index. Tevens heeft de NEJM het hoogste percentage door de industrie gesponsorde klinische trials, die de totale impactfactor opkrikken. Daarnaast gaan toptijdschriften vooral voor “positieve resultaten” en “trendy onderwerpen”, wat publicatiebias in de hand werkt. Als we de vergelijking met de Tour de France doortrekken: het volbrengen van deze prestigieuze wedstrijd garandeert nog niet dat deelnemers geen verboden middelen gebruikt hebben. Ondanks de strenge dopingcontroles.




#EAHIL2012 CEC 2: Visibility & Impact – Library’s New Role to Enhance Visibility of Researchers

4 07 2012

This week I’m blogging at (and mostly about) the 13th EAHIL conference in Brussels. EAHIL stands for European Association for Health Information and Libraries.

The second Continuing Education Course (CEC) I followed was given by Tiina Heino and Katri Larmo of the Terkko Meilahti Campus Library at the University of Helsinki in Finland.

The full title of the course was Visibility and impact – library’s new role: How the library can support the researcher to get visibility and generate impact to researcher’s work. You can read the abstract here.

The hands-on workshop mainly concentrated on the social bookmarking sites ConnoteaMendeley and Altmetric.

Furthermore we got information on CiteULike, ORCID,  Faculty of 1000 Posters and Pinterest. Also services developed in Terkko, such as ScholarChart and TopCited Articles, were shortly demonstrated.

What I especially liked in the hands on session is that the tutors had prepared a wikispace with all the information and links on the main page ( https://visibility2012.wikispaces.com) and a separate page for each participant to edit (here is my page). You could add links to your created accounts and embed widgets for Mendeley.

There was sufficient time to practice and try the tools. And despite the great number of participants there was ample room for questions (& even for making a blog draft ;)).

The main message of the tutors is that the process of publishing scientific research doesn’t end at publishing the article: it is equally important what happens after the research has been published. Visibility and impact in the scientific community and in the society are  crucial  for making the research go forward as well as for getting research funding and promoting the researcher’s career. The Fig below (taken from the presentation) visualizes this process.

The tutors discussed ORCID, Open Researcher and contributor ID, that will be introduced later this year. It is meant to solve the author name ambiguity problem in scholarly communication by central registry of unique identifiers for each author (because author names can’t be used to reliably identify all scholarly author). It will be possible for authors to create, manage and share their ORCID record without membership fee. For further information see several publications and presentations by Martin Fenner. I found this one during the course while browsing Mendeley.

Once published the author’s work can be promoted using bookmarking tools, like CiteULike, Connotea and Mendeley. You can easily registrate for Connotea and Mendeley using your Facebook account. These social bookmarking tools are also useful for networking, i.e. to discover individuals and groups with the same field of interest. It is easy to synchronize your Mendeley with your CiteULike account.

Mendeley is available in a desktop and a web version. The web version offers a public profile for researchers, a catalog of documents, and collaborative groups (the cloud of Mendeley). The desktop version of Mendeley is specially suited for reference management and organizing your PDF’s. That said Mendeley seems most suitable for serendipitous use (clicking and importing a reference you happen to see and like) and less useful for managing and deduplicating large numbers of records, i.e. for a systematic review.
Also (during the course) it was not possible to import several PubMed records at once in either CiteULike or Mendeley.

What stroke me when I tried Mendeley is that there were many small or dead groups. A search for “cochrane”  for instance yielded one large group Cochrane QES Register, owned by Andrew Booth, and 3 groups with one member (thus not really a group), with 0 (!) to 6 papers each! It looks like people are trying Mendeley and other tools just for a short while. Indeed, most papers I looked up in PubMed were not bookmarked at all. It makes you wonder how widespread the use of these bookmarking tools is. It probably doesn’t help that there are so many tools with different purposes and possibilities.

Another tool that we tried was Altmetric. This is a free bookmarklet on scholarly articles which allows you to track the conversations around scientific articles online. It shows the tweets, blogposts, Google+ and Facebook mentions, and the numbers of bookmarks on Mendeley, CiteULike and Connotea.

I tried the tool on a paper I blogged about , ie. Seventy-Five Trials and Eleven Systematic Reviews a Day: How Will We Ever Keep Up?

The bookmarklet showed the tweets and the blogposts mentioning the paper.

Indeed altmetrics did correctly refer to my blog (even to 2 posts).

I liked altmetrics*, but saying that it is suitable for scientific metrics is a step too far. For people interested in this topic I would like to refer -again- to a post of Martin Fenner on altmetrics (in general).  He stresses that “usage metrics”  has its limitations because of its proness  to “gaming” (cheating).

But the current workshop didn’t address the shortcomings of the tools, for it was meant as a first practical acquaintance with the web 2.0 tools.

For the other tools (Faculty of 1000 Posters, Pinterest) and the services developed in Terkko, such as ScholarChart and TopCited Articles,  see the wikipage and the presentation:

*Coincidentally I’m preparing a post on handy chrome extensions to look for tweets about a webpage. Altmetric is another tool which seems very suitable for this purpose

Related articles





What One Short Night’s Sleep does to your Glucose Metabolism

11 05 2010

ResearchBlogging.orgAs a blogger I regularly sleep 3-5 hours just to finish a post. I know that this has its effects on how I feel the next day. I also know short nights don’t promote my clear-headedness and I also recognize short-term effects on  memory, cognitive functions, reaction time and mood (irritability), as depicted in the picture below. But I had no idea of any effect on heart disease, obesity and risk of diabetes type 2.

Indeed, short sleep duration is consistently associated with the development of obesity and diabetes in observational studies (see several recent systematic reviews, 3-5). However, as explained before, an observational design cannot establish causality. For instance, diabetes type 2 may be the consequence of other lifestyle aspects of people who spend little time sleeping, or sleep problems might be a consequence rather than a cause of diabetogenic changes.

Diabetes is basically a condition characterized by difficulties processing carbohydrates (sugars, glucose). Type 2 diabetes has a slow onset. First there is a gradual defect in the body’s ability to use insulin. This is called insulin resistance. Insulin is a pancreatic hormone that increases glucose utilization in skeletal muscle and fat tissue and suppresses glucose production by the liver, thereby lowering blood glucose levels.  Over time, damage may occur to the insulin-producing cells in the pancreas (type 2 diabetes),  which may ultimately progress to the point where the pancreas doesn’t make enough insulin and injections are needed. (source: about.com).

Since it is such a slow process one would not expect insulin resistance to change overnight. And certainly not by just partial sleep deprivation of 4-5 hrs of sleep.

Still, this is the outcome of a study, performed by the PhD student Esther Donga. Esther belongs to the study group of Romijn who also studied the previously summarized effects of previous cortisol excess on cognitive functions in Cushing’s disease .

Donga et al. have studied the effects of one night of sleep restriction on insulin sensitivity in 9 healthy lean individuals [1] and in 7 patients with type 1 diabetes [2]. The outcomes were practically the same, but since the results in healthy individuals (having no problems with glucose metabolism, weight or sleep) are most remarkable, I will confine myself to the study in healthy people.

The study design is relatively simple. Five men and four healthy women (mean age 45 years) with a lean body weight and normal  sleep pattern participated in the study. They were not using medication affecting sleep or glucose metabolism and were asked to adhere to their normal lifestyle pattern during the study.

There were 3 study days, separated by intervals of at least 3 weeks. The volunteers were admitted to the clinical research center the night before each study day to become accustomed to sleeping there. They fasted throughout these nights and spent 8.5 h in bed.  The subjects were randomly assigned to sleep deprivation on either the second or third occasion. Then they were only allowed to sleep from 1 am to 4 am to secure equal compression of both non-REM and REM sleep stages.

(skip blue paragraphs if you are not interested in the details)

Effects on insulin sensitivity were determined on the day after the second and third night (one normal and one short night sleep) by the gold standard for quantifying insulin resistance: the hyperinsulinemic euglycemic clamp method. This method uses catheters to infuse insulin and glucose into the bloodstream. Insulin is infused to get a steady state of insulin in the blood and the insulin sensitivity is determined by measuring the amount of glucose necessary to compensate for an increased insulin level without causing hypoglycemia (low blood sugar). (see Figure below, and a more elaborate description at Diabetesmanager (pbworks).

Prior to beginning the hyperinsulinemic period, basal blood samples were taken and labeled [6,6-2H2]glucose was infused  for assessment of glucose kinetics in the basal state. At different time-points concentrations of glucose, insulin, and plasma nonesterified fatty acids (NEFA) were measured.

The sleep stages were differently affected  by the curtailed sleep duration: the proportion of the stage III and stage II sleep were greater (P < 0.007), respectively smaller (P < 0.006) in the sleep deprived night.

Partial sleep deprivation did not alter basal levels of glucose, nonesterified fatty acids (NEFA), insulin, glucagon, or cortisol measured the following morning, nor did it affect basal endogenous glucose production.

However, during the CLAMP-procedure there were significant alterations on the following parameters:

  • Endogenous glucose production – increase of approximately 22% (p< 0.017), indicating hepatic insulin resistance.
  • Rate of Glucose Disposal - decrease by approximately 20% (p< 0.009), indicating decreased peripheral insulin sensitivity.
  • Glucose infusion rate – approximately 25% lower after the night of reduced sleep duration (p< 0.001). This is in agreement with the above findings: less extra glucose needed to maintain plasma glucose levels.
  • NEFA – increased by 19% (p< 0.005), indicating decreased insulin sensitivity of lipolysis (breakdown of triglyceride lipids- into free fatty acids).

The main novelty of the present study is the finding that one single night of shortened sleep is sufficient to reduce insulin sensitivity (of different metabolic pathways) in healthy men and women.

This is in agreement with the evidence of observational studies showing an association between sleep deprivation and obesity/insulin resistance/diabetes (3-5). It also extends results from previous experimental studies (summarized in the paper), that document the effects on glucose-resistance after multiple nights of sleep reduction (of 4h) or total sleep deprivation.

The authors speculate that the negative effects of multiple nights of partial sleep restriction on glucose tolerance can be reproduced, at least in part, by only a single night of sleep deprivation.

And the media conclude:

  • just one night of short sleep duration can induce insulin resistance, a component of type 2 diabetes (Science Daily)
  • healthy people who had just one night of short sleep can show signs of insulin resistance, a condition that often precedes Type 2 diabetes. (Medical News Today)
  • even a single of night of sleep deprivation can cause the body to show signs of insulin resistance, a warning sign of diabetes (CBS-news)
  • And this was of course the message that catched my eye in the first place: “Gee, one night of bad sleep, can already disturb your glucose metabolism in such a way that you arrive at the first stage of diabetes: insulin resistance!…Help!”

    First “insulin resistance” calls up another association than “partial insulin resistance” or a “somewhat lower insulin sensitivity” (as demonstrated in this study).  We interpret insulin resistance as a disorder that will eventually lead to diabetes, but perhaps adaptations in insulin sensitivity are just a normal phenomenon, a way to cope with normal fluctuations in exercise, diet and sleep. Or a consequence of other adaptive processes, like changes  in the activity of the autonomous nervous system in response to a short sleep duration.

    Just as blood lipids will be high after a lavish dinner, or even after a piece of chocolate. And just as blood-cortisol will raise in case of exercise, inflammation or stress. That is normal homeostasis. In this way the body adapts to changing conditions.

    Similarly -and it is a mere coincidence that I saw the post of Neuroskeptic about this study today- an increase of blood cortisol levels in children when ‘dropped’ at daycare, doesn’t mean that this small increase in cortisol is bad for them. And it certainly doesn’t mean that you should avoid putting toddlers in daycare as Oliver James concludes, because “high cortisol has been shown many times to be a correlate of all manner of problems”. As neuroskeptic explains:

    Our bodies release cortisol to mobilize us for pretty much any kind of action. Physical exercise, which of course is good for you in pretty much every possible way, cause cortisol release. This is why cortisol spikes every day when you wake up: it helps give you the energy to get out of bed and brush your teeth. Maybe the kids in daycare were just more likely to be doing stuff than before they enrolled.

    Extremely high levels of cortisol over a long period certainly do cause plenty of symptoms including memory and mood problems, probably linked to changes in the hippocampus. And moderately elevated levels are correlated with depression etc, although it’s not clear that they cause it. But a rise from 0.3 to 0.4 is much lower than the kind of values we’re talking about there.

    So the same may be true for a small temporary decrease in glucose sensitivity. Of course insulin resistance can be a bad thing, if blood sugars stay elevated. And it is conceivable that bad sleep habits contribute to this (certainly when combined with the use of much alcohol and eating junk food).

    What is remarkable (and not discussed by the authors) is that the changes in sensitivity were only “obvious” (by eyeballing) in 3-4 volunteers in all 4 tests. Was the insulin resistance unaffected in the same persons in all 4 tests or was the variation just randomly distributed? This could mean that not all persons are equally sensitive.

    It should be noted that the authors themselves remain rather reserved about the consequences of their findings for normal individuals. They conclude “This physiological observation may be of relevance for variations in glucoregulation in patients with type 1 and type 2 diabetes” and suggest that  “interventions aimed at optimization of sleep duration may be beneficial in stabilizing glucose levels in patients with diabetes.”
    Of course, their second article in diabetic persons[2], rather warrants this conclusion. Their specific advise is not directly relevant to healthy individuals.

    Credits

    References

    1. Donga E, van Dijk M, van Dijk JG, Biermasz NR, Lammers GJ, van Kralingen KW, Corssmit EP, & Romijn JA (2010). A Single Night of Partial Sleep Deprivation Induces Insulin Resistance in Multiple Metabolic Pathways in Healthy Subjects. The Journal of clinical endocrinology and metabolism PMID: 20371664
    2. Donga E, van Dijk M, van Dijk JG, Biermasz NR, Lammers GJ, van Kralingen K, Hoogma RP, Corssmit EP, & Romijn JA (2010). Partial sleep restriction decreases insulin sensitivity in type 1 diabetes. Diabetes care PMID: 2035738
    3. Nielsen LS, Danielsen KV, & Sørensen TI (2010). Short sleep duration as a possible cause of obesity: critical analysis of the epidemiological evidence. Obesity reviews : an official journal of the International Association for the Study of Obesity PMID: 20345429
    4. Monasta L, Batty GD, Cattaneo A, Lutje V, Ronfani L, van Lenthe FJ, & Brug J (2010). Early-life determinants of overweight and obesity: a review of systematic reviews. Obesity reviews : an official journal of the International Association for the Study of Obesity PMID: 20331509
    5. Cappuccio FP, D’Elia L, Strazzullo P, & Miller MA (2010). Quantity and quality of sleep and incidence of type 2 diabetes: a systematic review and meta-analysis. Diabetes care, 33 (2), 414-20 PMID: 19910503
    The subjects were studied on 3 d, separated by intervals of at
    least 3 wk. Subjects kept a detailed diary of their diet and physical
    activity for 3 d before each study day and were asked to maintain
    a standardized schedule of bedtimes and mealtimes in accordance
    with their usual habits. They were admitted to our clinical
    research center the night before each study day, and spent 8.5 h
    in bed from 2300 to 0730 h on all three occasions. Subjects fasted
    throughout these nights from 2200 h. The first study day was
    included to let the subjects become accustomed to sleeping in our
    clinical research center. Subjects were randomly assigned to sleep
    deprivation on either the second (n4) or third (n5) occasion.
    During the night of sleep restriction, subjects spent 8.5 h in
    bed but were only allowed to sleep from 0100 to 0500 h. They
    were allowed to read or watch movies in an upward position
    during the awake hours, and their wakefulness was monitored
    and assured if necessary.
    The rationale for essentially broken sleep deprivation from
    2300 to 0100 h and from 0500 to 0730 h, as opposed to sleep
    deprivation from 2300 to 0300 h or from 0300 to 0730 h, was
    that in both conditions, the time in bed was centered at the same
    time, i.e. approximately 0300 h. Slow-wave sleep (i.e. stage III of
    non-REM sleep) is thought to play the most important role in
    metabolic, hormonal, and neurophysiological changes during
    sleep. Slow-wave sleep mainly occurs during the first part of the
    night, whereas REM sleep predominantly occurs during the latter
    part of the night (12). We used broken sleep deprivation to
    achieve a more equal compression of both non-REM and REM
    sleep stages. Moreover, we used the same experimental conditions
    for partial sleep deprivation as previously used in other
    studies (7, 13) to enable comparison of the results.




    Friday Foolery #20 What is in an element’s name?

    19 03 2010

    You probably know the periodic table of elements. The  table contains 118 confirmed elements, from 1 (H, hydrogen) to 118 (Uuo, Ununoctium).

    In Wikipedia. you have a nice large periodic table with chemical symbols, that link to the Wikipedia pages on the individual elements (left).

    As a chemist, David Bradley at Sciencebase must have been bored with it, because he designed an unusual version of the periodic table, where the chemical symbols will take you to his various accounts online rather than information about a given chemical. Quite a few elements remained and he invited other research bloggers to claim an element if your or your blog’s name fit in terms of initial letters. David started this morning and in about a few hours almost the entire table was filled.

    I claimed Li (my surname), but that was already taken by David’s Linkedin account and he suggested that I should take La of Laikas. La is Lathanum.

    Of course this can be hilarious. I tweeted to Andrew Spong that he would surely fit As (Arsenicum) -poisonous as you may know- and he replied he would rather choose absinth, which unfortunately isn’t an element.

    There are still a few elements left. Thus if you would like your site highlighted as an element, let David know via Twitter, give him the link to your blog and an appropriate element.

    This is how the table looks. You can go to the table here (with real links).
    The original post is here

    And if you don’t particularly care about this table, perhaps the following adaptation suits you better. It is still available via Amazon (click on the Figure).

    This table was also found on David’s blog ( see here)





    #NotSoFunny #16 – Ridiculing RCTs & EBM

    1 02 2010

    I remember it well. As a young researcher I presented my findings in one of my first talks, at the end of which the chair killed my work with a remark, that made the whole room of scientists laugh, but was really beside the point. My supervisor, a truly original and very wise scientist, suppressed his anger. Afterwards, he said: “it is very easy ridiculing something that isn’t a mainstream thought. It’s the argument that counts. We will prove that we are right.” …And we did.

    This was not my only encounter with scientists who try to win the debate by making fun of a theory, a finding or …people. But it is not only the witty scientist who is to *blame*, it is also the uncritical audience that just swallows it.

    I have similar feelings with some journal articles or blog posts that try to ridicule EBM – or any other theory or approach. Funny, perhaps, but often misunderstood and misused by “the audience”.

    Take for instance the well known spoof article in the BMJ:

    “Parachute use to prevent death and major trauma related to gravitational challenge: systematic review of randomised controlled trials”

    It is one of those Christmas spoof articles in the BMJ, meant to inject some medical humor into the normally serious scientific literature. The spoof parachute article pretends to be a Systematic Review of RCT’s  investigating if parachutes can prevent death and major trauma. Of course, no such trial has been done or will be done: dropping people at random with and without a parachute to proof that you better jump out of a plane with a parachute.

    I found the article only mildly amusing. It is so unrealistic, that it becomes absurd. Not that I don’t enjoy absurdities at times, but  absurdities should not assume a live of their own.  In this way it doesn’t evoke a true discussion, but only worsens the prejudice some people already have.

    People keep referring to this 2003 article. Last Friday, Dr. Val (with whom I mostly agree) devoted a Friday Funny post to it at Get Better Health: “The Friday Funny: Why Evidence-Based Medicine Is Not The Whole Story”.* In 2008 the paper was also discussed by Not Totally Rad [3]. That EBM is not the whole story seems pretty obvious to me. It was never meant to be…

    But lets get specific. Which assumptions about RCT’s and SR’s are wrong, twisted or put out of context? Please read the excellent comments below the article. These often put the finger on the spot.

    1. EBM is cookbook medicine.
    Many define EBM as “make clinical decisions based on a synthesis of the best available evidence about a treatment.” (i.e. [3]). However, EBM is not cookbook medicine.

    The accepted definition of EBM  is “the conscientious, explicit and judicious use of current best evidence in making decisions about the care of individual patients” [4]. Sacket already emphasized back in 1996:

    Good doctors use both individual clinical expertise and the best available external evidence, and neither alone is enough. Without clinical expertise, practice risks becoming tyrannised by evidence, for even excellent external evidence may be inapplicable to or inappropriate for an individual patient. Without current best evidence, practice risks becoming rapidly out of date, to the detriment of patients.


    2. RCT’s are required for evidence.

    Although a well performed RCT provides the “best” evidence, RCT’s are often not appropriate or indicated. That is especially true for domains other than therapy. In case of prognostic questions the most appropriate study design is usually an inception cohort. A RCT for instance can’t tell whether female age is a prognostic factor for clinical pregnancy rates following IVF: there is no way to randomize for “age”, or for “BMI”. ;)

    The same is true for etiologic or harm questions. In theory, the “best” answer is obtained by RCT. However RCT’s are often unethical or unnecessary. RCT’s are out of the question to address whether substance X causes cancer. Observational studies will do. Sometimes cases provide sufficient evidence. If a woman gets hepatic veno-occlusive disease after drinking loads of a herbal tea the finding of  similar cases in the literature may be sufficient to conclude that the herbal tea probably caused the disease.

    Diagnostic accuracy studies also require another study design (cross-sectional study, or cohort).

    But even in the case of  interventions, we can settle for less than a RCT. Evidence is not present or not, but exists on a hierarchy. RCT’s (if well performed) are the most robust, but if not available we have to rely on “lower” evidence.

    BMJ Clinical Evidence even made a list of clinical questions unlikely to be answered by RCT’s. In this case Clinical Evidence searches and includes the best appropriate form of evidence.

    1. where there are good reasons to think the intervention is not likely to be beneficial or is likely to be harmful;
    2. where the outcome is very rare (e.g. a 1/10000 fatal adverse reaction);
    3. where the condition is very rare;
    4. where very long follow up is required (e.g. does drinking milk in adolescence prevent fractures in old age?);
    5. where the evidence of benefit from observational studies is overwhelming (e.g. oxygen for acute asthma attacks);
    6. when applying the evidence to real clinical situations (external validity);
    7. where current practice is very resistant to change and/or patients would not be willing to take the control or active treatment;
    8. where the unit of randomisation would have to be too large (e.g. a nationwide public health campaign); and
    9. where the condition is acute and requires immediate treatment.
      Of these, only the first case is categorical. For the rest the cut off point when an RCT is not appropriate is not precisely defined.

    Informed health decisions should be based on good science rather than EBM (alone).

    Dr Val [2]: “EBM has been an over-reliance on “methodolatry” – resulting in conclusions made without consideration of prior probability, laws of physics, or plain common sense. (….) Which is why Steve Novella and the Science Based Medicine team have proposed that our quest for reliable information (upon which to make informed health decisions) should be based on good science rather than EBM alone.

    Methodolatry is the profane worship of the randomized clinical trial as the only valid method of investigation. This is disproved in the previous sections.

    The name “Science Based Medicine” suggests that it is opposed to “Evidence Based Medicine”. At their blog David Gorski explains: “We at SBM believe that medicine based on science is the best medicine and tirelessly promote science-based medicine through discussion of the role of science and medicine.”

    While this may apply to a certain extent to quack or homeopathy (the focus of SBM) there are many examples of the opposite: that science or common sense led to interventions that were ineffective or even damaging, including:

    As a matter of fact many side-effects are not foreseen and few in vitro or animal experiments have led to successful new treatments.

    At the end it is most relevant to the patient that “it works” (and the benefits outweigh the harms).

    Furthermore EBM is not -or should not be- without consideration of prior probability, laws of physics, or plain common sense. To me SBM and EBM are not mutually exclusive.

    Why the example is bullshit unfair and unrealistic

    I’ll leave it to the following comments (and yes the choice is biased) [1]

    Nibu A George,Scientist :

    First of all generalizing such reports of some selected cases and making it a universal truth is unhealthy and challenging the entire scientific community. Secondly, the comparing the parachute scenario with a pure medical situation is unacceptable since the parachute jump is rather a physical situation and it become a medical situation only if the jump caused any physical harm to the person involved.

    Richard A. Davidson, MD,MPH:

    This weak attempt at humor unfortunately reinforces one of the major negative stereotypes about EBM….that RCT’s are required for evidence, and that observational studies are worthless. If only 10% of the therapies that are paraded in front of us by journals were as effective as parachutes, we would have much less need for EBM. The efficacy of most of our current therapies are only mildly successful. In fact, many therapies can provide only a 25% or less therapeutic improvement. If parachutes were that effective, nobody would use them.
    While it’s easy enough to just chalk this one up to the cliche of the cantankerous British clinician, it shows a tremendous lack of insight about what EBM is and does. Even worse, it’s just not funny.

    Aviel Roy-Shapira, Senior Staff Surgeon

    Smith and Pell succeeded in amusing me, but I think their spoof reflects a common misconception about evidence based medicine. All too many practitioners equate EBM with randomized controlled trials, and metaanalyses.
    EBM is about what is accepted as evidence, not about how the evidence is obtained. For example, an RCT which shows that a given drug lowers blood pressure in patients with mild hypertension, however well designed and executed, is not acceptable as a basis for treatment decisions. One has to show that the drug actually lowers the incidence of strokes and heart attacks.
    RCT’s are needed only when the outcome is not obvious. If most people who fall from airplanes without a parachute die, this is good enough. There is plenty of evidence for that.

    EBM is about using outcome data for making therapeutic decisions. That data can come from RCTs but also from observation

    Lee A. Green, Associate Professor

    EBM is not RCTs. That’s probably worth repeating several times, because so often both EBM’s detractors and some of its advocates just don’t get it. Evidence is not binary, present or not, but exists on a heirarchy (Guyatt & Rennie, 2001). (….)
    The methods and rigor of EBM are nothing more or less than ways of correcting for our
    imperfect perceptions of our experiences. We prefer, cognitively, to perceive causal connections. We even perceive such connections where they do not exist, and we do so reliably and reproducibly under well-known sets of circumstances. RCTs aren’t holy writ, they’re simply a tool for filtering out our natural human biases in judgment and causal attribution. Whether it’s necessary to use that tool depends upon the likelihood of such bias occurring.

    Scott D Ramsey, Associate Professor

    Parachutes may be a no-brainer, but this article is brainless.

    Unfortunately, there are few if any parallels to parachutes in health care. The danger with this type of article is that it can lead to labeling certain medical technologies as “parachutes” when in fact they are not. I’ve already seen this exact analogy used for a recent medical technology (lung volume reduction surgery for severe emphysema). In uncontrolled studies, it quite literally looked like everyone who didn’t die got better. When a high quality randomized controlled trial was done, the treatment turned out to have significant morbidity and mortality and a much more modest benefit than was originally hypothesized.

    Timothy R. Church, Professor

    On one level, this is a funny article. I chuckled when I first read it. On reflection, however, I thought “Well, maybe not,” because a lot of people have died based on physicians’ arrogance about their ability to judge the efficacy of a treatment based on theory and uncontrolled observation.

    Several high profile medical procedures that were “obviously” effective have been shown by randomized trials to be (oops) killing people when compared to placebo. For starters to a long list of such failed therapies, look at antiarrhythmics for post-MI arrhythmias, prophylaxis for T. gondii in HIV infection, and endarterectomy for carotid stenosis; all were proven to be harmful rather than helpful in randomized trials, and in the face of widespread opposition to even testing them against no treatment. In theory they “had to work.” But didn’t.

    But what the heck, let’s play along. Suppose we had never seen a parachute before. Someone proposes one and we agree it’s a good idea, but how to test it out? Human trials sound good. But what’s the question? It is not, as the author would have you believe, whether to jump out of the plane without a parachute or with one, but rather stay in the plane or jump with a parachute. No one was voluntarily jumping out of planes prior to the invention of the parachute, so it wasn’t to prevent a health threat, but rather to facilitate a rapid exit from a nonviable plane.

    Another weakness in this straw-man argument is that the physics of the parachute are clear and experimentally verifiable without involving humans, but I don’t think the authors would ever suggest that human physiology and pathology in the face of medication, radiation, or surgical intervention is ever quite as clear and predictable, or that non-human experience (whether observational or experimental) would ever suffice.

    The author offers as an alternative to evidence-based methods the “common sense” method, which is really the “trust me, I’m a doctor” method. That’s not worked out so well in many high profile cases (see above, plus note the recent finding that expensive, profitable angioplasty and coronary artery by-pass grafts are no better than simple medical treatment of arteriosclerosis). And these are just the ones for which careful scientists have been able to do randomized trials. Most of our accepted therapies never have been subjected to such scrutiny, but it is breathtaking how frequently such scrutiny reveals problems.

    Thanks, but I’ll stick with scientifically proven remedies.

    parachute experiments without humans

    * on the same day as I posted Friday Foolery #15: The Man who pioneered the RCT. What a coincidence.

    ** Don’t forget to read the comments to the article. They are often excellent.

    Photo Credits

    ReferencesResearchBlogging.org

    1. Smith, G. (2003). Parachute use to prevent death and major trauma related to gravitational challenge: systematic review of randomised controlled trials BMJ, 327 (7429), 1459-1461 DOI: 10.1136/bmj.327.7429.1459
    2. The Friday Funny: Why Evidence-Based Medicine Is Not The Whole Story”. (getbetterhealth.com) [2010.01.29]
    3. Call for randomized clinical trials of Parachutes (nottotallyrad.blogspot.com) [08-2008]
    4. Sackett DL, Rosenberg WM, Gray JA, Haynes RB, & Richardson WS (1996). Evidence based medicine: what it is and what it isn’t. BMJ (Clinical research ed.), 312 (7023), 71-2 PMID: 8555924
    Reblog this post [with Zemanta]
    are very well edged off







    Follow

    Get every new post delivered to your Inbox.

    Join 611 other followers