Of Mice and Men Again: New Genomic Study Helps Explain why Mouse Models of Acute Inflammation do not Work in Men

25 02 2013

ResearchBlogging.org

This post is update after a discussion at Twitter with @animalevidence who pointed me at a great blog post at Speaking of Research ([19], a repost of [20], highlighting the shortcomings of the current study using just one single inbred strain of mice (C57Bl6)  [2013-02-26]. Main changes are in blue

A recent paper published in PNAS [1] caused quite a stir both inside and outside the scientific community. The study challenges the validity of using mouse models to test what works as a treatment in humans. At least this is what many online news sources seem to conclude: “drug testing may be a waste of time”[2], “we are not mice” [3, 4], or a bit more to the point: mouse models of inflammation are worthless [5, 6, 7].

But basically the current study looks only at one specific area, the area of inflammatory responses that occur in critically ill patients after severe trauma and burns (SIRS, Systemic Inflammatory Response Syndrome). In these patients a storm of events may eventually lead to organ failure and death. It is similar to what may occur after sepsis (but here the cause is a systemic infection).

Furthermore the study only uses one single approach: it compares the gene response patterns in serious human injuries (burns, trauma) and a human model partially mimicking these inflammatory diseases (human healthy volunteers receiving  a low dose endotoxin) with the corresponding three animal models (burns, trauma, endotoxin).

And, as highlighted by Bill Barrington of “Understand Nutrition” [8], the researchers have only tested the gene profiles in one single strain of mice: C57Bl6 (B6 for short). If B6 was the only model used in practice this would be less of a problem. But according to Mark Wanner of the Jackson Laboratory [19, 20]:

 It is now well known that some inbred mouse strains, such as the C57BL/6J (B6 for short) strain used, are resistant to septic shock. Other strains, such as BALB and A/J, are much more susceptible, however. So use of a single strain will not provide representative results.

The results in itself are very clear. The figures show at a glance that there is no correlation whatsoever between the human and B6 mouse expression data.

Seok and 36 other researchers from across the USA  looked at approximately 5500 human genes and their mouse analogs. In humans, burns and traumatic injuries (and to a certain extent the human endotoxin model) triggered the activation of a vast number of genes, that were not triggered in the present C57Bl6 mouse models. In addition the genomic response is longer lasting in human injuries. Furthermore, the top 5 most activated and most suppressed pathways in human burns and trauma had no correlates in mice. Finally, analysis of existing data in the Gene Expression (GEO) Database showed that the lack of correlation between mouse and human studies was also true for other acute inflammatory responses, like sepsis and acute infection.

This is a high quality study with interesting results. However, the results are not as groundbreaking as some media suggest.

As discussed by the authors [1], mice are known to be far more resilient to inflammatory challenge than humans*: a million fold higher dose of endotoxin than the dose causing shock in humans is lethal to mice.* This, and the fact that “none of the 150  candidate agents that progressed to human trials has proved successful in critically ill patients” already indicates that the current approach fails.

[This is not entirely correct the endotoxin/LPS dose in mice is 1000–10,000 times the dose required to induce severe disease with shock in humans [20] and mice that are resilient to endotoxin may still be susceptible to infection. It may well be that the endotoxin response is not a good model for the late effects of  sepsis]

The disappointing trial results have forced many researchers to question not only the usefulness of the current mouse models for acute inflammation [9,10; refs from 11], but also to rethink the key aspects of the human response itself and the way these clinical trials are performed [12, 13, 14]. For instance, emphasis has always been on the exuberant inflammatory reaction, but the subsequent immunosuppression may also be a major contributor to the disease. There is also substantial heterogeneity among patients [13-14] that may explain why some patients have a good prognosis and others haven’t. And some of the initially positive results in human trials have not been reproduced in later studies either (benefit of intense glucose control and corticosteroid treatment) [12]. Thus is it fair to blame only the mouse studies?

dick mouse

dick mouse (Photo credit: Wikipedia)

The coverage by some media is grist to the mill of people who think animal studies are worthless anyway. But one cannot extrapolate these findings to other diseases. Furthermore, as referred to above, the researchers have only tested the gene profiles in one single strain of mice: C57Bl6, meaning that “The findings of Seok et al. are solely applicable to the B6 strain of mice in the three models of inflammation they tested. They unduly generalize these findings to mouse models of inflammation in general. [8]”

It is true that animal studies, including rodent studies, have their limitations. But what are the alternatives? In vitro studies are often even more artificial, and direct clinical testing of new compounds in humans is not ethical.

Obviously, the final proof of effectiveness and safety of new treatments can only be established in human trials. No one will question that.

A lot can be said about why animal studies often fail to directly translate to the clinic [15]. Clinical disparities between the animal models and the clinical trials testing the treatment (like in sepsis) are one reason. Other important reasons may be methodological flaws in animal studies (i.e. no randomization, wrong statistics) and publication bias: non-publication of “negative” results appears to be prevalent in laboratory animal research.[15-16]. Despite their shortcomings, animal studies and in vitro studies offer a way to examine certain aspects of a process, disease or treatment.

In summary, this study confirms that the existing (C57Bl6) mouse model doesn’t resemble the human situation in the systemic response following acute traumatic injury or sepsis: the genomic response is entirely different, in magnitude, duration and types of changes in expression.

The findings are not new: the shortcomings of the mouse model(s) were long known. It remains enigmatic why the researchers chose only one inbred strain of mice, and of all mice only the B6-strain, which is less sensitive to endotoxin, and only develop acute kidney injury (part of organ failure) at old age (young mice were used) [21]. In this paper from 2009 (!) various reasons are given why the animal models didn’t properly mimic the human disease and how this can be improved. The authors stress that:

the genetically heterogeneous human population should be more accurately represented by outbred mice, reducing the bias found in inbred strains that might contain or lack recessive disease susceptibility loci, depending on selective pressures.” 

Both Bill Barrington [8] and Mark Wanner [18,19] propose the use of “diversity outbred cross or collaborative cross mice that  provide additional diversity.” Indeed, “replicating genetic heterogeneity and critical clinical risk factors such as advanced age and comorbid conditions (..) led to improved models of sepsis and sepsis-induced AKI (acute kidney injury). 

The authors of the PNAS paper suggest that genomic analysis can aid further in revealing which genes play a role in the perturbed immune response in acute inflammation, but it remains to be seen whether this will ultimately lead to effective treatments of sepsis and other forms of acute inflammation.

It also remains to be seen whether comprehensive genomic characterization will be useful in other disease models. The authors suggest for instance,  that genetic profiling may serve as a guide to develop animal models. A shotgun analyses of gene expression of thousands of genes was useful in the present situation, because “the severe inflammatory stress produced a genomic storm affecting all major cellular functions and pathways in humans which led to sufficient perturbations to allow comparisons between the genes in the human conditions and their analogs in the murine models”. But rough analysis of overall expression profiles may give little insight in the usefulness of other animal models, where genetic responses are more subtle.

And predicting what will happen is far less easy that to confirm what is already known….

NOTE: as said the coverage in news and blogs is again quite biased. The conclusion of a generally good Dutch science  news site (the headline and lead suggested that animal models of immune diseases are crap [6]) was adapted after a critical discussion at Twitter (see here and here), and a link was added to this blog post). I wished this occurred more often….
In my opinion the most balanced summaries can be found at the science-based blogs: ScienceBased Medicine [11] and NIH’s Director’s Blog [17], whereas “Understand Nutrition” [8] has an original point of view, which is further elaborated by Mark Wanner at Speaking of Research [19] and Genetics and your health Blog [20]

References

  1. Seok, J., Warren, H., Cuenca, A., Mindrinos, M., Baker, H., Xu, W., Richards, D., McDonald-Smith, G., Gao, H., Hennessy, L., Finnerty, C., Lopez, C., Honari, S., Moore, E., Minei, J., Cuschieri, J., Bankey, P., Johnson, J., Sperry, J., Nathens, A., Billiar, T., West, M., Jeschke, M., Klein, M., Gamelli, R., Gibran, N., Brownstein, B., Miller-Graziano, C., Calvano, S., Mason, P., Cobb, J., Rahme, L., Lowry, S., Maier, R., Moldawer, L., Herndon, D., Davis, R., Xiao, W., Tompkins, R., , ., Abouhamze, A., Balis, U., Camp, D., De, A., Harbrecht, B., Hayden, D., Kaushal, A., O’Keefe, G., Kotz, K., Qian, W., Schoenfeld, D., Shapiro, M., Silver, G., Smith, R., Storey, J., Tibshirani, R., Toner, M., Wilhelmy, J., Wispelwey, B., & Wong, W. (2013). Genomic responses in mouse models poorly mimic human inflammatory diseases Proceedings of the National Academy of Sciences DOI: 10.1073/pnas.1222878110
  2. Drug Testing In Mice May Be a Waste of Time, Researchers Warn 2013-02-12 (science.slashdot.org)
  3. Susan M Love We are not mice 2013-02-14 (Huffingtonpost.com)
  4. Elbert Chu  This Is Why It’s A Mistake To Cure Mice Instead Of Humans 2012-12-20(richarddawkins.net)
  5. Derek Low. Mouse Models of Inflammation Are Basically Worthless. Now We Know. 2013-02-12 (pipeline.corante.com)
  6. Elmar Veerman. Waardeloos onderzoek. Proeven met muizen zeggen vrijwel niets over ontstekingen bij mensen. 2013-02-12 (wetenschap24.nl)
  7. Gina Kolata. Mice Fall Short as Test Subjects for Humans’ Deadly Ills. 2013-02-12 (nytimes.com)

  8. Bill Barrington. Are Mice Reliable Models for Human Disease Studies? 2013-02-14 (understandnutrition.com)
  9. Raven, K. (2012). Rodent models of sepsis found shockingly lacking Nature Medicine, 18 (7), 998-998 DOI: 10.1038/nm0712-998a
  10. Nemzek JA, Hugunin KM, & Opp MR (2008). Modeling sepsis in the laboratory: merging sound science with animal well-being. Comparative medicine, 58 (2), 120-8 PMID: 18524169
  11. Steven Novella. Mouse Model of Sepsis Challenged 2013-02-13 (http://www.sciencebasedmedicine.org/index.php/mouse-model-of-sepsis-challenged/)
  12. Wiersinga WJ (2011). Current insights in sepsis: from pathogenesis to new treatment targets. Current opinion in critical care, 17 (5), 480-6 PMID: 21900767
  13. Khamsi R (2012). Execution of sepsis trials needs an overhaul, experts say. Nature medicine, 18 (7), 998-9 PMID: 22772540
  14. Hotchkiss RS, Coopersmith CM, McDunn JE, & Ferguson TA (2009). The sepsis seesaw: tilting toward immunosuppression. Nature medicine, 15 (5), 496-7 PMID: 19424209
  15. van der Worp, H., Howells, D., Sena, E., Porritt, M., Rewell, S., O’Collins, V., & Macleod, M. (2010). Can Animal Models of Disease Reliably Inform Human Studies? PLoS Medicine, 7 (3) DOI: 10.1371/journal.pmed.1000245
  16. ter Riet, G., Korevaar, D., Leenaars, M., Sterk, P., Van Noorden, C., Bouter, L., Lutter, R., Elferink, R., & Hooft, L. (2012). Publication Bias in Laboratory Animal Research: A Survey on Magnitude, Drivers, Consequences and Potential Solutions PLoS ONE, 7 (9) DOI: 10.1371/journal.pone.0043404
  17. Dr. Francis Collins. Of Mice, Men and Medicine 2013-02-19 (directorsblog.nih.gov)
  18. Tom/ Mark Wanner Why mice may succeed in research when a single mouse falls short (2013-02-15) (speakingofresearch.com) [repost, with introduction]
  19. Mark Wanner Why mice may succeed in research when a single mouse falls short (2013-02-13/) (http://community.jax.org) %5Boriginal post]
  20. Warren, H. (2009). Editorial: Mouse models to study sepsis syndrome in humans Journal of Leukocyte Biology, 86 (2), 199-201 DOI: 10.1189/jlb.0309210
  21. Doi, K., Leelahavanichkul, A., Yuen, P., & Star, R. (2009). Animal models of sepsis and sepsis-induced kidney injury Journal of Clinical Investigation, 119 (10), 2868-2878 DOI: 10.1172/JCI39421




The Scatter of Medical Research and What to do About it.

18 05 2012

ResearchBlogging.orgPaul Glasziou, GP and professor in Evidence Based Medicine, co-authored a new article in the BMJ [1]. Similar to another paper [2] I discussed before [3] this paper deals with the difficulty for clinicians of staying up-to-date with the literature. But where the previous paper [2,3] highlighted the mere increase in number of research articles over time, the current paper looks at the scatter of randomized clinical trials (RCTs) and systematic reviews (SR’s) accross different journals cited in one year (2009) in PubMed.

Hofmann et al analyzed 7 specialties and 9 sub-specialties, that are considered the leading contributions to the burden of disease in high income countries.

They followed a relative straightforward method for identifying the publications. Each search string consisted of a MeSH term (controlled  term) to identify the selected disease or disorders, a publication type [pt] to identify the type of study, and the year of publication. For example, the search strategy for randomized trials in cardiology was: “heart diseases”[MeSH] AND randomized controlled trial[pt] AND 2009[dp]. (when searching “heart diseases” as a MeSH, narrower terms are also searched.) Meta-analysis[pt] was used to identify systematic reviews.

Using this approach Hofmann et al found 14 343 RCTs and 3214 SR’s published in 2009 in the field of the selected (sub)specialties. There was a clear scatter across journals, but this scatter varied considerably among specialties:

“Otolaryngology had the least scatter (363 trials across 167 journals) and neurology the most (2770 trials across 896 journals). In only three subspecialties (lung cancer, chronic obstructive pulmonary disease, hearing loss) were 10 or fewer journals needed to locate 50% of trials. The scatter was less for systematic reviews: hearing loss had the least scatter (10 reviews across nine journals) and cancer the most (670 reviews across 279 journals). For some specialties and subspecialties the papers were concentrated in specialty journals; whereas for others, few of the top 10 journals were a specialty journal for that area.
Generally, little overlap occurred between the top 10 journals publishing trials and those publishing systematic reviews. The number of journals required to find all trials or reviews was highly correlated (r=0.97) with the number of papers for each specialty/ subspecialty.”

Previous work already suggested that this scatter of research has a long tail. Half of the publications is in a minority of papers, whereas the remaining articles are scattered among many journals (see Fig below).

Click to enlarge en see legends at BMJ 2012;344:e3223 [CC]

The good news is that SRs are less scattered and that general journals appear more often in the top 10 journals publishing SRs. Indeed for 6 of the 7 specialties and 4 of the 9 subspecialties, the Cochrane Database of Systematic Reviews had published the highest number of systematic reviews, publishing between 6% and 18% of all the systematic reviews published in each area in 2009. The bad news is that even keeping up to date with SRs seems a huge, if not impossible, challenge.

In other words, it is not sufficient for clinicians to rely on personal subscriptions to a few journals in their specialty (which is common practice). Hoffmann et al suggest several solutions to help clinicians cope with the increasing volume and scatter of research publications.

  • a central library of systematic reviews (but apparently the Cochrane Library fails to fulfill such a role according to the authors, because many reviews are out of date and are perceived as less clinically relevant)
  • registry of planned and completed systematic reviews, such as prospero. (this makes it easier to locate SRs and reduces bias)
  • Synthesis of Evidence and synopses, like the ACP-Jounal Club which summarizes the best evidence in internal medicine
  • Specialised databases that collate and critically appraise randomized trials and systematic reviews, like www.pedro.org.au for physical therapy. In my personal experience, however, this database is often out of date and not comprehensive
  • Journal scanning services like EvidenceUpdates from mcmaster.ca), which scans over 120 journals, filters articles on the basis of quality, has practising clinicians rate them for relevance and newsworthiness, and makes them available as email alerts and in a searchable database. I use this service too, but besides that not all specialties are covered, the rating of evidence may not always be objective (see previous post [4])
  • The use of social media tools to alert clinicians to important new research.

Most of these solutions are (long) existing solutions that do not or only partly help to solve the information overload.

I was surprised that the authors didn’t propose the use of personalized alerts. PubMed’s My NCBI feature allows to create automatic email alerts on a topic and to subscribe to electronic tables of contents (which could include ACP journal Club). Suppose that a physician browses 10 journals roughly covering 25% of the trials. He/she does not need to read all the other journals from cover to cover to avoid missing one potentially relevant trial. Instead it is far more efficient to perform a topic search to filter relevant studies from journals that seldom publish trials on the topic of interest. One could even use the search of Hoffmann et al to achieve this.* Although in reality, most clinical researchers will have narrower fields of interest than all studies about endocrinology and neurology.

At our library we are working at creating deduplicated, easy to read, alerts that collate table of contents of certain journals with topic (and author) searches in PubMed, EMBASE and other databases. There are existing tools that do the same.

Another way to reduce the individual work (reading) load is to organize journals clubs or even better organize regular CATs (critical appraised topics). In the Netherlands, CATS are a compulsory item for residents. A few doctors do the work for many. Usually they choose topics that are clinically relevant (or for which the evidence is unclear).

The authors shortly mention that their search strategy might have missed  missed some eligible papers and included some that are not truly RCTs or SRs, because they relied on PubMed’s publication type to retrieve RCTs and SRs. For systematic reviews this may be a greater problem than recognized, for the authors have used meta-analyses[pt] to identify systematic reviews. Unfortunately PubMed has no publication type for systematic reviews, but it may be clear that there are many more systematic reviews that meta-analyses. Possibly systematical reviews might even have a different scatter pattern than meta-analyses (i.e. the latter might be preferentially included in core journals).

Furthermore not all meta-analyses and systematic reviews are reviews of RCTs (thus it is not completely fair to compare MAs with RCTs only). On the other hand it is a (not discussed) omission of this study, that only interventions are considered. Nowadays physicians have many other questions than those related to therapy, like questions about prognosis, harm and diagnosis.

I did a little imperfect search just to see whether use of other search terms than meta-analyses[pt] would have any influence on the outcome. I search for (1) meta-analyses [pt] and (2) systematic review [tiab] (title and abstract) of papers about endocrine diseases. Then I subtracted 1 from 2 (to analyse the systematic reviews not indexed as meta-analysis[pt])

Thus:

(ENDOCRINE DISEASES[MESH] AND SYSTEMATIC REVIEW[TIAB] AND 2009[DP]) NOT META-ANALYSIS[PT]

I analyzed the top 10/11 journals publishing these study types.

This little experiment suggests that:

  1. the precise scatter might differ per search: apparently the systematic review[tiab] search yielded different top 10/11 journals (for this sample) than the meta-analysis[pt] search. (partially because Cochrane systematic reviews apparently don’t mention systematic reviews in title and abstract?).
  2. the authors underestimate the numbers of Systematic Reviews: simply searching for systematic review[tiab] already found appr. 50% additional systematic reviews compared to meta-analysis[pt] alone
  3. As expected (by me at last), many of the SR’s en MA’s were NOT dealing with interventions, i.e. see the first 5 hits (out of 108 and 236 respectively).
  4. Together these findings indicate that the true information overload is far greater than shown by Hoffmann et al (not all systematic reviews are found, of all available search designs only RCTs are searched).
  5. On the other hand this indirectly shows that SRs are a better way to keep up-to-date than suggested: SRs  also summarize non-interventional research (the ratio SRs of RCTs: individual RCTs is much lower than suggested)
  6. It also means that the role of the Cochrane Systematic reviews to aggregate RCTs is underestimated by the published graphs (the MA[pt] section is diluted with non-RCT- systematic reviews, thus the proportion of the Cochrane SRs in the interventional MAs becomes larger)

Well anyway, these imperfections do not contradict the main point of this paper: that trials are scattered across hundreds of general and specialty journals and that “systematic reviews” (or meta-analyses really) do reduce the extent of scatter, but are still widely scattered and mostly in different journals to those of randomized trials.

Indeed, personal subscriptions to journals seem insufficient for keeping up to date.
Besides supplementing subscription by  methods such as journal scanning services, I would recommend the use of personalized alerts from PubMed and several prefiltered sources including an EBM search machine like TRIP (www.tripdatabase.com/).

*but I would broaden it to find all aggregate evidence, including ACP, Clinical Evidence, syntheses and synopses, not only meta-analyses.

**I do appreciate that one of the co-authors is a medical librarian: Sarah Thorning.

References

  1. Hoffmann, Tammy, Erueti, Chrissy, Thorning, Sarah, & Glasziou, Paul (2012). The scatter of research: cross sectional comparison of randomised trials and systematic reviews across specialties BMJ, 344 : 10.1136/bmj.e3223
  2. Bastian, H., Glasziou, P., & Chalmers, I. (2010). Seventy-Five Trials and Eleven Systematic Reviews a Day: How Will We Ever Keep Up? PLoS Medicine, 7 (9) DOI: 10.1371/journal.pmed.1000326
  3. How will we ever keep up with 75 trials and 11 systematic reviews a day (laikaspoetnik.wordpress.com)
  4. Experience versus Evidence [1]. Opioid Therapy for Rheumatoid Arthritis Pain. (laikaspoetnik.wordpress.com)




PubMed’s Higher Sensitivity than OVID MEDLINE… & other Published Clichés.

21 08 2011

ResearchBlogging.orgIs it just me, or are biomedical papers about searching for a systematic review often of low quality or just too damn obvious? I’m seldom excited about papers dealing with optimal search strategies or peculiarities of PubMed, even though it is my specialty.
It is my impression, that many of the lower quality and/or less relevant papers are written by clinicians/researchers instead of information specialists (or at least no medical librarian as the first author).

I can’t help thinking that many of those authors just happen to see an odd feature in PubMed or encounter an unexpected  phenomenon in the process of searching for a systematic review.
They think: “Hey, that’s interesting” or “that’s odd. Lets write a paper about it.” An easy way to boost our scientific output!
What they don’t realize is that the published findings are often common knowledge to the experienced MEDLINE searchers.

Lets give two recent examples of what I think are redundant papers.

The first example is a letter under the heading “Clinical Observation” in Annals of Internal Medicine, entitled:

“Limitations of the MEDLINE Database in Constructing Meta-analyses”.[1]

As the authors rightly state “a thorough literature search is of utmost importance in constructing a meta-analysis. Since the PubMed interface from the National Library of Medicine is a cornerstone of many meta-analysis,  the authors (two MD’s) focused on the freely available PubMed” (with MEDLINE as its largest part).

The objective was:

“To assess the accuracy of MEDLINE’s “human” and “clinical trial” search limits, which are used by authors to focus literature searches on relevant articles.” (emphasis mine)

O.k…. Stop! I know enough. This paper should have be titled: “Limitation of Limits in MEDLINE”.

Limits are NOT DONE, when searching for a systematic review. For the simple reason that most limits (except language and dates) are MESH-terms.
It takes a while before the indexers have assigned a MESH to the papers and not all papers are correctly (or consistently) indexed. Thus, by using limits you will automatically miss recent, not yet, or not correctly indexed papers. Whereas it is your goal (or it should be) to find as many relevant papers as possible for your systematic review. And wouldn’t it be sad if you missed that one important RCT that was published just the other day?

On the other hand, one doesn’t want to drown in irrelevant papers. How can one reduce “noise” while minimizing the risk of loosing relevant papers?

  1. Use both MESH and textwords to “limit” you search, i.e. also search “trial” as textword, i.e. in title and abstract: trial[tiab]
  2. Use more synonyms and truncation (random*[tiab] OR  placebo[tiab])
  3. Don’t actively limit but use double negation. Thus to get rid of animal studies, don’t limit to humans (this is the same as combining with MeSH [mh]) but safely exclude animals as follows: NOT animals[mh] NOT humans[mh] (= exclude papers indexed with “animals” except when these papers are also indexed with “humans”).
  4. Use existing Methodological Filters (ready-made search strategies) designed to help focusing on study types. These filters are based on one or more of the above-mentioned principles (see earlier posts here and here).
    Simple Methodological Filters can be found at the PubMed Clinical Queries. For instance the narrow filter for Therapy not only searches for the Publication Type “Randomized controlled trial” (a limit), but also for randomized, controlled ànd  trial  as textwords.
    Usually broader (more sensitive) filters are used for systematic reviews. The Cochrane handbook proposes to use the following filter maximizing precision and sensitivity to identify randomized trials in PubMed (see http://www.cochrane-handbook.org/):
    (randomized controlled trial [pt] OR controlled clinical trial [pt] OR randomized [tiab] OR placebo [tiab] OR clinical trials as topic [mesh: noexp] OR randomly [tiab] OR trial [ti]) NOT (animals [mh] NOT humans [mh]).
    When few hits are obtained, one can either use a broader filter or no filter at all.

In other words, it is a beginner’s mistake to use limits when searching for a systematic review.
Besides that the authors publish what should be common knowledge (even our medical students learn it) they make many other (little) mistakes, their precise search is difficult to reproduce and far from complete. This is already addressed by Dutch colleagues in a comment [2].

The second paper is:

PubMed had a higher sensitivity than Ovid-MEDLINE in the search for systematic reviews [3], by Katchamart et al.

Again this paper focuses on the usefulness of PubMed to identify RCT’s for a systematic review, but it concentrates on the differences between PubMed and OVID in this respect. The paper starts with  explaining that PubMed:

provides access to bibliographic information in addition to MEDLINE, such as in-process citations (..), some OLDMEDLINE citations (….) citations that precede the date that a journal was selected for MEDLINE indexing, and some additional life science journals that submit full texts to PubMed Central and receive a qualitative review by NLM.

Given these “facts”, am I exaggerating when I am saying that the authors are pushing at an open door when their main conclusion is that PubMed retrieved more citations overall than Ovid-MEDLINE? The one (!) relevant article missed in OVID was a 2005 study published in a Japanese journal that MEDLINE started indexing in 2007. It was therefore in PubMed, but not in OVID MEDLINE.

An important aspect to keep in mind when searching OVID/MEDLINE ( I have earlier discussed here and here). But worth a paper?

Recently, after finishing an exhaustive search in OVID/MEDLINE, we noticed that we missed a RCT in PubMed, that was not yet available in OVID/MEDLINE.  I just added one sentence to the search methods:

Additionally, PubMed was searched for randomized controlled trials ahead of print, not yet included in OVID MEDLINE. 

Of course, I could have devoted a separate article to this finding. But it is so self-evident, that I don’t think it would be worth it.

The authors have expressed their findings in sensitivity (85% for Ovid-MEDLINE vs. 90% for PubMed, 5% is that ONE paper missing), precision and  number to read (comparable for OVID-MEDLINE and PubMed).

If I might venture another opinion: it looks like editors of medical and epidemiology journals quickly fall for “diagnostic parameters” on a topic that they don’t understand very well: library science.

The sensitivity/precision data found have little general value, because:

  • it concerns a single search on a single topic
  • there are few relevant papers (17- 18)
  • useful features of OVID MEDLINE that are not available in PubMed are not used. I.e. Adjacency searching could enhance the retrieval of relevant papers in OVID MEDLINE (adjacency=words searched within a specified maximal distance of each other)
  • the searches are not comparable, nor are the search field commands.

The latter is very important, if one doesn’t wish to compare apples and oranges.

Lets take a look at the first part of the search (which is in itself well structured and covers many synonyms).
First part of the search - Click to enlarge
This part of the search deals with the P: patients with rheumatoid arthritis (RA). The authors first search for relevant MeSH (set 1-5) and then for a few textwords. The MeSH are fine. The authors have chosen to use Arthritis, rheumatoid and a few narrower terms (MeSH-tree shown at the right). The authors have taken care to use the MeSH:noexp command in PubMed to prevent the automatic explosion of narrower terms in PubMed (although this is superfluous for MesH terms having no narrow terms, like Caplan syndrome etc.).

But the fields chosen for the free text search (sets 6-9) are not comparable at all.

In OVID the mp. field is used, whereas all fields or even no fields are used in PubMed.

I am not even fond of the uncontrolled use of .mp (I rather search in title and abstract, remember we already have the proper MESH-terms), but all fields is even broader than .mp.

In general a .mp. search looks in the Title, Original Title, Abstract, Subject Heading, Name of Substance, and Registry Word fields. All fields would be .af in OVID not .mp.

Searching for rheumatism in OVID using the .mp field yields 7879 hits against 31390 hits when one searches in the .af field.

Thus 4 times as much. Extra fields searched are for instance the journal and the address field. One finds all articles in the journal Arthritis & Rheumatism for instance [line 6], or papers co-authored by someone of the dept. of rheumatoid surgery [line 9]

Worse, in PubMed the “all fields” command doesn’t prevent the automatic mapping.

In PubMed, Rheumatism[All Fields] is translated as follows:

“rheumatic diseases”[MeSH Terms] OR (“rheumatic”[All Fields] AND “diseases”[All Fields]) OR “rheumatic diseases”[All Fields] OR “rheumatism”[All Fields]

Oops, Rheumatism[All Fields] is searched as the (exploded!) MeSH rheumatic diseases. Thus rheumatic diseases (not included in the MeSH-search) plus all its narrower terms! This makes the entire first part of the PubMed search obsolete (where the authors searched for non-exploded specific terms). It explains the large difference in hits with rheumatism between PubMed and OVID/MEDLINE: 11910 vs 6945.

Not only do the authors use this .mp and [all fields] command instead of the preferred [tiab] field, they also apply this broader field to the existing (optimized) Cochrane filter, that uses [tiab]. Finally they use limits!

Well anyway, I hope that I made my point that useful comparison between strategies can only be made if optimal strategies and comparable  strategies are used. Sensitivity doesn’t mean anything here.

Coming back to my original point. I do think that some conclusions of these papers are “good to know”. As a matter of fact it should be basic knowledge for those planning an exhaustive search for a systematic review. We do not need bad studies to show this.

Perhaps an expert paper (or a series) on this topic, understandable for clinicians, would be of more value.

Or the recognition that such search papers should be designed and written by librarians with ample experience in searching for systematic reviews.

NOTE:
* = truncation=search for different word endings; [tiab] = title and abstract; [ti]=title; mh=mesh; pt=publication type

Photo credit

The image is taken from the Dragonfly-blog; here the Flickr-image Brain Vocab Sketch by labguest was adapted by adding the Pubmed logo.

References

  1. Winchester DE, & Bavry AA (2010). Limitations of the MEDLINE database in constructing meta-analyses. Annals of internal medicine, 153 (5), 347-8 PMID: 20820050
  2. Leclercq E, Kramer B, & Schats W (2011). Limitations of the MEDLINE database in constructing meta-analyses. Annals of internal medicine, 154 (5) PMID: 21357916
  3. Katchamart W, Faulkner A, Feldman B, Tomlinson G, & Bombardier C (2011). PubMed had a higher sensitivity than Ovid-MEDLINE in the search for systematic reviews. Journal of clinical epidemiology, 64 (7), 805-7 PMID: 20926257
  4. Search OVID EMBASE and Get MEDLINE for Free…. without knowing it (laikaspoetnik.wordpress.com 2010/10/19/)
  5. 10 + 1 PubMed Tips for Residents (and their Instructors) (laikaspoetnik.wordpress.com 2009/06/30)
  6. Adding Methodological filters to myncbi (laikaspoetnik.wordpress.com 2009/11/26/)
  7. Search filters 1. An Introduction (laikaspoetnik.wordpress.com 2009/01/22/)




RIP Statistician Paul Meier. Proponent not Father of the RCT.

14 08 2011

This headline in Boing Boing caught my eye today:  RIP Paul Meier, father of the randomized trial

Not surprisingly, I knew that Paul Meier (with Kaplan) introduced the Kaplan-Meier estimator (1958), a very important tool for measuring how many patients survive a medical treatment. But I didn’t know he was “father of the randomized trial”….

But is he really?:Father of the randomized trial and “probably best known for the introduction of randomized trials into the evaluation of medical treatments”, as Boing Boing states?

Boing Boing’s very short article is based on the New York Times article: Paul Meier, Statistician Who Revolutionized Medical Trials, Dies at 87. According to the NY Times “Dr. Meier was one of the first and most vocal proponents of what is called “randomization.” 

Randomization, the NY-Times explains, is:

Under the protocol, researchers randomly assign one group of patients to receive an experimental treatment and another to receive the standard treatment. In that way, the researchers try to avoid unintentionally skewing the results by choosing, for example, the healthier or younger patients to receive the new treatment.

(for a more detailed explanation see my previous posts The best study designs…. for dummies and #NotSoFunny #16 – Ridiculing RCTs & EBM)

Meier was a very successful proponent, that is for sure. According to Sir Richard Peto, (Dr. Meier) “perhaps more than any other U.S. statistician, was the one who influenced U.S. drug regulatory agencies, and hence clinical researchers throughout the U.S. and other countries, to insist on the central importance of randomized evidence.”

But an advocate need not be a father, for advocates are seldom the inventors/creators. A proponent is more of a nurse, a mentor or a … foster-parent.

Is Meier the true father/inventor of the RCT? And if not, who is?

Googling “Father of the randomized trial” won’t help, because all 1.610  hits point to Dr. Meier…. thanks to Boing Boing careless copying.

What I read so far doesn’t point at one single creator. And the RCT wasn’t just suddenly there. It started with comparison of treatments under controlled conditions. Back in 1753, the British naval surgeon James Lind published his famous account of 12 scurvy patients, “their cases as similar as I could get them” noting that “the most sudden and visible good effects were perceived from the uses of the oranges and lemons and that citrus fruit cured scurvy [3]. The French physician Pierre Louis and Harvard anatomist Oliver Wendell Holmes (19th century) were also fierce proponents of supporting conclusions about the effectiveness of treatments with statistics, not subjective impressions.[4]

But what was the first real RCT?

Perhaps the first real RCT was The Nuremberg salt test (1835) [6]. This was possibly not only the first RCT, but also the first scientific demonstration of the lack of effect of a homeopathic dilution. More than 50 visitors of a local tavern participated in the experiment. Half of them received a vial  filled with distilled snow water, the other half a vial with ordinary salt in a homeopathic C30-dilution of distilled snow water. None of the participants knew whether he got the “actual medicine or not” (blinding). The numbered vials were coded and the code was broken after the experiment (allocation concealment).

The first publications of RCT’s were in the field of psychology and agriculture. As a matter of fact one other famous statistician, Ronald A. Fisher  (of the Fisher’s exact test) seems to play a more important role in the genesis and popularization of RCT’s than Meier, albeit in agricultural research [5,7]. The book “The Lady Tasting Tea: How Statistics Revolutionized Science in the Twentieth Century” describes how Fisher devised a randomized trial at the spot to test the contention of a lady that she could taste the difference between tea into which milk had been poured and tea that had been poured into milk (almost according to homeopathic principles) [7]

According to Wikipedia [5] the published (medical) RCT appeared in the 1948 paper entitled “Streptomycin treatment of pulmonary tuberculosis”. One of the authors, Austin Bradford Hill, is (also) credited as having conceived the modern RCT.

Thus the road to the modern RCT is long, starting with the notions that experiments should be done under controlled conditions and that it doesn’t make sense to base treatment on intuition. Later, experiments were designed in which treatments were compared to placebo (or other treatments) in a randomized and blinded fashion, with concealment of allocation.

Paul Meier was not the inventor of the RCT, but a successful vocal proponent of the RCT. That in itself is commendable enough.

And although the Boing Boing article was incorrect, and many people googling for “father of the RCT” will find the wrong answer from now on, it did raise my interest in the history of the RCT and the role of statisticians in the development of science and clinical trials.
I plan to read a few of the articles and books mentioned below. Like the relatively lighthearted “The Lady Tasting Tea” [7]. You can envision a book review once I have finished reading it.

Note added 15-05 13.45 pm:

Today a more accurate article appeared in the Boston Globe (“Paul Meier; revolutionized medical studies using math”), which does justice to the important role of Dr Meier in the espousal of randomization as an essential element in clinical trials. For that is what he did.

Quote:

Dr. Meier published a scathing paper in the journal Science, “Safety Testing of Poliomyelitis Vaccine,’’ in which he described deficiencies in the production of vaccines by several companies. His paper was seen as a forthright indictment of federal authorities, pharmaceutical manufacturers, and the National Foundation for Infantile Paralysis, which funded the research for a polio vaccine.

  1. RIP Paul Meier, father of the randomized trial (boingboing.net)
  2. Paul Meier, Statistician Who Revolutionized Medical Trials, Dies at 87 (nytimes.com)
  3. M L Meldrum A brief history of the randomized controlled trial. From oranges and lemons to the gold standard. Hematology/ Oncology Clinics of North America (2000) Volume: 14, Issue: 4, Pages: 745-760, vii PubMed: 10949771  or see http://www.mendeley.com
  4. Fye WB. The power of clinical trials and guidelines,and the challenge of conflicts of interest. J Am Coll Cardiol. 2003 Apr 16;41(8):1237-42. PubMed PMID: 12706915. Full text
  5. http://en.wikipedia.org/wiki/Randomized_controlled_trial
  6. Stolberg M (2006). Inventing the randomized double-blind trial: The Nuremberg salt test of 1835. JLL Bulletin: Commentaries on the history of treatment evaluation (www.jameslindlibrary.org).
  7. The Lady Tasting Tea: How Statistics Revolutionized Science in the Twentieth Century Peter Cummings, MD, MPH, Jama 2001;286(10):1238-1239. doi:10.1001/jama.286.10.1238  Book Review.
    Book by David Salsburg, 340 pp, with illus, $23.95, ISBN 0-7167-41006-7, New York, NY, WH Freeman, 2001.
  8. Kaptchuk TJ. Intentional ignorance: a history of blind assessment and placebo controls in medicine. Bull Hist Med. 1998 Fall;72(3):389-433. PubMed PMID: 9780448. abstract
  9. The best study design for dummies/ (http://laikaspoetnik.wordpress.com: 2008/08/25/)
  10. #Notsofunny: Ridiculing RCT’s and EBM (http://laikaspoetnik.wordpress.com: 2010/02/01/)
  11. RIP Paul Meier : Research Randomization Advocate (mystrongmedicine.com)
  12. If randomized clinical trials don’t show that your woo works, try anthropology! (scienceblogs.com)
  13. The revenge of “microfascism”: PoMo strikes medicine again (scienceblogs.com)




How will we ever keep up with 75 Trials and 11 Systematic Reviews a Day?

6 10 2010

ResearchBlogging.orgAn interesting paper was published in PLOS Medicine [1]. As an information specialist and working part time for the Cochrane Collaboration* (see below), this topic is close to my heart.

The paper, published in PLOS Medicine is written by Hilda Bastian and two of my favorite EBM devotees ànd critics, Paul Glasziou and Iain Chalmers.

Their article gives an good overview of the rise in number of trials, systematic reviews (SR’s) of interventions and of medical papers in general. The paper (under the head: Policy Forum) raises some important issues, but the message is not as sharp and clear as usual.

Take the title for instance.

Seventy-Five Trials and Eleven Systematic Reviews a Day:
How Will We Ever Keep Up?

What do you consider its most important message?

  1. That doctors suffer from an information overload that is only going to get worse, as I did and probably also in part @kevinclauson who tweeted about it to medical librarians
  2. that the solution to this information overload consists of Cochrane systematic reviews (because they aggregate the evidence from individual trials) as @doctorblogs twittered
  3. that it is just about “too many systematic reviews (SR’s) ?”, the title of the PLOS-press release (so the other way around),
  4. That it is about too much of everything and the not always good quality SR’s: @kevinclauson and @pfanderson discussed that they both use the same ” #Cochrane Disaster” (see Kevin’s Blog) in their  teaching.
  5. that Archie Cochrane’s* dream is unachievable and ought perhaps be replaced by something less Utopian (comment by Richard Smith, former editor of the BMJ: 1, 3, 4, 5 together plus a new aspect: SR’s should not only  include randomized controlled trials (RCT’s)

The paper reads easily, but matters of importance are often only touched upon.  Even after reading it twice, I wondered: a lot is being said, but what is really their main point and what are their answers/suggestions?

But lets look at their arguments and pieces of evidence. (Black is from their paper, blue my remarks)

The landscape

I often start my presentations “searching for evidence” by showing the Figure to the right, which is from an older PLOS-article. It illustrates the information overload. Sometimes I also show another slide, with (5-10 year older data), saying that there are 55 trials a day, 1400 new records added per day to MEDLINE and 5000 biomedical articles a day. I also add that specialists have to read 17-22 articles a day to keep up to date with the literature. GP’s even have to read more, because they are generalists. So those 75 trials and the subsequent information overload is not really a shock to me.

Indeed the authors start with saying that “Keeping up with information in health care has never been easy.” The authors give an interesting overview of the driving forces for the increase in trials and the initiation of SR’s and critical appraisals to synthesize the evidence from all individual trials to overcome the information overload (SR’s and other forms of aggregate evidence decrease the number needed to read).

In box 1 they give an overview of the earliest systematic reviews. These SR’s often had a great impact on medical practice (see for instance an earlier discussion on the role of the Crash trial and of the first Cochrane review).
They also touch upon the institution of the Cochrane Collaboration.  The Cochrane collaboration is named after Archie Cochrane who “reproached the medical profession for not having managed to organise a “critical summary, by speciality or subspecialty, adapted periodically, of all relevant randomised controlled trials” He inspired the establishment of the international Oxford Database of Perinatal Trials and he encouraged the use of systematic reviews of randomized controlled trials (RCT’s).

A timeline with some of the key events are shown in Figure 1.

Where are we now?

The second paragraph shows many, interesting, graphs (figs 2-4).

Annoyingly, PLOS only allows one sentence-legends. The details are in the (WORD) supplement without proper referral to the actual figure numbers. Grrrr..!  This is completely unnecessary in reviews/editorials/policy forums. And -as said- annoying, because you have to read a Word file to understand where the data actually come from.

Bastian et al. have used MEDLINE’s publication types (i.e. case reports [pt], reviews[pt], Controlled Clinical Trial[pt] ) and search filters (the Montori SR filter and the Haynes narrow therapy filter, which is built-in in PubMed’s Clinical Queries) to estimate the yearly rise in number of study types. The total number of Clinical trials in CENTRAL (the largest database of controlled clinical trials, abbreviated as CCTRS in the article) and the Cochrane Database of Systematic Reviews (CDSR) are easy to retrieve, because the numbers are published quaterly (now monthly) by the Cochrane Library. Per definition, CDSR only contains SR’s and CENTRAL (as I prefer to call it) contains almost invariably controlled clinical trials.

In short, these are the conclusions from their three figures:

  • Fig 2: The number of published trials has raised sharply from 1950 till 2010
  • Fig 3: The number of systematic reviews and meta-analysis has raised tremendously as well
  • Fig 4: But systematic reviews and clinical trials are still far outnumbered by narrative reviews and case reports.

O.k. that’s clear & they raise a good point : an “astonishing growth has occurred in the number of reports of clinical trials since the middle of the 20th century, and in reports of systematic reviews since the 1980s—and a plateau in growth has not yet been reached.
Plus indirectly: the increase in systematic reviews  didn’t lead to a lower the number of trials and narrative reviews. Thus the information overload is still increasing.
But instead of discussing these findings they go into an endless discussion on the actual data and the fact that we “still do not know exactly how many trials have been done”, to end the discussion by saying that “Even though these figures must be seen as more illustrative than precise…” And than you think. So what? Furthermore, I don’t really get their point of this part of their article.

 

Fig. 2: The number of published trials, 1950 to 2007.

 

 

With regard to Figure 2 they say for instance:

The differences between the numbers of trial records in MEDLINE and CCTR (CENTRAL) (see Figure 2) have multiple causes. Both CCTR and MEDLINE often contain more than one record from a single study, and there are lags in adding new records to both databases. The NLM filters are probably not as efficient at excluding non-trials as are the methods used to compile CCTR. Furthermore, MEDLINE has more language restrictions than CCTR. In brief, there is still no single repository reliably showing the true number of randomised trials. Similar difficulties apply to trying to estimate the number of systematic reviews and health technology assessments (HTAs).

Sorry, although some of these points may be true, Bastian et al. don’t go into the main reason for the difference between both graphs, that is the higher number of trial records in CCTR (CENTRAL) than in MEDLINE: the difference can be simply explained by the fact that CENTRAL contains records from MEDLINE as well as from many other electronic databases and from hand-searched materials (see this post).
With respect to other details:. I don’t know which NLM filter they refer to, but if they mean the narrow therapy filter: this filter is specifically meant to find randomized controlled trials, and is far more specific and less sensitive than the Cochrane methodological filters for retrieving controlled clinical trials. In addition, MEDLINE does not have more language restrictions per se: it just contains a (extensive) selection of  journals. (Plus people more easily use language limits in MEDLINE, but that is besides the point).

Elsewhere the authors say:

In Figures 2 and 3 we use a variety of data sources to estimate the numbers of trials and systematic reviews published from 1950 to the end of 2007 (see Text S1). The number of trials continues to rise: although the data from CCTR suggest some fluctuation in trial numbers in recent years, this may be misleading because the Cochrane Collaboration virtually halted additions to CCTR as it undertook a review and internal restructuring that lasted a couple of years.

As I recall it , the situation is like this: till 2005 the Cochrane Collaboration did the so called “retag project” , in which they searched for controlled clinical trials in MEDLINE and EMBASE (with a very broad methodological filter). All controlled trials articles were loaded in CENTRAL, and the NLM retagged the controlled clinical trials that weren’t tagged with the appropriate publication type in MEDLINE. The Cochrane stopped the laborious retag project in 2005, but still continues the (now) monthly electronic search updates performed by the various Cochrane groups (for their topics only). They still continue handsearching. So they didn’t (virtually?!) halted additions to CENTRAL, although it seems likely that stopping the retagging project caused the plateau. Again the author’s main points are dwarfed by not very accurate details.

Some interesting points in this paragraph:

  • We still do not know exactly how many trials have been done.
  • For a variety of reasons, a large proportion of trials have remained unpublished (negative publication bias!) (note: Cochrane Reviews try to lower this kind of bias by applying no language limits and including unpublished data, i.e. conference proceedings, too)
  • Many trials have been published in journals without being electronically indexed as trials, which makes them difficult to find. (note: this has been tremendously improved since the Consort-statement, which is an evidence-based, minimum set of recommendations for reporting RCTs, and by the Cochrane retag-project, discussed above)
  • Astonishing growth has occurred in the number of reports of clinical trials since the middle of the 20th century, and in reports of systematic reviews since the 1980s—and a plateau in growth has not yet been reached.
  • Trials are now registered in prospective trial registers at inception, theoretically enabling an overview of all published and unpublished trials (note: this will also facilitate to find out reasons for not publishing data, or alteration of primary outcomes)
  • Once the International Committee of Medical Journal Editors announced that their journals would no longer publish trials that had not been prospectively registered, far more ongoing trials were being registered per week (200 instead of 30). In 2007, the US Congress made detailed prospective trial registration legally mandatory.

The authors do not discuss that better reporting of trials and the retag project might have facilitated the indexing and retrieval of trials.

How Close Are We to Archie Cochrane’s Goal?

According to the authors there are various reasons why Archie Cochrane’s goal will not be achieved without some serious changes in course:

  • The increase in systematic reviews didn’t displace other less reliable forms of information (Figs 3 and 4)
  • Only a minority of trials have been assessed in systematic review
  • The workload involved in producing reviews is increasing
  • The bulk of systematic reviews are now many years out of date.

Where to Now?

In this paragraph the authors discuss what should be changed:

  • Prioritize trials
  • Wider adoption of the concept that trials will not be supported unless a SR has shown the trial to be necessary.
  • Prioritizing SR’s: reviews should address questions that are relevant to patients, clinicians and policymakers.
  • Chose between elaborate reviews that answer a part of the relevant questions or “leaner” reviews of most of what we want to know. Apparently the authors have already chosen for the latter: they prefer:
    • shorter and less elaborate reviews
    • faster production ànd update of SR’s
    • no unnecessary inclusion of other study types other than randomized trials. (unless it is about less common adverse effects)
  • More international collaboration and thereby a better use  of resources for SR’s and HTAs. As an example of a good initiative they mention “KEEP Up,” which will aim to harmonise updating standards and aggregate updating results, initiated and coordinated by the German Institute for Quality and Efficiency in Health Care (IQWiG) and involving key systematic reviewing and guidelines organisations such as the Cochrane Collaboration, Duodecim, the Scottish Intercollegiate Guidelines Network (SIGN), and the National Institute for Health and Clinical Excellence (NICE).

Summary and comments

The main aim of this paper is to discuss  to which extent the medical profession has managed to make “critical summaries, by speciality or subspeciality, adapted periodically, of all relevant randomized controlled trials”, as proposed 30 years ago by Archie Cochrane.

Emphasis of the paper is mostly on the number of trials and systematic reviews, not on qualitative aspects. Furthermore there is too much emphasis on the methods determining the number of trials and reviews.

The main conclusion of the authors is that an astonishing growth has occurred in the number of reports of clinical trials as well as in the number of SR’s, but that these systematic pieces of evidence shrink into insignificance compared to the a-systematic narrative reviews or case reports published. That is an important, but not an unexpected conclusion.

Bastian et al don’t address whether systematic reviews have made the growing number of trials easier to access or digest. Neither do they go into developments that have facilitated the retrieval of clinical trials and aggregate evidence from databases like PubMed: the Cochrane retag-project, the Consort-statement, the existence of publication types and search filters (they use themselves to filter out trials and systematic reviews). They also skip other sources than systematic reviews, that make it easier to find the evidence: Databases with Evidence Based Guidelines, the TRIP database, Clinical Evidence.
As Clay Shirky said: “It’s Not Information Overload. It’s Filter Failure.”

It is also good to note that case reports and narrative reviews serve other aims. For medical practitioners rare case reports can be very useful for their clinical practice and good narrative reviews can be valuable for getting an overview in the field or for keeping up-to-date. You just have to know when to look for what.

Bastian et al have several suggestions for improvement, but these suggestions are not always underpinned. For instance, they propose access to all systematic reviews and trials. Perfect. But how can this be attained? We could stimulate authors to publish their trials in open access papers. For Cochrane reviews this would be desirable but difficult, as we cannot demand from authors who work for months for free to write a SR to pay the publications themselves. The Cochrane Collab is an international organization that does not receive subsidies for this. So how could this be achieved?

In my opinion, we can expect the most important benefits from prioritizing of trials ànd SR’s, faster production ànd update of SR’s, more international collaboration and less duplication. It is a pity the authors do not mention other projects than “Keep up”.  As discussed in previous posts, the Cochrane Collaboration also recognizes the many issues raised in this paper, and aims to speed up the updates and to produce evidence on priority topics (see here and here). Evidence aid is an example of a successful effort.  But this is only the Cochrane Collaboration. There are many more non-Cochrane systematic reviews produced.

And then we arrive at the next issue: Not all systematic reviews are created equal. There are a lot of so called “systematic reviews”, that aren’t the conscientious, explicit and judicious created synthesis of evidence as they ought to be.

Therefore, I do not think that the proposal that each single trial should be preceded by a systematic review, is a very good idea.
In the Netherlands writing a SR is already required for NWO grants. In practice, people just approach me, as a searcher, the days before Christmas, with the idea to submit the grant proposal (including the SR) early in January. This evidently is a fast procedure, but doesn’t result in a high standard SR, upon which others can rely.

Another point is that this simple and fast production of SR’s will only lead to a larger increase in number of SR’s, an effect that the authors wanted to prevent.

Of course it is necessary to get a (reliable) picture of what has already be done and to prevent unnecessary duplication of trials and systematic reviews. It would the best solution if we would have a triplet (nano-publications)-like repository of trials and systematic reviews done.

Ideally, researchers and doctors should first check such a database for existing systematic reviews. Only if no recent SR is present they could continue writing a SR themselves. Perhaps it sometimes suffices to search for trials and write a short synthesis.

There is another point I do not agree with. I do not think that SR’s of interventions should only include RCT’s . We should include those study types that are relevant. If RCT’s furnish a clear proof, than RCT’s are all we need. But sometimes – or in some topics/specialties- RCT’s are not available. Inclusion of other study designs and rating them with GRADE (proposed by Guyatt) gives a better overall picture. (also see the post: #notsofunny: ridiculing RCT’s and EBM.

The authors strive for simplicity. However, the real world isn’t that simple. In this paper they have limited themselves to evidence of the effects of health care interventions. Finding and assessing prognostic, etiological and diagnostic studies is methodologically even more difficult. Still many clinicians have these kinds of questions. Therefore systematic reviews of other study designs (diagnostic accuracy or observational studies) are also of great importance.

In conclusion, whereas I do not agree with all points raised, this paper touches upon a lot of important issues and achieves what can be expected from a discussion paper:  a thorough shake-up and a lot of discussion.

References

  1. Bastian, H., Glasziou, P., & Chalmers, I. (2010). Seventy-Five Trials and Eleven Systematic Reviews a Day: How Will We Ever Keep Up? PLoS Medicine, 7 (9) DOI: 10.1371/journal.pmed.1000326

Related Articles





#NotSoFunny #16 – Ridiculing RCTs & EBM

1 02 2010

I remember it well. As a young researcher I presented my findings in one of my first talks, at the end of which the chair killed my work with a remark, that made the whole room of scientists laugh, but was really beside the point. My supervisor, a truly original and very wise scientist, suppressed his anger. Afterwards, he said: “it is very easy ridiculing something that isn’t a mainstream thought. It’s the argument that counts. We will prove that we are right.” …And we did.

This was not my only encounter with scientists who try to win the debate by making fun of a theory, a finding or …people. But it is not only the witty scientist who is to *blame*, it is also the uncritical audience that just swallows it.

I have similar feelings with some journal articles or blog posts that try to ridicule EBM – or any other theory or approach. Funny, perhaps, but often misunderstood and misused by “the audience”.

Take for instance the well known spoof article in the BMJ:

“Parachute use to prevent death and major trauma related to gravitational challenge: systematic review of randomised controlled trials”

It is one of those Christmas spoof articles in the BMJ, meant to inject some medical humor into the normally serious scientific literature. The spoof parachute article pretends to be a Systematic Review of RCT’s  investigating if parachutes can prevent death and major trauma. Of course, no such trial has been done or will be done: dropping people at random with and without a parachute to proof that you better jump out of a plane with a parachute.

I found the article only mildly amusing. It is so unrealistic, that it becomes absurd. Not that I don’t enjoy absurdities at times, but  absurdities should not assume a live of their own.  In this way it doesn’t evoke a true discussion, but only worsens the prejudice some people already have.

People keep referring to this 2003 article. Last Friday, Dr. Val (with whom I mostly agree) devoted a Friday Funny post to it at Get Better Health: “The Friday Funny: Why Evidence-Based Medicine Is Not The Whole Story”.* In 2008 the paper was also discussed by Not Totally Rad [3]. That EBM is not the whole story seems pretty obvious to me. It was never meant to be…

But lets get specific. Which assumptions about RCT’s and SR’s are wrong, twisted or put out of context? Please read the excellent comments below the article. These often put the finger on the spot.

1. EBM is cookbook medicine.
Many define EBM as “make clinical decisions based on a synthesis of the best available evidence about a treatment.” (i.e. [3]). However, EBM is not cookbook medicine.

The accepted definition of EBM  is “the conscientious, explicit and judicious use of current best evidence in making decisions about the care of individual patients” [4]. Sacket already emphasized back in 1996:

Good doctors use both individual clinical expertise and the best available external evidence, and neither alone is enough. Without clinical expertise, practice risks becoming tyrannised by evidence, for even excellent external evidence may be inapplicable to or inappropriate for an individual patient. Without current best evidence, practice risks becoming rapidly out of date, to the detriment of patients.


2. RCT’s are required for evidence.

Although a well performed RCT provides the “best” evidence, RCT’s are often not appropriate or indicated. That is especially true for domains other than therapy. In case of prognostic questions the most appropriate study design is usually an inception cohort. A RCT for instance can’t tell whether female age is a prognostic factor for clinical pregnancy rates following IVF: there is no way to randomize for “age”, or for “BMI”. ;)

The same is true for etiologic or harm questions. In theory, the “best” answer is obtained by RCT. However RCT’s are often unethical or unnecessary. RCT’s are out of the question to address whether substance X causes cancer. Observational studies will do. Sometimes cases provide sufficient evidence. If a woman gets hepatic veno-occlusive disease after drinking loads of a herbal tea the finding of  similar cases in the literature may be sufficient to conclude that the herbal tea probably caused the disease.

Diagnostic accuracy studies also require another study design (cross-sectional study, or cohort).

But even in the case of  interventions, we can settle for less than a RCT. Evidence is not present or not, but exists on a hierarchy. RCT’s (if well performed) are the most robust, but if not available we have to rely on “lower” evidence.

BMJ Clinical Evidence even made a list of clinical questions unlikely to be answered by RCT’s. In this case Clinical Evidence searches and includes the best appropriate form of evidence.

  1. where there are good reasons to think the intervention is not likely to be beneficial or is likely to be harmful;
  2. where the outcome is very rare (e.g. a 1/10000 fatal adverse reaction);
  3. where the condition is very rare;
  4. where very long follow up is required (e.g. does drinking milk in adolescence prevent fractures in old age?);
  5. where the evidence of benefit from observational studies is overwhelming (e.g. oxygen for acute asthma attacks);
  6. when applying the evidence to real clinical situations (external validity);
  7. where current practice is very resistant to change and/or patients would not be willing to take the control or active treatment;
  8. where the unit of randomisation would have to be too large (e.g. a nationwide public health campaign); and
  9. where the condition is acute and requires immediate treatment.
    Of these, only the first case is categorical. For the rest the cut off point when an RCT is not appropriate is not precisely defined.

Informed health decisions should be based on good science rather than EBM (alone).

Dr Val [2]: “EBM has been an over-reliance on “methodolatry” – resulting in conclusions made without consideration of prior probability, laws of physics, or plain common sense. (….) Which is why Steve Novella and the Science Based Medicine team have proposed that our quest for reliable information (upon which to make informed health decisions) should be based on good science rather than EBM alone.

Methodolatry is the profane worship of the randomized clinical trial as the only valid method of investigation. This is disproved in the previous sections.

The name “Science Based Medicine” suggests that it is opposed to “Evidence Based Medicine”. At their blog David Gorski explains: “We at SBM believe that medicine based on science is the best medicine and tirelessly promote science-based medicine through discussion of the role of science and medicine.”

While this may apply to a certain extent to quack or homeopathy (the focus of SBM) there are many examples of the opposite: that science or common sense led to interventions that were ineffective or even damaging, including:

As a matter of fact many side-effects are not foreseen and few in vitro or animal experiments have led to successful new treatments.

At the end it is most relevant to the patient that “it works” (and the benefits outweigh the harms).

Furthermore EBM is not -or should not be- without consideration of prior probability, laws of physics, or plain common sense. To me SBM and EBM are not mutually exclusive.

Why the example is bullshit unfair and unrealistic

I’ll leave it to the following comments (and yes the choice is biased) [1]

Nibu A George,Scientist :

First of all generalizing such reports of some selected cases and making it a universal truth is unhealthy and challenging the entire scientific community. Secondly, the comparing the parachute scenario with a pure medical situation is unacceptable since the parachute jump is rather a physical situation and it become a medical situation only if the jump caused any physical harm to the person involved.

Richard A. Davidson, MD,MPH:

This weak attempt at humor unfortunately reinforces one of the major negative stereotypes about EBM….that RCT’s are required for evidence, and that observational studies are worthless. If only 10% of the therapies that are paraded in front of us by journals were as effective as parachutes, we would have much less need for EBM. The efficacy of most of our current therapies are only mildly successful. In fact, many therapies can provide only a 25% or less therapeutic improvement. If parachutes were that effective, nobody would use them.
While it’s easy enough to just chalk this one up to the cliche of the cantankerous British clinician, it shows a tremendous lack of insight about what EBM is and does. Even worse, it’s just not funny.

Aviel Roy-Shapira, Senior Staff Surgeon

Smith and Pell succeeded in amusing me, but I think their spoof reflects a common misconception about evidence based medicine. All too many practitioners equate EBM with randomized controlled trials, and metaanalyses.
EBM is about what is accepted as evidence, not about how the evidence is obtained. For example, an RCT which shows that a given drug lowers blood pressure in patients with mild hypertension, however well designed and executed, is not acceptable as a basis for treatment decisions. One has to show that the drug actually lowers the incidence of strokes and heart attacks.
RCT’s are needed only when the outcome is not obvious. If most people who fall from airplanes without a parachute die, this is good enough. There is plenty of evidence for that.

EBM is about using outcome data for making therapeutic decisions. That data can come from RCTs but also from observation

Lee A. Green, Associate Professor

EBM is not RCTs. That’s probably worth repeating several times, because so often both EBM’s detractors and some of its advocates just don’t get it. Evidence is not binary, present or not, but exists on a heirarchy (Guyatt & Rennie, 2001). (….)
The methods and rigor of EBM are nothing more or less than ways of correcting for our
imperfect perceptions of our experiences. We prefer, cognitively, to perceive causal connections. We even perceive such connections where they do not exist, and we do so reliably and reproducibly under well-known sets of circumstances. RCTs aren’t holy writ, they’re simply a tool for filtering out our natural human biases in judgment and causal attribution. Whether it’s necessary to use that tool depends upon the likelihood of such bias occurring.

Scott D Ramsey, Associate Professor

Parachutes may be a no-brainer, but this article is brainless.

Unfortunately, there are few if any parallels to parachutes in health care. The danger with this type of article is that it can lead to labeling certain medical technologies as “parachutes” when in fact they are not. I’ve already seen this exact analogy used for a recent medical technology (lung volume reduction surgery for severe emphysema). In uncontrolled studies, it quite literally looked like everyone who didn’t die got better. When a high quality randomized controlled trial was done, the treatment turned out to have significant morbidity and mortality and a much more modest benefit than was originally hypothesized.

Timothy R. Church, Professor

On one level, this is a funny article. I chuckled when I first read it. On reflection, however, I thought “Well, maybe not,” because a lot of people have died based on physicians’ arrogance about their ability to judge the efficacy of a treatment based on theory and uncontrolled observation.

Several high profile medical procedures that were “obviously” effective have been shown by randomized trials to be (oops) killing people when compared to placebo. For starters to a long list of such failed therapies, look at antiarrhythmics for post-MI arrhythmias, prophylaxis for T. gondii in HIV infection, and endarterectomy for carotid stenosis; all were proven to be harmful rather than helpful in randomized trials, and in the face of widespread opposition to even testing them against no treatment. In theory they “had to work.” But didn’t.

But what the heck, let’s play along. Suppose we had never seen a parachute before. Someone proposes one and we agree it’s a good idea, but how to test it out? Human trials sound good. But what’s the question? It is not, as the author would have you believe, whether to jump out of the plane without a parachute or with one, but rather stay in the plane or jump with a parachute. No one was voluntarily jumping out of planes prior to the invention of the parachute, so it wasn’t to prevent a health threat, but rather to facilitate a rapid exit from a nonviable plane.

Another weakness in this straw-man argument is that the physics of the parachute are clear and experimentally verifiable without involving humans, but I don’t think the authors would ever suggest that human physiology and pathology in the face of medication, radiation, or surgical intervention is ever quite as clear and predictable, or that non-human experience (whether observational or experimental) would ever suffice.

The author offers as an alternative to evidence-based methods the “common sense” method, which is really the “trust me, I’m a doctor” method. That’s not worked out so well in many high profile cases (see above, plus note the recent finding that expensive, profitable angioplasty and coronary artery by-pass grafts are no better than simple medical treatment of arteriosclerosis). And these are just the ones for which careful scientists have been able to do randomized trials. Most of our accepted therapies never have been subjected to such scrutiny, but it is breathtaking how frequently such scrutiny reveals problems.

Thanks, but I’ll stick with scientifically proven remedies.

parachute experiments without humans

* on the same day as I posted Friday Foolery #15: The Man who pioneered the RCT. What a coincidence.

** Don’t forget to read the comments to the article. They are often excellent.

Photo Credits

ReferencesResearchBlogging.org

  1. Smith, G. (2003). Parachute use to prevent death and major trauma related to gravitational challenge: systematic review of randomised controlled trials BMJ, 327 (7429), 1459-1461 DOI: 10.1136/bmj.327.7429.1459
  2. The Friday Funny: Why Evidence-Based Medicine Is Not The Whole Story”. (getbetterhealth.com) [2010.01.29]
  3. Call for randomized clinical trials of Parachutes (nottotallyrad.blogspot.com) [08-2008]
  4. Sackett DL, Rosenberg WM, Gray JA, Haynes RB, & Richardson WS (1996). Evidence based medicine: what it is and what it isn’t. BMJ (Clinical research ed.), 312 (7023), 71-2 PMID: 8555924
Reblog this post [with Zemanta]
are very well edged off




Complementary Medicine & Pharmacists

30 11 2009

I don’t know if the situation is the same in other countries, but in the Netherlands we can only get prescribed medications in pharmacies. Drugstores are only allowed to sell over-the counter (OTC) medicines.

Most Pharmacies have a small shop of 5 square meters (besides a large storage room). What surprises me is that the counter is not only full with non-allergic creams, and the shelves are not only filled with liquorice and plasters, but the counter and shelves predominantly display naturopathic and herbal “medicines”. In this flu-season there are even leaflets how to prevent flu with all kinds of naturopathic medicine. Dr Vogel’s Echinaforce “helps to augment your natural resistance, lowers the risk of flu and shortens the duration or decreases the severity of symptoms once you have the flu” (..”vermindert u de kans op griep en herstelt u sneller als u toch ziek wordt“). Apparently A Vogel.nl (via Biohorma) started a campaign in the Netherlands. At their website there is even an advertisement for an offer by an insurance company -OHRA- because it generously refunds homeopathic medicine. Biohorma also made a You-Tube video.
In contrast, in the US there is a disclaimer at the Echinaforce site:” These statements have not been evaluated by the Food and Drug Administration (FDA). This product is not intended to diagnose, treat, cure or prevent any disease.”

There is no evidence that Echinacea prevents flu (see Cochrane Review and de Volkskrant [Dutch newspaper referring to clinical trials]), although it is not excluded that it helps for the early treatment of colds in adults.

Isn’t such a promotion of ineffective stuff a bad advice considering we have  a real flu-epidemic, and given the inverse relationship between pediatric vaccination and CAM usage (see Respectful Insolence)?

It is quite confusing, however, because Echinacea is advertised as an homeopathic medicine, whereas it seems a herbal medicine (not diluted ad infinitum). To date there is no evidence that homeopathy ‘works’. All 6 published Cochrane systematic reviews with ‘homeopathy’ or ‘homeopathic’ in the title conclude that there is little or no evidence that it works beyond the placebo-effect.

During the recent The House of Commons Science and Technology Committee meeting calling in homeopaths and scientists to discuss evidence for the alternative therapy Prof. Dr Ernst (with experience as a homeopath) said: “I have supplied a list of systematic reviews of homeopathy. There are two dozen. None in that list were positive.” (see this excellent summary of the meeting by Ian Sample). For the entire memorandum of Dr Ernst see here.

Besides that the clinical trials are ineffective, the whole theory is incompatible with the laws of physics and chemistry.

Nevertheless:

  • There is a lot of homeopathic research going on, i.e. funded by the NHS (National Health Sevice) in the UK and the NCCAM (National Center for Complementary and Alternative Medicin, NIH) in the US.
  • In the UK homeopathic medicine is endorsed by the MHRA (Medicines and Healthcare products Regulatory Agency)
  • CAM is booming business (£1.5bn industry in the UK)
  • CAM is covered by insurance companies.
  • CAM is sold and sometimes advocated by pharmacists.

Thus all over the world people are buying these ineffective homeopathic medicines while believing they ‘work’, or at least cause no harm. However, while homeopathic medicines may not harm themselves, they may cause harm if they are used in place of proven treatment for any life-threatening illness.” Indeed the WHO has warned people with conditions such as HIV, TB and malaria not to rely on homeopathic treatments (BBC NEWS 20 August 2009

For me it is incomprehensible, that pharmacists who are trained in pharmacology and chemistry (at the University Level), just sell those ineffective costly water-dilutions and advocate them directly or indirectly by putting them on the shelves, providing ample leaflets and brochures and giving positive “advise”. What could be the reason for doing that other than ignorance or MONEY?


Recommended Reading:

Photo Credits

  1. Pharmacists mortar and pestle http://commons.wikimedia.org/wiki/File:PharmacistsMortar.svg
  2. Homeopathic Medicine on the shelves http://www.flickr.com/photos/caseywest/ / CC BY-SA 2.0
    (this photo has nothing to do with the subject)
, but all kind of complementary medicine (CAM).
Reblog this post [with Zemanta]







Follow

Get every new post delivered to your Inbox.

Join 611 other followers