Of Mice and Men Again: New Genomic Study Helps Explain why Mouse Models of Acute Inflammation do not Work in Men

25 02 2013

ResearchBlogging.org

This post is update after a discussion at Twitter with @animalevidence who pointed me at a great blog post at Speaking of Research ([19], a repost of [20], highlighting the shortcomings of the current study using just one single inbred strain of mice (C57Bl6)  [2013-02-26]. Main changes are in blue

A recent paper published in PNAS [1] caused quite a stir both inside and outside the scientific community. The study challenges the validity of using mouse models to test what works as a treatment in humans. At least this is what many online news sources seem to conclude: “drug testing may be a waste of time”[2], “we are not mice” [3, 4], or a bit more to the point: mouse models of inflammation are worthless [5, 6, 7].

But basically the current study looks only at one specific area, the area of inflammatory responses that occur in critically ill patients after severe trauma and burns (SIRS, Systemic Inflammatory Response Syndrome). In these patients a storm of events may eventually lead to organ failure and death. It is similar to what may occur after sepsis (but here the cause is a systemic infection).

Furthermore the study only uses one single approach: it compares the gene response patterns in serious human injuries (burns, trauma) and a human model partially mimicking these inflammatory diseases (human healthy volunteers receiving  a low dose endotoxin) with the corresponding three animal models (burns, trauma, endotoxin).

And, as highlighted by Bill Barrington of “Understand Nutrition” [8], the researchers have only tested the gene profiles in one single strain of mice: C57Bl6 (B6 for short). If B6 was the only model used in practice this would be less of a problem. But according to Mark Wanner of the Jackson Laboratory [19, 20]:

 It is now well known that some inbred mouse strains, such as the C57BL/6J (B6 for short) strain used, are resistant to septic shock. Other strains, such as BALB and A/J, are much more susceptible, however. So use of a single strain will not provide representative results.

The results in itself are very clear. The figures show at a glance that there is no correlation whatsoever between the human and B6 mouse expression data.

Seok and 36 other researchers from across the USA  looked at approximately 5500 human genes and their mouse analogs. In humans, burns and traumatic injuries (and to a certain extent the human endotoxin model) triggered the activation of a vast number of genes, that were not triggered in the present C57Bl6 mouse models. In addition the genomic response is longer lasting in human injuries. Furthermore, the top 5 most activated and most suppressed pathways in human burns and trauma had no correlates in mice. Finally, analysis of existing data in the Gene Expression (GEO) Database showed that the lack of correlation between mouse and human studies was also true for other acute inflammatory responses, like sepsis and acute infection.

This is a high quality study with interesting results. However, the results are not as groundbreaking as some media suggest.

As discussed by the authors [1], mice are known to be far more resilient to inflammatory challenge than humans*: a million fold higher dose of endotoxin than the dose causing shock in humans is lethal to mice.* This, and the fact that “none of the 150  candidate agents that progressed to human trials has proved successful in critically ill patients” already indicates that the current approach fails.

[This is not entirely correct the endotoxin/LPS dose in mice is 1000–10,000 times the dose required to induce severe disease with shock in humans [20] and mice that are resilient to endotoxin may still be susceptible to infection. It may well be that the endotoxin response is not a good model for the late effects of  sepsis]

The disappointing trial results have forced many researchers to question not only the usefulness of the current mouse models for acute inflammation [9,10; refs from 11], but also to rethink the key aspects of the human response itself and the way these clinical trials are performed [12, 13, 14]. For instance, emphasis has always been on the exuberant inflammatory reaction, but the subsequent immunosuppression may also be a major contributor to the disease. There is also substantial heterogeneity among patients [13-14] that may explain why some patients have a good prognosis and others haven’t. And some of the initially positive results in human trials have not been reproduced in later studies either (benefit of intense glucose control and corticosteroid treatment) [12]. Thus is it fair to blame only the mouse studies?

dick mouse

dick mouse (Photo credit: Wikipedia)

The coverage by some media is grist to the mill of people who think animal studies are worthless anyway. But one cannot extrapolate these findings to other diseases. Furthermore, as referred to above, the researchers have only tested the gene profiles in one single strain of mice: C57Bl6, meaning that “The findings of Seok et al. are solely applicable to the B6 strain of mice in the three models of inflammation they tested. They unduly generalize these findings to mouse models of inflammation in general. [8]“

It is true that animal studies, including rodent studies, have their limitations. But what are the alternatives? In vitro studies are often even more artificial, and direct clinical testing of new compounds in humans is not ethical.

Obviously, the final proof of effectiveness and safety of new treatments can only be established in human trials. No one will question that.

A lot can be said about why animal studies often fail to directly translate to the clinic [15]. Clinical disparities between the animal models and the clinical trials testing the treatment (like in sepsis) are one reason. Other important reasons may be methodological flaws in animal studies (i.e. no randomization, wrong statistics) and publication bias: non-publication of “negative” results appears to be prevalent in laboratory animal research.[15-16]. Despite their shortcomings, animal studies and in vitro studies offer a way to examine certain aspects of a process, disease or treatment.

In summary, this study confirms that the existing (C57Bl6) mouse model doesn’t resemble the human situation in the systemic response following acute traumatic injury or sepsis: the genomic response is entirely different, in magnitude, duration and types of changes in expression.

The findings are not new: the shortcomings of the mouse model(s) were long known. It remains enigmatic why the researchers chose only one inbred strain of mice, and of all mice only the B6-strain, which is less sensitive to endotoxin, and only develop acute kidney injury (part of organ failure) at old age (young mice were used) [21]. In this paper from 2009 (!) various reasons are given why the animal models didn’t properly mimic the human disease and how this can be improved. The authors stress that:

the genetically heterogeneous human population should be more accurately represented by outbred mice, reducing the bias found in inbred strains that might contain or lack recessive disease susceptibility loci, depending on selective pressures.” 

Both Bill Barrington [8] and Mark Wanner [18,19] propose the use of “diversity outbred cross or collaborative cross mice that  provide additional diversity.” Indeed, “replicating genetic heterogeneity and critical clinical risk factors such as advanced age and comorbid conditions (..) led to improved models of sepsis and sepsis-induced AKI (acute kidney injury). 

The authors of the PNAS paper suggest that genomic analysis can aid further in revealing which genes play a role in the perturbed immune response in acute inflammation, but it remains to be seen whether this will ultimately lead to effective treatments of sepsis and other forms of acute inflammation.

It also remains to be seen whether comprehensive genomic characterization will be useful in other disease models. The authors suggest for instance,  that genetic profiling may serve as a guide to develop animal models. A shotgun analyses of gene expression of thousands of genes was useful in the present situation, because “the severe inflammatory stress produced a genomic storm affecting all major cellular functions and pathways in humans which led to sufficient perturbations to allow comparisons between the genes in the human conditions and their analogs in the murine models”. But rough analysis of overall expression profiles may give little insight in the usefulness of other animal models, where genetic responses are more subtle.

And predicting what will happen is far less easy that to confirm what is already known….

NOTE: as said the coverage in news and blogs is again quite biased. The conclusion of a generally good Dutch science  news site (the headline and lead suggested that animal models of immune diseases are crap [6]) was adapted after a critical discussion at Twitter (see here and here), and a link was added to this blog post). I wished this occurred more often….
In my opinion the most balanced summaries can be found at the science-based blogs: ScienceBased Medicine [11] and NIH’s Director’s Blog [17], whereas “Understand Nutrition” [8] has an original point of view, which is further elaborated by Mark Wanner at Speaking of Research [19] and Genetics and your health Blog [20]

References

  1. Seok, J., Warren, H., Cuenca, A., Mindrinos, M., Baker, H., Xu, W., Richards, D., McDonald-Smith, G., Gao, H., Hennessy, L., Finnerty, C., Lopez, C., Honari, S., Moore, E., Minei, J., Cuschieri, J., Bankey, P., Johnson, J., Sperry, J., Nathens, A., Billiar, T., West, M., Jeschke, M., Klein, M., Gamelli, R., Gibran, N., Brownstein, B., Miller-Graziano, C., Calvano, S., Mason, P., Cobb, J., Rahme, L., Lowry, S., Maier, R., Moldawer, L., Herndon, D., Davis, R., Xiao, W., Tompkins, R., , ., Abouhamze, A., Balis, U., Camp, D., De, A., Harbrecht, B., Hayden, D., Kaushal, A., O’Keefe, G., Kotz, K., Qian, W., Schoenfeld, D., Shapiro, M., Silver, G., Smith, R., Storey, J., Tibshirani, R., Toner, M., Wilhelmy, J., Wispelwey, B., & Wong, W. (2013). Genomic responses in mouse models poorly mimic human inflammatory diseases Proceedings of the National Academy of Sciences DOI: 10.1073/pnas.1222878110
  2. Drug Testing In Mice May Be a Waste of Time, Researchers Warn 2013-02-12 (science.slashdot.org)
  3. Susan M Love We are not mice 2013-02-14 (Huffingtonpost.com)
  4. Elbert Chu  This Is Why It’s A Mistake To Cure Mice Instead Of Humans 2012-12-20(richarddawkins.net)
  5. Derek Low. Mouse Models of Inflammation Are Basically Worthless. Now We Know. 2013-02-12 (pipeline.corante.com)
  6. Elmar Veerman. Waardeloos onderzoek. Proeven met muizen zeggen vrijwel niets over ontstekingen bij mensen. 2013-02-12 (wetenschap24.nl)
  7. Gina Kolata. Mice Fall Short as Test Subjects for Humans’ Deadly Ills. 2013-02-12 (nytimes.com)

  8. Bill Barrington. Are Mice Reliable Models for Human Disease Studies? 2013-02-14 (understandnutrition.com)
  9. Raven, K. (2012). Rodent models of sepsis found shockingly lacking Nature Medicine, 18 (7), 998-998 DOI: 10.1038/nm0712-998a
  10. Nemzek JA, Hugunin KM, & Opp MR (2008). Modeling sepsis in the laboratory: merging sound science with animal well-being. Comparative medicine, 58 (2), 120-8 PMID: 18524169
  11. Steven Novella. Mouse Model of Sepsis Challenged 2013-02-13 (http://www.sciencebasedmedicine.org/index.php/mouse-model-of-sepsis-challenged/)
  12. Wiersinga WJ (2011). Current insights in sepsis: from pathogenesis to new treatment targets. Current opinion in critical care, 17 (5), 480-6 PMID: 21900767
  13. Khamsi R (2012). Execution of sepsis trials needs an overhaul, experts say. Nature medicine, 18 (7), 998-9 PMID: 22772540
  14. Hotchkiss RS, Coopersmith CM, McDunn JE, & Ferguson TA (2009). The sepsis seesaw: tilting toward immunosuppression. Nature medicine, 15 (5), 496-7 PMID: 19424209
  15. van der Worp, H., Howells, D., Sena, E., Porritt, M., Rewell, S., O’Collins, V., & Macleod, M. (2010). Can Animal Models of Disease Reliably Inform Human Studies? PLoS Medicine, 7 (3) DOI: 10.1371/journal.pmed.1000245
  16. ter Riet, G., Korevaar, D., Leenaars, M., Sterk, P., Van Noorden, C., Bouter, L., Lutter, R., Elferink, R., & Hooft, L. (2012). Publication Bias in Laboratory Animal Research: A Survey on Magnitude, Drivers, Consequences and Potential Solutions PLoS ONE, 7 (9) DOI: 10.1371/journal.pone.0043404
  17. Dr. Francis Collins. Of Mice, Men and Medicine 2013-02-19 (directorsblog.nih.gov)
  18. Tom/ Mark Wanner Why mice may succeed in research when a single mouse falls short (2013-02-15) (speakingofresearch.com) [repost, with introduction]
  19. Mark Wanner Why mice may succeed in research when a single mouse falls short (2013-02-13/) (http://community.jax.org) %5Boriginal post]
  20. Warren, H. (2009). Editorial: Mouse models to study sepsis syndrome in humans Journal of Leukocyte Biology, 86 (2), 199-201 DOI: 10.1189/jlb.0309210
  21. Doi, K., Leelahavanichkul, A., Yuen, P., & Star, R. (2009). Animal models of sepsis and sepsis-induced kidney injury Journal of Clinical Investigation, 119 (10), 2868-2878 DOI: 10.1172/JCI39421




The Scatter of Medical Research and What to do About it.

18 05 2012

ResearchBlogging.orgPaul Glasziou, GP and professor in Evidence Based Medicine, co-authored a new article in the BMJ [1]. Similar to another paper [2] I discussed before [3] this paper deals with the difficulty for clinicians of staying up-to-date with the literature. But where the previous paper [2,3] highlighted the mere increase in number of research articles over time, the current paper looks at the scatter of randomized clinical trials (RCTs) and systematic reviews (SR’s) accross different journals cited in one year (2009) in PubMed.

Hofmann et al analyzed 7 specialties and 9 sub-specialties, that are considered the leading contributions to the burden of disease in high income countries.

They followed a relative straightforward method for identifying the publications. Each search string consisted of a MeSH term (controlled  term) to identify the selected disease or disorders, a publication type [pt] to identify the type of study, and the year of publication. For example, the search strategy for randomized trials in cardiology was: “heart diseases”[MeSH] AND randomized controlled trial[pt] AND 2009[dp]. (when searching “heart diseases” as a MeSH, narrower terms are also searched.) Meta-analysis[pt] was used to identify systematic reviews.

Using this approach Hofmann et al found 14 343 RCTs and 3214 SR’s published in 2009 in the field of the selected (sub)specialties. There was a clear scatter across journals, but this scatter varied considerably among specialties:

“Otolaryngology had the least scatter (363 trials across 167 journals) and neurology the most (2770 trials across 896 journals). In only three subspecialties (lung cancer, chronic obstructive pulmonary disease, hearing loss) were 10 or fewer journals needed to locate 50% of trials. The scatter was less for systematic reviews: hearing loss had the least scatter (10 reviews across nine journals) and cancer the most (670 reviews across 279 journals). For some specialties and subspecialties the papers were concentrated in specialty journals; whereas for others, few of the top 10 journals were a specialty journal for that area.
Generally, little overlap occurred between the top 10 journals publishing trials and those publishing systematic reviews. The number of journals required to find all trials or reviews was highly correlated (r=0.97) with the number of papers for each specialty/ subspecialty.”

Previous work already suggested that this scatter of research has a long tail. Half of the publications is in a minority of papers, whereas the remaining articles are scattered among many journals (see Fig below).

Click to enlarge en see legends at BMJ 2012;344:e3223 [CC]

The good news is that SRs are less scattered and that general journals appear more often in the top 10 journals publishing SRs. Indeed for 6 of the 7 specialties and 4 of the 9 subspecialties, the Cochrane Database of Systematic Reviews had published the highest number of systematic reviews, publishing between 6% and 18% of all the systematic reviews published in each area in 2009. The bad news is that even keeping up to date with SRs seems a huge, if not impossible, challenge.

In other words, it is not sufficient for clinicians to rely on personal subscriptions to a few journals in their specialty (which is common practice). Hoffmann et al suggest several solutions to help clinicians cope with the increasing volume and scatter of research publications.

  • a central library of systematic reviews (but apparently the Cochrane Library fails to fulfill such a role according to the authors, because many reviews are out of date and are perceived as less clinically relevant)
  • registry of planned and completed systematic reviews, such as prospero. (this makes it easier to locate SRs and reduces bias)
  • Synthesis of Evidence and synopses, like the ACP-Jounal Club which summarizes the best evidence in internal medicine
  • Specialised databases that collate and critically appraise randomized trials and systematic reviews, like www.pedro.org.au for physical therapy. In my personal experience, however, this database is often out of date and not comprehensive
  • Journal scanning services like EvidenceUpdates from mcmaster.ca), which scans over 120 journals, filters articles on the basis of quality, has practising clinicians rate them for relevance and newsworthiness, and makes them available as email alerts and in a searchable database. I use this service too, but besides that not all specialties are covered, the rating of evidence may not always be objective (see previous post [4])
  • The use of social media tools to alert clinicians to important new research.

Most of these solutions are (long) existing solutions that do not or only partly help to solve the information overload.

I was surprised that the authors didn’t propose the use of personalized alerts. PubMed’s My NCBI feature allows to create automatic email alerts on a topic and to subscribe to electronic tables of contents (which could include ACP journal Club). Suppose that a physician browses 10 journals roughly covering 25% of the trials. He/she does not need to read all the other journals from cover to cover to avoid missing one potentially relevant trial. Instead it is far more efficient to perform a topic search to filter relevant studies from journals that seldom publish trials on the topic of interest. One could even use the search of Hoffmann et al to achieve this.* Although in reality, most clinical researchers will have narrower fields of interest than all studies about endocrinology and neurology.

At our library we are working at creating deduplicated, easy to read, alerts that collate table of contents of certain journals with topic (and author) searches in PubMed, EMBASE and other databases. There are existing tools that do the same.

Another way to reduce the individual work (reading) load is to organize journals clubs or even better organize regular CATs (critical appraised topics). In the Netherlands, CATS are a compulsory item for residents. A few doctors do the work for many. Usually they choose topics that are clinically relevant (or for which the evidence is unclear).

The authors shortly mention that their search strategy might have missed  missed some eligible papers and included some that are not truly RCTs or SRs, because they relied on PubMed’s publication type to retrieve RCTs and SRs. For systematic reviews this may be a greater problem than recognized, for the authors have used meta-analyses[pt] to identify systematic reviews. Unfortunately PubMed has no publication type for systematic reviews, but it may be clear that there are many more systematic reviews that meta-analyses. Possibly systematical reviews might even have a different scatter pattern than meta-analyses (i.e. the latter might be preferentially included in core journals).

Furthermore not all meta-analyses and systematic reviews are reviews of RCTs (thus it is not completely fair to compare MAs with RCTs only). On the other hand it is a (not discussed) omission of this study, that only interventions are considered. Nowadays physicians have many other questions than those related to therapy, like questions about prognosis, harm and diagnosis.

I did a little imperfect search just to see whether use of other search terms than meta-analyses[pt] would have any influence on the outcome. I search for (1) meta-analyses [pt] and (2) systematic review [tiab] (title and abstract) of papers about endocrine diseases. Then I subtracted 1 from 2 (to analyse the systematic reviews not indexed as meta-analysis[pt])

Thus:

(ENDOCRINE DISEASES[MESH] AND SYSTEMATIC REVIEW[TIAB] AND 2009[DP]) NOT META-ANALYSIS[PT]

I analyzed the top 10/11 journals publishing these study types.

This little experiment suggests that:

  1. the precise scatter might differ per search: apparently the systematic review[tiab] search yielded different top 10/11 journals (for this sample) than the meta-analysis[pt] search. (partially because Cochrane systematic reviews apparently don’t mention systematic reviews in title and abstract?).
  2. the authors underestimate the numbers of Systematic Reviews: simply searching for systematic review[tiab] already found appr. 50% additional systematic reviews compared to meta-analysis[pt] alone
  3. As expected (by me at last), many of the SR’s en MA’s were NOT dealing with interventions, i.e. see the first 5 hits (out of 108 and 236 respectively).
  4. Together these findings indicate that the true information overload is far greater than shown by Hoffmann et al (not all systematic reviews are found, of all available search designs only RCTs are searched).
  5. On the other hand this indirectly shows that SRs are a better way to keep up-to-date than suggested: SRs  also summarize non-interventional research (the ratio SRs of RCTs: individual RCTs is much lower than suggested)
  6. It also means that the role of the Cochrane Systematic reviews to aggregate RCTs is underestimated by the published graphs (the MA[pt] section is diluted with non-RCT- systematic reviews, thus the proportion of the Cochrane SRs in the interventional MAs becomes larger)

Well anyway, these imperfections do not contradict the main point of this paper: that trials are scattered across hundreds of general and specialty journals and that “systematic reviews” (or meta-analyses really) do reduce the extent of scatter, but are still widely scattered and mostly in different journals to those of randomized trials.

Indeed, personal subscriptions to journals seem insufficient for keeping up to date.
Besides supplementing subscription by  methods such as journal scanning services, I would recommend the use of personalized alerts from PubMed and several prefiltered sources including an EBM search machine like TRIP (www.tripdatabase.com/).

*but I would broaden it to find all aggregate evidence, including ACP, Clinical Evidence, syntheses and synopses, not only meta-analyses.

**I do appreciate that one of the co-authors is a medical librarian: Sarah Thorning.

References

  1. Hoffmann, Tammy, Erueti, Chrissy, Thorning, Sarah, & Glasziou, Paul (2012). The scatter of research: cross sectional comparison of randomised trials and systematic reviews across specialties BMJ, 344 : 10.1136/bmj.e3223
  2. Bastian, H., Glasziou, P., & Chalmers, I. (2010). Seventy-Five Trials and Eleven Systematic Reviews a Day: How Will We Ever Keep Up? PLoS Medicine, 7 (9) DOI: 10.1371/journal.pmed.1000326
  3. How will we ever keep up with 75 trials and 11 systematic reviews a day (laikaspoetnik.wordpress.com)
  4. Experience versus Evidence [1]. Opioid Therapy for Rheumatoid Arthritis Pain. (laikaspoetnik.wordpress.com)




PubMed’s Higher Sensitivity than OVID MEDLINE… & other Published Clichés.

21 08 2011

ResearchBlogging.orgIs it just me, or are biomedical papers about searching for a systematic review often of low quality or just too damn obvious? I’m seldom excited about papers dealing with optimal search strategies or peculiarities of PubMed, even though it is my specialty.
It is my impression, that many of the lower quality and/or less relevant papers are written by clinicians/researchers instead of information specialists (or at least no medical librarian as the first author).

I can’t help thinking that many of those authors just happen to see an odd feature in PubMed or encounter an unexpected  phenomenon in the process of searching for a systematic review.
They think: “Hey, that’s interesting” or “that’s odd. Lets write a paper about it.” An easy way to boost our scientific output!
What they don’t realize is that the published findings are often common knowledge to the experienced MEDLINE searchers.

Lets give two recent examples of what I think are redundant papers.

The first example is a letter under the heading “Clinical Observation” in Annals of Internal Medicine, entitled:

“Limitations of the MEDLINE Database in Constructing Meta-analyses”.[1]

As the authors rightly state “a thorough literature search is of utmost importance in constructing a meta-analysis. Since the PubMed interface from the National Library of Medicine is a cornerstone of many meta-analysis,  the authors (two MD’s) focused on the freely available PubMed” (with MEDLINE as its largest part).

The objective was:

“To assess the accuracy of MEDLINE’s “human” and “clinical trial” search limits, which are used by authors to focus literature searches on relevant articles.” (emphasis mine)

O.k…. Stop! I know enough. This paper should have be titled: “Limitation of Limits in MEDLINE”.

Limits are NOT DONE, when searching for a systematic review. For the simple reason that most limits (except language and dates) are MESH-terms.
It takes a while before the indexers have assigned a MESH to the papers and not all papers are correctly (or consistently) indexed. Thus, by using limits you will automatically miss recent, not yet, or not correctly indexed papers. Whereas it is your goal (or it should be) to find as many relevant papers as possible for your systematic review. And wouldn’t it be sad if you missed that one important RCT that was published just the other day?

On the other hand, one doesn’t want to drown in irrelevant papers. How can one reduce “noise” while minimizing the risk of loosing relevant papers?

  1. Use both MESH and textwords to “limit” you search, i.e. also search “trial” as textword, i.e. in title and abstract: trial[tiab]
  2. Use more synonyms and truncation (random*[tiab] OR  placebo[tiab])
  3. Don’t actively limit but use double negation. Thus to get rid of animal studies, don’t limit to humans (this is the same as combining with MeSH [mh]) but safely exclude animals as follows: NOT animals[mh] NOT humans[mh] (= exclude papers indexed with “animals” except when these papers are also indexed with “humans”).
  4. Use existing Methodological Filters (ready-made search strategies) designed to help focusing on study types. These filters are based on one or more of the above-mentioned principles (see earlier posts here and here).
    Simple Methodological Filters can be found at the PubMed Clinical Queries. For instance the narrow filter for Therapy not only searches for the Publication Type “Randomized controlled trial” (a limit), but also for randomized, controlled ànd  trial  as textwords.
    Usually broader (more sensitive) filters are used for systematic reviews. The Cochrane handbook proposes to use the following filter maximizing precision and sensitivity to identify randomized trials in PubMed (see http://www.cochrane-handbook.org/):
    (randomized controlled trial [pt] OR controlled clinical trial [pt] OR randomized [tiab] OR placebo [tiab] OR clinical trials as topic [mesh: noexp] OR randomly [tiab] OR trial [ti]) NOT (animals [mh] NOT humans [mh]).
    When few hits are obtained, one can either use a broader filter or no filter at all.

In other words, it is a beginner’s mistake to use limits when searching for a systematic review.
Besides that the authors publish what should be common knowledge (even our medical students learn it) they make many other (little) mistakes, their precise search is difficult to reproduce and far from complete. This is already addressed by Dutch colleagues in a comment [2].

The second paper is:

PubMed had a higher sensitivity than Ovid-MEDLINE in the search for systematic reviews [3], by Katchamart et al.

Again this paper focuses on the usefulness of PubMed to identify RCT’s for a systematic review, but it concentrates on the differences between PubMed and OVID in this respect. The paper starts with  explaining that PubMed:

provides access to bibliographic information in addition to MEDLINE, such as in-process citations (..), some OLDMEDLINE citations (….) citations that precede the date that a journal was selected for MEDLINE indexing, and some additional life science journals that submit full texts to PubMed Central and receive a qualitative review by NLM.

Given these “facts”, am I exaggerating when I am saying that the authors are pushing at an open door when their main conclusion is that PubMed retrieved more citations overall than Ovid-MEDLINE? The one (!) relevant article missed in OVID was a 2005 study published in a Japanese journal that MEDLINE started indexing in 2007. It was therefore in PubMed, but not in OVID MEDLINE.

An important aspect to keep in mind when searching OVID/MEDLINE ( I have earlier discussed here and here). But worth a paper?

Recently, after finishing an exhaustive search in OVID/MEDLINE, we noticed that we missed a RCT in PubMed, that was not yet available in OVID/MEDLINE.  I just added one sentence to the search methods:

Additionally, PubMed was searched for randomized controlled trials ahead of print, not yet included in OVID MEDLINE. 

Of course, I could have devoted a separate article to this finding. But it is so self-evident, that I don’t think it would be worth it.

The authors have expressed their findings in sensitivity (85% for Ovid-MEDLINE vs. 90% for PubMed, 5% is that ONE paper missing), precision and  number to read (comparable for OVID-MEDLINE and PubMed).

If I might venture another opinion: it looks like editors of medical and epidemiology journals quickly fall for “diagnostic parameters” on a topic that they don’t understand very well: library science.

The sensitivity/precision data found have little general value, because:

  • it concerns a single search on a single topic
  • there are few relevant papers (17- 18)
  • useful features of OVID MEDLINE that are not available in PubMed are not used. I.e. Adjacency searching could enhance the retrieval of relevant papers in OVID MEDLINE (adjacency=words searched within a specified maximal distance of each other)
  • the searches are not comparable, nor are the search field commands.

The latter is very important, if one doesn’t wish to compare apples and oranges.

Lets take a look at the first part of the search (which is in itself well structured and covers many synonyms).
First part of the search - Click to enlarge
This part of the search deals with the P: patients with rheumatoid arthritis (RA). The authors first search for relevant MeSH (set 1-5) and then for a few textwords. The MeSH are fine. The authors have chosen to use Arthritis, rheumatoid and a few narrower terms (MeSH-tree shown at the right). The authors have taken care to use the MeSH:noexp command in PubMed to prevent the automatic explosion of narrower terms in PubMed (although this is superfluous for MesH terms having no narrow terms, like Caplan syndrome etc.).

But the fields chosen for the free text search (sets 6-9) are not comparable at all.

In OVID the mp. field is used, whereas all fields or even no fields are used in PubMed.

I am not even fond of the uncontrolled use of .mp (I rather search in title and abstract, remember we already have the proper MESH-terms), but all fields is even broader than .mp.

In general a .mp. search looks in the Title, Original Title, Abstract, Subject Heading, Name of Substance, and Registry Word fields. All fields would be .af in OVID not .mp.

Searching for rheumatism in OVID using the .mp field yields 7879 hits against 31390 hits when one searches in the .af field.

Thus 4 times as much. Extra fields searched are for instance the journal and the address field. One finds all articles in the journal Arthritis & Rheumatism for instance [line 6], or papers co-authored by someone of the dept. of rheumatoid surgery [line 9]

Worse, in PubMed the “all fields” command doesn’t prevent the automatic mapping.

In PubMed, Rheumatism[All Fields] is translated as follows:

“rheumatic diseases”[MeSH Terms] OR (“rheumatic”[All Fields] AND “diseases”[All Fields]) OR “rheumatic diseases”[All Fields] OR “rheumatism”[All Fields]

Oops, Rheumatism[All Fields] is searched as the (exploded!) MeSH rheumatic diseases. Thus rheumatic diseases (not included in the MeSH-search) plus all its narrower terms! This makes the entire first part of the PubMed search obsolete (where the authors searched for non-exploded specific terms). It explains the large difference in hits with rheumatism between PubMed and OVID/MEDLINE: 11910 vs 6945.

Not only do the authors use this .mp and [all fields] command instead of the preferred [tiab] field, they also apply this broader field to the existing (optimized) Cochrane filter, that uses [tiab]. Finally they use limits!

Well anyway, I hope that I made my point that useful comparison between strategies can only be made if optimal strategies and comparable  strategies are used. Sensitivity doesn’t mean anything here.

Coming back to my original point. I do think that some conclusions of these papers are “good to know”. As a matter of fact it should be basic knowledge for those planning an exhaustive search for a systematic review. We do not need bad studies to show this.

Perhaps an expert paper (or a series) on this topic, understandable for clinicians, would be of more value.

Or the recognition that such search papers should be designed and written by librarians with ample experience in searching for systematic reviews.

NOTE:
* = truncation=search for different word endings; [tiab] = title and abstract; [ti]=title; mh=mesh; pt=publication type

Photo credit

The image is taken from the Dragonfly-blog; here the Flickr-image Brain Vocab Sketch by labguest was adapted by adding the Pubmed logo.

References

  1. Winchester DE, & Bavry AA (2010). Limitations of the MEDLINE database in constructing meta-analyses. Annals of internal medicine, 153 (5), 347-8 PMID: 20820050
  2. Leclercq E, Kramer B, & Schats W (2011). Limitations of the MEDLINE database in constructing meta-analyses. Annals of internal medicine, 154 (5) PMID: 21357916
  3. Katchamart W, Faulkner A, Feldman B, Tomlinson G, & Bombardier C (2011). PubMed had a higher sensitivity than Ovid-MEDLINE in the search for systematic reviews. Journal of clinical epidemiology, 64 (7), 805-7 PMID: 20926257
  4. Search OVID EMBASE and Get MEDLINE for Free…. without knowing it (laikaspoetnik.wordpress.com 2010/10/19/)
  5. 10 + 1 PubMed Tips for Residents (and their Instructors) (laikaspoetnik.wordpress.com 2009/06/30)
  6. Adding Methodological filters to myncbi (laikaspoetnik.wordpress.com 2009/11/26/)
  7. Search filters 1. An Introduction (laikaspoetnik.wordpress.com 2009/01/22/)




RIP Statistician Paul Meier. Proponent not Father of the RCT.

14 08 2011

This headline in Boing Boing caught my eye today:  RIP Paul Meier, father of the randomized trial

Not surprisingly, I knew that Paul Meier (with Kaplan) introduced the Kaplan-Meier estimator (1958), a very important tool for measuring how many patients survive a medical treatment. But I didn’t know he was “father of the randomized trial”….

But is he really?:Father of the randomized trial and “probably best known for the introduction of randomized trials into the evaluation of medical treatments”, as Boing Boing states?

Boing Boing’s very short article is based on the New York Times article: Paul Meier, Statistician Who Revolutionized Medical Trials, Dies at 87. According to the NY Times “Dr. Meier was one of the first and most vocal proponents of what is called “randomization.” 

Randomization, the NY-Times explains, is:

Under the protocol, researchers randomly assign one group of patients to receive an experimental treatment and another to receive the standard treatment. In that way, the researchers try to avoid unintentionally skewing the results by choosing, for example, the healthier or younger patients to receive the new treatment.

(for a more detailed explanation see my previous posts The best study designs…. for dummies and #NotSoFunny #16 – Ridiculing RCTs & EBM)

Meier was a very successful proponent, that is for sure. According to Sir Richard Peto, (Dr. Meier) “perhaps more than any other U.S. statistician, was the one who influenced U.S. drug regulatory agencies, and hence clinical researchers throughout the U.S. and other countries, to insist on the central importance of randomized evidence.”

But an advocate need not be a father, for advocates are seldom the inventors/creators. A proponent is more of a nurse, a mentor or a … foster-parent.

Is Meier the true father/inventor of the RCT? And if not, who is?

Googling “Father of the randomized trial” won’t help, because all 1.610  hits point to Dr. Meier…. thanks to Boing Boing careless copying.

What I read so far doesn’t point at one single creator. And the RCT wasn’t just suddenly there. It started with comparison of treatments under controlled conditions. Back in 1753, the British naval surgeon James Lind published his famous account of 12 scurvy patients, “their cases as similar as I could get them” noting that “the most sudden and visible good effects were perceived from the uses of the oranges and lemons and that citrus fruit cured scurvy [3]. The French physician Pierre Louis and Harvard anatomist Oliver Wendell Holmes (19th century) were also fierce proponents of supporting conclusions about the effectiveness of treatments with statistics, not subjective impressions.[4]

But what was the first real RCT?

Perhaps the first real RCT was The Nuremberg salt test (1835) [6]. This was possibly not only the first RCT, but also the first scientific demonstration of the lack of effect of a homeopathic dilution. More than 50 visitors of a local tavern participated in the experiment. Half of them received a vial  filled with distilled snow water, the other half a vial with ordinary salt in a homeopathic C30-dilution of distilled snow water. None of the participants knew whether he got the “actual medicine or not” (blinding). The numbered vials were coded and the code was broken after the experiment (allocation concealment).

The first publications of RCT’s were in the field of psychology and agriculture. As a matter of fact one other famous statistician, Ronald A. Fisher  (of the Fisher’s exact test) seems to play a more important role in the genesis and popularization of RCT’s than Meier, albeit in agricultural research [5,7]. The book “The Lady Tasting Tea: How Statistics Revolutionized Science in the Twentieth Century” describes how Fisher devised a randomized trial at the spot to test the contention of a lady that she could taste the difference between tea into which milk had been poured and tea that had been poured into milk (almost according to homeopathic principles) [7]

According to Wikipedia [5] the published (medical) RCT appeared in the 1948 paper entitled “Streptomycin treatment of pulmonary tuberculosis”. One of the authors, Austin Bradford Hill, is (also) credited as having conceived the modern RCT.

Thus the road to the modern RCT is long, starting with the notions that experiments should be done under controlled conditions and that it doesn’t make sense to base treatment on intuition. Later, experiments were designed in which treatments were compared to placebo (or other treatments) in a randomized and blinded fashion, with concealment of allocation.

Paul Meier was not the inventor of the RCT, but a successful vocal proponent of the RCT. That in itself is commendable enough.

And although the Boing Boing article was incorrect, and many people googling for “father of the RCT” will find the wrong answer from now on, it did raise my interest in the history of the RCT and the role of statisticians in the development of science and clinical trials.
I plan to read a few of the articles and books mentioned below. Like the relatively lighthearted “The Lady Tasting Tea” [7]. You can envision a book review once I have finished reading it.

Note added 15-05 13.45 pm:

Today a more accurate article appeared in the Boston Globe (“Paul Meier; revolutionized medical studies using math”), which does justice to the important role of Dr Meier in the espousal of randomization as an essential element in clinical trials. For that is what he did.

Quote:

Dr. Meier published a scathing paper in the journal Science, “Safety Testing of Poliomyelitis Vaccine,’’ in which he described deficiencies in the production of vaccines by several companies. His paper was seen as a forthright indictment of federal authorities, pharmaceutical manufacturers, and the National Foundation for Infantile Paralysis, which funded the research for a polio vaccine.

  1. RIP Paul Meier, father of the randomized trial (boingboing.net)
  2. Paul Meier, Statistician Who Revolutionized Medical Trials, Dies at 87 (nytimes.com)
  3. M L Meldrum A brief history of the randomized controlled trial. From oranges and lemons to the gold standard. Hematology/ Oncology Clinics of North America (2000) Volume: 14, Issue: 4, Pages: 745-760, vii PubMed: 10949771  or see http://www.mendeley.com
  4. Fye WB. The power of clinical trials and guidelines,and the challenge of conflicts of interest. J Am Coll Cardiol. 2003 Apr 16;41(8):1237-42. PubMed PMID: 12706915. Full text
  5. http://en.wikipedia.org/wiki/Randomized_controlled_trial
  6. Stolberg M (2006). Inventing the randomized double-blind trial: The Nuremberg salt test of 1835. JLL Bulletin: Commentaries on the history of treatment evaluation (www.jameslindlibrary.org).
  7. The Lady Tasting Tea: How Statistics Revolutionized Science in the Twentieth Century Peter Cummings, MD, MPH, Jama 2001;286(10):1238-1239. doi:10.1001/jama.286.10.1238  Book Review.
    Book by David Salsburg, 340 pp, with illus, $23.95, ISBN 0-7167-41006-7, New York, NY, WH Freeman, 2001.
  8. Kaptchuk TJ. Intentional ignorance: a history of blind assessment and placebo controls in medicine. Bull Hist Med. 1998 Fall;72(3):389-433. PubMed PMID: 9780448. abstract
  9. The best study design for dummies/ (http://laikaspoetnik.wordpress.com: 2008/08/25/)
  10. #Notsofunny: Ridiculing RCT’s and EBM (http://laikaspoetnik.wordpress.com: 2010/02/01/)
  11. RIP Paul Meier : Research Randomization Advocate (mystrongmedicine.com)
  12. If randomized clinical trials don’t show that your woo works, try anthropology! (scienceblogs.com)
  13. The revenge of “microfascism”: PoMo strikes medicine again (scienceblogs.com)




How will we ever keep up with 75 Trials and 11 Systematic Reviews a Day?

6 10 2010

ResearchBlogging.orgAn interesting paper was published in PLOS Medicine [1]. As an information specialist and working part time for the Cochrane Collaboration* (see below), this topic is close to my heart.

The paper, published in PLOS Medicine is written by Hilda Bastian and two of my favorite EBM devotees ànd critics, Paul Glasziou and Iain Chalmers.

Their article gives an good overview of the rise in number of trials, systematic reviews (SR’s) of interventions and of medical papers in general. The paper (under the head: Policy Forum) raises some important issues, but the message is not as sharp and clear as usual.

Take the title for instance.

Seventy-Five Trials and Eleven Systematic Reviews a Day:
How Will We Ever Keep Up?

What do you consider its most important message?

  1. That doctors suffer from an information overload that is only going to get worse, as I did and probably also in part @kevinclauson who tweeted about it to medical librarians
  2. that the solution to this information overload consists of Cochrane systematic reviews (because they aggregate the evidence from individual trials) as @doctorblogs twittered
  3. that it is just about “too many systematic reviews (SR’s) ?”, the title of the PLOS-press release (so the other way around),
  4. That it is about too much of everything and the not always good quality SR’s: @kevinclauson and @pfanderson discussed that they both use the same ” #Cochrane Disaster” (see Kevin’s Blog) in their  teaching.
  5. that Archie Cochrane’s* dream is unachievable and ought perhaps be replaced by something less Utopian (comment by Richard Smith, former editor of the BMJ: 1, 3, 4, 5 together plus a new aspect: SR’s should not only  include randomized controlled trials (RCT’s)

The paper reads easily, but matters of importance are often only touched upon.  Even after reading it twice, I wondered: a lot is being said, but what is really their main point and what are their answers/suggestions?

But lets look at their arguments and pieces of evidence. (Black is from their paper, blue my remarks)

The landscape

I often start my presentations “searching for evidence” by showing the Figure to the right, which is from an older PLOS-article. It illustrates the information overload. Sometimes I also show another slide, with (5-10 year older data), saying that there are 55 trials a day, 1400 new records added per day to MEDLINE and 5000 biomedical articles a day. I also add that specialists have to read 17-22 articles a day to keep up to date with the literature. GP’s even have to read more, because they are generalists. So those 75 trials and the subsequent information overload is not really a shock to me.

Indeed the authors start with saying that “Keeping up with information in health care has never been easy.” The authors give an interesting overview of the driving forces for the increase in trials and the initiation of SR’s and critical appraisals to synthesize the evidence from all individual trials to overcome the information overload (SR’s and other forms of aggregate evidence decrease the number needed to read).

In box 1 they give an overview of the earliest systematic reviews. These SR’s often had a great impact on medical practice (see for instance an earlier discussion on the role of the Crash trial and of the first Cochrane review).
They also touch upon the institution of the Cochrane Collaboration.  The Cochrane collaboration is named after Archie Cochrane who “reproached the medical profession for not having managed to organise a “critical summary, by speciality or subspecialty, adapted periodically, of all relevant randomised controlled trials” He inspired the establishment of the international Oxford Database of Perinatal Trials and he encouraged the use of systematic reviews of randomized controlled trials (RCT’s).

A timeline with some of the key events are shown in Figure 1.

Where are we now?

The second paragraph shows many, interesting, graphs (figs 2-4).

Annoyingly, PLOS only allows one sentence-legends. The details are in the (WORD) supplement without proper referral to the actual figure numbers. Grrrr..!  This is completely unnecessary in reviews/editorials/policy forums. And -as said- annoying, because you have to read a Word file to understand where the data actually come from.

Bastian et al. have used MEDLINE’s publication types (i.e. case reports [pt], reviews[pt], Controlled Clinical Trial[pt] ) and search filters (the Montori SR filter and the Haynes narrow therapy filter, which is built-in in PubMed’s Clinical Queries) to estimate the yearly rise in number of study types. The total number of Clinical trials in CENTRAL (the largest database of controlled clinical trials, abbreviated as CCTRS in the article) and the Cochrane Database of Systematic Reviews (CDSR) are easy to retrieve, because the numbers are published quaterly (now monthly) by the Cochrane Library. Per definition, CDSR only contains SR’s and CENTRAL (as I prefer to call it) contains almost invariably controlled clinical trials.

In short, these are the conclusions from their three figures:

  • Fig 2: The number of published trials has raised sharply from 1950 till 2010
  • Fig 3: The number of systematic reviews and meta-analysis has raised tremendously as well
  • Fig 4: But systematic reviews and clinical trials are still far outnumbered by narrative reviews and case reports.

O.k. that’s clear & they raise a good point : an “astonishing growth has occurred in the number of reports of clinical trials since the middle of the 20th century, and in reports of systematic reviews since the 1980s—and a plateau in growth has not yet been reached.
Plus indirectly: the increase in systematic reviews  didn’t lead to a lower the number of trials and narrative reviews. Thus the information overload is still increasing.
But instead of discussing these findings they go into an endless discussion on the actual data and the fact that we “still do not know exactly how many trials have been done”, to end the discussion by saying that “Even though these figures must be seen as more illustrative than precise…” And than you think. So what? Furthermore, I don’t really get their point of this part of their article.

 

Fig. 2: The number of published trials, 1950 to 2007.

 

 

With regard to Figure 2 they say for instance:

The differences between the numbers of trial records in MEDLINE and CCTR (CENTRAL) (see Figure 2) have multiple causes. Both CCTR and MEDLINE often contain more than one record from a single study, and there are lags in adding new records to both databases. The NLM filters are probably not as efficient at excluding non-trials as are the methods used to compile CCTR. Furthermore, MEDLINE has more language restrictions than CCTR. In brief, there is still no single repository reliably showing the true number of randomised trials. Similar difficulties apply to trying to estimate the number of systematic reviews and health technology assessments (HTAs).

Sorry, although some of these points may be true, Bastian et al. don’t go into the main reason for the difference between both graphs, that is the higher number of trial records in CCTR (CENTRAL) than in MEDLINE: the difference can be simply explained by the fact that CENTRAL contains records from MEDLINE as well as from many other electronic databases and from hand-searched materials (see this post).
With respect to other details:. I don’t know which NLM filter they refer to, but if they mean the narrow therapy filter: this filter is specifically meant to find randomized controlled trials, and is far more specific and less sensitive than the Cochrane methodological filters for retrieving controlled clinical trials. In addition, MEDLINE does not have more language restrictions per se: it just contains a (extensive) selection of  journals. (Plus people more easily use language limits in MEDLINE, but that is besides the point).

Elsewhere the authors say:

In Figures 2 and 3 we use a variety of data sources to estimate the numbers of trials and systematic reviews published from 1950 to the end of 2007 (see Text S1). The number of trials continues to rise: although the data from CCTR suggest some fluctuation in trial numbers in recent years, this may be misleading because the Cochrane Collaboration virtually halted additions to CCTR as it undertook a review and internal restructuring that lasted a couple of years.

As I recall it , the situation is like this: till 2005 the Cochrane Collaboration did the so called “retag project” , in which they searched for controlled clinical trials in MEDLINE and EMBASE (with a very broad methodological filter). All controlled trials articles were loaded in CENTRAL, and the NLM retagged the controlled clinical trials that weren’t tagged with the appropriate publication type in MEDLINE. The Cochrane stopped the laborious retag project in 2005, but still continues the (now) monthly electronic search updates performed by the various Cochrane groups (for their topics only). They still continue handsearching. So they didn’t (virtually?!) halted additions to CENTRAL, although it seems likely that stopping the retagging project caused the plateau. Again the author’s main points are dwarfed by not very accurate details.

Some interesting points in this paragraph:

  • We still do not know exactly how many trials have been done.
  • For a variety of reasons, a large proportion of trials have remained unpublished (negative publication bias!) (note: Cochrane Reviews try to lower this kind of bias by applying no language limits and including unpublished data, i.e. conference proceedings, too)
  • Many trials have been published in journals without being electronically indexed as trials, which makes them difficult to find. (note: this has been tremendously improved since the Consort-statement, which is an evidence-based, minimum set of recommendations for reporting RCTs, and by the Cochrane retag-project, discussed above)
  • Astonishing growth has occurred in the number of reports of clinical trials since the middle of the 20th century, and in reports of systematic reviews since the 1980s—and a plateau in growth has not yet been reached.
  • Trials are now registered in prospective trial registers at inception, theoretically enabling an overview of all published and unpublished trials (note: this will also facilitate to find out reasons for not publishing data, or alteration of primary outcomes)
  • Once the International Committee of Medical Journal Editors announced that their journals would no longer publish trials that had not been prospectively registered, far more ongoing trials were being registered per week (200 instead of 30). In 2007, the US Congress made detailed prospective trial registration legally mandatory.

The authors do not discuss that better reporting of trials and the retag project might have facilitated the indexing and retrieval of trials.

How Close Are We to Archie Cochrane’s Goal?

According to the authors there are various reasons why Archie Cochrane’s goal will not be achieved without some serious changes in course:

  • The increase in systematic reviews didn’t displace other less reliable forms of information (Figs 3 and 4)
  • Only a minority of trials have been assessed in systematic review
  • The workload involved in producing reviews is increasing
  • The bulk of systematic reviews are now many years out of date.

Where to Now?

In this paragraph the authors discuss what should be changed:

  • Prioritize trials
  • Wider adoption of the concept that trials will not be supported unless a SR has shown the trial to be necessary.
  • Prioritizing SR’s: reviews should address questions that are relevant to patients, clinicians and policymakers.
  • Chose between elaborate reviews that answer a part of the relevant questions or “leaner” reviews of most of what we want to know. Apparently the authors have already chosen for the latter: they prefer:
    • shorter and less elaborate reviews
    • faster production ànd update of SR’s
    • no unnecessary inclusion of other study types other than randomized trials. (unless it is about less common adverse effects)
  • More international collaboration and thereby a better use  of resources for SR’s and HTAs. As an example of a good initiative they mention “KEEP Up,” which will aim to harmonise updating standards and aggregate updating results, initiated and coordinated by the German Institute for Quality and Efficiency in Health Care (IQWiG) and involving key systematic reviewing and guidelines organisations such as the Cochrane Collaboration, Duodecim, the Scottish Intercollegiate Guidelines Network (SIGN), and the National Institute for Health and Clinical Excellence (NICE).

Summary and comments

The main aim of this paper is to discuss  to which extent the medical profession has managed to make “critical summaries, by speciality or subspeciality, adapted periodically, of all relevant randomized controlled trials”, as proposed 30 years ago by Archie Cochrane.

Emphasis of the paper is mostly on the number of trials and systematic reviews, not on qualitative aspects. Furthermore there is too much emphasis on the methods determining the number of trials and reviews.

The main conclusion of the authors is that an astonishing growth has occurred in the number of reports of clinical trials as well as in the number of SR’s, but that these systematic pieces of evidence shrink into insignificance compared to the a-systematic narrative reviews or case reports published. That is an important, but not an unexpected conclusion.

Bastian et al don’t address whether systematic reviews have made the growing number of trials easier to access or digest. Neither do they go into developments that have facilitated the retrieval of clinical trials and aggregate evidence from databases like PubMed: the Cochrane retag-project, the Consort-statement, the existence of publication types and search filters (they use themselves to filter out trials and systematic reviews). They also skip other sources than systematic reviews, that make it easier to find the evidence: Databases with Evidence Based Guidelines, the TRIP database, Clinical Evidence.
As Clay Shirky said: “It’s Not Information Overload. It’s Filter Failure.”

It is also good to note that case reports and narrative reviews serve other aims. For medical practitioners rare case reports can be very useful for their clinical practice and good narrative reviews can be valuable for getting an overview in the field or for keeping up-to-date. You just have to know when to look for what.

Bastian et al have several suggestions for improvement, but these suggestions are not always underpinned. For instance, they propose access to all systematic reviews and trials. Perfect. But how can this be attained? We could stimulate authors to publish their trials in open access papers. For Cochrane reviews this would be desirable but difficult, as we cannot demand from authors who work for months for free to write a SR to pay the publications themselves. The Cochrane Collab is an international organization that does not receive subsidies for this. So how could this be achieved?

In my opinion, we can expect the most important benefits from prioritizing of trials ànd SR’s, faster production ànd update of SR’s, more international collaboration and less duplication. It is a pity the authors do not mention other projects than “Keep up”.  As discussed in previous posts, the Cochrane Collaboration also recognizes the many issues raised in this paper, and aims to speed up the updates and to produce evidence on priority topics (see here and here). Evidence aid is an example of a successful effort.  But this is only the Cochrane Collaboration. There are many more non-Cochrane systematic reviews produced.

And then we arrive at the next issue: Not all systematic reviews are created equal. There are a lot of so called “systematic reviews”, that aren’t the conscientious, explicit and judicious created synthesis of evidence as they ought to be.

Therefore, I do not think that the proposal that each single trial should be preceded by a systematic review, is a very good idea.
In the Netherlands writing a SR is already required for NWO grants. In practice, people just approach me, as a searcher, the days before Christmas, with the idea to submit the grant proposal (including the SR) early in January. This evidently is a fast procedure, but doesn’t result in a high standard SR, upon which others can rely.

Another point is that this simple and fast production of SR’s will only lead to a larger increase in number of SR’s, an effect that the authors wanted to prevent.

Of course it is necessary to get a (reliable) picture of what has already be done and to prevent unnecessary duplication of trials and systematic reviews. It would the best solution if we would have a triplet (nano-publications)-like repository of trials and systematic reviews done.

Ideally, researchers and doctors should first check such a database for existing systematic reviews. Only if no recent SR is present they could continue writing a SR themselves. Perhaps it sometimes suffices to search for trials and write a short synthesis.

There is another point I do not agree with. I do not think that SR’s of interventions should only include RCT’s . We should include those study types that are relevant. If RCT’s furnish a clear proof, than RCT’s are all we need. But sometimes – or in some topics/specialties- RCT’s are not available. Inclusion of other study designs and rating them with GRADE (proposed by Guyatt) gives a better overall picture. (also see the post: #notsofunny: ridiculing RCT’s and EBM.

The authors strive for simplicity. However, the real world isn’t that simple. In this paper they have limited themselves to evidence of the effects of health care interventions. Finding and assessing prognostic, etiological and diagnostic studies is methodologically even more difficult. Still many clinicians have these kinds of questions. Therefore systematic reviews of other study designs (diagnostic accuracy or observational studies) are also of great importance.

In conclusion, whereas I do not agree with all points raised, this paper touches upon a lot of important issues and achieves what can be expected from a discussion paper:  a thorough shake-up and a lot of discussion.

References

  1. Bastian, H., Glasziou, P., & Chalmers, I. (2010). Seventy-Five Trials and Eleven Systematic Reviews a Day: How Will We Ever Keep Up? PLoS Medicine, 7 (9) DOI: 10.1371/journal.pmed.1000326

Related Articles





#NotSoFunny #16 – Ridiculing RCTs & EBM

1 02 2010

I remember it well. As a young researcher I presented my findings in one of my first talks, at the end of which the chair killed my work with a remark, that made the whole room of scientists laugh, but was really beside the point. My supervisor, a truly original and very wise scientist, suppressed his anger. Afterwards, he said: “it is very easy ridiculing something that isn’t a mainstream thought. It’s the argument that counts. We will prove that we are right.” …And we did.

This was not my only encounter with scientists who try to win the debate by making fun of a theory, a finding or …people. But it is not only the witty scientist who is to *blame*, it is also the uncritical audience that just swallows it.

I have similar feelings with some journal articles or blog posts that try to ridicule EBM – or any other theory or approach. Funny, perhaps, but often misunderstood and misused by “the audience”.

Take for instance the well known spoof article in the BMJ:

“Parachute use to prevent death and major trauma related to gravitational challenge: systematic review of randomised controlled trials”

It is one of those Christmas spoof articles in the BMJ, meant to inject some medical humor into the normally serious scientific literature. The spoof parachute article pretends to be a Systematic Review of RCT’s  investigating if parachutes can prevent death and major trauma. Of course, no such trial has been done or will be done: dropping people at random with and without a parachute to proof that you better jump out of a plane with a parachute.

I found the article only mildly amusing. It is so unrealistic, that it becomes absurd. Not that I don’t enjoy absurdities at times, but  absurdities should not assume a live of their own.  In this way it doesn’t evoke a true discussion, but only worsens the prejudice some people already have.

People keep referring to this 2003 article. Last Friday, Dr. Val (with whom I mostly agree) devoted a Friday Funny post to it at Get Better Health: “The Friday Funny: Why Evidence-Based Medicine Is Not The Whole Story”.* In 2008 the paper was also discussed by Not Totally Rad [3]. That EBM is not the whole story seems pretty obvious to me. It was never meant to be…

But lets get specific. Which assumptions about RCT’s and SR’s are wrong, twisted or put out of context? Please read the excellent comments below the article. These often put the finger on the spot.

1. EBM is cookbook medicine.
Many define EBM as “make clinical decisions based on a synthesis of the best available evidence about a treatment.” (i.e. [3]). However, EBM is not cookbook medicine.

The accepted definition of EBM  is “the conscientious, explicit and judicious use of current best evidence in making decisions about the care of individual patients” [4]. Sacket already emphasized back in 1996:

Good doctors use both individual clinical expertise and the best available external evidence, and neither alone is enough. Without clinical expertise, practice risks becoming tyrannised by evidence, for even excellent external evidence may be inapplicable to or inappropriate for an individual patient. Without current best evidence, practice risks becoming rapidly out of date, to the detriment of patients.


2. RCT’s are required for evidence.

Although a well performed RCT provides the “best” evidence, RCT’s are often not appropriate or indicated. That is especially true for domains other than therapy. In case of prognostic questions the most appropriate study design is usually an inception cohort. A RCT for instance can’t tell whether female age is a prognostic factor for clinical pregnancy rates following IVF: there is no way to randomize for “age”, or for “BMI”. ;)

The same is true for etiologic or harm questions. In theory, the “best” answer is obtained by RCT. However RCT’s are often unethical or unnecessary. RCT’s are out of the question to address whether substance X causes cancer. Observational studies will do. Sometimes cases provide sufficient evidence. If a woman gets hepatic veno-occlusive disease after drinking loads of a herbal tea the finding of  similar cases in the literature may be sufficient to conclude that the herbal tea probably caused the disease.

Diagnostic accuracy studies also require another study design (cross-sectional study, or cohort).

But even in the case of  interventions, we can settle for less than a RCT. Evidence is not present or not, but exists on a hierarchy. RCT’s (if well performed) are the most robust, but if not available we have to rely on “lower” evidence.

BMJ Clinical Evidence even made a list of clinical questions unlikely to be answered by RCT’s. In this case Clinical Evidence searches and includes the best appropriate form of evidence.

  1. where there are good reasons to think the intervention is not likely to be beneficial or is likely to be harmful;
  2. where the outcome is very rare (e.g. a 1/10000 fatal adverse reaction);
  3. where the condition is very rare;
  4. where very long follow up is required (e.g. does drinking milk in adolescence prevent fractures in old age?);
  5. where the evidence of benefit from observational studies is overwhelming (e.g. oxygen for acute asthma attacks);
  6. when applying the evidence to real clinical situations (external validity);
  7. where current practice is very resistant to change and/or patients would not be willing to take the control or active treatment;
  8. where the unit of randomisation would have to be too large (e.g. a nationwide public health campaign); and
  9. where the condition is acute and requires immediate treatment.
    Of these, only the first case is categorical. For the rest the cut off point when an RCT is not appropriate is not precisely defined.

Informed health decisions should be based on good science rather than EBM (alone).

Dr Val [2]: “EBM has been an over-reliance on “methodolatry” - resulting in conclusions made without consideration of prior probability, laws of physics, or plain common sense. (….) Which is why Steve Novella and the Science Based Medicine team have proposed that our quest for reliable information (upon which to make informed health decisions) should be based on good science rather than EBM alone.

Methodolatry is the profane worship of the randomized clinical trial as the only valid method of investigation. This is disproved in the previous sections.

The name “Science Based Medicine” suggests that it is opposed to “Evidence Based Medicine”. At their blog David Gorski explains: “We at SBM believe that medicine based on science is the best medicine and tirelessly promote science-based medicine through discussion of the role of science and medicine.”

While this may apply to a certain extent to quack or homeopathy (the focus of SBM) there are many examples of the opposite: that science or common sense led to interventions that were ineffective or even damaging, including:

As a matter of fact many side-effects are not foreseen and few in vitro or animal experiments have led to successful new treatments.

At the end it is most relevant to the patient that “it works” (and the benefits outweigh the harms).

Furthermore EBM is not -or should not be- without consideration of prior probability, laws of physics, or plain common sense. To me SBM and EBM are not mutually exclusive.

Why the example is bullshit unfair and unrealistic

I’ll leave it to the following comments (and yes the choice is biased) [1]

Nibu A George,Scientist :

First of all generalizing such reports of some selected cases and making it a universal truth is unhealthy and challenging the entire scientific community. Secondly, the comparing the parachute scenario with a pure medical situation is unacceptable since the parachute jump is rather a physical situation and it become a medical situation only if the jump caused any physical harm to the person involved.

Richard A. Davidson, MD,MPH:

This weak attempt at humor unfortunately reinforces one of the major negative stereotypes about EBM….that RCT’s are required for evidence, and that observational studies are worthless. If only 10% of the therapies that are paraded in front of us by journals were as effective as parachutes, we would have much less need for EBM. The efficacy of most of our current therapies are only mildly successful. In fact, many therapies can provide only a 25% or less therapeutic improvement. If parachutes were that effective, nobody would use them.
While it’s easy enough to just chalk this one up to the cliche of the cantankerous British clinician, it shows a tremendous lack of insight about what EBM is and does. Even worse, it’s just not funny.

Aviel Roy-Shapira, Senior Staff Surgeon

Smith and Pell succeeded in amusing me, but I think their spoof reflects a common misconception about evidence based medicine. All too many practitioners equate EBM with randomized controlled trials, and metaanalyses.
EBM is about what is accepted as evidence, not about how the evidence is obtained. For example, an RCT which shows that a given drug lowers blood pressure in patients with mild hypertension, however well designed and executed, is not acceptable as a basis for treatment decisions. One has to show that the drug actually lowers the incidence of strokes and heart attacks.
RCT’s are needed only when the outcome is not obvious. If most people who fall from airplanes without a parachute die, this is good enough. There is plenty of evidence for that.

EBM is about using outcome data for making therapeutic decisions. That data can come from RCTs but also from observation

Lee A. Green, Associate Professor

EBM is not RCTs. That’s probably worth repeating several times, because so often both EBM’s detractors and some of its advocates just don’t get it. Evidence is not binary, present or not, but exists on a heirarchy (Guyatt & Rennie, 2001). (….)
The methods and rigor of EBM are nothing more or less than ways of correcting for our
imperfect perceptions of our experiences. We prefer, cognitively, to perceive causal connections. We even perceive such connections where they do not exist, and we do so reliably and reproducibly under well-known sets of circumstances. RCTs aren’t holy writ, they’re simply a tool for filtering out our natural human biases in judgment and causal attribution. Whether it’s necessary to use that tool depends upon the likelihood of such bias occurring.

Scott D Ramsey, Associate Professor

Parachutes may be a no-brainer, but this article is brainless.

Unfortunately, there are few if any parallels to parachutes in health care. The danger with this type of article is that it can lead to labeling certain medical technologies as “parachutes” when in fact they are not. I’ve already seen this exact analogy used for a recent medical technology (lung volume reduction surgery for severe emphysema). In uncontrolled studies, it quite literally looked like everyone who didn’t die got better. When a high quality randomized controlled trial was done, the treatment turned out to have significant morbidity and mortality and a much more modest benefit than was originally hypothesized.

Timothy R. Church, Professor

On one level, this is a funny article. I chuckled when I first read it. On reflection, however, I thought “Well, maybe not,” because a lot of people have died based on physicians’ arrogance about their ability to judge the efficacy of a treatment based on theory and uncontrolled observation.

Several high profile medical procedures that were “obviously” effective have been shown by randomized trials to be (oops) killing people when compared to placebo. For starters to a long list of such failed therapies, look at antiarrhythmics for post-MI arrhythmias, prophylaxis for T. gondii in HIV infection, and endarterectomy for carotid stenosis; all were proven to be harmful rather than helpful in randomized trials, and in the face of widespread opposition to even testing them against no treatment. In theory they “had to work.” But didn’t.

But what the heck, let’s play along. Suppose we had never seen a parachute before. Someone proposes one and we agree it’s a good idea, but how to test it out? Human trials sound good. But what’s the question? It is not, as the author would have you believe, whether to jump out of the plane without a parachute or with one, but rather stay in the plane or jump with a parachute. No one was voluntarily jumping out of planes prior to the invention of the parachute, so it wasn’t to prevent a health threat, but rather to facilitate a rapid exit from a nonviable plane.

Another weakness in this straw-man argument is that the physics of the parachute are clear and experimentally verifiable without involving humans, but I don’t think the authors would ever suggest that human physiology and pathology in the face of medication, radiation, or surgical intervention is ever quite as clear and predictable, or that non-human experience (whether observational or experimental) would ever suffice.

The author offers as an alternative to evidence-based methods the “common sense” method, which is really the “trust me, I’m a doctor” method. That’s not worked out so well in many high profile cases (see above, plus note the recent finding that expensive, profitable angioplasty and coronary artery by-pass grafts are no better than simple medical treatment of arteriosclerosis). And these are just the ones for which careful scientists have been able to do randomized trials. Most of our accepted therapies never have been subjected to such scrutiny, but it is breathtaking how frequently such scrutiny reveals problems.

Thanks, but I’ll stick with scientifically proven remedies.

parachute experiments without humans

* on the same day as I posted Friday Foolery #15: The Man who pioneered the RCT. What a coincidence.

** Don’t forget to read the comments to the article. They are often excellent.

Photo Credits

ReferencesResearchBlogging.org

  1. Smith, G. (2003). Parachute use to prevent death and major trauma related to gravitational challenge: systematic review of randomised controlled trials BMJ, 327 (7429), 1459-1461 DOI: 10.1136/bmj.327.7429.1459
  2. The Friday Funny: Why Evidence-Based Medicine Is Not The Whole Story”. (getbetterhealth.com) [2010.01.29]
  3. Call for randomized clinical trials of Parachutes (nottotallyrad.blogspot.com) [08-2008]
  4. Sackett DL, Rosenberg WM, Gray JA, Haynes RB, & Richardson WS (1996). Evidence based medicine: what it is and what it isn’t. BMJ (Clinical research ed.), 312 (7023), 71-2 PMID: 8555924
Reblog this post [with Zemanta]
are very well edged off




Complementary Medicine & Pharmacists

30 11 2009

I don’t know if the situation is the same in other countries, but in the Netherlands we can only get prescribed medications in pharmacies. Drugstores are only allowed to sell over-the counter (OTC) medicines.

Most Pharmacies have a small shop of 5 square meters (besides a large storage room). What surprises me is that the counter is not only full with non-allergic creams, and the shelves are not only filled with liquorice and plasters, but the counter and shelves predominantly display naturopathic and herbal “medicines”. In this flu-season there are even leaflets how to prevent flu with all kinds of naturopathic medicine. Dr Vogel’s Echinaforce “helps to augment your natural resistance, lowers the risk of flu and shortens the duration or decreases the severity of symptoms once you have the flu” (..”vermindert u de kans op griep en herstelt u sneller als u toch ziek wordt“). Apparently A Vogel.nl (via Biohorma) started a campaign in the Netherlands. At their website there is even an advertisement for an offer by an insurance company -OHRA- because it generously refunds homeopathic medicine. Biohorma also made a You-Tube video.
In contrast, in the US there is a disclaimer at the Echinaforce site:” These statements have not been evaluated by the Food and Drug Administration (FDA). This product is not intended to diagnose, treat, cure or prevent any disease.”

There is no evidence that Echinacea prevents flu (see Cochrane Review and de Volkskrant [Dutch newspaper referring to clinical trials]), although it is not excluded that it helps for the early treatment of colds in adults.

Isn’t such a promotion of ineffective stuff a bad advice considering we have  a real flu-epidemic, and given the inverse relationship between pediatric vaccination and CAM usage (see Respectful Insolence)?

It is quite confusing, however, because Echinacea is advertised as an homeopathic medicine, whereas it seems a herbal medicine (not diluted ad infinitum). To date there is no evidence that homeopathy ‘works’. All 6 published Cochrane systematic reviews with ‘homeopathy’ or ‘homeopathic’ in the title conclude that there is little or no evidence that it works beyond the placebo-effect.

During the recent The House of Commons Science and Technology Committee meeting calling in homeopaths and scientists to discuss evidence for the alternative therapy Prof. Dr Ernst (with experience as a homeopath) said: “I have supplied a list of systematic reviews of homeopathy. There are two dozen. None in that list were positive.” (see this excellent summary of the meeting by Ian Sample). For the entire memorandum of Dr Ernst see here.

Besides that the clinical trials are ineffective, the whole theory is incompatible with the laws of physics and chemistry.

Nevertheless:

  • There is a lot of homeopathic research going on, i.e. funded by the NHS (National Health Sevice) in the UK and the NCCAM (National Center for Complementary and Alternative Medicin, NIH) in the US.
  • In the UK homeopathic medicine is endorsed by the MHRA (Medicines and Healthcare products Regulatory Agency)
  • CAM is booming business (£1.5bn industry in the UK)
  • CAM is covered by insurance companies.
  • CAM is sold and sometimes advocated by pharmacists.

Thus all over the world people are buying these ineffective homeopathic medicines while believing they ‘work’, or at least cause no harm. However, while homeopathic medicines may not harm themselves, they may cause harm if they are used in place of proven treatment for any life-threatening illness.” Indeed the WHO has warned people with conditions such as HIV, TB and malaria not to rely on homeopathic treatments (BBC NEWS 20 August 2009

For me it is incomprehensible, that pharmacists who are trained in pharmacology and chemistry (at the University Level), just sell those ineffective costly water-dilutions and advocate them directly or indirectly by putting them on the shelves, providing ample leaflets and brochures and giving positive “advise”. What could be the reason for doing that other than ignorance or MONEY?


Recommended Reading:

Photo Credits

  1. Pharmacists mortar and pestle http://commons.wikimedia.org/wiki/File:PharmacistsMortar.svg
  2. Homeopathic Medicine on the shelves http://www.flickr.com/photos/caseywest/ / CC BY-SA 2.0
    (this photo has nothing to do with the subject)
, but all kind of complementary medicine (CAM).
Reblog this post [with Zemanta]




Still Confusion about the Usefulness of PSA-screening.

13 04 2009

Prostate cancer is the most commonly diagnosed cancer affecting older men and second-biggest cancer killer. pc_epid_fig11a

Prostate Specific Antigen (PSA), a protein mainly produced by the prostate gland, is often elevated in prostate cancer – and often proportional to the prostate cancer volume. Since more men are diagnosed with prostate cancer by using PSA screening, middle-aged men have been advised to undergo a simple blood test to determine their blood PSA levels.

Indeed in the 20 years that the PSA test has been used there has been a significant drop in prostate cancer deaths.

However, this may have also resulted from better treatment modalities.

Furthermore, PSA tests are prone to false negative results (prostate cancer present in the complete absence of an elevated PSA level ), or vice versa, false positive results: elevated PSA occurring in non-cancerous prostate diseases, like prostate infection and benign prostatic hyperplasia (BPH). Some detected prostate cancers may also be indolent, never giving any trouble on the long term. Since the further diagnosis methods (biopsy) and treatment methods (irradiation, surgery, hormonal treatment) often have serious side effects (erectile dysfunction, urinary incontinence and bowel problems), there is a clear need to demonstrate whether PSA screening is worth the high risks of overdiagnosis and overtreatment:

Thus, does PSA screening really saves lives?
And what is the trade off between benefits and harms?

A Cochrane Systematic Review from 2006 [5] (also reviewed in EBM-online) concluded that there was no proof of benefit of PSA-screening. Yet absence of proof is not proof of absence. Moreover, both trials on which the review was based had methodological weaknesses.
Therefore, the main conclusion was to wait for the results from two large scale ongoing randomized controlled trials (RCTs).

The first study results of these two large RCT’s,  that many observers hoped would settle the controversy, have appeared in the March issue of the New England Journal of Medicine (NEJM). [1,2] The results are discussed in an accompanying editorial [3] and in a Perspective Roundtable [4] (with a video).

It should be stressed, however, that these are just interim results.

One of these two studies [1], the prostate component of the U.S. National Cancer Institute’s Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial (PLCO) showed no prostate specific mortality reduction over 11 yrs follow-up in 76,705 men by annual PSA screening and DRE (digital rectal exam). However:

  • The cut off is relatively high (4.0 ng per milliliter), which means that some prostate cancers could have been missed (on the other hand lowering the screening criteria might also have led to a higher false negative response)
  • The control group is “highly contaminated”, meaning that many men in the so called nonscreened arm had a PSA-test anyway ((52% in the nonscreened versus 85% in the screened arm).
  • The 11 yr follow up may be too short to show any significant effect. “Only” 0,1% of the men died of prostate cancer. On the long term the differences might become larger.
  • Since there were only 122 prostate cancer deaths in the screening group versus 135 in the control group, the power of the study to find any differences is mortality seems to be rather low.

The European ERSPC study [2] is larger than the PLCO trial (190,000 men), the cut off rate was lower (3.0 µg/L), and there was less contamination of the nonscreened arm. A shortcoming of the trial is that the diagnosis methods varied widely among centers participating in the trial. The follow-up time is 9 years.

The ESPRC trial noticed a difference in mortality between the screened and non-screened arms. Surprisingly the same outcome led to widely different conclusions, especially in the media (see Ben Goldacre on his blog Bad Science [6])

English newspapers concluded that the ERPSC study showed a clear advantage: Prostate cancer screening could cut deaths by 20% said the Guardian. Better cancer screening is every man’s right was the editorial in the Scotsman (see 6). These newspapers didn’t mention the lack of effect in the US study.

But most US newspapers, and scientists, concluded that the benefits didn’t outweigh the risks.

Why this different interpretation?

It is because 20% is the relative risk reduction. This means that the risk of getting prostate cancer is reduced by 20%. This sounds more impressive than it is, because it depends on your baseline risk. It is the absolute reduction that counts.
Suppose you would have a baseline chance of 10% of getting prostate cancer. Reducing this risk by 20% means that the risk is reduced from 10% to 8%. This sounds a lot less impressive.
But in reality your chance of getting prostate cancer comes closer to 0,1%. Then, a risk reduction of 20% becomes even less significant: it means your risk has decreased to 0,08%.

Absolute numbers are more meaningful. In the ESPRC trial[2], the estimated absolute reduction in prostate-cancer mortality was about 7 deaths per 10,000 men after 9 years of follow-up. This is not a tremendous effect. However the costs are high: to prevent one death from prostate cancer 1410 men would need to be screened and 48 additional cases of prostate cancer would need to be treated.

Overdiagnosis and overtreatment are probably the most important adverse effects of prostate-cancer screening and are vastly more common than in screening for breast, colorectal, or cervical cancer.

It is difficult to realize the impact of a false negative diagnosis. People tend to think that saving any live is worth any cost. But that isn’t the case.

This quote says a lot (from Ray Sahelian)

A few years ago my dad was found to have a high PSA test. He was 74 at the time. He underwent multiple visits to the doctor over the next few months with repeated PSA tests and exams, and eventually a biopsy indicated he had a small prostate cancer. I remember my dad calling me several times a month during that period constantly asking my thoughts on how he should proceed with radiation or other treatments for his cancer. My dad had a preexisting heart condition known as atrial fibrillation. I suggested he not undergo any treatment for the small cancer but just to follow the PSA levels. His doctor had agreed with my opinion. His PSA test stayed relatively the same over the next few years and the prostate cancer did not grow larger. My dad died at 78 from a heart rhythm problem. Ever since the discovery of the high PSA level, he was constantly worried about this prostate gland. What good did it do to have this PSA test at his age? It only led to more doctor visits, a painful prostate gland biopsy, and constant worry. Maybe the constant worry even made his heart weaker.

Indeed more men die with prostate cancer than of it.It’s estimated that appr 30% of American men over age 60 have small, harmless prostate cancers.

Although still hypothetical, non-invasive tests that would discriminate between low- and high risk prostate cancer could be a real solution to the problem. One such candidate might be the recently discovered urine test for sarcosine [7]

In conclusion
PSA-screening is associated with an earlier diagnosis of prostate cancer, but the present evidence shows at the most a slight reduction in prostate related mortality. Since screening and subsequent testing do have serious side effects, there seems a trade off between uncertain benefits and known harms. However, definite conclusions can only be drawn after complete follow-up and analyses of these and other studies [1,2,3]

REFERENCES

  1. ResearchBlogging.orgAndriole, G., Grubb, R., Buys, S., Chia, D., Church, T., Fouad, M., Gelmann, E., Kvale, P., Reding, D., Weissfeld, J., Yokochi, L., Crawford, E., O’Brien, B., Clapp, J., Rathmell, J., Riley, T., Hayes, R., Kramer, B., Izmirlian, G., Miller, A., Pinsky, P., Prorok, P., Gohagan, J., Berg, C., & , . (2009). Mortality Results from a Randomized Prostate-Cancer Screening Trial New England Journal of Medicine DOI: 10.1056/NEJMoa0810696
  2. Schroder, F., Hugosson, J., Roobol, M., Tammela, T., Ciatto, S., Nelen, V., Kwiatkowski, M., Lujan, M., Lilja, H., Zappa, M., Denis, L., Recker, F., Berenguer, A., Maattanen, L., Bangma, C., Aus, G., Villers, A., Rebillard, X., van der Kwast, T., Blijenberg, B., Moss, S., de Koning, H., Auvinen, A., & , . (2009). Screening and Prostate-Cancer Mortality in a Randomized European Study New England Journal of Medicine DOI: 10.1056/NEJMoa0810084
  3. Barry, M. (2009). Screening for Prostate Cancer — The Controversy That Refuses to Die New England Journal of Medicine, 360 (13), 1351-1354 DOI: 10.1056/NEJMe0901166
  4. Lee, T., Kantoff, P., & McNaughton-Collins, M. (2009). Screening for Prostate Cancer New England Journal of Medicine, 360 (13) DOI: 10.1056/NEJMp0901825
  5. Ilic D, O’Connor D, Green S, Wilt T. Screening for prostate cancer. Cochrane Database Syst Rev. 2006;3:CD004720.[Medline]
  6. Goldacre, Ben (2009) Bad Science: Venal-misleading-pathetic-dangerous-stupid-and-now-busted.net. (2009/03/), also Published in The Guardian, 21 March 2009
  7. Sreekumar A et al. (2009) Metabolomic profiles delineate potential role for sarcosine in prostate cancer progression Nature 457 (7231): 910-914 DOI: 10.1038/nature07762




An Antibiotic Past May Save Lives at the ICU.

16 03 2009

3241003338_60b07d7aba

Respiratory tract infections acquired in the intensive care unit (ICU) are important causes of morbidity and mortality, the most significant risk factor being mechanical ventilation. It is thought that hospital pneumonia commonly originates from flora colonized in the patient’s oropharynx (the area of the throat at the back of the mouth). Therefore, reduction of respiratory tract infections has been obtained by putting patients in semirecumbent instead of supine position. Another approach is selective decontamination. There are two methods of selective decontamination, SDD and SOD.

  1. SDD, Selective Decontamination of the Digestive tract consists of the administration of topical nonabsorbable antibiotics in the oropharynx and gastrointestinal tract, often concomitant with systemic antibiotics. It aims to reduce the incidence of pneumonia in critically ill patients by diminishing colonization of the upper respiratory tract with aerobic gram-negative bacilli and yeasts, without disrupting the anaerobic flora.
  2. SOD, Selective Oropharyngeal Decontamination is application of local antibiotics in the oopharynx only.

Both approaches were first introduced in the Netherlands. Most trials suggested that SDD lowered infection rates, but lacked statistical power to demonstrate an effect on mortality. However, meta-analyses and three single-center, randomized studies, did show a survival benefit of SDD in critically ill patients. Several studies had suggested that the local variant, SOD, was also effective, but SOD was never directly compared with SDD in the same study. Because of methodological issues and concern about increasing antibiotic resistance the use of both SDD and SOD has remained controversial. Even in the Netherlands where guidelines recommended the use of SDD after a Dutch publication in the Lancet (de Jonge et al, 2003) had shown the mortality to drop with 30% in the Academic Medical Center in Amsterdam, only 25% of the emergency doctors followed the guidelines.

The present Dutch study, published in the NEJM (2009), was undertaken to determine the effects on mortality in a head to head comparison of SDD and SOD. The effectiveness of SDD and SOD was determined in a crossover study using cluster randomization in 13 Dutch ICU’s, differing in size and teaching status. Cluster randomization means that ICU’s rather than the individual patients were randomized to avoid that one treatment regimen would influence the outcome of another regimen. Crossover implies that all three treatments (SDD, SOD, standard care) were administered in a random order in all ICU’s.

A total of 5939 patients were enrolled in this large study. Patients were eligible if they were expected to be intubated for more than 48 hours or to stay in the ICU for more than 72 hours. The SDD regimen involved four days of intravenous cefotaxime along with topical application of tobramycin, colistin and amphotericin B; the SOD regimen used only the topical antibiotics. Both regimens were compared with standard care. The duration of the study was six months, and the primary end point was 28-day mortality.

Of the 5,939 patients, 1,990 received standard care, 1,904 received SOD and 2,405 received SDD. Crude mortality rates in the three groups were 27.5%, 26.6% and 26.9%, respectively. These differences are not very huge and benefit was only discernable after adjustment for covariates (age, sex, APACHE II score, intubation status, medical specialty, study site, and study period): adjusted* odds ratios for 28-day mortality were 0.86 (95% CI, 0.74 to 0.99) in the SOD group and 0.83 (95% CI, 0.72 to 0.97) in the SDD group compared with standard care. This corresponded with the needed-to-treat numbers (NNT’s) of 29 and 34 to prevent one casualty at day 28 for SDD and SOD, respectively.

The limitations of the study (acknowledged by the authors) were the absence of concealment of allocation (due to the study design it was impossible to conceal the allocation for doctors at the wards), differences at baseline between the standard care and treatment groups and a mismatch between the original analysis plan and the study design (originally specified in-hospital death was the primary end point, but this did not take into account analysis of cluster effects.)

Selective Decontamination also improved microbiological outcomes, such as carriage of gram-negative bacteria in the respiratory and intestinal tracts and ICU-acquired bacteriemia. During the study periods the prevalence rates for antibiotic-resistant gram-negative bacteria were lower in the SOD and SDD periods than during the standard-care periods.

The authors concluded that both SDD and SOD were effective compared with standard care. Given the similarity in effects on survival between the treatment groups, the SOD regimen seems preferable to the SDD regimen, becauses it minimizes the risk of antibiotic resistance which poses a major threat to patients admitted to ICU’s. It should be noted that MRSA-infections are very rare in the Netherlands and in Scandinavia. The outcome of the study might therefore be different after long term treatment and/or in regions with a high prevalence of MRSA.

References

ResearchBlogging.orgde Smet, A., Kluytmans, J., Cooper, B., Mascini, E., Benus, R., van der Werf, T., van der Hoeven, J., Pickkers, P., Bogaers-Hofman, D., van der Meer, N., Bernards, A., Kuijper, E., Joore, J., Leverstein-van Hall, M., Bindels, A., Jansz, A., Wesselink, R., de Jongh, B., Dennesen, P., van Asselt, G., te Velde, L., Frenay, I., Kaasjager, K., Bosch, F., van Iterson, M., Thijsen, S., Kluge, G., Pauw, W., de Vries, J., Kaan, J., Arends, J., Aarts, L., Sturm, P., Harinck, H., Voss, A., Uijtendaal, E., Blok, H., Thieme Groen, E., Pouw, M., Kalkman, C., & Bonten, M. (2009). Decontamination of the Digestive Tract and Oropharynx in ICU Patients New England Journal of Medicine, 360 (1), 20-31 DOI: 10.1056/NEJMoa0800394

de Jonge E, Schultz M, Spanjaard L, et al. Effects of selective decontamination of the digestive tract on mortality and acquisition of resistant bacteria in intensive care: a randomised controlled trial. Lancet 2003;362:1011-1016 (PubMed citation)

Wim Köhler (2009) Smeren tegen infectie, NRC Handelsblad, Wetenschapsbijlage 3,4 januari (Dutch, online)

Barclay, L & Vega, C (2009) Selective Digestive, Oropharyngeal Decontamination May Reduce Intensive Care Mortality, Medscape

File, T.M., Bartlett J.G.,& Thorner, A.R. Risk factors and prevention of hospital-acquired (nosocomial); ventilator-associated; and healthcare-associated pneumonia in adults.www.uptodate)

Photo Credit (CC): http://www.flickr.com/photos/30688696@N00/3241003338/ (JomCleay)





Yet Another Negative Trial with Vitamins in Prostate Cancer: Vitamins C and E.

15 12 2008

Within a week after the large SELECT (Selenium and Vitamin E Cancer Prevention) Trial was halted due to disappointing results (see previous posts: [1] and [2]), the negative results of yet another large vitamin trial were announced [7].
Again, no benefits were found from either vitamin C or E when it came to preventing prostate ànd other cancers.
Both trials are now prepublished in JAMA. The full text articles and the accompanying editorial are freely available [3, 4, 5].

In The Physicians’ Health Study II Randomized Controlled Trial (PHS II), researchers tested the impact of regular vitamin E and C supplements on cancer rates among 14,641 male physicians over 50: 7641 men from the PHS I study and 7000 new physicians.

The man were randomly assigned to receive vitamin E, vitamin C, or a placebo. Besides vitamin C or E, beta carotene and/or multivitamins were also tested, but beta carotene was terminated on schedule in 2003 and the multivitamin component is continuing at the recommendation of the data and safety monitoring committee.

Similar to the SELECT trial this RCT had a factorial (2×2) design with respect to the vitamins E and C [1]: randomization yielded 4 nearly equal-sized groups receiving:

  • 400-IU synthetic {alpha}-tocopherol (vitamin E), every other day and placebo (similar to the SELECT trial)
  • 500-mg synthetic ascorbic acid (vitamin C), daily and placebo
  • both active agents
  • both placebos.

Over 8 years, taking vitamin E had no impact at all on rates of either prostate cancer (the primary outcome for vitamin E), or cancer in general. Vitamin C had no significant effect on total cancer (primary outcome for vitamin C) and prostate cancer. Neither was there an effect of vitamin E and/or C on other site-specific cancers.

How can the negative results be explained in the light of the positive results of earlier trials?

  • The conditions may differ from the positive trials:
    • The earlier positive trials had less methodological rigor. These were either observational studies or prostate cancer was not their primary outcome (and may therefore be due to chance). (See previous post The best study design for dummies).
    • Clinical data suggest that the positive effect of vitamin E observed in earlier trials was limited to smokers and/or people with low basal levels of vitamin E, whereas animal models suggest that vitamin E is efficacious against high fat-promoted prostate cancer growth (20), but lacks chemopreventive effects (i.e. see [1,4] and references in [5], a preclinical study we published in 2006).
      Indeed, there were very low levels of smoking in the PHS II study and the effect of the vitamins was mainly assessed on induction not on progression of prostate cancer.
    • Eight times higher vitamin E doses (400IE) have been used than in the ATCB study showing a benefit for vitamin E in decreasing prostate cancer risk! [1,4]
  • Other forms of vitamin E and selenium have been proposed to be more effective.
  • As Gann noted in the JAMA-editorial, the men in both recent studies were highly motivated and had good access to care. In SELECT, the majority of men were tested for PSA each year. Probably because of this intense surveillance, the mean PSA at diagnosis was low and prostate cancers were detected in an early, curable stage. Strikingly, there was only 1 death from prostate cancer in SELECT, whereas appr. 75-100 deaths were expected. There also were indications of a deficit in advanced prostate cancer in PHS II, although a much smaller one.
    In other words (Gann):
    “how can an agent be shown to prevent serious, clinically significant prostate cancers when PSA testing may be rapidly removing those cancers from the population at risk before they progress?”
  • Similarly, in the SELECT trial there was no constraint on the use of other multivitamins and both studies put no restriction on the diet. Indeed the group of physicians who participated in the PHS II trial were healthier overall and ate a more nutritious diet. Therefore Dr Shao wondered
    “Do we really have a placebo group – people with zero exposure? None of these physicians had zero vitamin C and E” [7]. In the Netherlands we were not even able to perform a small phase II trial with certain nutrients for the simple reason that most people already took them.

What can we learn from these negative trials (the SELECT trial and this PHS II-trial)?

  • Previous positive results were probably due to chance. In the future a better preselection of compounds and doses in Phase 2 trials should determine which few interventions make it through the pipeline (Gann, Schroder).
  • Many other trials disprove the health benefits of high dose vitamins and some single vitamins may even increase risks for specific cancers, heart disease or mortality [9]. In addition vitamin C has recently been shown to interfere with cancer treatment [10].
  • The trials make it highly unlikely that vitamins prevent the development of prostate cancer (or other cancers) when given as a single nutrient intervention. Instead, as Dr Sasso puts it “At the end of the day this serves as a reminder that we should get back to basics: keeping your body weight in check, being physically active, not smoking and following a good diet.”
  • Single vitamins or high dose vitamins/antioxidants should not be advised to prevent prostate cancer (or any other cancer). Still it is very difficult to convince people not taking supplements.
  • Another issue is that all kind of pharmaceutical companies keep on pushing the sales of these “natural products”, selectively referring to positive results only. It is about time to regulate this.

1937004448_dfcf7d149f-vitamines-op-een-bordje1

Sources & other reading (click on grey)

  1. Huge disappointment: Selenium and Vitamin E fail to Prevent Prostate Cancer.(post on this blog about the SELECT trial)
  2. Podcasts: Cochrane Library and MedlinePlus: (post on this blog)
  3. Vitamins E and C in the Prevention of Prostate and Total Cancer in Men: The Physicians’ Health Study II Randomized Controlled Trial. J. Michael Gaziano et al JAMA. 2008;0(2008):2008862-11.[free full text]
  4. Effect of Selenium and Vitamin E on Risk of Prostate Cancer and Other Cancers: The Selenium and Vitamin E Cancer Prevention Trial. Scott M. Lippman, Eric A. Klein et al (SELECT)JAMA. 2008;0(2008):2008864-13 [free full text].
  5. Randomized Trials of Antioxidant Supplementation for Cancer Prevention: First Bias, Now Chance-Next, Cause. Peter H. Gann JAMA. 2008;0(2008):2008863-2 [free full text].
  6. Combined lycopene and vitamin E treatment suppresses the growth of PC-346C human prostate cancer cells in nude mice. Limpens J, Schröder FH, et al. J Nutr. 2006 May;136(5):1287-93 [free full text].

    News
  7. The New York Times (2008/11/19) Study: Vitamins E and C Fail to Prevent Cancer in Men.
  8. BBC news: (2008/12/10) Vitamins ‘do not cut cancer risk’.
  9. The New York Times (2008/11/20) News keeps getting worse for vitamins.
  10. The New York Times (2008/10/01) Vitamin C may interfere with cancer treatment.








Huge disappointment: Selenium and Vitamin E fail to Prevent Prostate Cancer.

16 11 2008

select

October 27th the news was released that ([see here for entire announcement from nih.gov]

“an initial, independent review of study data from the Selenium and Vitamin E Cancer Prevention Trial (SELECT), funded by the National Cancer Institute (NCI) and other institutes that comprise the National Institutes of Health shows that selenium and vitamin E supplements, taken either alone or together, did not prevent prostate cancer. The data also showed two concerning trends: a small but not statistically significant increase in the number of prostate cancer cases among the over 35,000 men age 50 and older in the trial taking only vitamin E and a small, but not statistically significant increase in the number of cases of adult onset diabetes in men taking only selenium. Because this is an early analysis of the data from the study, neither of these findings proves an increased risk from the supplements and both may be due to chance.”

SELECT is the second large-scale study of chemoprevention for prostate cancer. Chemoprevention or chemoprophylaxis refers to the administration of a medication to prevent disease. The SELECT trial aimed to determine whether dietary supplementation with selenium and/or vitamin E could reduce the risk of prostate cancer among healthy men. It is a randomized, prospective, double-blind study with a 2×2 factorial design, which means that the volunteering men received either one of the supplements, b2x2-select-vierkantoth supplements or no supplements (but placebo instead), without knowing which treatment they would receive.
The trial volunteers were randomly assigned to one the following treatments:

  1. 200 µg of selenium and 400 IU of vitamin E per day. (both supplements)
  2. 200 µg of selenium per day and placebo
  3. 400 IU of vitamin E per day and placebo
  4. two different placebo’s (neither supplement)
    (µg = micrograms, IU = International Units)

Enrollment for the trial began in 2001 and ended in 2004. Supplements were to be taken for a minimum of 7 years and a maximum of 12 years. Therefore the final results were anticipated in 2013. However, but due to the negative preliminary results, SELECT participants still in the trial are now being told to stop taking the pills. The participants will continue to have their health monitored by study staff for about three more years, continue to respond to the study questionnaires, and will provide a blood sample at their five-year anniversary of joining the trial, to ensure their health and to allow a complete analysis of the study. (see SELECT Q & A).

In an interview with CBS, one of the investigators Dr Katz, said he was highly disappointed and concerned, because he had high hopes for the trial. “I”m disappointed with the study. I’m very concerned about the results of the trial.

more about “Vitamin E A Flop In Prostate Cancer T…“, (with 15 sec advertisement first) posted with vodpod. This video is derived from CBS news.

Dr. Klein, one of the principal investigators, has published as many as 14 publications on the SELECT trial (see PubMed). He has always been a strong advocate of this huge trial.

The question now is:
Was there enough evidence to support such a large trial? Could this result have been foreseen? Would the trial have had different outcomes if other conditions had been chosen?

The SELECT trial seems to add to the ever growing list of disappointing “preventive” vitamin trials. See for instance this blogpost of sandnsurf on “a systematic review of all the published randomized controlled trials (RCTs) on multivitamins and antioxidant supplements in various diseases, and their effect on overall mortality” concluding:

“Taking the antioxidant vitamins A (and its precursor beta-carotene) and E singly or in multivitamins is dangerous and should be avoided by people eating a healthy diet. On a diet like that recommended here, the intake of these and other important vitamins should be high, with no need for supplementation.”

Quite coincidentally I commented to Sandsnurf blogpost referring to the SELECT trial, 1 week before the bad outcome was announced):

Indeed, in many RCT’s vitamin supplements didn’t have the beneficial effects that they were supposed to have. Already in the early nineties, adverse effects of beta-carotene (higher mortality in smokers) have been shown in several RCT’s. Still, because vitamin E had an expected positive effect on prostate cancer in one such trial, vitamin E is now being tested together with selenium (2X2) in a very large prostate cancer trial. Quite disturbingly, 8 times higher doses vitamin E are being used (400IE) compared to the original study. If the Lawson study is right, the outcome might be harmful. Worrying.

It might be argued that it is easy to criticize a study once the outcome is known. However, this critique is not new.

Already in 2002 a very good critique was written by MA Moyad in Urology entitled: Selenium and vitamin E supplements for prostate cancer: evidence or embellishment?

Here I will summarize the most important arguments against this particular trial (largely based on the Moyad paper)

  • SELECT was based on numerous laboratory and observational studies supporting the use of these supplements. As discussed previously such study designs don’t provide the best evidence.
  • The incidence, or rate of occurrence, of prostate cancer was not the primary focus or endpoint of the few randomized controlled trials studies on which the SELECT study was based.
  • A 2×2 design is inadequate for dose-response evaluations, in other words: before you start the trial, you have to be pretty sure about the optimal dose of each supplement and of the interactive effect of vitamin E and selenium in the particular doses used. The interaction between two agents might be synergistic or additive, also with respect to any negative (i.e. pro-oxidant) effect.
  • Eight times higher vitamin E doses (400IE) have been used than in the ATCB study showing a benefit for vitamin E in decreasing prostate cancer risk! This is remarkable, given the fact that high doses of anti-oxidants can be harmful. Indeed, a prospective study has shown, that vitamin E supplements in higher doses (> or =100 IU) are associated with a higher risk of aggressive or fatal prostate cancer in nonsmokers.
  • Other forms of vitamin E and selenium have been proposed to be more effective. For instance dietary vitamin E (gamma tocopherol and/or gamma tocotrienols) might be more effective in lowering prostate cancer risk than the chemically-derived vitamin E (dl-alpha tocopherol acetate) used in SELECT. Also the used selenomethionine might be less effective than organically-bound selenium.
  • Selenium and vitamin E supplements seem to provide a benefit only for those individuals who have lower baseline plasma levels of selenium or vitamin E.
  • There may be other compounds that may be more effective, like finasteride, lycopene, statins (or with respect to food: a healthy lifestyle)

Katz said. “I would have hoped this would have been the way to prevent cancer in this country.”

Isn’t it a little bit naive to expect such huge effects (25% less prostate cancers) just by taking 2 supplements, given the thoughts summarized above?

In the interview, shown in the CBS-interview LaPook concludes “This is a major disappointment, but it is also progress. Because it’s also important to know what does not prevent cancer.”

Well I wonder whether it is ethical ànd scientifically valid, to do such a costly experiment with 35.000 healthy volunteers, based on such little evidence. Do we have to test each single possibly effective food ingredient as a single intervention?

SOURCES:
Official publications and information

- EA Klein: http://www.ncbi.nlm.nih.gov/pubmed/12756490
- Lippman SM, J Natl Cancer Inst. 2005 Jan 19;97(2):94-102. Designing the Selenium and Vitamin E Cancer Prevention Trial (SELECT). (PubMed record)
- new2.gif The results of the SELECT trial are published in JAMA: Effect of Selenium and Vitamin E on Risk of Prostate Cancer and Other Cancers: The Selenium and Vitamin E Cancer Prevention Trial. Scott M. Lippman, Eric A. Klein et al SELECT)JAMA. 2008;0(2008):2008864-13, published online December 9th 2008.

- SELECT Q&A: www.cancer.gov/newscenter/pressreleases/SELECTQandA
- General information on SELECT http://www.crab.org/select/
- Information on Study design (from Cancer Gov.clinical trialsSWOG s0000) and from clinicaltrials.gov

- More information on study designs and the ATCB trial (on which this study was based) in a previous post: the best study design for dummies

NEWS
- CBS Evening News Exclusive: Vitamin E And Selenium Fail To Prevent The Disease In Large Clinical Trial, NEW YORK, Oct. 27, 2008
- Los Angelos Times; Vitamin E, selenium fail to prevent prostate
- Emaxhealth: NCI stops prostate cancer prevention trial. With many good links to further information








Follow

Get every new post delivered to your Inbox.

Join 610 other followers