Can Guidelines Harm Patients?

2 05 2012

ResearchBlogging.orgRecently I saw an intriguing “personal view” in the BMJ written by Grant Hutchison entitled: “Can Guidelines Harm Patients Too?” Hutchison is a consultant anesthetist with -as he calls it- chronic guideline fatigue syndrome. Hutchison underwent an acute exacerbation of his “condition” with the arrival of another set of guidelines in his email inbox. Hutchison:

On reviewing the level of evidence provided for the various recommendations being offered, I was struck by the fact that no relevant clinical trials had been carried out in the population of interest. Eleven out of 25 of the recommendations made were supported only by the lowest levels of published evidence (case reports and case series, or inference from studies not directly applicable to the relevant population). A further seven out of 25 were derived only from the expert opinion of members of the guidelines committee, in the absence of any guidance to be gleaned from the published literature.

Hutchison’s personal experience is supported by evidence from two articles [2,3].

One paper published in the JAMA 2009 [2] concludes that ACC/AHA (American College of Cardiology and the American Heart Association) clinical practice guidelines are largely developed from lower levels of evidence or expert opinion and that the proportion of recommendations for which there is no conclusive evidence is growing. Only 314 recommendations of 2711 (median, 11%) are classified as level of evidence A , thus recommendation based on evidence from multiple randomized trials or meta-analyses.  The majority of recommendations (1246/2711; median, 48%) are level of evidence C, thus based  on expert opinion, case studies, or standards of care. Strikingly only 245 of 1305 class I recommendations are based on the highest level A evidence (median, 19%).

Another paper, published in Ann Intern Med 2011 [3], reaches similar conclusions analyzing the Infectious Diseases Society of America (IDSA) Practice Guidelines. Of the 4218 individual recommendations found, only 14% were supported by the strongest (level I) quality of evidence; more than half were based on level III evidence only. Like the ACC/AHH guidelines only a small part (23%) of the strongest IDSA recommendations, were based on level I evidence (in this case ≥1 randomized controlled trial, see below). And, here too, the new recommendations were mostly based on level II and III evidence.

Although there is little to argue about Hutchison’s observations, I do not agree with his conclusions.

In his view guidelines are equivalent to a bullet pointed list or flow diagram, allowing busy practitioners to move on from practice based on mere anecdote and opinion. It therefore seems contradictory that half of the EBM-guidelines are based on little more than anecdote (case series, extrapolation from other populations) and opinion. He then argues that guidelines, like other therapeutic interventions, should be considered in terms of balance between benefit and risk and that the risk  associated with the dissemination of poorly founded guidelines must also be considered. One of those risks is that doctors will just tend to adhere to the guidelines, and may even change their own (adequate) practice  in the absence of any scientific evidence against it. If a patient is harmed despite punctilious adherence to the guideline-rules,  “it is easy to be seduced into assuming that the bad outcome was therefore unavoidable”. But perhaps harm was done by following the guideline….

First of all, overall evidence shows that adherence to guidelines can improve patient outcome and provide more cost effective care (Naveed Mustfa in a comment refers to [4]).

Hutchinson’s piece is opinion-based and rather driven by (understandable) gut feelings and implicit assumptions, that also surround EBM in general.

  1. First there is the assumption that guidelines are a fixed set of rules, like a protocol, and that there is no room for preferences (both of the doctor and the patient), interpretations and experience. In the same way as EBM is often degraded to “cookbook medicine”, EBM guidelines are turned into mere bullet pointed lists made by a bunch of experts that just want to impose their opinions as truth.
  2. The second assumption (shared by many) is that evidence based medicine is synonymous with “randomized controlled trials”. In analogy, only those EBM guideline recommendations “count” that are based on RCT’s or meta-analyses.

Before I continue, I would strongly advice all readers (and certainly all EBM and guideline-skeptics) to read this excellent and clearly written BJM-editorial by David Sackett et al. that deals with misconceptions, myths and prejudices surrounding EBM : Evidence based medicine: what it is and what it isn’t [5].

Sackett et al define EBM as “the conscientious, explicit and judicious use of current best evidence in making decisions about the care of individual patients” [5]. Sackett emphasizes that “Good doctors use both individual clinical expertise and the best available external evidence, and neither alone is enough. Without clinical expertise, practice risks becoming tyrannised by evidence, for even excellent external evidence may be inapplicable to or inappropriate for an individual patient. Without current best evidence, practice risks becoming rapidly out of date, to the detriment of patients.”

Guidelines are meant to give recommendations based on the best available evidence. Guidelines should not be a set of rules, set in stone. Ideally, guidelines have gathered evidence in a transparent way and make it easier for the clinicians to grasp the evidence for a certain procedure in a certain situation … and to see the gaps.

Contrary to what many people think, EBM is not restricted to randomized trials and meta-analyses. It involves tracking down the best external evidence there is. As I explained in #NotSoFunny #16 – Ridiculing RCTs & EBM, evidence is not an all-or-nothing thing: RCT’s (if well performed) are the most robust, but if not available we have to rely on “lower” evidence (from cohort to case-control to case series or expert opinion even).
On the other hand RCT’s are often not even suitable to answer questions in other domains than therapy (etiology/harm, prognosis, diagnosis): per definition the level of evidence for these kind of questions inevitably will be low*. Also, for some interventions RCT’s are not appropriate, feasible or too costly to perform (cesarean vs vaginal birth; experimental therapies, rare diseases, see also [3]).

It is also good to realize that guidance, based on numerous randomized controlled trials is probably not or limited applicable to groups of patients who are seldom included in a RCT: the cognitively impaired, the patient with multiple comorbidities [6], the old patient [6], children and (often) women.

Finally not all RCTs are created equal (various forms of bias; surrogate outcomes; small sample sizes, short follow-up), and thus should not all represent the same high level of evidence.*

Thus in my opinion, low levels of evidence are not per definition problematic. Even if they are the basis for strong recommendations. As long as it is clear how the recommendations were reached and as long as these are well underpinned (by whatever evidence or motivation). One could see the exposed gaps in evidence as a positive thing as it may highlight the need for clinical research in certain fields.

There is one BIG BUT: my assumption is that guidelines are “just” recommendations based on exhaustive and objective reviews of existing evidence. No more, no less. This means that the clinician must have the freedom to deviate from the recommendations, based on his own expertise and/or the situation and/or the patient’s preferences. The more, when the evidence on which these strong recommendations are based is ‘scant’. Sackett already warned for the possible hijacking of EBM by purchasers and managers (and may I add health insurances and governmental agencies) to cut the costs of health care and to impose “rules”.

I therefore think it is odd that the ACC/AHA guidelines prescribe that Class I recommendations SHOULD be performed/administered even if they are based on level C recommendations (see Figure).

I also find it odd that different guidelines have a different nomenclature. The ACC/AHA have Class I, IIa, IIb and III recommendations and level A, B, C evidence where level A evidence represents sufficient evidence from multiple randomized trials and meta-analyses, whereas the strength of recommendations in the IDSA guidelines includes levels A through C (OR D/E recommendations against use) and quality of evidence ranges from level I through III , where I indicates evidence from (just) 1 properly randomized controlled trial. As explained in [3] this system was introduced to evaluate the effectiveness of preventive health care interventions in Canada (for which RCTs are apt).

Finally, guidelines and guideline makers should probably be more open for input/feedback from people who apply these guidelines.


*the new GRADE (Grading of Recommendations Assessment, Development, and Evaluation) scoring system taking into account good quality observational studies as well may offer a potential solution.

Another possibly relevant post at this blog: The Best Study Design for … Dummies

Taken from a summary of an ACC/AHA guideline at
Click to enlarge.


  1. Hutchison, G. (2012). Guidelines can harm patients too BMJ, 344 (apr18 1) DOI: 10.1136/bmj.e2685
  2. Tricoci P, Allen JM, Kramer JM, Califf RM, & Smith SC Jr (2009). Scientific evidence underlying the ACC/AHA clinical practice guidelines. JAMA : the journal of the American Medical Association, 301 (8), 831-41 PMID: 19244190
  3. Lee, D., & Vielemeyer, O. (2011). Analysis of Overall Level of Evidence Behind Infectious Diseases Society of America Practice Guidelines Archives of Internal Medicine, 171 (1), 18-22 DOI: 10.1001/archinternmed.2010.482
  4. Menéndez R, Reyes S, Martínez R, de la Cuadra P, Manuel Vallés J, & Vallterra J (2007). Economic evaluation of adherence to treatment guidelines in nonintensive care pneumonia. The European respiratory journal : official journal of the European Society for Clinical Respiratory Physiology, 29 (4), 751-6 PMID: 17005580
  5. Sackett, D., Rosenberg, W., Gray, J., Haynes, R., & Richardson, W. (1996). Evidence based medicine: what it is and what it isn’t BMJ, 312 (7023), 71-72 DOI: 10.1136/bmj.312.7023.71
  6. Aylett, V. (2010). Do geriatricians need guidelines? BMJ, 341 (sep29 3) DOI: 10.1136/bmj.c5340

To Retract or Not to Retract… That’s the Question

7 06 2011

In the previous post I discussed [1] that editors of Science asked for the retraction of a paper linking XMRV retrovirus to ME/CFS.

The decision of the editors was based on the failure of at least 10 other studies to confirm these findings and on growing support that the results were caused by contamination. When the authors refused to retract their paper, Science issued an Expression of Concern [2].

In my opinion retraction is premature. Science should at least await the results of two multi-center studies, that were designed to confirm or disprove the results. These studies will continue anyway… The budget is already allocated.

Furthermore, I can’t suppress the idea that Science asked for a retraction to exonerate themselves for the bad peer review (the paper had serious flaws) and their eagerness to swiftly publish the possibly groundbreaking study.

And what about the other studies linking the XMRV to ME/CFS or other diseases: will these also be retracted?
And what happens in the improbable case that the multi-center studies confirm the 2009 paper? Would Science republish the retracted paper?

Thus in my opinion, it is up to other scientists to confirm or disprove findings published. Remember that falsifiability was Karl Popper’s basic scientific principle. My conclusion was that “fraud is a reason to retract a paper and doubt is not”. 

This is my opinion, but is this opinion shared by others?

When should editors retract a paper? Is fraud the only reason? When should editors issue a letter of concern? Are there guidelines?

Let first say that even editors don’t agree. Schekman, the editor-in chief of PNAS, has no direct plans to retract another paper reporting XMRV-like viruses in CFS [3].

Schekman considers it “an unusual situation to retract a paper even if the original findings in a paper don’t hold up: it’s part of the scientific process for different groups to publish findings, for other groups to try to replicate them, and for researchers to debate conflicting results.”

Back at the Virology Blog [4] there was also a vivid discussion about the matter. Prof. Vincent Ranciello gave the following answer in response to a question of a reader:

I don’t have any hard numbers on how often journals ask scientists to retract a paper, only my sense that it is very rare. Author retractions are more frequent, but I’m only aware of a handful of those in a year. I can recall a few other cases in which the authors were asked to retract a paper, but in those cases scientific fraud was involved. That’s not the case here. I don’t believe there is a standard policy that enumerates how such decisions are made; if they exist they are not public.

However, there is a Guideline for editors, the Guidance from the Committee on Publication Ethics (COPE) (PDF) [5]

Ivanoranski, of the great blog Retraction Watch, linked to it when we discussed reasons for retraction.

With regard to retraction the COPE-guidelines state that journal editors should consider retracting a publication if:

  1. they have clear evidence that the findings are unreliable, either as a result of misconduct (e.g. data fabrication) or honest error (e.g. miscalculation or experimental error)
  2. the findings have previously been published elsewhere without proper crossreferencing, permission or justification (i.e. cases of redundant publication)
  3. it constitutes plagiarism
  4. it reports unethical research

According to the same guidelines journal editors should consider issuing an expression of concern if:

  1. they receive inconclusive evidence of research or publication misconduct by the authors 
  2. there is evidence that the findings are unreliable but the authors’ institution will not investigate the case 
  3. they believe that an investigation into alleged misconduct related to the publication either has not been, or would not be, fair and impartial or conclusive 
  4. an investigation is underway but a judgement will not be available for a considerable time

Thus in the case of the Science XMRV/CSF paper an expression of concern certainly applies (all 4 points) and one might even consider a retraction, because the results seem unreliable (point 1). But it is not 100%  established that the findings are false. There is only serious doubt……

The guidelines seem to leave room for separate decisions. To retract a paper in case of plain fraud is not under discussion. But when is an error sufficiently established ànd important to warrant retraction?

Apparently retractions are on the rise. Although still rare (0.02% of all publications by the late 2000s) there has been a tenfold increase in retractions compared to the early 1980s (see review at Scholarly Kitchen [6] about two papers: [7] and [8]). However it is unclear whether increasing rates of retraction reflect more fraudulent or erroneous papers or a better diligence. The  first paper [7] also highlights that, out of fear of litigation, editors are generally hesitant to retract an article without the author’s permission.

At the blog Nerd Alert they give a nice overview [9] (based on Retraction Watch, but then summarized in one post 😉 ) . They clarify that papers are retracted for “less dastardly reasons then those cases that hit the national headlines and involve purposeful falsification of data”, such as the fraudulent papers of Andrew Wakefield (autism caused by vaccination). Besides the mistaken publication of the same paper twice, data over-interpretation, plagiarism and the like, the reason can also be more trivial: ordering the wrong mice or using an incorrectly labeled bottle.

Still, scientist don’t unanimously agree that such errors should lead to retraction.

Drug Monkey blogs about his discussion [10] with @ivanoransky over a recent post at Retraction Watch, which asks whether a failure to replicate a result justifies a retraction [11]”. Ivanoransky presents a case, where a researcher (B) couldn’t reproduce the findings of another lab (A) and demonstrated mutations in the published protein sequence that excluded the mechanism proposed in A’s paper. This wasn’t retracted, possibly because B didn’t follow the published experimental protocols of A in all details. (reminds me of the XMRV controversy). 

Drugmonkey says (quote):  (cross-posted at Scientopia here — hmmpf isn’t that an example of redundant publication?)

“I don’t give a fig what any journals might wish to enact as a policy to overcompensate for their failures of the past.
In my view, a correction suffices” (provided that search engines like Google and PubMed make clear that the paper was in fact corrected).

Drug Monkey has a point there. A clear watermark should suffice.

However, we should note that most papers are retracted by authors, not the editors/journals, and that the majority of “retracted papers” remain available. Just 13.2% are deleted from the journal’s website. And 31% are not clearly labelled as such.

Summary of how the naïve reader is alerted to paper retraction (from Table 2 in [7], see: Scholarly Kitchen [6])

  • Watermark on PDF (41.1%)
  • Journal website (33.4%)
  • Not noted anywhere (31.8%)
  • Note appended to PDF (17.3%)
  • PDF deleted from website (13.2%)

My conclusion?

Of course fraudulent papers should be retracted. Also papers with obvious errors that invalidate the conclusions.

However, we should be extremely hesitant to retract papers that can’t be reproduced, if there is no undisputed evidence of error.

Otherwise we should retract almost all published papers at one point or another. Because if Professor Ioannides is right (and he probably is) “Much of what medical researchers conclude in their studies is misleading, exaggerated, or flat-out wrong”. ( see previous post [12],  “Lies, Damned Lies, and Medical Science” [13])  and Ioannides’ crushing article “Why most published research findings are false [14]”)

All retracted papers (and papers with major deficiencies and shortcomings) should be clearly labeled as such (as Drugmonkey proposed, not only at the PDF and at the Journal website, but also by search engines and biomedical databases).

Or lets hope, with Biochembelle [15], that the future of scientific publishing will make retractions for technical issues obsolete (whether in the form of nano-publications [16] or otherwise):

One day the scientific community will trade the static print-type approach of publishing for a dynamic, adaptive model of communication. Imagine a manuscript as a living document, one perhaps where all raw data would be available, others could post their attempts to reproduce data, authors could integrate corrections or addenda….

NOTE: Retraction Watch (@ivanoransky) and @laikas have voted in @drugmonkeyblog‘s poll about what a retracted paper means [here]. Have you?


  1. Science Asks to Retract the XMRV-CFS Paper, it Should Never Have Accepted in the First Place. ( 2011-06-02)
  2. Alberts B. Editorial Expression of Concern. Science. 2011-05-31.
  3. Given Doubt Cast on CFS-XMRV Link, What About Related Research? (
  4. XMRV is a recombinant virus from mice  (Virology Blog : 2011/05/31)
  5. Retractions: Guidance from the Committee on Publication Ethics (COPE) Elizabeth Wager, Virginia Barbour, Steven Yentis, Sabine Kleinert on behalf of COPE Council:
  6. Retract This Paper! Trends in Retractions Don’t Reveal Clear Causes for Retractions (
  7. Wager E, Williams P. Why and how do journals retract articles? An analysis of Medline retractions 1988-2008. J Med Ethics. 2011 Apr 12. [Epub ahead of print] 
  8. Steen RG. Retractions in the scientific literature: is the incidence of research fraud increasing? J Med Ethics. 2011 Apr;37(4):249-53. Epub 2010 Dec 24.
  9. Don’t touch that blot. ( : 2011/02/25)
  10. What_does_a_retracted_paper_mean? ( 2011/06/03)
  11. So when is a retraction warranted? The long and winding road to publishing a failure to replicate ( : 2011/06/03/)
  12. Much Ado About ADHD-Research: Is there a Misrepresentation of ADHD in Scientific Journals? ( 2011-06-02)
  13. “Lies, Damned Lies, and Medical Science” ( :2010/11/)
  14. Ioannidis, J. (2005). Why Most Published Research Findings Are False. PLoS Medicine, 2 (8) DOI: 10.1371/journal.pmed.0020124
  15. Retractions: What are they good for? ( : 2011/06/04/)
  16. Will Nano-Publications & Triplets Replace The Classic Journal Articles? ( 2011-06-02)

NEW* (Added 2011-06-08):


Kaleidoscope #3: 2011 Wk 12

23 03 2011

It has been long since I have posted a Kaleidoscope post with a “kaleidoscope” of facts, findings, views and news gathered over the last 1-2 weeks. There have been only 2 editions: Kaleidoscope 1 (2009 wk 47) and 2 (2010 wk 31).

Here is some recommended reading from the previous two weeks. Benlysta (belimumab) approved by FDA for treatment of lupus.

Belimumab is the first new lupus drug approved in 56 years! Thus, potentially good news for patients suffering from the serious auto-immunity disease SLE (systemic lupus erythematosus).  Belimumab needs to be administered once monthly via the intravenous route. It is a fully human monoclonal antibody specifically designed to inhibit B-lymphocyte stimulator (BLyS™), thereby reducing the number of circulating B cells, and the produced ds-DNA antibodies (which are characteristic for lupus).
Two clinical trials showed that more patients experienced less disease activity when treated with belimumab compared to placebo. Data suggested that some patients had less severe flares, and some reduced their steroid doses (not an impressive difference using “eyeballing”). Patients were selected with signs of B-cell hyperactivity and with fairly stable, but active disease. Belimumab was ineffective in Blacks, which are hit hardest by the disease. The most serious side effect were infections: 3 deaths in the belimumab groups were due to infections.
Thus, overall the efficacy seems limited. Belimumab only benefits 35% of the patients with not the worst form of the disease. But for these patients it is a step forward.

  1. Press Announcement (
  2. Navarra SV, Guzmán RM, Gallacher AE, Hall S, Levy RA, Jimenez RE, Li EK,Thomas M, Kim HY, León MG, Tanasescu C, Nasonov E, Lan JL, Pineda L, Zhong ZJ, Freimuth W, Petri MA; BLISS-52 Study Group. Efficacy and safety of belimumab in patients with active systemic lupus erythematosus: a randomised, placebo-controlled, phase 3 trial. Lancet. 2011 Feb 26;377(9767):721-31. Epub 2011 Feb 4. PubMed PMID: 21296403.
  3. Belimumab: Anti-BLyS Monoclonal Antibody; Benlysta(TM); BmAb; LymphoStat-B. Drugs in R & D (Open Access): 28 May 2010 – Volume 10 – Issue 1 – pp 55-65 doi: 10.2165/11538300-000000000-00000 Adis R&D Profiles (

Sleep-deprived subjects make risky gambling decisions.

Recent research has shown, that a single night of sleep deprivation alters decision making independent from a shift in attention: most volunteers moved from seeking to minimize the effect of the worst loss to seeking increased reward. This change towards risky decision making was correlated with an increased brain activity in brain regions that assess positive outcomes (ventromedial prefrontal activation) and a simultaneous decreased activation in the brain areas that process negative outcomes (anterior insula). This was assessed by functional MRI.

One co-author (Chee) noted that “casinos often take steps to encourage risk-seeking behavior — providing free alcohol, flashy lights and sounds, and converting money into abstractions like chips or electronic credits”

Interestingly, Chee also linked their findings to empirical evidence that long work hours for medical residents increased the number of accidents. Is a similar mechanism involved?

  1. Venkatraman V, Huettel SA, Chuah LY, Payne JW, Chee MW. Sleep deprivation biases the neural mechanisms underlying economic preferences.  J Neurosci. 2011 Mar 9;31(10):3712-8 (free full text)
  2. Sleep deprived people make risky decisions based on too much optimism (Duke Healthpress release)

Grand Rounds

Grand Rounds is up at Better Health. Volume 7, Number 26 is an “Emotional Edition” where posts are organized into emotion categories. My post about the hysteria and misinformation surrounding the recent Japanese earthquake is placed under Outrage.

There are many terrific posts included. A few posts I want to mention shortly.

First a post by a woman who diagnosed hers and her sons’ disease after numerous tests. Her sons’ pediatrician only tried to reassure, so it seems. (“don’t worry…”).

I was also moved by the South African surgeon, Bongi, who tells the tragic story of a missed diagnosis that still haunts him. “For every surgeon has a graveyard hidden away somewhere in the dark recesses of his mind…”

Bongi’s blog Other Things Amanzi is one of my favorites. Another blog that has become one of my favs is 33 Charts by Dr. Bryan Vartabedian. Included in this Grand Round is “And a child will lead them“. It is a beautiful post about the loss of a young patient:

….”And facing Cooper’s parents for the first time after his passing was strangely difficult for me.  When he was alive I always had a plan.  Every sign, symptom, and problem had a systematic approach.  But when faced with the most inconceivable process, I found myself awkwardly at odds with how to handle the dialog”….

Other Medical Blogs

Another of my recent fav blogs is the blog of cardiologist, dr. Wes. Two recent posts I would especially like to recommend.

The first asks a seemingly simple question: “So which set of guidelines should doctors use?” The answer, however,  may surprise you.

In another post dr Wes describes the retraction of an online-before-print case report entitled “Spontaneous explosion of implantable cardioverter-defibrillator” with dramatic pictures of an “exploded ICD” .(here is the PDF of the cache). This retraction took place after dr. Wes reported the case at his blog. Strange enough the article was republished this February, with another title, “Case report of out-of-hospital heat dissipation of an implantable cardioverter-defibrillator.” (no explosion anymore) and no shocking photos. Food for thought….  The main conclusion of dr Wes? Independent scientific peer-reviewed journals might not be so independent after all. Library matter

Sorry, but I had to laugh about David Rothman’s Common Sense Librarianship: An Ordered List Manifesto. As put it so well by Kathryn Greenhill at her blog Librarians Matter: “It is a hot knife of reason through the butterpat of weighty bullshit that often presses down as soon as we open our mouths to talk about our profession.”

Oh, and a big congrats to Aaron Tay for his  Library Journal moversShakers award. Please read why he deserves this award. What impresses me the most is the way he involves users and converts unhappy users “into strong supporters of the library”. I would recommend all librarians to follow him on Twitter (@aarontay) and to regularly read his blog Musings about Librarianship. Web 2.0

The iPad 2 is here. A very positive review can be found at Techcrunch. The iPad 2 has a camera, is thinner, lighter, and has a much more powerful dual-core chip. Still many people on Twitter complain about the reflective screen. Furthermore the cover is stylish but  not very protective as this blogger noticed 2 days after purchase.
Want to read further: You might like “iPad 2: Thoughts from a first time tablet use” (via @drVes)

It has been five years since Twitter was launched when one of its founders, Jack Dorsey, tweeted “just setting up my twttr’. Now Twitter nearly has 200 million users who now post more than a billion tweets every week. (see Twitter Blog)

Just the other week  Twitter has told developers to stop building apps. It is not exactly clear what this will mean. According to The Next Web it is to prevent confusion of consumers third-party Twitter clients and because of privacy issues. According to i-programmer the decision is mainly driven by the desire of Twitter to be in control of its API and the data that its users create (as to maximize its -future- revenue). I hope it won’t affect Twitter-clients like Tweetdeck and Seesmic, which perform much better (in my view) than

#Cochrane Colloquium 2009: Better Working Relationship between Cochrane and Guideline Developers

19 10 2009

singapore CCLast week I attended the annual Cochrane Colloquium in Singapore. I will summarize some of the meetings.

Here is a summary of an interesting (parallel) special session: Creating a closer working relationship between Cochrane and Guideline Developers. This session was brought together as a partnership between the Guidelines International Network (G-I-N) and The Cochrane Collaboration to look at the current experience of guideline developers and their use of Cochrane reviews (see abstract).

Emma Tavender of the EPOC Australian Satellite, Australia reported on the survey carried out by the UK Cochrane Centre to identify the use of Cochrane reviews in guidelines produced in the UK ) (not attended this presentation) .

Pwee Keng Ho, Ministry of Health, Singapore, is leading the Health Technology Assessment (HTA) and guideline development program of the Singapore Ministry of Health. He spoke about the issues faced as a guideline developer using Cochrane reviews or -in his own words- his task was: “to summarize whether guideline developers like Cochrane Systematic reviews or not” .

Keng Ho presented the results of 3 surveys of different guideline developers. Most surveys had very few respondents: 12-29 if I remember it well.

Each survey had approximately the same questions, but in a different order. On the face of it, the 3 surveys gave the same picture.

Main points:

  • some guideline developers are not familiar with Cochrane Systematic Reviews
  • others have no access to it.
  • of those who are familiar with the Cochrane Reviews and do have access to it, most found the Cochrane reviews useful and reliable. (in one survey half of the respondents were neutral)
  • most importantly they actually did use the Cochrane reviews for most of their guidelines.
  • these guideline developers also used the Cochrane methodology to make their guidelines (whereas most physicians are not inclined to use the exhaustive search strategies and systematic approach of the Cochrane Collaboration)
  • An often heard critique of Guideline developers concerned the non-comprehensive coverage of topics by Cochrane Reviews. However, unlike in Western countries, the Singapore minister of Health mentioned acupuncture and herbs as missing topics (for certain diseases).

This incomplete coverage caused by a not-demand driven choice of subjects was a recurrent topic at this meeting and a main issue recognized by the entire Cochrane Community. Therefore priority setting of Cochrane Systematic reviews is one of the main topics addressed at this Colloquium and in the Cochrane Strategic review.

Kay Dickersin of the US Cochrane Center, USA, reported on the issues raised at the stakeholders meeting held in June 2009 in the US (see here for agenda) on whether systematic reviews can effectively inform guideline development, with a particular focus on areas of controversy and debate.

The Stakeholder summit concentrated on using quality SR’s for guidelines. This is different from effectiveness research, for which the Institute of Medicine (IOM) sets the standards: local and specialist guidelines require a different expertise and approach.

All kinds of people are involved in the development of guidelines, i.e. nurses, consumers, physicians.
Important issues to address, point by point:

  • Some may not understand the need to be systematic
  • How to get physicians on board: they are not very comfortable with extensive searching and systematic work
  • Ongoing education, like how-to workshops, is essential
  • What to do if there is no evidence?
  • More transparency; handling conflicts of interest
  • Guidelines differ, including the rating of the evidence. Almost everyone in the Stakeholders meeting used GRADE to grade the evidence, but not as it was originally described. There were numerous variations on the same theme. One question is whether there should be one system or not.
  • Another -recurrent- issue was that Guidelines should be made actionable.

Here are podcasts covering the meeting

Gordon Guyatt, McMaster University, Canada, gave  an outline of the GRADE approach and the purpose of ‘Summary of Findings’ tables, and how both are perceived by Cochrane review authors and guideline developers.

Gordon Guyatt, whose magnificent book ” Users’ Guide to the Medical Literature”  (JAMA-Evidence) lies at my desk, was clearly in favor of adherence to the original Grade-guidelines. Forty organizations have adopted these Grade Guidelines.

Grade stands for “Grading of Recommendations Assessment, Development and Evaluation”  system. It is used for grading evidence when submitting a clinical guidelines article. Six articles in the BMJ are specifically devoted to GRADE (see here for one (full text); and 2 (PubMed)). GRADE not only takes the rigor of the methods  into account, but also the balance between the benefits and the risks, burdens, and costs.

Suppose  a guideline would recommend  to use thrombolysis to treat disease X, because a good quality small RCTs show thrombolysis to be slightly but significantly more effective than heparin in this disease. However by relying on only direct evidence from the RCT’s it isn’t taken into account that observational studies have long shown that thrombolysis enhances the risk of massive bleeding in diseases Y and Z. Clearly the risk of harm is the same in disease X: both benefits and harms should be weighted.
Guyatt gave several other examples illustrating the importance of grading the evidence and the understandable overview presented in the Summary of Findings Table.

Another issue is that guideline makers are distressingly ready to embrace surrogate endpoints instead of outcomes that are more relevant to the patient. For instance it is not very meaningful if angiographic outcomes are improved, but mortality or the recurrence of cardiovascular disease are not.
GRADE takes into account if indirect evidence is used: It downgrades the evidence rating.  Downgrading also occurs in case of low quality RCT’s or the non-trade off of benefits versus harms.

Guyatt pleaded for uniform use of GRADE, and advised everybody to get comfortable with it.

Although I must say that it can feel somewhat uncomfortable to give absolute rates to non-absolute differences. These are really man-made formulas, people agreed upon. On the other hand it is a good thing that it is not only the outcome of the RCT’s with respect to benefits (of sometimes surrogate markers) that count.

A final remark of Guyatt: ” Everybody makes the claim they are following evidence based approach, but you have to learn them what that really means.”
Indeed, many people talk about their findings and/or recommendations being evidence based, because “EBM sells well”, but upon closer examination many reports are hardly worth the name.

Reblog this post [with Zemanta]

CC (2) Duodecim: Connecting patients (and doctors) to the best-evidence

5 10 2008

This is the second post in the series Cochrane Colloquium (CC) 2008.

In the previous post, I mentioned a very interesting opening session.

Here I will summarize one of the presentations in that opening session, i.e. the presentation by Pekka Mustonen, called:

Connecting patients to the best-evidence through technology: An effective solution or “the great seduction”?

Pekka essentially showed us what the Finnish have achieved with their Duodecim database.

Duodecim was started as a health portal for professionals only. It is a database (a decision support system) made by doctors for doctors. It contains Evidence Base (EBM) Guidelines with:

  • regularly updated recommendations
  • links to evidence, including guidelines and Cochrane Systematic Reviews
  • commentaries

Busy Clinicians don’t have the time to perform an extensive search to find the best available evidence each time they have a clinical question. Ideally, they only would have to carry out one search, taking not more than one minute to find the right information.

This demand seems to be reasonably met by Duodecim.

Notably, Duodecim is not only very popular as a source for clinicians ànd nurses, the guidelines are also read and followed by them. Those familiar with healthcare know that this is the main obstacle: getting doctors and nurses to actually use the guidelines.

According to Pekka, patients are even more important than doctors to implement guidelines: Half of the patients don’t seem to follow their doctor’s advice. If the advice is to keep on inhaled steroids for long-term management for asthma, many patients won’t follow that advice, for instance. “When you reach patients, small changes can have large benefits”, he said.

However, although many patients rely on internet to find health information, formal health information sites face fierce competition on Internet. It is difficult for consumers to separate chaff from wheat:

Still, Duodecim has managed to make a website for the general public that is now as popular as the original physicians database is for doctors, the only difference being that doctors use the database continuously, whereas the general public just consults the database when they are confronted with a health problem.
The database contains 1000 EBM key articles, where the content is integrated with personal health records. The site looks rather straightforward, not glitzy nor flashy. Intentionally, in order to look like a serious and trustworthy professional health care site.

A survey revealed that Duodecim performed a lot better than Google in answering health care questions, and does lead to more people either deciding NOT to consult a physician (because they are reassured), or deciding to consult one (because the symptoms might be more serious than thought). Thus it can make a difference!

The results are communicated differently to patients compared to doctors. For instance, whether it is useful to wear stockings during long-haul flights to prevent deep venous thrombosis in patients that have either a low or a high risk for thrombosis is explained to the physician in terms of RR, ARR, RRR and NNT.
Patients see a table with red (high risk patients) and green columns (low risk patients). Conclusions will be translated as follows:

If 1000 patients with a low risk for DVT wear stockings on long-haul flights

  • 9 will avoid it
  • 1 will get it
  • 1 out of 1000 (will get it)
  • 990 use stocking in vain

If 1000 patients at high risk for DVT wear stockings on long-haul flights:

  • 27 will avoid it
  • 3 will get it
  • 1 out of 333 (will get it)
  • 970 use stocking in vain

This database will be integrated with permanent health records and virtual health checks. It is also linked to a tv program with the aim of changing the way of living. Online you can do a life expectancy test to see what age you would reach if you continue your life style as you do (compare “je echte leeftijd”, “your real age”[dutch]).

“What young people don’t realize”, Pekka said, is that most older people find that the best of life starts at the age of 60(?!) Thus, it doesn’t end at 30, as most youngsters think. But young people will only notice, when they reach old age in good health. To do this, they must change their habits already when young.

The Finnish database is for free for Finnish people.

Quite coincidentally (asking for a free usb-stick at the Wiley stand 😉 ) I found out that Wiley’s database EBM Guidelines links to the Duodecim platform (see below). Quite interesting to take a trial, I think.

(Although this presumably is only the professional part of Duodecim, thus not the patient oriented database.)