Between the Lines. Finding the Truth in Medical Literature [Book Review]

19 07 2013

In the 1970s a study was conducted among 60 physicians and physicians-in-training. They had to solve a simple problem:

“If a test to detect a disease whose prevalence is 1/1000 has a false positive rate of 5 %, what is the chance that a person found to have a positive result actually has the disease, assuming that you know nothing about the person’s symptoms or signs?” 

Half of the “medical experts” thought the answer was 95%.
Only a small proportion, 18%, of the doctors arrived at the right answer of 2%.

If you are a medical expert who comes the same faulty conclusion -or need a refresher how to arrive at the right answer- you might benefit from the book written by Marya Zilberberg: “Between the Lines. Finding the Truth in Medical Literature”.

The same is true for a patient whose doctor thinks he/she is among the 95% to benefit form such a test…
Or for journalists who translate medical news to the public…
Or for peer reviewers or editors who have to assess biomedical papers…

In other words, this book is useful for everyone who wants to be able to read “between the lines”. For everyone who needs to examine medical literature critically from time to time and doesn’t want to rely solely on the interpretation of others.

I hope that I didn’t scare you off with the abovementioned example. Between the Lines surely is NOT a complicated epidemiology textbook, nor a dull studybook where you have to struggle through a lot of definitions, difficult tables and statistic formulas and where each chapter is followed by a set of review questions that test what you learned.

This example is presented half way the book, at the end of Part I. By then you have enough tools to solve the question yourself. But even if you don’t feel like doing the exact calculation at that moment, you have a solid basis to understand the bottomline: the (enormous) 93% gap (95% vs 2% of the people with a positive test are considered truly positive) serves as the pool for overdiagnosis and overtreatment.

In the previous chapters of Part I (“Context”), you have learned about the scientific methods in clinical research, uncertainty as the only certain feature of science, the importance of denominators, outcomes that matter and outcomes that don’t, Bayesian probability, evidence hierarchies, heterogeneous treatment effects (does the evidence apply to this particular patient?) and all kinds of biases.

Most reviewers prefer part I of the book. Personally I find part II (“Evaluation”) as interesting.

Part II deals with the study question, and study design, pros and cons of observational and interventional studies, validity, hypothesis testing and statistics.

Perhaps part II  is somewhat less narrative. Furthermore, it deals with tougher topics like statistics. But I find it very valuable for being able to critically appraise a study. I have never seen a better description of “ODDs”: somehow ODDs it is better to grasp if you substitute “treatment A” and “treatment B” for “horse A” and “horse B”, and substitute “death” for “loss of a race”.
I knew the basic differences between cohort studies, case control studies and so on, but I kind of never realized before that ODDs Ratio is the only measure of association available in a case-control study and that case control studies cannot estimate incidence or prevalence (as shown in a nice overview in table 4).

Unlike many other books about “the art of reading of medical articles”, “study designs” or “Evidence Based Medicine”, Marya’s book is easy to read. It is written at a conversational tone and statements are illustrated by means of current, appealing examples, like the overestimation of risk of death from the H1N1 virus, breast cancer screening and hormone replacement therapy.

Although I had printed this book in a wrong order (page 136 next to 13 etc), I was able to read (and understand) 1/3 of the book (the more difficult part II) during a 2 hour car trip….

Because this book is comprehensive, yet accessible, I recommend it highly to everyone, including fellow librarians.

Marya even mentions medical librarians as a separate target audience:

Medical librarians may find this book particularly helpful: Being at the forefront of evidence dissemination, they can lead the charge of separating credible science from rubbish.

(thanks Marya!)

In addition, this book may be indirectly useful to librarians as it may help to choose appropriate methodological filters and search terms for certain EBM-questions. In case of etiology questions words like “cohort”, “case-control”, “odds”, “risk” and “regression” might help to find the “right” studies.

By the way Marya Ziberberg @murzee at Twitter and she writes at her blog Healthcare etc.

p.s. 1 I want to apologize to Marya for writing this review more than a year after the book was published. For personal reasons I found little time to read and blog. Luckily the book lost none of its topicality.

p.s. 2 patients who are not very familiar with critical reading of medical papers might benefit from reading “your medical mind” first [1]. 

bwtn the lines

Amazon Product Details





No, Google Scholar Shouldn’t be Used Alone for Systematic Review Searching

9 07 2013

Several papers have addressed the usefulness of Google Scholar as a source for systematic review searching. Unfortunately the quality of those papers is often well below the mark.

In 2010 I already [1]  (in the words of Isla Kuhn [2]) “robustly rebutted” the Anders’ paper PubMed versus Google Scholar for Retrieving Evidence” [3] at this blog.

But earlier this year another controversial paper was published [4]:

“Is the coverage of google scholar enough to be used alone for systematic reviews?

It is one of the highly accessed papers of BMC Medical Informatics and Decision Making and has been welcomed in (for instance) the Twittosphere.

Researchers seem  to blindly accept the conclusions of the paper:

https://twitter.com/jeffvallance/status/340562086524510208

But don’t rush  and assume you can now forget about PubMed, MEDLINE, Cochrane and EMBASE for your systematic review search and just do a simple Google Scholar (GS) search instead.

You might  throw the baby out with the bath water….

… As has been immediately recognized by many librarians, either at their blogs (see blogs of Dean Giustini [5], Patricia Anderson [6] and Isla Kuhn [1]) or as direct comments to the paper (by Tuulevi OvaskaMichelle Fiander and Alison Weightman [7].

In their paper, Jean-François Gehanno et al examined whether GS was able to retrieve all the 738 original studies included in 29 Cochrane and JAMA systematic reviews.

And YES! GS had a coverage of 100%!

WOW!

All those fools at the Cochrane who do exhaustive searches in multiple databases using controlled vocabulary and a lot of synonyms when a simple search in GS could have sufficed…

But it is a logical fallacy to conclude from their findings that GS alone will suffice for SR-searching.

Firstly, as Tuulevi [7] rightly points out :

“Of course GS will find what you already know exists”

Or in the words of one of the official reviewers [8]:

What the authors show is only that if one knows what studies should be identified, then one can go to GS, search for them one by one, and find out that they are indexed. But, if a researcher already knows the studies that should be included in a systematic review, why bother to also check whether those studies are indexed in GS?

Right!

Secondly, it is also the precision that counts.

As Dean explains at his blog a 100% recall with a precision of 0,1% (and it can be worse!) means that in order to find 36 relevant papers you have to go through  ~36,700 items.

Dean:

Are the authors suggesting that researchers consider a precision level of 0.1% acceptable for the SR? Who has time to sift through that amount of information?

It is like searching for needles in a haystack.  Correction: It is like searching for particular hay stalks in a hay stack. It is very difficult to find them if they are hidden among other hay stalks. Suppose the hay stalks were all labeled (title), and I would have a powerful haystalk magnet (“title search”)  it would be a piece of cake to retrieve them. This is what we call “known item search”. But would you even consider going through the haystack and check the stalks one by one? Because that is what we have to do if we use Google Scholar as a one stop search tool for systematic reviews.

Another main point of criticism is that the authors have a grave and worrisome lack of understanding of the systematic review methodology [6] and don’t grasp the importance of the search interface and knowledge of indexing which are both integral to searching for systematic reviews.[7]

One wonders why the paper even passed the peer review, as one of the two reviewers (Miguel Garcia-Perez [8]) already smashed the paper to pieces.

The authors’ method is inadequate and their conclusion is not logically connected to their results. No revision (major, minor, or discretionary) will save this work. (…)

Miguel’s well funded criticism was not well addressed by the authors [9]. Apparently the editors didn’t see through and relied on the second peer reviewer [10], who merely said it was a “great job” etcetera, but that recall should not be written with a capital R.
(and that was about the only revision the authors made)

Perhaps it needs another paper to convince Gehanno et al and the uncritical readers of their manuscript.

Such a paper might just have been published [11]. It is written by Dean Giustini and Maged Kamel Boulos and is entitled:

Google Scholar is not enough to be used alone for systematic reviews

It is a simple and straightforward paper, but it makes its points clearly.

Giustini and Kamel Boulos looked for a recent SR in their own area of expertise (Chou et al [12]), that included a comparable number of references as that of Gehanno et al. Next they test GS’ ability to locate these references.

Although most papers cited by Chou et al. (n=476/506;  ~95%) were ultimately found in GS, numerous iterative searches were required to find the references and each citation had to be managed once at a time. Thus GS was not able to locate all references found by Chou et al. and the whole exercise was rather cumbersome.

As expected, trying to find the papers by a “real-life” GS search was almost impossible. Because due to its rudimentary structure, GS did not understand the expert search strings and was unable to translate them. Thus using Chou et al.’s original search strategy and keywords yielded unmanageable results of approximately >750,000 items.

Giustini and Kamel Boulos note that GS’ ability to search into the full-text of papers combined with its PageRank’s algorithm can be useful.

On the other hand GS’ changing content, unknown updating practices and poor reliability make it an inappropriate sole choice for systematic reviewers:

As searchers, we were often uncertain that results found one day in GS had not changed a day later and trying to replicate searches with date delimiters in GS did not help. Papers found today in GS did not mean they would be there tomorrow.

But most importantly, not all known items could be found and the search process and selection are too cumbersome.

Thus shall we now for once and for all conclude that GS is NOT sufficient to be used alone for SR searching?

We don’t need another bad paper addressing this.

But I would really welcome a well performed paper looking at the additional value of a GS in SR-searching. For I am sure that GS may be valuable for some questions and some topics in some respects. We have to find out which.

References

  1. PubMed versus Google Scholar for Retrieving Evidence 2010/06 (laikaspoetnik.wordpress.com)
  2. Google scholar for systematic reviews…. hmmmm  2013/01 (ilk21.wordpress.com)
  3. Anders M.E. & Evans D.P. (2010) Comparison of PubMed and Google Scholar literature searches, Respiratory care, May;55(5):578-83  PMID:
  4. Gehanno J.F., Rollin L. & Darmoni S. (2013). Is the coverage of Google Scholar enough to be used alone for systematic reviews., BMC medical informatics and decision making, 13:7  PMID:  (open access)
  5. Is Google scholar enough for SR searching? No. 2013/01 (blogs.ubc.ca/dean)
  6. What’s Wrong With Google Scholar for “Systematic” Review 2013/01 (etechlib.wordpress.com)
  7. Comments at Gehanno’s paper (www.biomedcentral.com)
  8. Official Reviewer’s report of Gehanno’s paper [1]: Miguel Garcia-Perez, 2012/09
  9. Authors response to comments  (www.biomedcentral.com)
  10. Official Reviewer’s report of Gehanno’s paper [2]: Henrik von Wehrden, 2012/10
  11. Giustini D. & Kamel Boulos M.N. (2013). Google Scholar is not enough to be used alone for systematic reviews, Online Journal of Public Health Informatics, 5 (2) DOI:
  12. Chou W.Y.S., Prestin A., Lyons C. & Wen K.Y. (2013). Web 2.0 for Health Promotion: Reviewing the Current Evidence, American Journal of Public Health, 103 (1) e9-e18. DOI:





Friday Foolery #51 Statistically Funny

1 06 2012

Epidemiologists, people working in the EBM field and, above all, statisticians are said to have no sense of humor.*

Hilda Bastian is a clear exception to this rule.

I met Hilda a few years ago at a Cochrane colloquium. At that time she was working as a consumer advocate in Australia. Nowadays she is editor and curator of PubMed Health. According to her Twitter Bio (she tweets as @hildabast) she is (still) “Interested in effective communication as well as effective health care”. She also writes important articles, like “Seventy-Five Trials and Eleven Systematic Reviews a Day: How Will We Ever Keep Up? (PLOS 2010), reviewed at this blog.

Today I learned she also has a great creative talent in cartoon drawing, in the field of …  yeah… EBM, epidemiology & statistics.

Below is one of her cartoons, which fits in well with a recent post in the BMJ by Ray Moynihan, retweeted by Hilda: Preventing overdiagnosis: how to stop harming the healthy. In her post she refers to another article: Overdiagnosis in cancer (JNCI 2010), saying:

“Finding and aggressively treating non-symptomatic disease that would never have made people sick, inventing new conditions and re-defining the thresholds for old ones: will there be anyone healthy left at all?”

I invite you to go and visit Hilda’s blog Statistically funny (Commenting on the science of unbiased health research with cartoons) and to enjoy her cartoons, that are often inspired by recent publications in the field.

* My post #NotSoFunny #16: ridiculing RCTs and EBM even led David Rind to sigh that “EBM folks are not necessarily known for their great senses of humor”. (so I’m no exception to the rule 😉





The Scatter of Medical Research and What to do About it.

18 05 2012

ResearchBlogging.orgPaul Glasziou, GP and professor in Evidence Based Medicine, co-authored a new article in the BMJ [1]. Similar to another paper [2] I discussed before [3] this paper deals with the difficulty for clinicians of staying up-to-date with the literature. But where the previous paper [2,3] highlighted the mere increase in number of research articles over time, the current paper looks at the scatter of randomized clinical trials (RCTs) and systematic reviews (SR’s) accross different journals cited in one year (2009) in PubMed.

Hofmann et al analyzed 7 specialties and 9 sub-specialties, that are considered the leading contributions to the burden of disease in high income countries.

They followed a relative straightforward method for identifying the publications. Each search string consisted of a MeSH term (controlled  term) to identify the selected disease or disorders, a publication type [pt] to identify the type of study, and the year of publication. For example, the search strategy for randomized trials in cardiology was: “heart diseases”[MeSH] AND randomized controlled trial[pt] AND 2009[dp]. (when searching “heart diseases” as a MeSH, narrower terms are also searched.) Meta-analysis[pt] was used to identify systematic reviews.

Using this approach Hofmann et al found 14 343 RCTs and 3214 SR’s published in 2009 in the field of the selected (sub)specialties. There was a clear scatter across journals, but this scatter varied considerably among specialties:

“Otolaryngology had the least scatter (363 trials across 167 journals) and neurology the most (2770 trials across 896 journals). In only three subspecialties (lung cancer, chronic obstructive pulmonary disease, hearing loss) were 10 or fewer journals needed to locate 50% of trials. The scatter was less for systematic reviews: hearing loss had the least scatter (10 reviews across nine journals) and cancer the most (670 reviews across 279 journals). For some specialties and subspecialties the papers were concentrated in specialty journals; whereas for others, few of the top 10 journals were a specialty journal for that area.
Generally, little overlap occurred between the top 10 journals publishing trials and those publishing systematic reviews. The number of journals required to find all trials or reviews was highly correlated (r=0.97) with the number of papers for each specialty/ subspecialty.”

Previous work already suggested that this scatter of research has a long tail. Half of the publications is in a minority of papers, whereas the remaining articles are scattered among many journals (see Fig below).

Click to enlarge en see legends at BMJ 2012;344:e3223 [CC]

The good news is that SRs are less scattered and that general journals appear more often in the top 10 journals publishing SRs. Indeed for 6 of the 7 specialties and 4 of the 9 subspecialties, the Cochrane Database of Systematic Reviews had published the highest number of systematic reviews, publishing between 6% and 18% of all the systematic reviews published in each area in 2009. The bad news is that even keeping up to date with SRs seems a huge, if not impossible, challenge.

In other words, it is not sufficient for clinicians to rely on personal subscriptions to a few journals in their specialty (which is common practice). Hoffmann et al suggest several solutions to help clinicians cope with the increasing volume and scatter of research publications.

  • a central library of systematic reviews (but apparently the Cochrane Library fails to fulfill such a role according to the authors, because many reviews are out of date and are perceived as less clinically relevant)
  • registry of planned and completed systematic reviews, such as prospero. (this makes it easier to locate SRs and reduces bias)
  • Synthesis of Evidence and synopses, like the ACP-Jounal Club which summarizes the best evidence in internal medicine
  • Specialised databases that collate and critically appraise randomized trials and systematic reviews, like www.pedro.org.au for physical therapy. In my personal experience, however, this database is often out of date and not comprehensive
  • Journal scanning services like EvidenceUpdates from mcmaster.ca), which scans over 120 journals, filters articles on the basis of quality, has practising clinicians rate them for relevance and newsworthiness, and makes them available as email alerts and in a searchable database. I use this service too, but besides that not all specialties are covered, the rating of evidence may not always be objective (see previous post [4])
  • The use of social media tools to alert clinicians to important new research.

Most of these solutions are (long) existing solutions that do not or only partly help to solve the information overload.

I was surprised that the authors didn’t propose the use of personalized alerts. PubMed’s My NCBI feature allows to create automatic email alerts on a topic and to subscribe to electronic tables of contents (which could include ACP journal Club). Suppose that a physician browses 10 journals roughly covering 25% of the trials. He/she does not need to read all the other journals from cover to cover to avoid missing one potentially relevant trial. Instead it is far more efficient to perform a topic search to filter relevant studies from journals that seldom publish trials on the topic of interest. One could even use the search of Hoffmann et al to achieve this.* Although in reality, most clinical researchers will have narrower fields of interest than all studies about endocrinology and neurology.

At our library we are working at creating deduplicated, easy to read, alerts that collate table of contents of certain journals with topic (and author) searches in PubMed, EMBASE and other databases. There are existing tools that do the same.

Another way to reduce the individual work (reading) load is to organize journals clubs or even better organize regular CATs (critical appraised topics). In the Netherlands, CATS are a compulsory item for residents. A few doctors do the work for many. Usually they choose topics that are clinically relevant (or for which the evidence is unclear).

The authors shortly mention that their search strategy might have missed  missed some eligible papers and included some that are not truly RCTs or SRs, because they relied on PubMed’s publication type to retrieve RCTs and SRs. For systematic reviews this may be a greater problem than recognized, for the authors have used meta-analyses[pt] to identify systematic reviews. Unfortunately PubMed has no publication type for systematic reviews, but it may be clear that there are many more systematic reviews that meta-analyses. Possibly systematical reviews might even have a different scatter pattern than meta-analyses (i.e. the latter might be preferentially included in core journals).

Furthermore not all meta-analyses and systematic reviews are reviews of RCTs (thus it is not completely fair to compare MAs with RCTs only). On the other hand it is a (not discussed) omission of this study, that only interventions are considered. Nowadays physicians have many other questions than those related to therapy, like questions about prognosis, harm and diagnosis.

I did a little imperfect search just to see whether use of other search terms than meta-analyses[pt] would have any influence on the outcome. I search for (1) meta-analyses [pt] and (2) systematic review [tiab] (title and abstract) of papers about endocrine diseases. Then I subtracted 1 from 2 (to analyse the systematic reviews not indexed as meta-analysis[pt])

Thus:

(ENDOCRINE DISEASES[MESH] AND SYSTEMATIC REVIEW[TIAB] AND 2009[DP]) NOT META-ANALYSIS[PT]

I analyzed the top 10/11 journals publishing these study types.

This little experiment suggests that:

  1. the precise scatter might differ per search: apparently the systematic review[tiab] search yielded different top 10/11 journals (for this sample) than the meta-analysis[pt] search. (partially because Cochrane systematic reviews apparently don’t mention systematic reviews in title and abstract?).
  2. the authors underestimate the numbers of Systematic Reviews: simply searching for systematic review[tiab] already found appr. 50% additional systematic reviews compared to meta-analysis[pt] alone
  3. As expected (by me at last), many of the SR’s en MA’s were NOT dealing with interventions, i.e. see the first 5 hits (out of 108 and 236 respectively).
  4. Together these findings indicate that the true information overload is far greater than shown by Hoffmann et al (not all systematic reviews are found, of all available search designs only RCTs are searched).
  5. On the other hand this indirectly shows that SRs are a better way to keep up-to-date than suggested: SRs  also summarize non-interventional research (the ratio SRs of RCTs: individual RCTs is much lower than suggested)
  6. It also means that the role of the Cochrane Systematic reviews to aggregate RCTs is underestimated by the published graphs (the MA[pt] section is diluted with non-RCT- systematic reviews, thus the proportion of the Cochrane SRs in the interventional MAs becomes larger)

Well anyway, these imperfections do not contradict the main point of this paper: that trials are scattered across hundreds of general and specialty journals and that “systematic reviews” (or meta-analyses really) do reduce the extent of scatter, but are still widely scattered and mostly in different journals to those of randomized trials.

Indeed, personal subscriptions to journals seem insufficient for keeping up to date.
Besides supplementing subscription by  methods such as journal scanning services, I would recommend the use of personalized alerts from PubMed and several prefiltered sources including an EBM search machine like TRIP (www.tripdatabase.com/).

*but I would broaden it to find all aggregate evidence, including ACP, Clinical Evidence, syntheses and synopses, not only meta-analyses.

**I do appreciate that one of the co-authors is a medical librarian: Sarah Thorning.

References

  1. Hoffmann, Tammy, Erueti, Chrissy, Thorning, Sarah, & Glasziou, Paul (2012). The scatter of research: cross sectional comparison of randomised trials and systematic reviews across specialties BMJ, 344 : 10.1136/bmj.e3223
  2. Bastian, H., Glasziou, P., & Chalmers, I. (2010). Seventy-Five Trials and Eleven Systematic Reviews a Day: How Will We Ever Keep Up? PLoS Medicine, 7 (9) DOI: 10.1371/journal.pmed.1000326
  3. How will we ever keep up with 75 trials and 11 systematic reviews a day (laikaspoetnik.wordpress.com)
  4. Experience versus Evidence [1]. Opioid Therapy for Rheumatoid Arthritis Pain. (laikaspoetnik.wordpress.com)




Can Guidelines Harm Patients?

2 05 2012

ResearchBlogging.orgRecently I saw an intriguing “personal view” in the BMJ written by Grant Hutchison entitled: “Can Guidelines Harm Patients Too?” Hutchison is a consultant anesthetist with -as he calls it- chronic guideline fatigue syndrome. Hutchison underwent an acute exacerbation of his “condition” with the arrival of another set of guidelines in his email inbox. Hutchison:

On reviewing the level of evidence provided for the various recommendations being offered, I was struck by the fact that no relevant clinical trials had been carried out in the population of interest. Eleven out of 25 of the recommendations made were supported only by the lowest levels of published evidence (case reports and case series, or inference from studies not directly applicable to the relevant population). A further seven out of 25 were derived only from the expert opinion of members of the guidelines committee, in the absence of any guidance to be gleaned from the published literature.

Hutchison’s personal experience is supported by evidence from two articles [2,3].

One paper published in the JAMA 2009 [2] concludes that ACC/AHA (American College of Cardiology and the American Heart Association) clinical practice guidelines are largely developed from lower levels of evidence or expert opinion and that the proportion of recommendations for which there is no conclusive evidence is growing. Only 314 recommendations of 2711 (median, 11%) are classified as level of evidence A , thus recommendation based on evidence from multiple randomized trials or meta-analyses.  The majority of recommendations (1246/2711; median, 48%) are level of evidence C, thus based  on expert opinion, case studies, or standards of care. Strikingly only 245 of 1305 class I recommendations are based on the highest level A evidence (median, 19%).

Another paper, published in Ann Intern Med 2011 [3], reaches similar conclusions analyzing the Infectious Diseases Society of America (IDSA) Practice Guidelines. Of the 4218 individual recommendations found, only 14% were supported by the strongest (level I) quality of evidence; more than half were based on level III evidence only. Like the ACC/AHH guidelines only a small part (23%) of the strongest IDSA recommendations, were based on level I evidence (in this case ≥1 randomized controlled trial, see below). And, here too, the new recommendations were mostly based on level II and III evidence.

Although there is little to argue about Hutchison’s observations, I do not agree with his conclusions.

In his view guidelines are equivalent to a bullet pointed list or flow diagram, allowing busy practitioners to move on from practice based on mere anecdote and opinion. It therefore seems contradictory that half of the EBM-guidelines are based on little more than anecdote (case series, extrapolation from other populations) and opinion. He then argues that guidelines, like other therapeutic interventions, should be considered in terms of balance between benefit and risk and that the risk  associated with the dissemination of poorly founded guidelines must also be considered. One of those risks is that doctors will just tend to adhere to the guidelines, and may even change their own (adequate) practice  in the absence of any scientific evidence against it. If a patient is harmed despite punctilious adherence to the guideline-rules,  “it is easy to be seduced into assuming that the bad outcome was therefore unavoidable”. But perhaps harm was done by following the guideline….

First of all, overall evidence shows that adherence to guidelines can improve patient outcome and provide more cost effective care (Naveed Mustfa in a comment refers to [4]).

Hutchinson’s piece is opinion-based and rather driven by (understandable) gut feelings and implicit assumptions, that also surround EBM in general.

  1. First there is the assumption that guidelines are a fixed set of rules, like a protocol, and that there is no room for preferences (both of the doctor and the patient), interpretations and experience. In the same way as EBM is often degraded to “cookbook medicine”, EBM guidelines are turned into mere bullet pointed lists made by a bunch of experts that just want to impose their opinions as truth.
  2. The second assumption (shared by many) is that evidence based medicine is synonymous with “randomized controlled trials”. In analogy, only those EBM guideline recommendations “count” that are based on RCT’s or meta-analyses.

Before I continue, I would strongly advice all readers (and certainly all EBM and guideline-skeptics) to read this excellent and clearly written BJM-editorial by David Sackett et al. that deals with misconceptions, myths and prejudices surrounding EBM : Evidence based medicine: what it is and what it isn’t [5].

Sackett et al define EBM as “the conscientious, explicit and judicious use of current best evidence in making decisions about the care of individual patients” [5]. Sackett emphasizes that “Good doctors use both individual clinical expertise and the best available external evidence, and neither alone is enough. Without clinical expertise, practice risks becoming tyrannised by evidence, for even excellent external evidence may be inapplicable to or inappropriate for an individual patient. Without current best evidence, practice risks becoming rapidly out of date, to the detriment of patients.”

Guidelines are meant to give recommendations based on the best available evidence. Guidelines should not be a set of rules, set in stone. Ideally, guidelines have gathered evidence in a transparent way and make it easier for the clinicians to grasp the evidence for a certain procedure in a certain situation … and to see the gaps.

Contrary to what many people think, EBM is not restricted to randomized trials and meta-analyses. It involves tracking down the best external evidence there is. As I explained in #NotSoFunny #16 – Ridiculing RCTs & EBM, evidence is not an all-or-nothing thing: RCT’s (if well performed) are the most robust, but if not available we have to rely on “lower” evidence (from cohort to case-control to case series or expert opinion even).
On the other hand RCT’s are often not even suitable to answer questions in other domains than therapy (etiology/harm, prognosis, diagnosis): per definition the level of evidence for these kind of questions inevitably will be low*. Also, for some interventions RCT’s are not appropriate, feasible or too costly to perform (cesarean vs vaginal birth; experimental therapies, rare diseases, see also [3]).

It is also good to realize that guidance, based on numerous randomized controlled trials is probably not or limited applicable to groups of patients who are seldom included in a RCT: the cognitively impaired, the patient with multiple comorbidities [6], the old patient [6], children and (often) women.

Finally not all RCTs are created equal (various forms of bias; surrogate outcomes; small sample sizes, short follow-up), and thus should not all represent the same high level of evidence.*

Thus in my opinion, low levels of evidence are not per definition problematic. Even if they are the basis for strong recommendations. As long as it is clear how the recommendations were reached and as long as these are well underpinned (by whatever evidence or motivation). One could see the exposed gaps in evidence as a positive thing as it may highlight the need for clinical research in certain fields.

There is one BIG BUT: my assumption is that guidelines are “just” recommendations based on exhaustive and objective reviews of existing evidence. No more, no less. This means that the clinician must have the freedom to deviate from the recommendations, based on his own expertise and/or the situation and/or the patient’s preferences. The more, when the evidence on which these strong recommendations are based is ‘scant’. Sackett already warned for the possible hijacking of EBM by purchasers and managers (and may I add health insurances and governmental agencies) to cut the costs of health care and to impose “rules”.

I therefore think it is odd that the ACC/AHA guidelines prescribe that Class I recommendations SHOULD be performed/administered even if they are based on level C recommendations (see Figure).

I also find it odd that different guidelines have a different nomenclature. The ACC/AHA have Class I, IIa, IIb and III recommendations and level A, B, C evidence where level A evidence represents sufficient evidence from multiple randomized trials and meta-analyses, whereas the strength of recommendations in the IDSA guidelines includes levels A through C (OR D/E recommendations against use) and quality of evidence ranges from level I through III , where I indicates evidence from (just) 1 properly randomized controlled trial. As explained in [3] this system was introduced to evaluate the effectiveness of preventive health care interventions in Canada (for which RCTs are apt).

Finally, guidelines and guideline makers should probably be more open for input/feedback from people who apply these guidelines.

————————————————

*the new GRADE (Grading of Recommendations Assessment, Development, and Evaluation) scoring system taking into account good quality observational studies as well may offer a potential solution.

Another possibly relevant post at this blog: The Best Study Design for … Dummies

Taken from a summary of an ACC/AHA guideline at http://guideline.gov/
Click to enlarge.

References

  1. Hutchison, G. (2012). Guidelines can harm patients too BMJ, 344 (apr18 1) DOI: 10.1136/bmj.e2685
  2. Tricoci P, Allen JM, Kramer JM, Califf RM, & Smith SC Jr (2009). Scientific evidence underlying the ACC/AHA clinical practice guidelines. JAMA : the journal of the American Medical Association, 301 (8), 831-41 PMID: 19244190
  3. Lee, D., & Vielemeyer, O. (2011). Analysis of Overall Level of Evidence Behind Infectious Diseases Society of America Practice Guidelines Archives of Internal Medicine, 171 (1), 18-22 DOI: 10.1001/archinternmed.2010.482
  4. Menéndez R, Reyes S, Martínez R, de la Cuadra P, Manuel Vallés J, & Vallterra J (2007). Economic evaluation of adherence to treatment guidelines in nonintensive care pneumonia. The European respiratory journal : official journal of the European Society for Clinical Respiratory Physiology, 29 (4), 751-6 PMID: 17005580
  5. Sackett, D., Rosenberg, W., Gray, J., Haynes, R., & Richardson, W. (1996). Evidence based medicine: what it is and what it isn’t BMJ, 312 (7023), 71-72 DOI: 10.1136/bmj.312.7023.71
  6. Aylett, V. (2010). Do geriatricians need guidelines? BMJ, 341 (sep29 3) DOI: 10.1136/bmj.c5340




Experience versus Evidence [1]. Opioid Therapy for Rheumatoid Arthritis Pain.

5 12 2011

ResearchBlogging.orgRheumatoid arthritis (RA) is a chronic auto-immune disease, which causes inflammation of the joints that eventually leads to progressive joint destruction and deformity. Patients have swollen, stiff and painful joints.  The main aim of treatment is to reduce swelling  and inflammation, to alleviate pain and stiffness and to maintain normal joint function. While there is no cure, it is important to properly manage pain.

The mainstays of therapy in RA are disease-modifying anti-rheumatic drugs (DMARDs) and non-steroidal anti-inflammatory drugs (NSAIDs). These drugs primarily target inflammation. However, since inflammation is not the only factor that causes pain in RA, patients may not be (fully) responsive to treatment with these medications.
Opioids are another class of pain-relieving substance (analgesics). They are frequently used in RA, but their role in chronic cancer pain, including RA, is not firmly established.

A recent Cochrane Systematic Review [1] assessed the beneficial and harmful effects of opioids in RA.

Eleven studies (672 participants) were included in the review.

Four studies only assessed the efficacy of  single doses of different analgesics, often given on consecutive days. In each study opioids reduced pain (a bit) more than placebo. There were no differences in effectiveness between the opioids.

Seven studies between 1-6 weeks in duration assessed 6 different oral opioids either alone or combined with non-opioid analgesics.
The only strong opioid investigated was controlled-release morphine sulphate, in a single study with 20 participants.
Six studies compared an opioid (often combined with an non-opioid analgesic) to placebo. Opioids were slightly better than placebo in improving patient reported global impression of clinical change (PGIC)  (3 studies, 324 participants: relative risk (RR) 1.44, 95% CI 1.03 to 2.03), but did not lower the  number of withdrawals due to inadequate analgesia in 4 studies.
Notably none of the 11 studies reported the primary and probably more clinical relevant outcome “proportion of participants reporting ≥ 30% pain relief”.

On the other hand adverse events (most commonly nausea, vomiting, dizziness and constipation) were more frequent in patients receiving opioids compared to placebo (4 studies, 371 participants: odds ratio 3.90, 95% CI 2.31 to 6.56). Withdrawal due to adverse events was  non-significantly higher in the opioid-treated group.

Comparing opioids to other analgesics instead of placebos seems more relevant. Among the 11 studies, only 1 study compared an opioid (codeine with paracetamol) to an NSAID (diclofenac). This study found no difference in efficacy or safety between the two treatments.

The 11 included studies were very heterogeneous (i.e. different opioid studied, with or without concurrent use of non-opioid analgesics, different outcomes measured) and the risk of bias was generally high. Furthermore, most studies were published before 2000 (less optimal treatment of RA).

The authors therefore conclude:

In light of this, the quantitative findings of this review must be interpreted with great caution. At best, there is weak evidence in favour of the efficacy of opioids for the treatment of pain in patients with RA but, as no study was longer than six weeks in duration, no reliable conclusions can be drawn regarding the efficacy or safety of opioids in the longer term.

This was the evidence, now the opinion.

I found this Cochrane Review via an EvidenceUpdates email alert from the BMJ Group and McMaster PLUS.

EvidenceUpdate alerts are meant to “provide you with access to current best evidence from research, tailored to your own health care interests, to support evidence-based clinical decisions. (…) All citations are pre-rated for quality by research staff, then rated for clinical relevance and interest by at least 3 members of a worldwide panel of practicing physicians”

I usually don’t care about the rating, because it is mostly 5-6 on a scale of 7. This was also true for the current SR.

There is a more detailed rating available (when clicking the link, free registration required). Usually, the newsworthiness of SR’s scores relatively low. (because it summarizes ‘old’ studies?). Personally I would think that the relevance and newsworthiness would be higher for the special interest group, pain.

But the comment of the first of the 3 clinical raters was most revealing:

He/she comments:

As a Palliative care physician and general internist, I have had excellent results using low potency opiates for RA and OA pain. The palliative care literature is significantly more supportive of this approach vs. the Cochrane review.

Thus personal experience wins from evidence?* How did this palliative care physician assess effectiveness? Just give a single dose of an opiate? How did he rate the effectiveness of the opioids? Did he/she compare it to placebo or NSAID (did he compare it at all?), did he/she measure adverse effects?

And what is “The palliative care literature”  the commenter is referring to? Apparently not this Cochrane Review. Apparently not the 11 controlled trials included in the Cochrane review. Apparently not the several other Cochrane reviews on use of opioids for non-chronic cancer pain, and not the guidelines, syntheses and synopsis I found via the TRIP-database. All conclude that using opioids to treat non-cancer chronic pain is supported by very limited evidence, that adverse effects are common and that long-term use may lead to opioid addiction.

I’m sorry to note that although the alerting service is great as an alert, such personal ratings are not very helpful for interpreting and *true* rating of the evidence.

I would rather prefer a truly objective, structured critical appraisal like this one on a similar topic by DARE (“Opioids for chronic noncancer pain: a meta-analysis of effectiveness and side effects”)  and/or an objective piece that puts the new data into clinical perspective.

*Just to be clear, the own expertise and opinions of experts are also important in decision making. Rightly, Sackett [2] emphasized that good doctors use both individual clinical expertise and the best available external evidence. However, that doesn’t mean that one personal opinion and/or preference replaces all the existing evidence.

References 

  1. Whittle SL, Richards BL, Husni E, & Buchbinder R (2011). Opioid therapy for treating rheumatoid arthritis pain. Cochrane database of systematic reviews (Online), 11 PMID: 22071805
  2. Sackett DL, Rosenberg WM, Gray JA, Haynes RB, & Richardson WS (1996). Evidence based medicine: what it is and what it isn’t. BMJ (Clinical research ed.), 312 (7023), 71-2 PMID: 8555924
Enhanced by Zemanta




Things to Keep in Mind when Searching OVID MEDLINE instead of PubMed

25 11 2011

When I search extensively for systematic reviews I prefer OVID MEDLINE to PubMed for several reasons. Among them, it is easier to build a systematic search in OVID, the search history has a more structured format that is easy to edit, the search features are more advanced giving you more control over the search and translation of the a search to OVID EMBASE, PSYCHINFO and the Cochrane Library is “peanuts”, relatively speaking.

However, there are at least two things to keep in mind when searching OVID MEDLINE instead of PubMed.

1. You may miss publications, most notably recent papers.

PubMed doesn’t only provide access to MEDLINE, but also contains some other citations, including in-process citations which provide a record for an article before it is indexed with MeSH and added to MEDLINE.

As previously mentioned, I once missed a crucial RCT that was available in PubMed, but not yet available in OVID/MEDLINE.

A few weeks ago one of my clients said that she found 3 important papers with a simple PubMed search that were not retrieved by my exhaustive OVID MEDLINE (Doh!).
All articles were recent ones [Epub ahead of print, PubMed – as supplied by publisher]. I checked that these articles were indeed not yet included in OVID MEDLINE, and they weren’t.

As said, PubMed doesn’t have all search features of OVID MEDLINE and I felt a certain reluctance to make a completely new exhaustive search in PubMed. I would probably retrieve many irrelevant papers which I had tried to avoid by searching OVID*. I therefore decided to roughly translate the OVID search using textwords only (the missed articles had no MESH attached). It was a matter of copy-pasting the single textwords from the OVID MEDLINE search (and omitting adjacency operators) and adding the command [tiab], which means that terms are searched as textwords (in title and abstract) in PubMed (#2, only part of the long search string is shown).

To see whether all articles missed in OVID were in the non-MEDLINE set, I added the command: NOT MEDLINE[sb] (#3). Of the 332 records (#2), 28 belonged to the non-MEDLINE subset. All 3 relevant articles, not found in OVID MEDLINE, were in this set.

In total, there were 15 unique records not present in the OVID MEDLINE and EMBASE search. This additional search in PubMed was certainly worth the effort as it yielded more than 3 new relevant papers. (Apparently there was a boom in relevant papers on the topic, recently)

In conclusion, when doing an exhaustive search in OVID MEDLINE it is worth doing an additional search in PubMed to find the non-MEDLINE papers. Regularly these are very relevant papers that you wouldn’t like to have missed. Dependent on your aim you can suffice with a simpler, broader search for only textwords and limit by using NOT MEDLINE[sb].**

From now on, I will always include this PubMed step in my exhaustive searches. 

2. OVID MEDLINE contains duplicate records

I use Reference Manager to deduplicate the records retrieved from all databases  and I share the final database with my client. I keep track of the number of hits in each database and of the number of duplicates to facilitate the reporting of the search procedure later on (using the PRISM flowchart, see above). During this procedure, I noticed that I always got LESS records in Reference Manager when I imported records from OVID MEDLINE, but not when I imported records from the other databases. Thus it appears that OVID MEDLINE contains duplicate records.

For me it was just a fact that there were duplicate records in OVID MEDLINE. But others were surprised to hear this.

Where everyone just wrote down the number of total number hits in OVID MEDLINE, I always used the number of hits after deduplication in Reference Manager. But this is a quite a detour and not easy to explain in the PRISM-flowchart.

I wondered whether this deduplication could be done in OVID MEDLINE directly. I knew you cold deduplicate a multifile search, but would it also be possible to deduplicate a set from one database only? According to OVID help there should be a button somewhere, but I couldn’t find it (curious if you can).

Googling I found another OVID manual saying :

..dedup n = Removes duplicate records from multifile search results. For example, ..dedup 5 removes duplicate records from the multifile results set numbered 5.

Although the manual only talked about “multifile searches”, I tried the comment (..dedup 34) on the final search set (34) in OVID MEDLINE, and voilà, 21 duplicates were found (exactly the same number as removed by Reference manager)

The duplicates had the same PubMed ID (PMID, the .an. command in OVID), and were identical or almost identical.

Differences that I noticed were minimal changes in the MeSH (i.e. one or more MeSH  and/or subheadings changed) and changes in journal format (abbreviation used instead of full title).

Why are these duplicates present in OVID MEDLINE and not in PubMed?

These are the details of the PMID 20846254 in OVID (2 records) and in PubMed (1 record)

The Electronic Date of Publication (PHST)  was September 16th 2010. 2 days later the record was included in PubMed , but MeSH were added 3 months later ((MHDA: 2011/02/12). Around this date records are also entered in OVID MEDLINE. The only difference between the 2 records in OVID MEDLINE is that one record appears to be revised at 2011-10-13, whereas the other is not.

The duplicate records of 18231698 have again the same creation date (20080527) and entry date (20081203), but one is revised 2110-20-09 and updated 2010-12-14, while the other is revised 2011-08-18 and updated 2011-08-19 (thus almost one year later).

Possibly PubMed changes some records, instantaneously replacing the old ones, but OVID only includes the new PubMed records during MEDLINE-updates and doesn’t delete the old version.

Anyway, wouldn’t it be a good thing if OVID deduplicated its MEDLINE records on a daily basis or would replace the old ones when loading  new records from MEDLINE?

In the meantime, I would recommend to apply the deduplicate command yourself to get the exact number of unique records retrieved by your search in OVID MEDLINE.

*mostly because PubMed doesn’t have an adjacency-operator.
** Of course, only if you have already an extensive OVID MEDLINE search.





Evidence Based Point of Care Summaries [2] More Uptodate with Dynamed.

18 10 2011

ResearchBlogging.orgThis post is part of a short series about Evidence Based Point of Care Summaries or POCs. In this series I will review 3 recent papers that objectively compare a selection of POCs.

In the previous post I reviewed a paper from Rita Banzi and colleagues from the Italian Cochrane Centre [1]. They analyzed 18 POCs with respect to their “volume”, content development and editorial policy. There were large differences among POCs, especially with regard to evidence-based methodology scores, but no product appeared the best according to the criteria used.

In this post I will review another paper by Banzi et al, published in the BMJ a few weeks ago [2].

This article examined the speed with which EBP-point of care summaries were updated using a prospective cohort design.

First the authors selected all the systematic reviews signaled by the American College of Physicians (ACP) Journal Club and Evidence-Based Medicine Primary Care and Internal Medicine from April to December 2009. In the same period the authors selected all the Cochrane systematic reviews labelled as “conclusion changed” in the Cochrane Library. In total 128 systematic reviews were retrieved, 68 from the literature surveillance journals (53%) and 60 (47%) from the Cochrane Library. Two months after the collection started (June 2009) the authors did a monthly screen for a year to look for potential citation of the identified 128 systematic reviews in the POCs.

Only those 5 POCs were studied that were ranked in the top quarter for at least 2 (out of 3) desirable dimensions, namely: Clinical Evidence, Dynamed, EBM Guidelines, UpToDate and eMedicine. Surprisingly eMedicine was among the selected POCs, having a rating of “1” on a scale of 1 to 15 for EBM methodology. One would think that Evidence-based-ness is a fundamental prerequisite  for EBM-POCs…..?!

Results were represented as a (rather odd, but clear) “survival analysis” ( “death” = a citation in a summary).

Fig.1 : Updating curves for relevant evidence by POCs (from [2])

I will be brief about the results.

Dynamed clearly beated all the other products  in its updating speed.

Expressed in figures, the updating speed of Dynamed was 78% and 97% greater than those of EBM Guidelines and Clinical Evidence, respectively. Dynamed had a median citation rate of around two months and EBM Guidelines around 10 months, quite close to the limit of the follow-up, but the citation rate of the other three point of care summaries (UpToDate, eMedicine, Clinical Evidence) were so slow that they exceeded the follow-up period and the authors could not compute the median.

Dynamed outperformed the other POC’s in updating of systematic reviews independent of the route. EBM Guidelines and UpToDate had similar overall updating rates, but Cochrane systematic reviews were more likely to be cited by EBM Guidelines than by UpToDate (odds ratio 0.02, P<0.001). Perhaps not surprising, as EBM Guidelines has a formal agreement with the Cochrane Collaboration to use Cochrane contents and label its summaries as “Cochrane inside.” On the other hand, UpToDate was faster than EBM Guidelines in updating systematic reviews signaled by literature surveillance journals.

Dynamed‘s higher updating ability was not due to a difference in identifying important new evidence, but to the speed with which this new information was incorporated in their summaries. Possibly the central updating of Dynamed by the editorial team might account for the more prompt inclusion of evidence.

As the authors rightly point out, slowness in updating could mean that new relevant information is ignored and could thus affect the validity of point of care information services”.

A slower updating rate may be considered more important for POCs that “promise” to “continuously update their evidence summaries” (EBM-Guidelines) or to “perform a continuous comprehensive review and to revise chapters whenever important new information is published, not according to any specific time schedule” (UpToDate). (see table with description of updating mechanisms )

In contrast, Emedicine doesn’t provide any detailed information on updating policy, another reason that it doesn’t belong to this list of best POCs.
Clinical Evidence, however, clearly states, We aim to update Clinical Evidence reviews annually. In addition to this cycle, details of clinically important studies are added to the relevant reviews throughout the year using the BMJ Updates service.” But BMJ Updates is not considered in the current analysis. Furthermore, patience is rewarded with excellent and complete summaries of evidence (in my opinion).

Indeed a major limitation of the current (and the previous) study by Banzi et al [1,2] is that they have looked at quantitative aspects and items that are relatively “easy to score”, like “volume” and “editorial quality”, not at the real quality of the evidence (previous post).

Although the findings were new to me, others have recently published similar results (studies were performed in the same time-span):

Shurtz and Foster [3] of the Texas A&M University Medical Sciences Library (MSL) also sought to establish a rubric for evaluating evidence-based medicine (EBM) point-of-care tools in a health sciences library.

They, too, looked at editorial quality and speed of updating plus reviewing content, search options, quality control, and grading.

Their main conclusion is that “differences between EBM tools’ options, content coverage, and usability were minimal, but that the products’ methods for locating and grading evidence varied widely in transparency and process”.

Thus this is in line with what Banzi et al reported in their first paper. They also share Banzi’s conclusion about differences in speed of updating

“DynaMed had the most up-to-date summaries (updated on average within 19 days), while First Consult had the least up to date (updated on average within 449 days). Six tools claimed to update summaries within 6 months or less. For the 10 topics searched, however, only DynaMed met this claim.”

Table 3 from Shurtz and Foster [3] 

Ketchum et al [4] also conclude that DynaMed the largest proportion of current (2007-2009) references (170/1131, 15%). In addition they found that Dynamed had the largest total number of references (1131/2330, 48.5%).

Yes, and you might have guessed it. The paper of Andrea Ketchum is the 3rd paper I’m going to review.

I also recommend to read the paper of the librarians Shurtz and Foster [3], which I found along the way. It has too much overlap with the Banzi papers to devote a separate post to it. Still it provides better background information then the Banzi papers, it focuses on POCs that claim to be EBM and doesn’t try to weigh one element over another. 

References

  1. Banzi, R., Liberati, A., Moschetti, I., Tagliabue, L., & Moja, L. (2010). A Review of Online Evidence-based Practice Point-of-Care Information Summary Providers Journal of Medical Internet Research, 12 (3) DOI: 10.2196/jmir.1288
  2. Banzi, R., Cinquini, M., Liberati, A., Moschetti, I., Pecoraro, V., Tagliabue, L., & Moja, L. (2011). Speed of updating online evidence based point of care summaries: prospective cohort analysis BMJ, 343 (sep22 2) DOI: 10.1136/bmj.d5856
  3. Shurtz, S., & Foster, M. (2011). Developing and using a rubric for evaluating evidence-based medicine point-of-care tools Journal of the Medical Library Association : JMLA, 99 (3), 247-254 DOI: 10.3163/1536-5050.99.3.012
  4. Ketchum, A., Saleh, A., & Jeong, K. (2011). Type of Evidence Behind Point-of-Care Clinical Information Products: A Bibliometric Analysis Journal of Medical Internet Research, 13 (1) DOI: 10.2196/jmir.1539
  5. Evidence Based Point of Care Summaries [1] No “Best” Among the Bests? (laikaspoetnik.wordpress.com)
  6. How will we ever keep up with 75 Trials and 11 Systematic Reviews a Day? (laikaspoetnik.wordpress.com
  7. UpToDate or Dynamed? (Shamsha Damani at laikaspoetnik.wordpress.com)
  8. How Evidence Based is UpToDate really? (laikaspoetnik.wordpress.com)

Related articles (automatically generated)





Evidence Based Point of Care Summaries [1] No “Best” Among the Bests?

13 10 2011

ResearchBlogging.orgFor many of today’s busy practicing clinicians, keeping up with the enormous and ever growing amount of medical information, poses substantial challenges [6]. Its impractical to do a PubMed search to answer each clinical question and then synthesize and appraise the evidence. Simply, because busy health care providers have limited time and many questions per day.

As repeatedly mentioned on this blog ([67]), it is far more efficient to try to find aggregate (or pre-filtered or pre-appraised) evidence first.

Haynes ‘‘5S’’ levels of evidence (adapted by [1])

There are several forms of aggregate evidence, often represented as the higher layers of an evidence pyramid (because they aggregate individual studies, represented by the lowest layer). There are confusingly many pyramids, however [8] with different kinds of hierarchies and based on different principles.

According to the “5S” paradigm[9] (now evolved to 6S -[10]) the peak of the pyramid are the ideal but not yet realized computer decision support systems, that link the individual patient characteristics to the current best evidence. According to the 5S model the next best source are Evidence Based Textbooks.
(Note: EBM and textbooks almost seem a contradiction in terms to me, personally I would not put many of the POCs somewhere at the top. Also see my post: How Evidence Based is UpToDate really?)

Whatever their exact place in the EBM-pyramid, these POCs are helpful to many clinicians. There are many different POCs (see The HLWIKI Canada for a comprehensive overview [11]) with a wide range of costs, varying from free with ads (e-Medicine) to very expensive site licenses (UpToDate). Because of the costs, hospital libraries have to choose among them.

Choices are often based on user preferences and satisfaction and balanced against costs, scope of coverage etc. Choices are often subjective and people tend to stick to the databases they know.

Initial literature about POCs concentrated on user preferences and satisfaction. A New Zealand study [3] among 84 GPs showed no significant difference in preference for, or usage levels of DynaMed, MD Consult (including FirstConsult) and UpToDate. The proportion of questions adequately answered by POCs differed per study (see introduction of [4] for an overview) varying from 20% to 70%.
McKibbon and Fridsma ([5] cited in [4]) found that the information resources chosen by primary care physicians were seldom helpful in providing the correct answers, leading them to conclude that:

“…the evidence base of the resources must be strong and current…We need to evaluate them well to determine how best to harness the resources to support good clinical decision making.”

Recent studies have tried to objectively compare online point-of-care summaries with respect to their breadth, content development, editorial policy, the speed of updating and the type of evidence cited. I will discuss 3 of these recent papers, but will review each paper separately. (My posts tend to be pretty long and in-depth. So in an effort to keep them readable I try to cut down where possible.)

Two of the three papers are published by Rita Banzi and colleagues from the Italian Cochrane Centre.

In the first paper, reviewed here, Banzi et al [1] first identified English Web-based POCs using Medline, Google, librarian association websites, and information conference proceedings from January to December 2008. In order to be eligible, a product had to be an online-delivered summary that is regularly updated, claims to provide evidence-based information and is to be used at the bedside.

They found 30 eligible POCs, of which the following 18 databases met the criteria: 5-Minute Clinical Consult, ACP-Pier, BestBETs, CKS (NHS), Clinical Evidence, DynaMed, eMedicine,  eTG complete, EBM Guidelines, First Consult, GP Notebook, Harrison’s Practice, Health Gate, Map Of Medicine, Micromedex, Pepid, UpToDate, ZynxEvidence.

They assessed and ranked these 18 point-of-care products according to: (1) coverage (volume) of medical conditions, (2) editorial quality, and (3) evidence-based methodology. (For operational definitions see appendix 1)

From a quantitive perspective DynaMed, eMedicine, and First Consult were the most comprehensive (88%) and eTG complete the least (45%).

The best editorial quality of EBP was delivered by Clinical Evidence (15), UpToDate (15), eMedicine (13), Dynamed (11) and eTG complete (10). (Scores are shown in brackets)

Finally, BestBETs, Clinical Evidence, EBM Guidelines and UpToDate obtained the maximal score (15 points each) for best evidence-based methodology, followed by DynaMed and Map Of Medicine (12 points each).
As expected eMedicine, eTG complete, First Consult, GP Notebook and Harrison’s Practice had a very low EBM score (1 point each). Personally I would not have even considered these online sources as “evidence based”.

The calculations seem very “exact”, but assumptions upon which these figures are based are open to question in my view. Furthermore all items have the same weight. Isn’t the evidence-based methodology far more important than “comprehensiveness” and editorial quality?

Certainly because “volume” is “just” estimated by analyzing to which extent 4 random chapters of the ICD-10 classification are covered by the POCs. Some sources, like Clinical Evidence and BestBets (scoring low for this item) don’t aim to be comprehensive but only “answer” a limited number of questions: they are not textbooks.

Editorial quality is determined by scoring of the specific indicators of transparency: authorship, peer reviewing procedure, updating, disclosure of authors’ conflicts of interest, and commercial support of content development.

For the EB methodology, Banzi et al scored the following indicators:

  1. Is a systematic literature search or surveillance the basis of content development?
  2. Is the critical appraisal method fully described?
  3. Are systematic reviews preferred over other types of publication?
  4. Is there a system for grading the quality of evidence?
  5. When expert opinion is included is it easily recognizable over studies’ data and results ?

The  score for each of these indicators is 3 for “yes”, 1 for “unclear”, and 0 for “no” ( if judged “not adequate” or “not reported.”)

This leaves little room for qualitative differences and mainly relies upon adequate reporting. As discussed earlier in a post where I questioned the evidence-based-ness of UpToDate, there is a difference between tailored searches and checking a limited list of sources (indicator 1.). It also matters whether the search is mentioned or not (transparency), whether it is qualitatively ok and whether it is extensive or not. For lists, it matters how many sources are “surveyed”. It also matters whether one or both methods are used… These important differences are not reflected by the scores.

Furthermore some points may be more important than others. Personally I find step 1 the most important. For what good is appraising and grading if it isn’t applied to the most relevant evidence? It is “easy” to do a grading or to copy it from other sources (yes, I wouldn’t be surprised if some POCs are doing this).

On the other hand, a zero for one indicator can have too much weight on the score.

Dynamed got 12 instead of the maximum 15 points, because their editorial policy page didn’t explicitly describe their absolute prioritization of systematic reviews although they really adhere to that in practice (see comment by editor-in-chief  Brian Alper [2]). Had Dynamed received the deserved 15 points for this indicator, they would have had the highest score overall.

The authors further conclude that none of the dimensions turned out to be significantly associated with the other dimensions. For example, BestBETs scored among the worst on volume (comprehensiveness), with an intermediate score for editorial quality, and the highest score for evidence-based methodology.  Overall, DynaMed, EBM Guidelines, and UpToDate scored in the top quartile for 2 out of 3 variables and in the 2nd quartile for the 3rd of these variables. (but as explained above Dynamed really scored in the top quartile for all 3 variables)

On basis of their findings Banzi et al conclude that only a few POCs satisfied the criteria, with none excelling in all.

The finding that Pepid, eMedicine, eTG complete, First Consult, GP Notebook, Harrison’s Practice and 5-Minute Clinical Consult only obtained 1 or 2 of the maximum 15 points for EBM methodology confirms my “intuitive grasp” that these sources really don’t deserve the label “evidence based”. Perhaps we should make a more strict distinction between “point of care” databases as a point where patients and practitioners interact, particularly referring to the context of the provider-patient dyad (definition by Banzi et al) and truly evidence based summaries. Only few of the tested databases would fit the latter definition. 

In summary, Banzi et al reviewed 18 Online Evidence-based Practice Point-of-Care Information Summary Providers. They comprehensively evaluated and summarized these resources with respect to coverage (volume) of medical conditions, editorial quality, and (3) evidence-based methodology. 

Limitations of the study, also according to the authors, were the lack of a clear definition of these products, arbitrariness of the scoring system and emphasis on the quality of reporting. Furthermore the study didn’t really assess the products qualitatively (i.e. with respect to performance). Nor did it take into account that products might have a different aim. Clinical Evidence only summarizes evidence on the effectiveness of treatments of a limited number of diseases, for instance. Therefore it scores bad on volume while excelling on the other items. 

Nevertheless it is helpful that POCs are objectively compared and it may help as starting point for decisions about acquisition.

References (not in chronological order)

  1. Banzi, R., Liberati, A., Moschetti, I., Tagliabue, L., & Moja, L. (2010). A Review of Online Evidence-based Practice Point-of-Care Information Summary Providers Journal of Medical Internet Research, 12 (3) DOI: 10.2196/jmir.1288
  2. Alper, B. (2010). Review of Online Evidence-based Practice Point-of-Care Information Summary Providers: Response by the Publisher of DynaMed Journal of Medical Internet Research, 12 (3) DOI: 10.2196/jmir.1622
  3. Goodyear-Smith F, Kerse N, Warren J, & Arroll B (2008). Evaluation of e-textbooks. DynaMed, MD Consult and UpToDate. Australian family physician, 37 (10), 878-82 PMID: 19002313
  4. Ketchum, A., Saleh, A., & Jeong, K. (2011). Type of Evidence Behind Point-of-Care Clinical Information Products: A Bibliometric Analysis Journal of Medical Internet Research, 13 (1) DOI: 10.2196/jmir.1539
  5. McKibbon, K., & Fridsma, D. (2006). Effectiveness of Clinician-selected Electronic Information Resources for Answering Primary Care Physicians’ Information Needs Journal of the American Medical Informatics Association, 13 (6), 653-659 DOI: 10.1197/jamia.M2087
  6. How will we ever keep up with 75 Trials and 11 Systematic Reviews a Day? (laikaspoetnik.wordpress.com)
  7. 10 + 1 PubMed Tips for Residents (and their Instructors) (laikaspoetnik.wordpress.com)
  8. Time to weed the (EBM-)pyramids?! (laikaspoetnik.wordpress.com)
  9. Haynes RB. Of studies, syntheses, synopses, summaries, and systems: the “5S” evolution of information services for evidence-based healthcare decisions. Evid Based Med 2006 Dec;11(6):162-164. [PubMed]
  10. DiCenso A, Bayley L, Haynes RB. ACP Journal Club. Editorial: Accessing preappraised evidence: fine-tuning the 5S model into a 6S model. Ann Intern Med. 2009 Sep 15;151(6):JC3-2, JC3-3. PubMed PMID: 19755349 [free full text].
  11. How Evidence Based is UpToDate really? (laikaspoetnik.wordpress.com)
  12. Point_of_care_decision-making_tools_-_Overview (hlwiki.slais.ubc.ca)
  13. UpToDate or Dynamed? (Shamsha Damani at laikaspoetnik.wordpress.com)

Related articles (automatically generated)





PubMed’s Higher Sensitivity than OVID MEDLINE… & other Published Clichés.

21 08 2011

ResearchBlogging.orgIs it just me, or are biomedical papers about searching for a systematic review often of low quality or just too damn obvious? I’m seldom excited about papers dealing with optimal search strategies or peculiarities of PubMed, even though it is my specialty.
It is my impression, that many of the lower quality and/or less relevant papers are written by clinicians/researchers instead of information specialists (or at least no medical librarian as the first author).

I can’t help thinking that many of those authors just happen to see an odd feature in PubMed or encounter an unexpected  phenomenon in the process of searching for a systematic review.
They think: “Hey, that’s interesting” or “that’s odd. Lets write a paper about it.” An easy way to boost our scientific output!
What they don’t realize is that the published findings are often common knowledge to the experienced MEDLINE searchers.

Lets give two recent examples of what I think are redundant papers.

The first example is a letter under the heading “Clinical Observation” in Annals of Internal Medicine, entitled:

“Limitations of the MEDLINE Database in Constructing Meta-analyses”.[1]

As the authors rightly state “a thorough literature search is of utmost importance in constructing a meta-analysis. Since the PubMed interface from the National Library of Medicine is a cornerstone of many meta-analysis,  the authors (two MD’s) focused on the freely available PubMed” (with MEDLINE as its largest part).

The objective was:

“To assess the accuracy of MEDLINE’s “human” and “clinical trial” search limits, which are used by authors to focus literature searches on relevant articles.” (emphasis mine)

O.k…. Stop! I know enough. This paper should have be titled: “Limitation of Limits in MEDLINE”.

Limits are NOT DONE, when searching for a systematic review. For the simple reason that most limits (except language and dates) are MESH-terms.
It takes a while before the indexers have assigned a MESH to the papers and not all papers are correctly (or consistently) indexed. Thus, by using limits you will automatically miss recent, not yet, or not correctly indexed papers. Whereas it is your goal (or it should be) to find as many relevant papers as possible for your systematic review. And wouldn’t it be sad if you missed that one important RCT that was published just the other day?

On the other hand, one doesn’t want to drown in irrelevant papers. How can one reduce “noise” while minimizing the risk of loosing relevant papers?

  1. Use both MESH and textwords to “limit” you search, i.e. also search “trial” as textword, i.e. in title and abstract: trial[tiab]
  2. Use more synonyms and truncation (random*[tiab] OR  placebo[tiab])
  3. Don’t actively limit but use double negation. Thus to get rid of animal studies, don’t limit to humans (this is the same as combining with MeSH [mh]) but safely exclude animals as follows: NOT animals[mh] NOT humans[mh] (= exclude papers indexed with “animals” except when these papers are also indexed with “humans”).
  4. Use existing Methodological Filters (ready-made search strategies) designed to help focusing on study types. These filters are based on one or more of the above-mentioned principles (see earlier posts here and here).
    Simple Methodological Filters can be found at the PubMed Clinical Queries. For instance the narrow filter for Therapy not only searches for the Publication Type “Randomized controlled trial” (a limit), but also for randomized, controlled ànd  trial  as textwords.
    Usually broader (more sensitive) filters are used for systematic reviews. The Cochrane handbook proposes to use the following filter maximizing precision and sensitivity to identify randomized trials in PubMed (see http://www.cochrane-handbook.org/):
    (randomized controlled trial [pt] OR controlled clinical trial [pt] OR randomized [tiab] OR placebo [tiab] OR clinical trials as topic [mesh: noexp] OR randomly [tiab] OR trial [ti]) NOT (animals [mh] NOT humans [mh]).
    When few hits are obtained, one can either use a broader filter or no filter at all.

In other words, it is a beginner’s mistake to use limits when searching for a systematic review.
Besides that the authors publish what should be common knowledge (even our medical students learn it) they make many other (little) mistakes, their precise search is difficult to reproduce and far from complete. This is already addressed by Dutch colleagues in a comment [2].

The second paper is:

PubMed had a higher sensitivity than Ovid-MEDLINE in the search for systematic reviews [3], by Katchamart et al.

Again this paper focuses on the usefulness of PubMed to identify RCT’s for a systematic review, but it concentrates on the differences between PubMed and OVID in this respect. The paper starts with  explaining that PubMed:

provides access to bibliographic information in addition to MEDLINE, such as in-process citations (..), some OLDMEDLINE citations (….) citations that precede the date that a journal was selected for MEDLINE indexing, and some additional life science journals that submit full texts to PubMed Central and receive a qualitative review by NLM.

Given these “facts”, am I exaggerating when I am saying that the authors are pushing at an open door when their main conclusion is that PubMed retrieved more citations overall than Ovid-MEDLINE? The one (!) relevant article missed in OVID was a 2005 study published in a Japanese journal that MEDLINE started indexing in 2007. It was therefore in PubMed, but not in OVID MEDLINE.

An important aspect to keep in mind when searching OVID/MEDLINE ( I have earlier discussed here and here). But worth a paper?

Recently, after finishing an exhaustive search in OVID/MEDLINE, we noticed that we missed a RCT in PubMed, that was not yet available in OVID/MEDLINE.  I just added one sentence to the search methods:

Additionally, PubMed was searched for randomized controlled trials ahead of print, not yet included in OVID MEDLINE. 

Of course, I could have devoted a separate article to this finding. But it is so self-evident, that I don’t think it would be worth it.

The authors have expressed their findings in sensitivity (85% for Ovid-MEDLINE vs. 90% for PubMed, 5% is that ONE paper missing), precision and  number to read (comparable for OVID-MEDLINE and PubMed).

If I might venture another opinion: it looks like editors of medical and epidemiology journals quickly fall for “diagnostic parameters” on a topic that they don’t understand very well: library science.

The sensitivity/precision data found have little general value, because:

  • it concerns a single search on a single topic
  • there are few relevant papers (17- 18)
  • useful features of OVID MEDLINE that are not available in PubMed are not used. I.e. Adjacency searching could enhance the retrieval of relevant papers in OVID MEDLINE (adjacency=words searched within a specified maximal distance of each other)
  • the searches are not comparable, nor are the search field commands.

The latter is very important, if one doesn’t wish to compare apples and oranges.

Lets take a look at the first part of the search (which is in itself well structured and covers many synonyms).
First part of the search - Click to enlarge
This part of the search deals with the P: patients with rheumatoid arthritis (RA). The authors first search for relevant MeSH (set 1-5) and then for a few textwords. The MeSH are fine. The authors have chosen to use Arthritis, rheumatoid and a few narrower terms (MeSH-tree shown at the right). The authors have taken care to use the MeSH:noexp command in PubMed to prevent the automatic explosion of narrower terms in PubMed (although this is superfluous for MesH terms having no narrow terms, like Caplan syndrome etc.).

But the fields chosen for the free text search (sets 6-9) are not comparable at all.

In OVID the mp. field is used, whereas all fields or even no fields are used in PubMed.

I am not even fond of the uncontrolled use of .mp (I rather search in title and abstract, remember we already have the proper MESH-terms), but all fields is even broader than .mp.

In general a .mp. search looks in the Title, Original Title, Abstract, Subject Heading, Name of Substance, and Registry Word fields. All fields would be .af in OVID not .mp.

Searching for rheumatism in OVID using the .mp field yields 7879 hits against 31390 hits when one searches in the .af field.

Thus 4 times as much. Extra fields searched are for instance the journal and the address field. One finds all articles in the journal Arthritis & Rheumatism for instance [line 6], or papers co-authored by someone of the dept. of rheumatoid surgery [line 9]

Worse, in PubMed the “all fields” command doesn’t prevent the automatic mapping.

In PubMed, Rheumatism[All Fields] is translated as follows:

“rheumatic diseases”[MeSH Terms] OR (“rheumatic”[All Fields] AND “diseases”[All Fields]) OR “rheumatic diseases”[All Fields] OR “rheumatism”[All Fields]

Oops, Rheumatism[All Fields] is searched as the (exploded!) MeSH rheumatic diseases. Thus rheumatic diseases (not included in the MeSH-search) plus all its narrower terms! This makes the entire first part of the PubMed search obsolete (where the authors searched for non-exploded specific terms). It explains the large difference in hits with rheumatism between PubMed and OVID/MEDLINE: 11910 vs 6945.

Not only do the authors use this .mp and [all fields] command instead of the preferred [tiab] field, they also apply this broader field to the existing (optimized) Cochrane filter, that uses [tiab]. Finally they use limits!

Well anyway, I hope that I made my point that useful comparison between strategies can only be made if optimal strategies and comparable  strategies are used. Sensitivity doesn’t mean anything here.

Coming back to my original point. I do think that some conclusions of these papers are “good to know”. As a matter of fact it should be basic knowledge for those planning an exhaustive search for a systematic review. We do not need bad studies to show this.

Perhaps an expert paper (or a series) on this topic, understandable for clinicians, would be of more value.

Or the recognition that such search papers should be designed and written by librarians with ample experience in searching for systematic reviews.

NOTE:
* = truncation=search for different word endings; [tiab] = title and abstract; [ti]=title; mh=mesh; pt=publication type

Photo credit

The image is taken from the Dragonfly-blog; here the Flickr-image Brain Vocab Sketch by labguest was adapted by adding the Pubmed logo.

References

  1. Winchester DE, & Bavry AA (2010). Limitations of the MEDLINE database in constructing meta-analyses. Annals of internal medicine, 153 (5), 347-8 PMID: 20820050
  2. Leclercq E, Kramer B, & Schats W (2011). Limitations of the MEDLINE database in constructing meta-analyses. Annals of internal medicine, 154 (5) PMID: 21357916
  3. Katchamart W, Faulkner A, Feldman B, Tomlinson G, & Bombardier C (2011). PubMed had a higher sensitivity than Ovid-MEDLINE in the search for systematic reviews. Journal of clinical epidemiology, 64 (7), 805-7 PMID: 20926257
  4. Search OVID EMBASE and Get MEDLINE for Free…. without knowing it (laikaspoetnik.wordpress.com 2010/10/19/)
  5. 10 + 1 PubMed Tips for Residents (and their Instructors) (laikaspoetnik.wordpress.com 2009/06/30)
  6. Adding Methodological filters to myncbi (laikaspoetnik.wordpress.com 2009/11/26/)
  7. Search filters 1. An Introduction (laikaspoetnik.wordpress.com 2009/01/22/)




#FollowFriday #FF @DrJenGunter: EBM Sex Health Expert Wielding the Lasso of Truth

19 08 2011

If you’re on Twitter you probably seen the #FF or #FollowFriday phenomenon. FollowFriday is a way to recommend people on Twitter to others. For at least 2 reasons: to acknowledge your favorite tweople and to make it easier for your followers to find new interesting people.

However, some #FollowFriday tweet-series are more like a weekly spam. Almost 2 years ago I blogged about the misuse of FF-recommendations and I gave some suggestions to do #FollowFriday the right way: not by sheer mentioning many people in numerous  tweets, but by recommending one or a few people a time, and explaining why this person is so awesome to follow.

Twitter Lists are also useful tools for recommending people (see post). You could construct lists of your favorite Twitter people for others to follow. I have created a general FollowFridays list, where I list all the people I have recommended in a #FF-tweet and/or post.

In this post I would like to take up the tradition of highlighting the #FF favs at my blog. .

This FollowFriday I recommend:  

Jennifer Gunter

Jennifer Gunter (@DrJenGunter at Twitter), is a beautiful lady, but she shouldn’t be tackled without gloves, for she is a true defender of evidence-based medicine and wields the lasso of truth.

Her specialty is OB/GYN. She is a sex health expert. No surprise, many tweets are related to this topic, some very serious, some with a humorous undertone. And there can be just fun (re)tweets, like:

LOL -> “@BackpackingDad: New Word: Fungry. Full-hungry. “I just ate a ton of nachos, but hot damn am I fungry for those Buffalo wings!””

Dr Jen Gunter has a blog Dr. Jen Gunther (wielding the lasso of truth). 

Again we find the same spectrum of posts, mostly in the field of ob/gyn. You need not be an ob/gyn nor an EBM expert to enjoy them. Jen’s posts are written in plain language, suitable for anyone to understand (including patients).

Some titles:

In addition, There are also hilarious posts like “Cosmo’s sex position of the day proves they know nothing about good sex or women“,where she criticizes Cosmo for tweeting impossible sex positions (“If you’re over 40, I dare you to even GET into that position! “), which she thinks were created by one of the following:

A) a computer who has never had sex and is not programmed to understand how the female body bends.
B) a computer programmer who has never has sex and has no understanding of how the female body bends.
C) a Yogi master/Olympic athlete.

Sometimes the topic is blogging. Jen is a fierce proponent of medical blogging. She sees it as a way to “promote” yourself as a doctor, to learn from your readers and to “contribute credible content drowns out garbage medical information” (true) and as an ideal platform to deliver content to your patients and like-minded medical professionals. (great idea)

Read more at:

You can follow Jen at her Twitter-account (http://twitter.com/#!/DrJenGunter) and/or you can follow my lists. She is on:  ebm-cochrane-sceptics and the followfridays list.

Of course you can also take a subscription to her blog http://drjengunter.wordpress.com/

Related articles





RIP Statistician Paul Meier. Proponent not Father of the RCT.

14 08 2011

This headline in Boing Boing caught my eye today:  RIP Paul Meier, father of the randomized trial

Not surprisingly, I knew that Paul Meier (with Kaplan) introduced the Kaplan-Meier estimator (1958), a very important tool for measuring how many patients survive a medical treatment. But I didn’t know he was “father of the randomized trial”….

But is he really?:Father of the randomized trial and “probably best known for the introduction of randomized trials into the evaluation of medical treatments”, as Boing Boing states?

Boing Boing’s very short article is based on the New York Times article: Paul Meier, Statistician Who Revolutionized Medical Trials, Dies at 87. According to the NY Times “Dr. Meier was one of the first and most vocal proponents of what is called “randomization.” 

Randomization, the NY-Times explains, is:

Under the protocol, researchers randomly assign one group of patients to receive an experimental treatment and another to receive the standard treatment. In that way, the researchers try to avoid unintentionally skewing the results by choosing, for example, the healthier or younger patients to receive the new treatment.

(for a more detailed explanation see my previous posts The best study designs…. for dummies and #NotSoFunny #16 – Ridiculing RCTs & EBM)

Meier was a very successful proponent, that is for sure. According to Sir Richard Peto, (Dr. Meier) “perhaps more than any other U.S. statistician, was the one who influenced U.S. drug regulatory agencies, and hence clinical researchers throughout the U.S. and other countries, to insist on the central importance of randomized evidence.”

But an advocate need not be a father, for advocates are seldom the inventors/creators. A proponent is more of a nurse, a mentor or a … foster-parent.

Is Meier the true father/inventor of the RCT? And if not, who is?

Googling “Father of the randomized trial” won’t help, because all 1.610  hits point to Dr. Meier…. thanks to Boing Boing careless copying.

What I read so far doesn’t point at one single creator. And the RCT wasn’t just suddenly there. It started with comparison of treatments under controlled conditions. Back in 1753, the British naval surgeon James Lind published his famous account of 12 scurvy patients, “their cases as similar as I could get them” noting that “the most sudden and visible good effects were perceived from the uses of the oranges and lemons and that citrus fruit cured scurvy [3]. The French physician Pierre Louis and Harvard anatomist Oliver Wendell Holmes (19th century) were also fierce proponents of supporting conclusions about the effectiveness of treatments with statistics, not subjective impressions.[4]

But what was the first real RCT?

Perhaps the first real RCT was The Nuremberg salt test (1835) [6]. This was possibly not only the first RCT, but also the first scientific demonstration of the lack of effect of a homeopathic dilution. More than 50 visitors of a local tavern participated in the experiment. Half of them received a vial  filled with distilled snow water, the other half a vial with ordinary salt in a homeopathic C30-dilution of distilled snow water. None of the participants knew whether he got the “actual medicine or not” (blinding). The numbered vials were coded and the code was broken after the experiment (allocation concealment).

The first publications of RCT’s were in the field of psychology and agriculture. As a matter of fact one other famous statistician, Ronald A. Fisher  (of the Fisher’s exact test) seems to play a more important role in the genesis and popularization of RCT’s than Meier, albeit in agricultural research [5,7]. The book “The Lady Tasting Tea: How Statistics Revolutionized Science in the Twentieth Century” describes how Fisher devised a randomized trial at the spot to test the contention of a lady that she could taste the difference between tea into which milk had been poured and tea that had been poured into milk (almost according to homeopathic principles) [7]

According to Wikipedia [5] the published (medical) RCT appeared in the 1948 paper entitled “Streptomycin treatment of pulmonary tuberculosis”. One of the authors, Austin Bradford Hill, is (also) credited as having conceived the modern RCT.

Thus the road to the modern RCT is long, starting with the notions that experiments should be done under controlled conditions and that it doesn’t make sense to base treatment on intuition. Later, experiments were designed in which treatments were compared to placebo (or other treatments) in a randomized and blinded fashion, with concealment of allocation.

Paul Meier was not the inventor of the RCT, but a successful vocal proponent of the RCT. That in itself is commendable enough.

And although the Boing Boing article was incorrect, and many people googling for “father of the RCT” will find the wrong answer from now on, it did raise my interest in the history of the RCT and the role of statisticians in the development of science and clinical trials.
I plan to read a few of the articles and books mentioned below. Like the relatively lighthearted “The Lady Tasting Tea” [7]. You can envision a book review once I have finished reading it.

Note added 15-05 13.45 pm:

Today a more accurate article appeared in the Boston Globe (“Paul Meier; revolutionized medical studies using math”), which does justice to the important role of Dr Meier in the espousal of randomization as an essential element in clinical trials. For that is what he did.

Quote:

Dr. Meier published a scathing paper in the journal Science, “Safety Testing of Poliomyelitis Vaccine,’’ in which he described deficiencies in the production of vaccines by several companies. His paper was seen as a forthright indictment of federal authorities, pharmaceutical manufacturers, and the National Foundation for Infantile Paralysis, which funded the research for a polio vaccine.

  1. RIP Paul Meier, father of the randomized trial (boingboing.net)
  2. Paul Meier, Statistician Who Revolutionized Medical Trials, Dies at 87 (nytimes.com)
  3. M L Meldrum A brief history of the randomized controlled trial. From oranges and lemons to the gold standard. Hematology/ Oncology Clinics of North America (2000) Volume: 14, Issue: 4, Pages: 745-760, vii PubMed: 10949771  or see http://www.mendeley.com
  4. Fye WB. The power of clinical trials and guidelines,and the challenge of conflicts of interest. J Am Coll Cardiol. 2003 Apr 16;41(8):1237-42. PubMed PMID: 12706915. Full text
  5. http://en.wikipedia.org/wiki/Randomized_controlled_trial
  6. Stolberg M (2006). Inventing the randomized double-blind trial: The Nuremberg salt test of 1835. JLL Bulletin: Commentaries on the history of treatment evaluation (www.jameslindlibrary.org).
  7. The Lady Tasting Tea: How Statistics Revolutionized Science in the Twentieth Century Peter Cummings, MD, MPH, Jama 2001;286(10):1238-1239. doi:10.1001/jama.286.10.1238  Book Review.
    Book by David Salsburg, 340 pp, with illus, $23.95, ISBN 0-7167-41006-7, New York, NY, WH Freeman, 2001.
  8. Kaptchuk TJ. Intentional ignorance: a history of blind assessment and placebo controls in medicine. Bull Hist Med. 1998 Fall;72(3):389-433. PubMed PMID: 9780448. abstract
  9. The best study design for dummies/ (https://laikaspoetnik.wordpress.com: 2008/08/25/)
  10. #Notsofunny: Ridiculing RCT’s and EBM (https://laikaspoetnik.wordpress.com: 2010/02/01/)
  11. RIP Paul Meier : Research Randomization Advocate (mystrongmedicine.com)
  12. If randomized clinical trials don’t show that your woo works, try anthropology! (scienceblogs.com)
  13. The revenge of “microfascism”: PoMo strikes medicine again (scienceblogs.com)




HOT TOPIC: Does Soy Relieve Hot Flashes?

20 06 2011

ResearchBlogging.orgThe theme of the Upcoming Grand Rounds held at June 21th (1st day of the Summer) at Shrink Rap is “hot”. A bit far-fetched, but aah you know….shrinks“. Of course they hope  assume  that we will express Weiner-like exhibitionism at our blogs. Or go into spicy details of hot sexpectations or other Penis Friday NCBI-ROFL posts. But no, not me, scientist and librarian to my bone marrow. I will stick to boring, solid science and will do a thorough search to find the evidence. Here I will discuss whether soy really helps to relieve hot flashes (also called hot flushes).

…..As illustrated by this HOT picture, I should post as well…..

(CC from Katy Tresedder, Flickr):

Yes, many menopausal women plagued by hot flashes take their relief  in soy or other phytoestrogens (estrogen-like chemicals derived from plants). I know, because I happen to have many menopausal women in my circle of friends who prefer taking soy over estrogen. They rather not take normal hormone replacement therapy, because this can have adverse effects if taken for a longer time. Soy on the other hand is considered a “natural remedy”, and harmless. Probably physiological doses of soy (food) are harmless and therefore a better choice than the similarly “natural” black cohosh, which is suspected to give liver injury and other adverse effects.

But is soy effective?

I did a quick search in PubMed and found a Cochrane Systematic Review from 2007 that was recently edited with no change to the conclusions.

This review looked at several phytoestrogens that were offered in several ways, as: dietary soy (9x) (powder, cereals, drinks, muffins), soy extracts (9x), red clover extracts (7x, including Promensil (5x)), Genistein extract , Flaxseed, hop-extract  and a Chinese medicinal herb.

Thirty randomized controlled trials with a total of 2730 participants met the inclusion criteria: the participants were women in or just before their menopause complaining of vasomotor symptoms (thus having hot flashes) for at least 12 weeks. The intervention was a food or supplement with high levels of phytoestrogens (not any other herbal treatment) and this was compared with placebo, no treatment or hormone replacement therapy.

Only 5 trials using the red clover extract Promensil were homogenous enough to combine in a meta-analysis. The effect on one outcome (incidence of hot flashes) is shown below. As can be seen at a glance, Promensil had no significant effect, whether given in a low (40 mg/day) or a higher (80 mg/day) dose. This was also true for the other outcomes.

The other phytoestrogen interventions were very heterogeneous with respect to dose, composition and type. This was especially true for the dietary soy treatment. Although some of the trials showed a positive effect of phytoestrogens on hot flashes and night sweats, overall, phytoestrogens were no better than the comparisons.

Most trials were small,  of short duration and/or of poor quality. Fewer than half of the studies (n=12) indicated that allocation had been concealed from the trial investigators.

One striking finding was that there was a strong placebo effect in most trials with a reduction in frequency of hot flashes ranging from 1% to 59% .

I also found another systematic review in PubMed by Bolaños R et al , that limited itself only to soy. Other differences with the Cochrane Systematic Review (besides the much simpler search 😉 ) were: inclusion of more recently published clinical trials, no inclusion of unpublished studies and less strict exclusion on basis of low methodological quality. Furthermore, genestein was (rightly) considered as a soy product.

The group of studies that used soy dietary supplement showed the highest heterogeneity. Overall, the results “showed a significant tendency(?)  in favor of soy. Nevertheless the authors conclude (similar to the Cochrane authors), that  it is still difficult to establish conclusive results given the high heterogeneity found in the studies. (but apparently the data could still be pooled?)

References

  • Lethaby A, Marjoribanks J, Kronenberg F, Roberts H, Eden J, & Brown J. (2007). Phytoestrogens for vasomotor menopausal symptoms Cochrane Database of Systematic Reviews (4) : 10.1002/14651858.CD001395.pub3.
  • Bolaños R, Del Castillo A, & Francia J (2010). Soy isoflavones versus placebo in the treatment of climacteric vasomotor symptoms: systematic review and meta-analysis. Menopause (New York, N.Y.), 17 (3), 660-6 PMID: 20464785