RIP Statistician Paul Meier. Proponent not Father of the RCT.

14 08 2011

This headline in Boing Boing caught my eye today:  RIP Paul Meier, father of the randomized trial

Not surprisingly, I knew that Paul Meier (with Kaplan) introduced the Kaplan-Meier estimator (1958), a very important tool for measuring how many patients survive a medical treatment. But I didn’t know he was “father of the randomized trial”….

But is he really?:Father of the randomized trial and “probably best known for the introduction of randomized trials into the evaluation of medical treatments”, as Boing Boing states?

Boing Boing’s very short article is based on the New York Times article: Paul Meier, Statistician Who Revolutionized Medical Trials, Dies at 87. According to the NY Times “Dr. Meier was one of the first and most vocal proponents of what is called “randomization.” 

Randomization, the NY-Times explains, is:

Under the protocol, researchers randomly assign one group of patients to receive an experimental treatment and another to receive the standard treatment. In that way, the researchers try to avoid unintentionally skewing the results by choosing, for example, the healthier or younger patients to receive the new treatment.

(for a more detailed explanation see my previous posts The best study designs…. for dummies and #NotSoFunny #16 – Ridiculing RCTs & EBM)

Meier was a very successful proponent, that is for sure. According to Sir Richard Peto, (Dr. Meier) “perhaps more than any other U.S. statistician, was the one who influenced U.S. drug regulatory agencies, and hence clinical researchers throughout the U.S. and other countries, to insist on the central importance of randomized evidence.”

But an advocate need not be a father, for advocates are seldom the inventors/creators. A proponent is more of a nurse, a mentor or a … foster-parent.

Is Meier the true father/inventor of the RCT? And if not, who is?

Googling “Father of the randomized trial” won’t help, because all 1.610  hits point to Dr. Meier…. thanks to Boing Boing careless copying.

What I read so far doesn’t point at one single creator. And the RCT wasn’t just suddenly there. It started with comparison of treatments under controlled conditions. Back in 1753, the British naval surgeon James Lind published his famous account of 12 scurvy patients, “their cases as similar as I could get them” noting that “the most sudden and visible good effects were perceived from the uses of the oranges and lemons and that citrus fruit cured scurvy [3]. The French physician Pierre Louis and Harvard anatomist Oliver Wendell Holmes (19th century) were also fierce proponents of supporting conclusions about the effectiveness of treatments with statistics, not subjective impressions.[4]

But what was the first real RCT?

Perhaps the first real RCT was The Nuremberg salt test (1835) [6]. This was possibly not only the first RCT, but also the first scientific demonstration of the lack of effect of a homeopathic dilution. More than 50 visitors of a local tavern participated in the experiment. Half of them received a vial  filled with distilled snow water, the other half a vial with ordinary salt in a homeopathic C30-dilution of distilled snow water. None of the participants knew whether he got the “actual medicine or not” (blinding). The numbered vials were coded and the code was broken after the experiment (allocation concealment).

The first publications of RCT’s were in the field of psychology and agriculture. As a matter of fact one other famous statistician, Ronald A. Fisher  (of the Fisher’s exact test) seems to play a more important role in the genesis and popularization of RCT’s than Meier, albeit in agricultural research [5,7]. The book “The Lady Tasting Tea: How Statistics Revolutionized Science in the Twentieth Century” describes how Fisher devised a randomized trial at the spot to test the contention of a lady that she could taste the difference between tea into which milk had been poured and tea that had been poured into milk (almost according to homeopathic principles) [7]

According to Wikipedia [5] the published (medical) RCT appeared in the 1948 paper entitled “Streptomycin treatment of pulmonary tuberculosis”. One of the authors, Austin Bradford Hill, is (also) credited as having conceived the modern RCT.

Thus the road to the modern RCT is long, starting with the notions that experiments should be done under controlled conditions and that it doesn’t make sense to base treatment on intuition. Later, experiments were designed in which treatments were compared to placebo (or other treatments) in a randomized and blinded fashion, with concealment of allocation.

Paul Meier was not the inventor of the RCT, but a successful vocal proponent of the RCT. That in itself is commendable enough.

And although the Boing Boing article was incorrect, and many people googling for “father of the RCT” will find the wrong answer from now on, it did raise my interest in the history of the RCT and the role of statisticians in the development of science and clinical trials.
I plan to read a few of the articles and books mentioned below. Like the relatively lighthearted “The Lady Tasting Tea” [7]. You can envision a book review once I have finished reading it.

Note added 15-05 13.45 pm:

Today a more accurate article appeared in the Boston Globe (“Paul Meier; revolutionized medical studies using math”), which does justice to the important role of Dr Meier in the espousal of randomization as an essential element in clinical trials. For that is what he did.

Quote:

Dr. Meier published a scathing paper in the journal Science, “Safety Testing of Poliomyelitis Vaccine,’’ in which he described deficiencies in the production of vaccines by several companies. His paper was seen as a forthright indictment of federal authorities, pharmaceutical manufacturers, and the National Foundation for Infantile Paralysis, which funded the research for a polio vaccine.

  1. RIP Paul Meier, father of the randomized trial (boingboing.net)
  2. Paul Meier, Statistician Who Revolutionized Medical Trials, Dies at 87 (nytimes.com)
  3. M L Meldrum A brief history of the randomized controlled trial. From oranges and lemons to the gold standard. Hematology/ Oncology Clinics of North America (2000) Volume: 14, Issue: 4, Pages: 745-760, vii PubMed: 10949771  or see http://www.mendeley.com
  4. Fye WB. The power of clinical trials and guidelines,and the challenge of conflicts of interest. J Am Coll Cardiol. 2003 Apr 16;41(8):1237-42. PubMed PMID: 12706915. Full text
  5. http://en.wikipedia.org/wiki/Randomized_controlled_trial
  6. Stolberg M (2006). Inventing the randomized double-blind trial: The Nuremberg salt test of 1835. JLL Bulletin: Commentaries on the history of treatment evaluation (www.jameslindlibrary.org).
  7. The Lady Tasting Tea: How Statistics Revolutionized Science in the Twentieth Century Peter Cummings, MD, MPH, Jama 2001;286(10):1238-1239. doi:10.1001/jama.286.10.1238  Book Review.
    Book by David Salsburg, 340 pp, with illus, $23.95, ISBN 0-7167-41006-7, New York, NY, WH Freeman, 2001.
  8. Kaptchuk TJ. Intentional ignorance: a history of blind assessment and placebo controls in medicine. Bull Hist Med. 1998 Fall;72(3):389-433. PubMed PMID: 9780448. abstract
  9. The best study design for dummies/ (https://laikaspoetnik.wordpress.com: 2008/08/25/)
  10. #Notsofunny: Ridiculing RCT’s and EBM (https://laikaspoetnik.wordpress.com: 2010/02/01/)
  11. RIP Paul Meier : Research Randomization Advocate (mystrongmedicine.com)
  12. If randomized clinical trials don’t show that your woo works, try anthropology! (scienceblogs.com)
  13. The revenge of “microfascism”: PoMo strikes medicine again (scienceblogs.com)
Advertisements




Kaleidoscope 2: 2010 wk 31

8 08 2010

Almost a year ago I started a new series Kaleidoscope, with a “kaleidoscope” of facts, findings, views and news gathered over the last 1-2 weeks.
It never got beyond the first edition. Perhaps the introduction of this Kaleidoscope was to overwhelming & dazzling: lets say it was very rich in content. Or as
Andrew Spong tweeted: “Part cornucopia, part cabinet of wonders, it’s @laikas Kaleidoscope 2009 wk 47”

This is  a reprise in a (somewhat) “shorter” format. Lets see how it turns out.

This edition will concentrate on Social Media (Blogging, Twitter Google Wave). I fear that I won’t keep my promise, if I deal with more topics.

Medical Grand Rounds and News from the Blogosphere

Life in the Fast Lane is the host of this weeks Grand Rounds. This edition is truly terrific, if not terrifying. Not only does it contain “killer posts”, each medblogger has also been coupled to its preferred deadly Aussie critter.
Want to know how a full time ER-doctor/educator/textbook author/blogger/editor /health search engine director manages to complete work-related tasks …when the kids are either at school or asleep(!), then read this recent interview with Mike Cadogan, the founder of Life in the Fast Lane.

Don’t forget to submit your medical blog post to next weeks Grand Rounds over at Dispatch From Second Base. Instructions and theme details can be found on the post “You are invited to Grand Rounds!“ (update here).

And certainly don’t forget to submit your post related to medical information to the MedLibs Round (about medical information) here. More details can be found at Laika’s MedLibLog and at Highlight Health, the host of the upcoming Edition.
(sorry, writing this post took longer than I thought: you have one day left for submission)

Dr Shock of the blog with the same name advises us to submit good quality, easy-to-understand posts dealing with science, environment or medicine to Scientia Pro Publica via the blog carnival submission form.

There is a new on-line science blogging community – Scientopia, till now mostly consisting of bloggers who left Scienceblogs after (but not because of) Pepsigate. New members can only be added to the collective by invitation (?). Obviously, pepsi-researchers will not be invited, but it remains to be seen who will…  Hopefully it doesn’t become an elitist club.
Virginia Heffernan (NY-Times) has an outspoken opinion about the (ex-) sciencebloggers, illustrated by this one-liner

“ScienceBlogs has become Fox News for the religion-baiting, peak-oil crowd.”

Although I don’t appreciate the ranting-style of some of the blogs myself (the sub-“South Park” blasphemy style of PZ Myers, as Virginia puts it). I don’t think most Scienceblogs deserve to be labelled as “preoccupied with trivia, name-calling and saber rattling”.
See balanced responses at: NeurodojoNeuron Culture & Neuroanthropology (anything with neuro– makes sense, I guess).
Want to understand more about ScienceBlogs and why it was such a terrific community, then read Bora Z’s (rather long) ScienceBlog farewell post.

Oh.. and there is yet another new science blogging platform: http://www.labspaces.net/, that has evolved from a science news aggregator . It looks slick.

Social Media

Speaking about Twitter, did you know that  Twitter reached its 20 billionth tweet over the weekend, a milestone that came just a few months after hitting the 10 billion tweet mark!? (read more in the Guardian)

Well and if you have no idea WHAT THE FUCK IS MY SOCIAL MEDIA “STRATEGY”? you might click the link to get some (new) ideas. You probably need to refresh the site a couple of times to find the right answer.

First-year medical school and master’s of medicine students of Stanford University will receive an i-pad at the start of the year. The extremely tech-savvy Students do appreciate the gift:

“Especially in medicine, we’re using so many different resources, including all the syllabuses and slides. I’m able to pull them up and search them whenever I need to. It’s a fantastic idea.”

Good news for Facebook friends: VoIP giant Vonage has just introduced a new iPhone, iPod touch and Android app that allows users to call their Facebook friends for free (Mashable).

It was a shock – or wasn’t it – that Google pulled the plug on Google Wave (RRW), after being available to the general public for only 78 days?  The unparalleled tool that “could change the web”, but was too complex to be understood. Here are some thoughts why Google wave failed.  Since much of the Code is open source, ambitious developers may pick up where Google left.

Votes down for the social media site Digg.com: an undercover investigation has exposed that a group of influential conservative members were involved in censorship, deliberately trying to ban progressives, by “burying them” (voting down), which effectively means these progressives don’t get enough “digs” to reach the front page where most users spend their time.

Votes up for Healthcare Social Media Europe (#HCSMEU), which just celebrated its first birthday.

Miscellanous

A very strange move: a journal has changed a previously stated conclusion of a previously published paper after a Reuters Health story about serious shortcomings in the report. Read more about it at Gary Schwitzer’s HealthNewsReview Blog.

Finally for the EBM-addicts among us: The Center of Evidence Based Medicine released a new (downloadable) Levels of Evidence Table. At the CEBM-blog they stress that hierarchies of evidence have been somewhat inflexibly used, but are essentially a heuristic, or short-cut to finding the likely best evidence. At first sight the new Table looks simpler, and more easy to use.

Are you a Twitter user? Tweet this!





PubMed versus Google Scholar for Retrieving Evidence

8 06 2010

ResearchBlogging.orgA while ago a resident in dermatology told me she got many hits out of PubMed, but zero results out of TRIP. It appeared she had used the same search for both databases: alopecea areata and diphenciprone (a drug with a lot of synonyms). Searching TRIP for alopecea (in the title) only, we found a Cochrane Review and a relevant NICE guideline.

Usually, each search engine has is its own search and index features. When comparing databases one should compare “optimal” searches and keep in mind for what purpose the search engines are designed. TRIP is most suited to search aggregate evidence, whereas PubMed is most suited to search individual biomedical articles.

Michael Anders and Dennis Evans ignore this “rule of the thumb” in their recent paper “Comparison of PubMed and Google Scholar Literature Searches”. And this is not the only shortcoming of the paper.

The authors performed searches on 3 different topics to compare PubMed and Google Scholar search results. Their main aim was to see which database was the most useful to find clinical evidence in respiratory care.

Well quick guess: PubMed wins…

The 3 respiratory care topics were selected from a list of systematic reviews on the Website of the Cochrane Collaboration and represented in-patient care, out-patient care, and pediatrics.

The references in the three chosen Cochrane Systematic Reviews served as a “reference” (or “golden”) standard. However, abstracts, conference proceedings, and responses to letters were excluded.

So far so good. But note that the outcome of the study only allows us to draw conclusions about interventional questions, that seek to find controlled clinical trials. Other principles may apply to other domains (diagnosis, etiology/harm, prognosis ) or to other types of studies. And it certainly doesn’t apply to non-EBM-topics.

The authors designed ONE search for each topic, by taking 2 common clinical terms from the title of each Cochrane review connected by the Boolean operator “AND” (see Table, ” ” are not used). No synonyms were used and the translation of searches in PubMed wasn’t checked (luckily the mapping was rather good).

“Mmmmm…”

Topic

Search Terms

Noninvasive positive-pressure ventilation for cardiogenic pulmonary edema “noninvasive positive-pressure ventilation” AND “pulmonary edema”
Self-management education and regular practitioner review for adults with asthma “asthma” AND “education”
Ribavirin for respiratory syncytial virus “ribavirin” AND “respiratory syncytial virus”

In PubMed they applied the narrow methodological filter, or Clinical Query, for the domain therapy.
This prefab search strategy (randomized controlled trial[Publication Type] OR (randomized[Title/Abstract] AND controlled[Title/Abstract] AND trial[Title/Abstract]), developed by Haynes, is suitable to quickly detect the available evidence (provided one is looking for RCT’s and doesn’t do an exhaustive search). (see previous posts 2, 3, 4)

Google Scholar, as we all probably know, does not have such methodological filters, but the authors “limited” their search by using the Advanced option and enter the 2 search terms in the “Find articles….with all of the words” space (so this is a boolean “AND“) and they limited it the search to the subject area “Medicine, Pharmacology, and Veterinary Science”.

They did a separate search for publications that were available at their library, which has limited value for others, subscriptions being different for each library.

Next they determined the sensitivity (the number of relevant records retrieved as a proportion of the total number of records in the gold standard) and the precision or positive predictive value, the  fraction of returned positives that are true positives (explained in 3).

Let me guess: sensitivity might be equal or somewhat higher, and precision is undoubtedly much lower in Google Scholar. This is because (in) Google Scholar:

  • you can often search full text instead of just in the abstract, title and (added) keywords/MeSH
  • the results are inflated by finding one and the same references cited in many different papers (that might not directly deal with the subject).
  • you can’t  limit on methodology, study type or “evidence”
  • there is no automatic mapping and explosion (which may provide a way to find more synonyms and thus more relevant studies)
  • has a broader coverage (grey literature, books, more topics)
  • lags behind PubMed in receiving updates from MEDLINE

Results: PubMed and Google Scholar had pretty much the same recall, but for ribavirin and RSV the recall was higher in PubMed, PubMed finding 100%  (12/12) of the included trials, and Google Scholar 58% (7/12)

No discussion as to the why. Since Google Scholar should find the words in titles and abstracts of PubMed I repeated the search in PubMed but only in the title, abstract field, so I searched ribavirin[tiab] AND respiratory syncytial virus[tiab]* and limited it with the narrow therapy filter: I found 26 papers instead of 32. These titles were missing when I only searched title and abstract (between brackets: [relevant MeSH (reason why paper was found), absence of abstract (thus only title and MeSH) and letter], bold: why terms in title abstract are not found)

  1. Evaluation by survival analysis on effect of traditional Chinese medicine in treating children with respiratory syncytial viral pneumonia of phlegm-heat blocking Fei syndrome.
    [MesH:
    Respiratory Syncytial Virus Infections/]
  2. Ribavarin in ventilated respiratory syncytial virus bronchiolitis: a randomized, placebo-controlled trial.
    [MeSH:
    Respiratory Syncytial Virus Infections/[NO ABSTRACT, LETTER]
  3. Study of interobserver reliability in clinical assessment of RSV lower respiratory illness.
    [MeSH:Respiratory Syncytial Virus Infections*]
  4. Ribavirin for severe RSV infection. N Engl J Med.
    [MeSH: Respiratory Syncytial Viruses
    [NO ABSTRACT, LETTER]
  5. Stutman HR, Rub B, Janaim HK. New data on clinical efficacy of ribavirin.
    MeSH: Respiratory Syncytial Viruses
    [NO ABSTRACT]
  6. Clinical studies with ribavirin.
    MeSH: Respiratory Syncytial Viruses
    [NO ABSTRACT]

Three of the papers had the additional MeSH respiratory syncytial virus and the three others respiratory syncytial virus infections. Although not all papers (2 comments/letters) may be relevant, it illustrates why PubMed may yield results, that are not retrieved by Google Scholar (if one doesn’t use synonyms)

In Contrast to Google Scholar, PubMed translates the search ribavirin AND respiratory syncytial virus so that the MeSH-terms “ribavirin”, “respiratory syncytial viruses”[MeSH Terms] and (indirectly) respiratory syncytial virus infection”[MeSH] are also found.

Thus in Google Scholar articles with terms like RSV and respiratory syncytial viral pneumonia (or lack of specifications, like clinical efficacy) could have been missed with the above-mentioned search.

The other result of the study (the result section comprises 3 sentences) is that “For each individual search, PubMed had better precision”.

The Precision was 59/467 (13%) in PubMed and 57/80,730 (0.07%)  in Google Scholar (p<0.001)!!
(note: they had to add author names in the Google Scholar search to find the papers in the haystack 😉

Héhéhé, how surprising. Well why would it be that no clinician or librarian would ever think of using Google Scholar as the primary, let alone the only, source to search for medical evidence?
It should also ring a bell, that [QUOTE**]:
In the Cochrane reviews the researchers retrieved information from multiple databases, including MEDLINE, the Cochrane Airways Group trial register (derived from MEDLINE)***, CENTRAL, EMBASE, CINAHL, DARE, NHSEED, the Acute Respiratory Infections Group’s specialized register, and LILACS… ”
Note
Google Scholar isn’t mentioned as a source! Google Scholar is only recommendable to search for work citing (already found) relevant articles (this is called forward searching), if one hasn’t access to Web of Science or SCOPUS. Thus only to catch the last fish.

Perhaps the paper could have been more interesting if the authors had looked at any ADDED VALUE of Google Scholar, when exhaustively searching for evidence. Then it would have been crucial to look for grey literature too, (instead of excluding it), because this could be a possible strong point for Google Scholar. Furthermore one could have researched if forward searching yielded extra papers.

The specificity of PubMed is attributed to the used therapy-narrow filter, but the vastly lower specificity of Google Scholar is also due to the searching in the full text, including the reference lists.

For instance, searching for ribavirin AND respiratory syncytial virus in PubMed yields 523 hits. This can be reduced to 32 hits when applying the narrow therapy filter. This means a reduction by a factor of 16.
Yet a similar search in Google Scholar yield
4,080 hits. Thus without the filter there is still an almost 8 times higher yield from Google Scholar than from PubMed.

That evokes another  research idea: what would have happened if randomized (OR randomised) would have been added to the Google Scholar search? Would this have increased the specificity? In case of the above search it lowers the yield with a factor 2, and the first hits look very relevant.

It is really funny but the authors bring down their own conclusion that “These results are important because efficient retrieval of the best available scientific evidence can inform respiratory care protocols, recommendations for clinical decisions in individual patients, and education, while minimizing information overload.” by saying elsewhere that “It is unlikely that users consider more than the first few hundred search results, so RTs who conduct literature searches with Google Scholar on these topics will be much less likely to find references cited in Cochrane reviews.”

Indeed no one would take it into ones head to try to find the relevant papers out of those 4,080 hits retrieved. So what is this study worth from a practical point of view?

Well anyway, as you can ask for the sake of asking you can research for the sake of researching. Despite being an EBM-addict I prefer a good subjective overview on this topic over a weak scientific, quasi-evidence based, research paper.

Does this mean Google Scholar is useless? Does it mean that all those PhD’s hooked on Google Scholar are wrong?

No, Google Scholar serves certain purposes.

Just like the example of PubMed and TRIP, you need to know what is in it for you and how to use it.

I used Google Scholar when I was a researcher:

  • to quickly find a known reference
  • to find citing papers
  • to get an idea of how much articles have been cited/ find the most relevant papers in a quick and dirty way (i.e. by browsing)
  • for quick and dirty searches by putting words string between brackets.
  • to search full text. I used quite extensive searches to find out what methods were used (for instance methods AND (synonym1 or syn2 or syn3)). An interesting possibility is to do a second search for only the last few words (in a string). This will often reveal the next words in the sentence. Often you can repeat this trick, reading a piece of the paper without need for access.

If you want to know more about the pros and cons of Google Scholar I recommend the recent overview by the expert librarian Dean Giustini: “Sure Google Scholar is ideal for some things” [7]”. He also compiled a “Google scholar bibliography” with ~115 articles as of May 2010.

Speaking of librarians, why was the study performed by PhD RRT (RN)’s and wasn’t the university librarian involved?****

* this is a search string and more strict than respiratory AND syncytial AND virus
**
abbreviations used instead of full (database) names
*** this is wrong, a register contains references to controlled clinical trials from EMBASE, CINAHL and all kind of  databases in addition to MEDLINE.
****other then to read the manuscript afterwards.

References

  1. Anders ME, & Evans DP (2010). Comparison of PubMed and Google Scholar Literature Searches. Respiratory care, 55 (5), 578-83 PMID: 20420728
  2. This Blog: https://laikaspoetnik.wordpress.com/2009/11/26/adding-methodological-filters-to-myncbi/
  3. This Blog: https://laikaspoetnik.wordpress.com/2009/01/22/search-filters-1-an-introduction/
  4. This Blog: https://laikaspoetnik.wordpress.com/2009/06/30/10-1-pubmed-tips-for-residents-and-their-instructors/
  5. NeuroDojo (2010/05) Pubmed vs Google Scholar? [also gives a nice overview of pros and cons]
  6. GenomeWeb (2010/05/10) Content versus interface at the heart of Pubmed versus Scholar?/ [response to 5]
  7. The Search principle Blog (2010/05) Sure Google Scholar is ideal for some things.




An Evidence Pyramid that Facilitates the Finding of Evidence

20 03 2010

Earlier I described that there are so many search- and EBM-pyramids that it is confusing. I described  3 categories of pyramids:

  1. Search Pyramids
  2. Pyramids of EBM-sources
  3. Pyramids of EBM-levels (levels of evidence)

In my courses where I train doctors and medical students how to find evidence quickly, I use a pyramid that is a mixture of 1. and 2. This is a slide from a 2007 course.

This pyramid consists of 4 layers (from top down):

  1. EBM-(evidence based) guidelines.
  2. Synopses & Syntheses*: a synopsis is a summary and critical appraisal of one article, whereas synthesis is a summary and critical appraisal of a topic (which may answer several questions and may cover many articles).
  3. Systematic Reviews (a systematic summary and critical appraisal of original studies) which may or may not include a meta-analysis.
  4. Original Studies.

The upper 3 layers represent “Aggregate Evidence”. This is evidence from secondary sources, that search, summarize and critically appraise original studies (lowest layer of the pyramid).

The layers do not necessarily represent the levels of evidence and should not be confused with Pyramids of EBM-levels (type 3). An Evidence Based guideline can have a lower level of evidence than a good systematic review, for instance.
The present pyramid is only meant to lead the way in the labyrinth of sources. Thus, to speed up to process of searching. The relevance and the quality of evidence should always be checked.

The idea is:

  • The higher the level in the pyramid the less publications it contains (the narrower it becomes)
  • Each level summarizes and critically appraises the underlying levels.

I advice people to try to find aggregate evidence first, thus to drill down (hence the drill in the Figure).

The advantage: faster results, lower number to read (NNR).

During the first courses I gave, I just made a pyramid in Word with the links to the main sources.

Our library ICT department converted it into a HTML document with clickable links.

However, although the pyramid looked quite complex, not all main evidence sources were included. Plus some sources belong to different layers. The Trip Database for instance searches sources from all layers.

Our ICT-department came up with a much better looking and better functioning 3-D pyramid, with databases like TRIP in the sidebar.

Moving the  mouse over a pyramid layer invokes a pop-up with links to the databases belonging to that layer.

Furthermore the sources included in the pyramid differ per specialty. So for the department Gynecology we include POPLINE and MIDIRS in the lowest layer, and the RCOG and NVOG (Dutch) guidelines in the EBM-guidelines layer.

Together my colleagues and I decide whether a source is evidence based (we don’t include UpToDate for instance) and where it  belongs. Each clinical librarian (we all serve different departments) then decides which databases to include. Clients can give suggestions.

Below is a short You Tube video showing how this pyramid can be used. Because of the rather poor quality, the video is best to be viewed in full screen mode.
I have no audio (yet), so in short this is what you see:

Made with Screenr:  http://screenr.com/8kg

The pyramid is highly appreciated by our clients and students.

But it is just a start. My dream is to visualize the entire pathway from question to PICO, checklists, FAQs and database of results per type of question/reason for searching (fast question, background question, CAT etc.).

I’m just waiting for someone to fulfill the technical part of this dream.

————–

*Note that there may be different definitions as well. The top layers in the 5S pyramid of Bryan Hayes are defined as follows: syntheses & synopses (succinct descriptions of selected individual studies or systematic reviews, such as those found in the evidence-based journals), summaries, which integrate best available evidence from the lower layers to develop practice guidelines based on a full range of evidence (e.g. Clinical Evidence, National Guidelines Clearinghouse), and at the peak of the model, systems, in which the individual patient’s characteristics are automatically linked to the current best evidence that matches the patient’s specific circumstances and the clinician is provided with key aspects of management (e.g., computerised decision support systems).

Begin with the richest source of aggregate (pre-filtered) evidence and decline in order to to decrease the number needed to read: there are less EBM guidelines than there are Systematic Reviews and (certainly) individual papers.




#NotSoFunny #16 – Ridiculing RCTs & EBM

1 02 2010

I remember it well. As a young researcher I presented my findings in one of my first talks, at the end of which the chair killed my work with a remark, that made the whole room of scientists laugh, but was really beside the point. My supervisor, a truly original and very wise scientist, suppressed his anger. Afterwards, he said: “it is very easy ridiculing something that isn’t a mainstream thought. It’s the argument that counts. We will prove that we are right.” …And we did.

This was not my only encounter with scientists who try to win the debate by making fun of a theory, a finding or …people. But it is not only the witty scientist who is to *blame*, it is also the uncritical audience that just swallows it.

I have similar feelings with some journal articles or blog posts that try to ridicule EBM – or any other theory or approach. Funny, perhaps, but often misunderstood and misused by “the audience”.

Take for instance the well known spoof article in the BMJ:

“Parachute use to prevent death and major trauma related to gravitational challenge: systematic review of randomised controlled trials”

It is one of those Christmas spoof articles in the BMJ, meant to inject some medical humor into the normally serious scientific literature. The spoof parachute article pretends to be a Systematic Review of RCT’s  investigating if parachutes can prevent death and major trauma. Of course, no such trial has been done or will be done: dropping people at random with and without a parachute to proof that you better jump out of a plane with a parachute.

I found the article only mildly amusing. It is so unrealistic, that it becomes absurd. Not that I don’t enjoy absurdities at times, but  absurdities should not assume a live of their own.  In this way it doesn’t evoke a true discussion, but only worsens the prejudice some people already have.

People keep referring to this 2003 article. Last Friday, Dr. Val (with whom I mostly agree) devoted a Friday Funny post to it at Get Better Health: “The Friday Funny: Why Evidence-Based Medicine Is Not The Whole Story”.* In 2008 the paper was also discussed by Not Totally Rad [3]. That EBM is not the whole story seems pretty obvious to me. It was never meant to be…

But lets get specific. Which assumptions about RCT’s and SR’s are wrong, twisted or put out of context? Please read the excellent comments below the article. These often put the finger on the spot.

1. EBM is cookbook medicine.
Many define EBM as “make clinical decisions based on a synthesis of the best available evidence about a treatment.” (i.e. [3]). However, EBM is not cookbook medicine.

The accepted definition of EBM  is “the conscientious, explicit and judicious use of current best evidence in making decisions about the care of individual patients” [4]. Sacket already emphasized back in 1996:

Good doctors use both individual clinical expertise and the best available external evidence, and neither alone is enough. Without clinical expertise, practice risks becoming tyrannised by evidence, for even excellent external evidence may be inapplicable to or inappropriate for an individual patient. Without current best evidence, practice risks becoming rapidly out of date, to the detriment of patients.


2. RCT’s are required for evidence.

Although a well performed RCT provides the “best” evidence, RCT’s are often not appropriate or indicated. That is especially true for domains other than therapy. In case of prognostic questions the most appropriate study design is usually an inception cohort. A RCT for instance can’t tell whether female age is a prognostic factor for clinical pregnancy rates following IVF: there is no way to randomize for “age”, or for “BMI”. 😉

The same is true for etiologic or harm questions. In theory, the “best” answer is obtained by RCT. However RCT’s are often unethical or unnecessary. RCT’s are out of the question to address whether substance X causes cancer. Observational studies will do. Sometimes cases provide sufficient evidence. If a woman gets hepatic veno-occlusive disease after drinking loads of a herbal tea the finding of  similar cases in the literature may be sufficient to conclude that the herbal tea probably caused the disease.

Diagnostic accuracy studies also require another study design (cross-sectional study, or cohort).

But even in the case of  interventions, we can settle for less than a RCT. Evidence is not present or not, but exists on a hierarchy. RCT’s (if well performed) are the most robust, but if not available we have to rely on “lower” evidence.

BMJ Clinical Evidence even made a list of clinical questions unlikely to be answered by RCT’s. In this case Clinical Evidence searches and includes the best appropriate form of evidence.

  1. where there are good reasons to think the intervention is not likely to be beneficial or is likely to be harmful;
  2. where the outcome is very rare (e.g. a 1/10000 fatal adverse reaction);
  3. where the condition is very rare;
  4. where very long follow up is required (e.g. does drinking milk in adolescence prevent fractures in old age?);
  5. where the evidence of benefit from observational studies is overwhelming (e.g. oxygen for acute asthma attacks);
  6. when applying the evidence to real clinical situations (external validity);
  7. where current practice is very resistant to change and/or patients would not be willing to take the control or active treatment;
  8. where the unit of randomisation would have to be too large (e.g. a nationwide public health campaign); and
  9. where the condition is acute and requires immediate treatment.
    Of these, only the first case is categorical. For the rest the cut off point when an RCT is not appropriate is not precisely defined.

Informed health decisions should be based on good science rather than EBM (alone).

Dr Val [2]: “EBM has been an over-reliance on “methodolatry” – resulting in conclusions made without consideration of prior probability, laws of physics, or plain common sense. (….) Which is why Steve Novella and the Science Based Medicine team have proposed that our quest for reliable information (upon which to make informed health decisions) should be based on good science rather than EBM alone.

Methodolatry is the profane worship of the randomized clinical trial as the only valid method of investigation. This is disproved in the previous sections.

The name “Science Based Medicine” suggests that it is opposed to “Evidence Based Medicine”. At their blog David Gorski explains: “We at SBM believe that medicine based on science is the best medicine and tirelessly promote science-based medicine through discussion of the role of science and medicine.”

While this may apply to a certain extent to quack or homeopathy (the focus of SBM) there are many examples of the opposite: that science or common sense led to interventions that were ineffective or even damaging, including:

As a matter of fact many side-effects are not foreseen and few in vitro or animal experiments have led to successful new treatments.

At the end it is most relevant to the patient that “it works” (and the benefits outweigh the harms).

Furthermore EBM is not -or should not be- without consideration of prior probability, laws of physics, or plain common sense. To me SBM and EBM are not mutually exclusive.

Why the example is bullshit unfair and unrealistic

I’ll leave it to the following comments (and yes the choice is biased) [1]

Nibu A George,Scientist :

First of all generalizing such reports of some selected cases and making it a universal truth is unhealthy and challenging the entire scientific community. Secondly, the comparing the parachute scenario with a pure medical situation is unacceptable since the parachute jump is rather a physical situation and it become a medical situation only if the jump caused any physical harm to the person involved.

Richard A. Davidson, MD,MPH:

This weak attempt at humor unfortunately reinforces one of the major negative stereotypes about EBM….that RCT’s are required for evidence, and that observational studies are worthless. If only 10% of the therapies that are paraded in front of us by journals were as effective as parachutes, we would have much less need for EBM. The efficacy of most of our current therapies are only mildly successful. In fact, many therapies can provide only a 25% or less therapeutic improvement. If parachutes were that effective, nobody would use them.
While it’s easy enough to just chalk this one up to the cliche of the cantankerous British clinician, it shows a tremendous lack of insight about what EBM is and does. Even worse, it’s just not funny.

Aviel Roy-Shapira, Senior Staff Surgeon

Smith and Pell succeeded in amusing me, but I think their spoof reflects a common misconception about evidence based medicine. All too many practitioners equate EBM with randomized controlled trials, and metaanalyses.
EBM is about what is accepted as evidence, not about how the evidence is obtained. For example, an RCT which shows that a given drug lowers blood pressure in patients with mild hypertension, however well designed and executed, is not acceptable as a basis for treatment decisions. One has to show that the drug actually lowers the incidence of strokes and heart attacks.
RCT’s are needed only when the outcome is not obvious. If most people who fall from airplanes without a parachute die, this is good enough. There is plenty of evidence for that.

EBM is about using outcome data for making therapeutic decisions. That data can come from RCTs but also from observation

Lee A. Green, Associate Professor

EBM is not RCTs. That’s probably worth repeating several times, because so often both EBM’s detractors and some of its advocates just don’t get it. Evidence is not binary, present or not, but exists on a heirarchy (Guyatt & Rennie, 2001). (….)
The methods and rigor of EBM are nothing more or less than ways of correcting for our
imperfect perceptions of our experiences. We prefer, cognitively, to perceive causal connections. We even perceive such connections where they do not exist, and we do so reliably and reproducibly under well-known sets of circumstances. RCTs aren’t holy writ, they’re simply a tool for filtering out our natural human biases in judgment and causal attribution. Whether it’s necessary to use that tool depends upon the likelihood of such bias occurring.

Scott D Ramsey, Associate Professor

Parachutes may be a no-brainer, but this article is brainless.

Unfortunately, there are few if any parallels to parachutes in health care. The danger with this type of article is that it can lead to labeling certain medical technologies as “parachutes” when in fact they are not. I’ve already seen this exact analogy used for a recent medical technology (lung volume reduction surgery for severe emphysema). In uncontrolled studies, it quite literally looked like everyone who didn’t die got better. When a high quality randomized controlled trial was done, the treatment turned out to have significant morbidity and mortality and a much more modest benefit than was originally hypothesized.

Timothy R. Church, Professor

On one level, this is a funny article. I chuckled when I first read it. On reflection, however, I thought “Well, maybe not,” because a lot of people have died based on physicians’ arrogance about their ability to judge the efficacy of a treatment based on theory and uncontrolled observation.

Several high profile medical procedures that were “obviously” effective have been shown by randomized trials to be (oops) killing people when compared to placebo. For starters to a long list of such failed therapies, look at antiarrhythmics for post-MI arrhythmias, prophylaxis for T. gondii in HIV infection, and endarterectomy for carotid stenosis; all were proven to be harmful rather than helpful in randomized trials, and in the face of widespread opposition to even testing them against no treatment. In theory they “had to work.” But didn’t.

But what the heck, let’s play along. Suppose we had never seen a parachute before. Someone proposes one and we agree it’s a good idea, but how to test it out? Human trials sound good. But what’s the question? It is not, as the author would have you believe, whether to jump out of the plane without a parachute or with one, but rather stay in the plane or jump with a parachute. No one was voluntarily jumping out of planes prior to the invention of the parachute, so it wasn’t to prevent a health threat, but rather to facilitate a rapid exit from a nonviable plane.

Another weakness in this straw-man argument is that the physics of the parachute are clear and experimentally verifiable without involving humans, but I don’t think the authors would ever suggest that human physiology and pathology in the face of medication, radiation, or surgical intervention is ever quite as clear and predictable, or that non-human experience (whether observational or experimental) would ever suffice.

The author offers as an alternative to evidence-based methods the “common sense” method, which is really the “trust me, I’m a doctor” method. That’s not worked out so well in many high profile cases (see above, plus note the recent finding that expensive, profitable angioplasty and coronary artery by-pass grafts are no better than simple medical treatment of arteriosclerosis). And these are just the ones for which careful scientists have been able to do randomized trials. Most of our accepted therapies never have been subjected to such scrutiny, but it is breathtaking how frequently such scrutiny reveals problems.

Thanks, but I’ll stick with scientifically proven remedies.

parachute experiments without humans

* on the same day as I posted Friday Foolery #15: The Man who pioneered the RCT. What a coincidence.

** Don’t forget to read the comments to the article. They are often excellent.

Photo Credits

ReferencesResearchBlogging.org

  1. Smith, G. (2003). Parachute use to prevent death and major trauma related to gravitational challenge: systematic review of randomised controlled trials BMJ, 327 (7429), 1459-1461 DOI: 10.1136/bmj.327.7429.1459
  2. The Friday Funny: Why Evidence-Based Medicine Is Not The Whole Story”. (getbetterhealth.com) [2010.01.29]
  3. Call for randomized clinical trials of Parachutes (nottotallyrad.blogspot.com) [08-2008]
  4. Sackett DL, Rosenberg WM, Gray JA, Haynes RB, & Richardson WS (1996). Evidence based medicine: what it is and what it isn’t. BMJ (Clinical research ed.), 312 (7023), 71-2 PMID: 8555924
Reblog this post [with Zemanta]
are very well edged off




#Cochrane Colloquium 2009: Better Working Relationship between Cochrane and Guideline Developers

19 10 2009

singapore CCLast week I attended the annual Cochrane Colloquium in Singapore. I will summarize some of the meetings.

Here is a summary of an interesting (parallel) special session: Creating a closer working relationship between Cochrane and Guideline Developers. This session was brought together as a partnership between the Guidelines International Network (G-I-N) and The Cochrane Collaboration to look at the current experience of guideline developers and their use of Cochrane reviews (see abstract).

Emma Tavender of the EPOC Australian Satellite, Australia reported on the survey carried out by the UK Cochrane Centre to identify the use of Cochrane reviews in guidelines produced in the UK ) (not attended this presentation) .

Pwee Keng Ho, Ministry of Health, Singapore, is leading the Health Technology Assessment (HTA) and guideline development program of the Singapore Ministry of Health. He spoke about the issues faced as a guideline developer using Cochrane reviews or -in his own words- his task was: “to summarize whether guideline developers like Cochrane Systematic reviews or not” .

Keng Ho presented the results of 3 surveys of different guideline developers. Most surveys had very few respondents: 12-29 if I remember it well.

Each survey had approximately the same questions, but in a different order. On the face of it, the 3 surveys gave the same picture.

Main points:

  • some guideline developers are not familiar with Cochrane Systematic Reviews
  • others have no access to it.
  • of those who are familiar with the Cochrane Reviews and do have access to it, most found the Cochrane reviews useful and reliable. (in one survey half of the respondents were neutral)
  • most importantly they actually did use the Cochrane reviews for most of their guidelines.
  • these guideline developers also used the Cochrane methodology to make their guidelines (whereas most physicians are not inclined to use the exhaustive search strategies and systematic approach of the Cochrane Collaboration)
  • An often heard critique of Guideline developers concerned the non-comprehensive coverage of topics by Cochrane Reviews. However, unlike in Western countries, the Singapore minister of Health mentioned acupuncture and herbs as missing topics (for certain diseases).

This incomplete coverage caused by a not-demand driven choice of subjects was a recurrent topic at this meeting and a main issue recognized by the entire Cochrane Community. Therefore priority setting of Cochrane Systematic reviews is one of the main topics addressed at this Colloquium and in the Cochrane Strategic review.

Kay Dickersin of the US Cochrane Center, USA, reported on the issues raised at the stakeholders meeting held in June 2009 in the US (see here for agenda) on whether systematic reviews can effectively inform guideline development, with a particular focus on areas of controversy and debate.

The Stakeholder summit concentrated on using quality SR’s for guidelines. This is different from effectiveness research, for which the Institute of Medicine (IOM) sets the standards: local and specialist guidelines require a different expertise and approach.

All kinds of people are involved in the development of guidelines, i.e. nurses, consumers, physicians.
Important issues to address, point by point:

  • Some may not understand the need to be systematic
  • How to get physicians on board: they are not very comfortable with extensive searching and systematic work
  • Ongoing education, like how-to workshops, is essential
  • What to do if there is no evidence?
  • More transparency; handling conflicts of interest
  • Guidelines differ, including the rating of the evidence. Almost everyone in the Stakeholders meeting used GRADE to grade the evidence, but not as it was originally described. There were numerous variations on the same theme. One question is whether there should be one system or not.
  • Another -recurrent- issue was that Guidelines should be made actionable.

Here are podcasts covering the meeting

Gordon Guyatt, McMaster University, Canada, gave  an outline of the GRADE approach and the purpose of ‘Summary of Findings’ tables, and how both are perceived by Cochrane review authors and guideline developers.

Gordon Guyatt, whose magnificent book ” Users’ Guide to the Medical Literature”  (JAMA-Evidence) lies at my desk, was clearly in favor of adherence to the original Grade-guidelines. Forty organizations have adopted these Grade Guidelines.

Grade stands for “Grading of Recommendations Assessment, Development and Evaluation”  system. It is used for grading evidence when submitting a clinical guidelines article. Six articles in the BMJ are specifically devoted to GRADE (see here for one (full text); and 2 (PubMed)). GRADE not only takes the rigor of the methods  into account, but also the balance between the benefits and the risks, burdens, and costs.

Suppose  a guideline would recommend  to use thrombolysis to treat disease X, because a good quality small RCTs show thrombolysis to be slightly but significantly more effective than heparin in this disease. However by relying on only direct evidence from the RCT’s it isn’t taken into account that observational studies have long shown that thrombolysis enhances the risk of massive bleeding in diseases Y and Z. Clearly the risk of harm is the same in disease X: both benefits and harms should be weighted.
Guyatt gave several other examples illustrating the importance of grading the evidence and the understandable overview presented in the Summary of Findings Table.

Another issue is that guideline makers are distressingly ready to embrace surrogate endpoints instead of outcomes that are more relevant to the patient. For instance it is not very meaningful if angiographic outcomes are improved, but mortality or the recurrence of cardiovascular disease are not.
GRADE takes into account if indirect evidence is used: It downgrades the evidence rating.  Downgrading also occurs in case of low quality RCT’s or the non-trade off of benefits versus harms.

Guyatt pleaded for uniform use of GRADE, and advised everybody to get comfortable with it.

Although I must say that it can feel somewhat uncomfortable to give absolute rates to non-absolute differences. These are really man-made formulas, people agreed upon. On the other hand it is a good thing that it is not only the outcome of the RCT’s with respect to benefits (of sometimes surrogate markers) that count.

A final remark of Guyatt: ” Everybody makes the claim they are following evidence based approach, but you have to learn them what that really means.”
Indeed, many people talk about their findings and/or recommendations being evidence based, because “EBM sells well”, but upon closer examination many reports are hardly worth the name.

Reblog this post [with Zemanta]




UpToDate or Dynamed?

5 07 2009

Guest author: Shamsha Damani (@shamsha) ;
Submission for the July Medlib’s Round

Doctors and other healthcare providers are busy folks. They often don’t have time to go through all the primary literature, find the best evidence, critique it and apply it to their patients in real-time. This is where point-of-care resources shine and make life a bit easier. There are several such tools out there, but the two that I use on a regular basis are UpToDate and DynaMed. There are others like InfoPoems, ACP’s PIER, MD Consult and BMJ’s Point of Care. I often get asked which ones are the best to use and why. The librarian answer to this question: depends on what you are looking for! Not a fair answer I admit, so I wanted to highlight some pros and cons of UpToDate and DynaMed to help you better determine what route to take the next time you find yourself in need of a quick answer to a clinical question.

UpToDate

Pros:

  • Comprehensive coverage
  • Easy-to-read writing style
  • The introduction of grading the evidence is certainly very welcome!

Cons:

  • Expensive
  • Conflict of interest policy a bit perplexing
  • Search feature could use a makeover
  • Remote access at a high premium
  • Not accessible via smart phones
  • They didn’t come to MLA’09 this year and medical librarians felt snubbed (ok, that is not a con, just an observation!)

DynaMed

Pros:

  • Bulleted format is easy to read
  • Remote access part of subscription
  • No conflict of interest with authors
  • A lot of the evidence is graded
  • Accessible on PDAs (iPhones and Blackberries included!)

Cons:

  • The user interface is a bit 1990s and could use a makeover
  • The coverage is not as extensive yet, though they keep adding more topics

A lot has been written about UpToDate and DynaMed, both in PubMed as well as on various blogs. Jacqueline also did a fabulous post of the evidence-based-ness of UpToDate not too long ago. I used to think that I should pick one and stick to it, but have recently found myself re-thinking this attitude. I think that we need to keep in mind that these are point-of-care tools and should not be utilized as one’s only source of information. Use the tool to get an idea about current evidence and combine it with your own clinical judgment when needed at point-of-care. If suspicious, look up the primary literature the good old way by using MEDLINE or other such databases. A point-of-care database will get you started; however, it is not meant to be a one-stop-shop.

I can almost hear people saying: so which one do you prefer anyways? That’s like asking me if I prefer Coke or Pepsi. My honest answer: both! (databases as well as beverages!). So what is a busy clinician to do? If you have access to both (or more), spend some time playing with them and see which one you like. Everyone has a different searching and learning style and it is sometimes a matter of preference. DynaMed’s concise structure may be appealing to newbies, whereas seasoned clinicians may prefer UpToDate’s narrative approach. Based on my very unscientific observation of Twitter conversations, it appears that clinicians in general prefer UpToDate whereas librarians prefer DynaMed. Could this be because UpToDate markets heavily to clinicians and snubs librarians? Or could it be the price? Or could it be the age-old debate on what is evidence? I don’t know the answer, partly because I find it all a bit too political. I’ve seen healthcare providers often use Google or Wikipedia for medical answers, which is quite sad. If you are using either UpToDate or DynaMed (or another similar product), you have already graduated to the big leagues and are a true EBM player! So relax and don’t feel like you have to pick a side. I find myself using both on a regular basis; the degree of success I have with each can be gauged by my daily Twitter feed!

Shamsha Damani