Sugary Drinks as the Culprit in Childhood Obesity? a RCT among Primary School Children

24 09 2012

ResearchBlogging.org Childhood obesity is a growing health problem. Since 1980, the proportion of overweighted children has almost tripled in the USA:  nowadays approximately 17% of children and adolescents are obese.  (Source: cdc.gov [6])

Common sense tells me that obesity is the result of too high calory intake without sufficient physical activity.” - which is just what the CDC states. I’m not surprised that the CDC also mentions the greater availability of high-energy-dense foods and sugary drinks at home and at school as main reasons for the increased intake of calories among children.

In my teens I already realized that sugar in sodas were just “empty calories” and I replaced tonic and cola by low calory  Rivella (and omitted sugar from tea). When my children were young I urged the day care to restrain from routinely giving lemonade (often in vain).

I was therefore a bit surprised to notice all the fuss in the Dutch newspapers [NRC] [7] about a new Dutch study [1] showing that sugary drinks contributed to obesity. My first reaction was “Duhhh?!…. so what?”.

Also, it bothered me that the researchers had performed a RCT (randomized controlled trial) in kids giving one half of them sugar-sweetened drinks and the other half sugar-free drinks. “Is it ethical to perform such a scientific “experiment” in healthy kids?”, I wondered, “giving more than 300 kids 14 kilo sugar over 18 months, without them knowing it?”

But reading the newspaper and the actual paper[1], I found that the study was very well thought out. Also ethically.

It is true that the association between sodas and weight gain has been shown before. But these studies were either observational studies, where one cannot look at the effect of sodas in isolation (kids who drink a lot of sodas often eat more junk food and watch more television: so these other life style aspects may be the real culprit) or inconclusive RCT’s (i.e. because of low sample size). Weak studies and inconclusive evidence will not convince policy makers, organizations and beverage companies (nor schools) to take action.

As explained previously in The Best Study Design… For Dummies [8] the best way to test whether an intervention has a health effect is to do a  double blind RCT, where the intervention (in this case: sugary drinks) is compared to a control (drinks with artificial sweetener instead of sugar) and where the study participants, and direct researchers do not now who receives the  actual intervention and who the phony one.

The study of Katan and his group[1] was a large, double blinded RCT with a long follow-up (18 months). The researchers recruited 641 normal-weight schoolchildren from 8 primary schools.

Importantly, only children were included in the study that normally drank sugared drinks at school (see announcement in Dutch). Thus participation in the trial only meant that half of the children received less sugar during the study-period. The researchers would have preferred drinking water as a control, but to ensure that the sugar-free and sugar-containing drinks tasted and looked essentially the same they used an artificial sweetener as a control.

The children drank 8 ounces (250 ml) of a 104-calorie sugar-sweetened or no-calorie sugar-free fruit-flavoured drink every day during 18 months.  Compliance was good as children who drank the artificially sweetened beverages had the expected level of urinary sucralose (sweetener).

At the end of the study the kids in the sugar-free group gained a kilo less weight than their peers. They also had a significant lower BMI-increase and gained less body fat.

Thus, according to Katan in the Dutch newspaper NRC[7], “it is time to get rid of the beverage vending machines”. (see NRC [6]).

But does this research really support that conclusion and does it, as some headlines state [9]: “powerfully strengthen the case against soda and other sugary drinks as culprits in the obesity epidemic?”

Rereading the paper I wondered as to the reasons why this study was performed.

If the trial was meant to find out whether putting children on artificially sweetened beverages (instead of sugary drinks) would lead to less fat gain, then why didn’t the researchers do an  intention to treat (ITT) analysis? In an ITT analysis trial participants are compared–in terms of their final results–within the groups to which they were initially randomized. This permits the pragmatic evaluation of the benefit of a treatment policy.
Suppose there were more dropouts in the intervention group, that might indicate that people had a reason not to adhere to the treatment. Indeed there were many dropouts overall: 26% of the children had stopped consuming the drinks, 29% from the sugar-free group, and 22% from the sugar group.
Interestingly, the majority of the children who stopped drinking the cans because they no longer liked the drink (68/94 versus 45/70 dropouts in the sugar-free versus the sugar group).
Ànd children who correctly assumed that the sweetened drinks were “artificially sweetened” was 21% higher than expected by chance (correct identification was 3% lower in the sugar group).
Did some children stop using the non-sugary drinks because they found the taste less nice than usual or artificial? Perhaps.

This  might indicate that replacing sugar-drinks by artificially sweetened drinks might not be as effective in “practice”.

Indeed most of the effect on the main outcome, the differences in BMI-Z score (the number of standard deviations by which a child differs from the mean in the Netherland for his or her age or sex) was “strongest” after 6 months and faded after 12 months.

Mind you, the researchers did neatly correct for the missing data by multiple imputation. As long as the children participated in the study, their changes in body weight and fat paralleled those of children who finished the study. However, the positive effect of the earlier use of non-sugary drinks faded in children who went back to drinking sugary drinks. This is not unexpected, but it underlines the point I raised above: the effect may be less drastic in the “real world”.

Another (smaller) RCT, published in the same issue of the NEJM [2](editorial in[4]), aimed to test the effect of an intervention to cut the intake of sugary drinks in obese adolescents. The intervention (home deliveries of bottled water and diet drinks for one year) led to a significant reduction in mean BMI (body mass index), but not in percentage body fat, especially in Hispanic adolescents. However at one year follow up (thus one year after the intervention had stopped) the differences between the groups evaporated again.

But perhaps the trial was “just” meant as a biological-fysiological experiment, as Hans van Maanen suggested in his critical response in de Volkskrant[10].

Indeed, the data actually show that sugar in drinks can lead to a greater increase in obesity-related parameters (and vice versa). [avoiding the endless fructose-glucose debate [11].

In the media, Katan stresses the mechanistic aspects too. He claims that children who drank the sweetened drinks, didn’t compensate for the lower intake of sugars by eating more. In the NY-times he is cited as follows[12]: “When you change the intake of liquid calories, you don’t get the effect that you get when you skip breakfast and then compensate with a larger lunch…”

This seems a logic explanation, but I can’t find any substatation in the article.

Still “food intake of the children at lunch time, shortly after the morning break when the children have consumed the study drinks”, was a secondary outcome in the original protocol!! (see the nice comparison of the two most disparate descriptions of the trial design at clinicaltrials.gov [5], partly shown in the figure below).

“Energy intake during lunchtime” was later replaced by a ”sensory evaluation” (with questions like: “How satiated do you feel?”). The results, however were not reported in their current paper. That is also true for a questionnaire about dental health.

Looking at the two protocol versions I saw other striking differences. At 2009_05_28, the primary outcomes of the study are the children’s body weight (BMI z-score),waist circumference (replaced by waist to height), skin folds and bioelectrical impedance.
The latter three become secondary outcomes in the final draft. Why?

Click to enlarge (source Clinicaltrials.gov [5])

It is funny that although the main outcome is the BMI z score, the authors mainly discuss the effects on body weight and body fat in the media (but perhaps this is better understood by the audience).

Furthermore, the effect on weight is less then expected: 1 kilo instead of 2,3 kilo. And only a part is accounted for by loss in body fat: -0,55 kilo fat as measured by electrical impedance and -0,35 kilo as measured by changes in skinfold thickness. The standard deviations are enormous.

Look for instance at the primary end point (BMI z score) at 0 and 18 months in both groups. The change in this period is what counts. The difference in change between both groups from baseline is -0,13, with a P value of 0.001.

(data are based on the full cohort, with imputed data, taken from Table 2)

Sugar-free group : 0.06±1.00  [0 Mo]  –> 0.08±0.99 [18 Mo] : change = 0.02±0.41  

Sugar-group: 0.01±1.04  [0 Mo]  –> 0.15±1.06 [18 Mo] : change = 0.15±0.42 

Difference in change from baseline: −0.13 (−0.21 to −0.05) P = 0.001

Looking at these data I’m impressed by the standard deviations (replaced by standard errors in the somewhat nicer looking fig 3). What does a value of 0.01 ±1.04 represent? There is a looooot of variation (even though BMI z is corrected for age and sex). Although no statistical differences were found for baseline values between the groups the “eyeball test” tells me the sugar- group has a slight “advantage”. They seem to start with slightly lower baseline values (overall, except for body weight).

Anyway, the changes are significant….. But significance isn’t identical to relevant.

At a second look the data look less impressive than the media reports.

Another important point, raised by van Maanen[10], is that the children’s weight increases more in this study than in the normal Dutch population. 6-7 kilo instead of 3 kilo.

In conclusion, the study by the group of Katan et al is a large, unique, randomized trial, that looked at the effects of replacement of sugar by artificial sweeteners in drinks consumed by healthy school children. An effect was noticed on several “obesity-related parameters”, but the effects were not large and possibly don’t last after discontinuation of the trial.

It is important that a single factor, the sugar component in beverages is tested in isolation. This shows that sugar itself “does matter”. However, the trial does not show that sugary drinks are the main obesity  factor in childhood (as suggested in some media reports).

It is clear that the investigators feel very engaged, they really want to tackle the childhood obesity problem. But they should separate the scientific findings from common sense.

The cans fabricated for this trial were registered under the trade name Blikkie (Dutch for “Little Can”). This was to make sure that the drinks would never be sold by smart business guys using the slogan: “cans which have scientifically been proven to help to keep your child lean and healthy”.[NRC]

Still soft drink stakeholders may well argue that low calory drinks are just fine and that curbing sodas is not the magic bullet.

But it is a good start, I think.

Photo credits Cola & Obesity:  Melliegrunt Flikr [CC]

  1. de Ruyter JC, Olthof MR, Seidell JC, & Katan MB (2012). A Trial of Sugar-free or Sugar-Sweetened Beverages and Body Weight in Children. The New England journal of medicine PMID: 22998340
  2. Ebbeling CB, Feldman HA, Chomitz VR, Antonelli TA, Gortmaker SL, Osganian SK, & Ludwig DS (2012). A Randomized Trial of Sugar-Sweetened Beverages and Adolescent Body Weight. The New England journal of medicine PMID: 22998339
  3. Qi Q, Chu AY, Kang JH, Jensen MK, Curhan GC, Pasquale LR, Ridker PM, Hunter DJ, Willett WC, Rimm EB, Chasman DI, Hu FB, & Qi L (2012). Sugar-Sweetened Beverages and Genetic Risk of Obesity. The New England journal of medicine PMID: 22998338
  4. Caprio S (2012). Calories from Soft Drinks – Do They Matter? The New England journal of medicine PMID: 22998341
  5. Changes to the protocol http://clinicaltrials.gov/archive/NCT00893529/2011_02_24/changes
  6. Overweight and Obesity: Childhood obesity facts  and A growing problem (www.cdc.gov)
  7. NRC Wim Köhler Eén kilo lichter.NRC | Zaterdag 22-09-2012 (http://archief.nrc.nl/)
  8.  The Best Study Design… For Dummies (http://laikaspoetnik.wordpress.com)
  9. Studies point to sugary drinks as culprits in childhood obesity – CTV News (ctvnews.ca)
  10. Hans van Maanen. Suiker uit fris, De Volkskrant, 29 september 2012 (freely accessible at http://www.vanmaanen.org/)
  11. Sugar-Sweetened Beverages, Diet Coke & Health. Part I. (http://laikaspoetnik.wordpress.com)
  12. Roni Caryn Rabina. Avoiding Sugared Drinks Limits Weight Gain in Two Studies. New York Times, September 21, 2012




#EAHIL2012 CEC 1: Drupal for Librarians

5 07 2012

This week I’m blogging at (and mostly about) the 13th EAHIL conference in Brussels. EAHIL stands for European Association for Health Information and Libraries.

I already blogged about the second Continuing Education Course (CEC) I followed, but I followed a continuing education course at Mondays, one day earlier. That session was led by Patrice Chalon, who is a Knowledge Manager at KCE – Belgian Health Care Knowledge Centre.

The first part was theoretical and easy to follow. Unfortunately there were quite a few mishaps with the practical part (some people could not install the program via the USB-stick, parts of the website were deleted and the computers were slow), but the entire session was instructive anyway. Even though I was about the only person (of 6) lacking CMS or HTML knowledge (but rereading the course abstract I now realize that was a prerequisite….)

Drupal is a freely available, easy to use,  modular content management system (CMS), for which you don’t need to have extensive programming (or HTML) experience.

Drupal was created by a Belgium student (Dries Buytaert) in 2000. It evolved from drop.org (small news site with build-in web board to share news among friends)  to Drupal (pronounced as “droo-puhl”, derived from the English pronunciation of the Dutch word “Druppel” which means “drop”). The purpose was to enable others to use and extend the experimentation platform so that more people could develop it further.

Drupal.org is a well established and active community with over 630,000 subscribed members.

This web application makes use of PHP as a programming language and MySQL as a database backend.

In Drupal every “page” is a node. You can define as many nodes as you need (news, page, event etc) and create “child” pages if you like (and move them to another parent page if necessary).

The editing function is easy: you can easily edit the format without needing HTML (looks quite like WordPress) and add files as if were email. Therefore it could easily have a wiki function as well.

Drupal is fitted with a very good taxonomy system. This helps to organize nodes and menus.

Nodes, account registration and maintenance, menu management, and system administration all are basic features of  the standard release of Drupal, known as Drupal core.

But thanks to the large community, Drupal benefits from thousands of third party modules, to tailor Drupal to your needs. When choosing modules it is important to check for longevity (are modules still being adapted for new Drupal releases, how many downloads are there: the more downloads the more popular the module, the more likely the module is going to stay).

There are also different themes.

Drupal is used a lot by libraries and libraries in turn have developed specific modules apt to use for library-purposes.

The view-module enables you to provide a view of the metadata and you can use metadata as a filter to create lists. Patrick was very enthusiastic about the bibliographic function (“the ENDNOTE within the context management system). He showed that it was very easy to import and search for bibliographic records (and metadata) from PubMed, Google Scholar etc (and maintain correct links over time), i.e. just enter the PMID, DOI lookup etc. Keywords like MeSH are loaded correctly.

Forgive me if I don’t remember (and even may be wrong about) the technical details, but it really looked like a great tool with many possible forms of  uses.

If you need more information you can contact Patrick (Twitter: @pchalon) or consult Drupal and especially the Drupal Group “Libraries”  and Drupalib.

And as said, there is a large active community. For Drupal’s motto is “Come for the software, stay for the community.”

Examples of Drupal Websites:
 http://www.cochrane.org/. The new face of the Cochrane was created by its webmaster Chris Mavergames, and it is far more inviting to read and more interactive then it’s boring predecessor. As a matter of fact it was Chris’ enthusiasm about Drupal and the new looks of the Cochrane site which raised my interest into Drupal. Chris has a website about Drupal (& web development, linked data & information architecture in general) and a Twitter list of Drupal folks you can follow.

Another example is http://htai.org/vortal, created by Patrick. Here is a presentation by Patrick that shows more details about this website (and Drupal’s versatility to create library websites).

This blogpost is largely based on the comprehensive course notes of Patrick Chalon’s “Drupal for Librarians” (CC), supplemented with my own notes.





#EAHIL2012 CEC 2: Visibility & Impact – Library’s New Role to Enhance Visibility of Researchers

4 07 2012

This week I’m blogging at (and mostly about) the 13th EAHIL conference in Brussels. EAHIL stands for European Association for Health Information and Libraries.

The second Continuing Education Course (CEC) I followed was given by Tiina Heino and Katri Larmo of the Terkko Meilahti Campus Library at the University of Helsinki in Finland.

The full title of the course was Visibility and impact – library’s new role: How the library can support the researcher to get visibility and generate impact to researcher’s work. You can read the abstract here.

The hands-on workshop mainly concentrated on the social bookmarking sites ConnoteaMendeley and Altmetric.

Furthermore we got information on CiteULike, ORCID,  Faculty of 1000 Posters and Pinterest. Also services developed in Terkko, such as ScholarChart and TopCited Articles, were shortly demonstrated.

What I especially liked in the hands on session is that the tutors had prepared a wikispace with all the information and links on the main page ( https://visibility2012.wikispaces.com) and a separate page for each participant to edit (here is my page). You could add links to your created accounts and embed widgets for Mendeley.

There was sufficient time to practice and try the tools. And despite the great number of participants there was ample room for questions (& even for making a blog draft ;) ).

The main message of the tutors is that the process of publishing scientific research doesn’t end at publishing the article: it is equally important what happens after the research has been published. Visibility and impact in the scientific community and in the society are  crucial  for making the research go forward as well as for getting research funding and promoting the researcher’s career. The Fig below (taken from the presentation) visualizes this process.

The tutors discussed ORCID, Open Researcher and contributor ID, that will be introduced later this year. It is meant to solve the author name ambiguity problem in scholarly communication by central registry of unique identifiers for each author (because author names can’t be used to reliably identify all scholarly author). It will be possible for authors to create, manage and share their ORCID record without membership fee. For further information see several publications and presentations by Martin Fenner. I found this one during the course while browsing Mendeley.

Once published the author’s work can be promoted using bookmarking tools, like CiteULike, Connotea and Mendeley. You can easily registrate for Connotea and Mendeley using your Facebook account. These social bookmarking tools are also useful for networking, i.e. to discover individuals and groups with the same field of interest. It is easy to synchronize your Mendeley with your CiteULike account.

Mendeley is available in a desktop and a web version. The web version offers a public profile for researchers, a catalog of documents, and collaborative groups (the cloud of Mendeley). The desktop version of Mendeley is specially suited for reference management and organizing your PDF’s. That said Mendeley seems most suitable for serendipitous use (clicking and importing a reference you happen to see and like) and less useful for managing and deduplicating large numbers of records, i.e. for a systematic review.
Also (during the course) it was not possible to import several PubMed records at once in either CiteULike or Mendeley.

What stroke me when I tried Mendeley is that there were many small or dead groups. A search for “cochrane”  for instance yielded one large group Cochrane QES Register, owned by Andrew Booth, and 3 groups with one member (thus not really a group), with 0 (!) to 6 papers each! It looks like people are trying Mendeley and other tools just for a short while. Indeed, most papers I looked up in PubMed were not bookmarked at all. It makes you wonder how widespread the use of these bookmarking tools is. It probably doesn’t help that there are so many tools with different purposes and possibilities.

Another tool that we tried was Altmetric. This is a free bookmarklet on scholarly articles which allows you to track the conversations around scientific articles online. It shows the tweets, blogposts, Google+ and Facebook mentions, and the numbers of bookmarks on Mendeley, CiteULike and Connotea.

I tried the tool on a paper I blogged about , ie. Seventy-Five Trials and Eleven Systematic Reviews a Day: How Will We Ever Keep Up?

The bookmarklet showed the tweets and the blogposts mentioning the paper.

Indeed altmetrics did correctly refer to my blog (even to 2 posts).

I liked altmetrics*, but saying that it is suitable for scientific metrics is a step too far. For people interested in this topic I would like to refer -again- to a post of Martin Fenner on altmetrics (in general).  He stresses that “usage metrics”  has its limitations because of its proness  to “gaming” (cheating).

But the current workshop didn’t address the shortcomings of the tools, for it was meant as a first practical acquaintance with the web 2.0 tools.

For the other tools (Faculty of 1000 Posters, Pinterest) and the services developed in Terkko, such as ScholarChart and TopCited Articles,  see the wikipage and the presentation:

*Coincidentally I’m preparing a post on handy chrome extensions to look for tweets about a webpage. Altmetric is another tool which seems very suitable for this purpose

Related articles





Friday Foolery #51 Statistically Funny

1 06 2012

Epidemiologists, people working in the EBM field and, above all, statisticians are said to have no sense of humor.*

Hilda Bastian is a clear exception to this rule.

I met Hilda a few years ago at a Cochrane colloquium. At that time she was working as a consumer advocate in Australia. Nowadays she is editor and curator of PubMed Health. According to her Twitter Bio (she tweets as @hildabast) she is (still) “Interested in effective communication as well as effective health care”. She also writes important articles, like “Seventy-Five Trials and Eleven Systematic Reviews a Day: How Will We Ever Keep Up? (PLOS 2010), reviewed at this blog.

Today I learned she also has a great creative talent in cartoon drawing, in the field of …  yeah… EBM, epidemiology & statistics.

Below is one of her cartoons, which fits in well with a recent post in the BMJ by Ray Moynihan, retweeted by Hilda: Preventing overdiagnosis: how to stop harming the healthy. In her post she refers to another article: Overdiagnosis in cancer (JNCI 2010), saying:

“Finding and aggressively treating non-symptomatic disease that would never have made people sick, inventing new conditions and re-defining the thresholds for old ones: will there be anyone healthy left at all?”

I invite you to go and visit Hilda’s blog Statistically funny (Commenting on the science of unbiased health research with cartoons) and to enjoy her cartoons, that are often inspired by recent publications in the field.

* My post #NotSoFunny #16: ridiculing RCTs and EBM even led David Rind to sigh that “EBM folks are not necessarily known for their great senses of humor”. (so I’m no exception to the rule ;)





Even the Scientific American Blog Links to Spammy Online Education Affiliate Sites…

28 05 2012

On numerous occasions [1,2,3] I have warned against top Twitter and Blog lists spread by education affiliate sites.
Sites like accreditedonlinecolleges.comonlinecolleges.com, onlinecollegesusa.org, onlinedegrees.com, mbaonline.com.

While some of the published Twitter Top 50 lists and Blog top 100 lists may be interesting as such (or may flatter you if you’re on it), the only intention of the makers is to lure you to their site and earn money through click-throughs.

Or as David Bradley from Sciencebase said it much more eloquently than I could:
(in a previous comment) 

“I get endless emails from people with these kinds of sites telling me I am on such and such a list…I even get different messages claiming to be from different people, but actually the same email address.They’re splogs and link bait scams almost always and unfortunately some people get suckered into linking to them, giving them credence and publicity. They’re a pain in the ‘arris.

These education sites do not only produce these “fantabulous” top 50 and 100 lists.
I also receive many requests for guest-authorships, and undoubtedly I’m not the only one.

Recently I also received a request from mbaonlinedegrees to post an infographic:

While searching for resources about the internet, I came across your site and noticed that you had posted the ‘State of the Internet’ video. I wanted to reach out as I have an infographic about the topic that I think would be a great fit for your site.”

But this mba.onlinedegrees infographic was a simple, yes even simplistic, summary of “a day at the internet”:

How many emails are sent, blog posts are made, how many people visit Facebook and how many updates are updated, and so forth and so on. Plus: Internet users spend 14.6 minutes viewing porn online: the average fap session is 12 minutes…
(How would they know?)

Anyway not the kind of information my readers are looking for. So I didn’t write a post with the embedding the code for the infographic.

Thus these online education affiliate sites produce top 50 and 100 lists, blogposts, guestposts and infographics and promote their use by actively approaching bloggers and people on Twitter.

I was surprised to find¹, however that even the high quality Scientific American science blog Observations (Opinion, arguments & analyses from the editors of Scientific American) blindly linked to such a spammy infographic (just adding a short meaningless introduction) [4].

That is an easy way to increase the numbers of blog posts….

And according to an insider commenting to the article the actual information in the infographic is even simply wrong.

“These MBAs have a smaller brain than accountants. They don’t know the difference between asset, revenue and income”.

If such a high authority science blog does not know to separate the wheat from the chaff, does not recognize splogs as such, and does not even (at the very least) filter and track the information offered, …. than who can…. who will….?³

Sometimes I feel like a miniature version of Don Quixote…

————-

NOTES

1.  HATTIP:

Again, @Nutsci brought this to my attention:

2. In response to my post @AdamMerberg tweeted a link to a very interesting article in the Atlantic by Megan McArdle issuing a plea to bloggers to help stop this plague in its track. (i.e. saying:  The reservoir of this disease of erroneous infographics is internet marketers who don’t care whether the information in their graphics is right … just so long as you link it.). She even uses an infographic herself to deliver her message. Highly recommended!

3. This doesn’t mean that Scientific American doesn’t produce good blog posts or good scientific papers. Just the other day, I tweeted:

The referred article Scientific American puts a new meta-analysis of statins and an accompanying editorial in the Lancet in broader perspective. The meta-analysis suggests that healthy people over 50 should take cholesterol-lowering drugs as a preventative measure. Scientific American questions this by also addressing the background risks (low for most 50+ people), possible risks of statin use, cost-effectiveness and the issue of funding by pharmaceutical companies and other types of bias.

References

  1. Health and Science Twitter & Blog Top 50 and 100 Lists. How to Separate the Wheat from the Chaff. (laikaspoetnik.wordpress.com)
  2. Beware of Top 50 “Great Tools to Double Check your Doctor” or whatever Lists. (laikaspoetnik.wordpress.com)
  3. Vanity is the Quicksand of Reasoning: Beware of Top 100 and 50 lists! ((laikaspoetnik.wordpress.com)
  4. What’s Smaller than Mark Zuckerberg? (blogs.scientificamerican.com/observations/)




The Scatter of Medical Research and What to do About it.

18 05 2012

ResearchBlogging.orgPaul Glasziou, GP and professor in Evidence Based Medicine, co-authored a new article in the BMJ [1]. Similar to another paper [2] I discussed before [3] this paper deals with the difficulty for clinicians of staying up-to-date with the literature. But where the previous paper [2,3] highlighted the mere increase in number of research articles over time, the current paper looks at the scatter of randomized clinical trials (RCTs) and systematic reviews (SR’s) accross different journals cited in one year (2009) in PubMed.

Hofmann et al analyzed 7 specialties and 9 sub-specialties, that are considered the leading contributions to the burden of disease in high income countries.

They followed a relative straightforward method for identifying the publications. Each search string consisted of a MeSH term (controlled  term) to identify the selected disease or disorders, a publication type [pt] to identify the type of study, and the year of publication. For example, the search strategy for randomized trials in cardiology was: “heart diseases”[MeSH] AND randomized controlled trial[pt] AND 2009[dp]. (when searching “heart diseases” as a MeSH, narrower terms are also searched.) Meta-analysis[pt] was used to identify systematic reviews.

Using this approach Hofmann et al found 14 343 RCTs and 3214 SR’s published in 2009 in the field of the selected (sub)specialties. There was a clear scatter across journals, but this scatter varied considerably among specialties:

“Otolaryngology had the least scatter (363 trials across 167 journals) and neurology the most (2770 trials across 896 journals). In only three subspecialties (lung cancer, chronic obstructive pulmonary disease, hearing loss) were 10 or fewer journals needed to locate 50% of trials. The scatter was less for systematic reviews: hearing loss had the least scatter (10 reviews across nine journals) and cancer the most (670 reviews across 279 journals). For some specialties and subspecialties the papers were concentrated in specialty journals; whereas for others, few of the top 10 journals were a specialty journal for that area.
Generally, little overlap occurred between the top 10 journals publishing trials and those publishing systematic reviews. The number of journals required to find all trials or reviews was highly correlated (r=0.97) with the number of papers for each specialty/ subspecialty.”

Previous work already suggested that this scatter of research has a long tail. Half of the publications is in a minority of papers, whereas the remaining articles are scattered among many journals (see Fig below).

Click to enlarge en see legends at BMJ 2012;344:e3223 [CC]

The good news is that SRs are less scattered and that general journals appear more often in the top 10 journals publishing SRs. Indeed for 6 of the 7 specialties and 4 of the 9 subspecialties, the Cochrane Database of Systematic Reviews had published the highest number of systematic reviews, publishing between 6% and 18% of all the systematic reviews published in each area in 2009. The bad news is that even keeping up to date with SRs seems a huge, if not impossible, challenge.

In other words, it is not sufficient for clinicians to rely on personal subscriptions to a few journals in their specialty (which is common practice). Hoffmann et al suggest several solutions to help clinicians cope with the increasing volume and scatter of research publications.

  • a central library of systematic reviews (but apparently the Cochrane Library fails to fulfill such a role according to the authors, because many reviews are out of date and are perceived as less clinically relevant)
  • registry of planned and completed systematic reviews, such as prospero. (this makes it easier to locate SRs and reduces bias)
  • Synthesis of Evidence and synopses, like the ACP-Jounal Club which summarizes the best evidence in internal medicine
  • Specialised databases that collate and critically appraise randomized trials and systematic reviews, like www.pedro.org.au for physical therapy. In my personal experience, however, this database is often out of date and not comprehensive
  • Journal scanning services like EvidenceUpdates from mcmaster.ca), which scans over 120 journals, filters articles on the basis of quality, has practising clinicians rate them for relevance and newsworthiness, and makes them available as email alerts and in a searchable database. I use this service too, but besides that not all specialties are covered, the rating of evidence may not always be objective (see previous post [4])
  • The use of social media tools to alert clinicians to important new research.

Most of these solutions are (long) existing solutions that do not or only partly help to solve the information overload.

I was surprised that the authors didn’t propose the use of personalized alerts. PubMed’s My NCBI feature allows to create automatic email alerts on a topic and to subscribe to electronic tables of contents (which could include ACP journal Club). Suppose that a physician browses 10 journals roughly covering 25% of the trials. He/she does not need to read all the other journals from cover to cover to avoid missing one potentially relevant trial. Instead it is far more efficient to perform a topic search to filter relevant studies from journals that seldom publish trials on the topic of interest. One could even use the search of Hoffmann et al to achieve this.* Although in reality, most clinical researchers will have narrower fields of interest than all studies about endocrinology and neurology.

At our library we are working at creating deduplicated, easy to read, alerts that collate table of contents of certain journals with topic (and author) searches in PubMed, EMBASE and other databases. There are existing tools that do the same.

Another way to reduce the individual work (reading) load is to organize journals clubs or even better organize regular CATs (critical appraised topics). In the Netherlands, CATS are a compulsory item for residents. A few doctors do the work for many. Usually they choose topics that are clinically relevant (or for which the evidence is unclear).

The authors shortly mention that their search strategy might have missed  missed some eligible papers and included some that are not truly RCTs or SRs, because they relied on PubMed’s publication type to retrieve RCTs and SRs. For systematic reviews this may be a greater problem than recognized, for the authors have used meta-analyses[pt] to identify systematic reviews. Unfortunately PubMed has no publication type for systematic reviews, but it may be clear that there are many more systematic reviews that meta-analyses. Possibly systematical reviews might even have a different scatter pattern than meta-analyses (i.e. the latter might be preferentially included in core journals).

Furthermore not all meta-analyses and systematic reviews are reviews of RCTs (thus it is not completely fair to compare MAs with RCTs only). On the other hand it is a (not discussed) omission of this study, that only interventions are considered. Nowadays physicians have many other questions than those related to therapy, like questions about prognosis, harm and diagnosis.

I did a little imperfect search just to see whether use of other search terms than meta-analyses[pt] would have any influence on the outcome. I search for (1) meta-analyses [pt] and (2) systematic review [tiab] (title and abstract) of papers about endocrine diseases. Then I subtracted 1 from 2 (to analyse the systematic reviews not indexed as meta-analysis[pt])

Thus:

(ENDOCRINE DISEASES[MESH] AND SYSTEMATIC REVIEW[TIAB] AND 2009[DP]) NOT META-ANALYSIS[PT]

I analyzed the top 10/11 journals publishing these study types.

This little experiment suggests that:

  1. the precise scatter might differ per search: apparently the systematic review[tiab] search yielded different top 10/11 journals (for this sample) than the meta-analysis[pt] search. (partially because Cochrane systematic reviews apparently don’t mention systematic reviews in title and abstract?).
  2. the authors underestimate the numbers of Systematic Reviews: simply searching for systematic review[tiab] already found appr. 50% additional systematic reviews compared to meta-analysis[pt] alone
  3. As expected (by me at last), many of the SR’s en MA’s were NOT dealing with interventions, i.e. see the first 5 hits (out of 108 and 236 respectively).
  4. Together these findings indicate that the true information overload is far greater than shown by Hoffmann et al (not all systematic reviews are found, of all available search designs only RCTs are searched).
  5. On the other hand this indirectly shows that SRs are a better way to keep up-to-date than suggested: SRs  also summarize non-interventional research (the ratio SRs of RCTs: individual RCTs is much lower than suggested)
  6. It also means that the role of the Cochrane Systematic reviews to aggregate RCTs is underestimated by the published graphs (the MA[pt] section is diluted with non-RCT- systematic reviews, thus the proportion of the Cochrane SRs in the interventional MAs becomes larger)

Well anyway, these imperfections do not contradict the main point of this paper: that trials are scattered across hundreds of general and specialty journals and that “systematic reviews” (or meta-analyses really) do reduce the extent of scatter, but are still widely scattered and mostly in different journals to those of randomized trials.

Indeed, personal subscriptions to journals seem insufficient for keeping up to date.
Besides supplementing subscription by  methods such as journal scanning services, I would recommend the use of personalized alerts from PubMed and several prefiltered sources including an EBM search machine like TRIP (www.tripdatabase.com/).

*but I would broaden it to find all aggregate evidence, including ACP, Clinical Evidence, syntheses and synopses, not only meta-analyses.

**I do appreciate that one of the co-authors is a medical librarian: Sarah Thorning.

References

  1. Hoffmann, Tammy, Erueti, Chrissy, Thorning, Sarah, & Glasziou, Paul (2012). The scatter of research: cross sectional comparison of randomised trials and systematic reviews across specialties BMJ, 344 : 10.1136/bmj.e3223
  2. Bastian, H., Glasziou, P., & Chalmers, I. (2010). Seventy-Five Trials and Eleven Systematic Reviews a Day: How Will We Ever Keep Up? PLoS Medicine, 7 (9) DOI: 10.1371/journal.pmed.1000326
  3. How will we ever keep up with 75 trials and 11 systematic reviews a day (laikaspoetnik.wordpress.com)
  4. Experience versus Evidence [1]. Opioid Therapy for Rheumatoid Arthritis Pain. (laikaspoetnik.wordpress.com)




Can Guidelines Harm Patients?

2 05 2012

ResearchBlogging.orgRecently I saw an intriguing “personal view” in the BMJ written by Grant Hutchison entitled: “Can Guidelines Harm Patients Too?” Hutchison is a consultant anesthetist with -as he calls it- chronic guideline fatigue syndrome. Hutchison underwent an acute exacerbation of his “condition” with the arrival of another set of guidelines in his email inbox. Hutchison:

On reviewing the level of evidence provided for the various recommendations being offered, I was struck by the fact that no relevant clinical trials had been carried out in the population of interest. Eleven out of 25 of the recommendations made were supported only by the lowest levels of published evidence (case reports and case series, or inference from studies not directly applicable to the relevant population). A further seven out of 25 were derived only from the expert opinion of members of the guidelines committee, in the absence of any guidance to be gleaned from the published literature.

Hutchison’s personal experience is supported by evidence from two articles [2,3].

One paper published in the JAMA 2009 [2] concludes that ACC/AHA (American College of Cardiology and the American Heart Association) clinical practice guidelines are largely developed from lower levels of evidence or expert opinion and that the proportion of recommendations for which there is no conclusive evidence is growing. Only 314 recommendations of 2711 (median, 11%) are classified as level of evidence A , thus recommendation based on evidence from multiple randomized trials or meta-analyses.  The majority of recommendations (1246/2711; median, 48%) are level of evidence C, thus based  on expert opinion, case studies, or standards of care. Strikingly only 245 of 1305 class I recommendations are based on the highest level A evidence (median, 19%).

Another paper, published in Ann Intern Med 2011 [3], reaches similar conclusions analyzing the Infectious Diseases Society of America (IDSA) Practice Guidelines. Of the 4218 individual recommendations found, only 14% were supported by the strongest (level I) quality of evidence; more than half were based on level III evidence only. Like the ACC/AHH guidelines only a small part (23%) of the strongest IDSA recommendations, were based on level I evidence (in this case ≥1 randomized controlled trial, see below). And, here too, the new recommendations were mostly based on level II and III evidence.

Although there is little to argue about Hutchison’s observations, I do not agree with his conclusions.

In his view guidelines are equivalent to a bullet pointed list or flow diagram, allowing busy practitioners to move on from practice based on mere anecdote and opinion. It therefore seems contradictory that half of the EBM-guidelines are based on little more than anecdote (case series, extrapolation from other populations) and opinion. He then argues that guidelines, like other therapeutic interventions, should be considered in terms of balance between benefit and risk and that the risk  associated with the dissemination of poorly founded guidelines must also be considered. One of those risks is that doctors will just tend to adhere to the guidelines, and may even change their own (adequate) practice  in the absence of any scientific evidence against it. If a patient is harmed despite punctilious adherence to the guideline-rules,  “it is easy to be seduced into assuming that the bad outcome was therefore unavoidable”. But perhaps harm was done by following the guideline….

First of all, overall evidence shows that adherence to guidelines can improve patient outcome and provide more cost effective care (Naveed Mustfa in a comment refers to [4]).

Hutchinson’s piece is opinion-based and rather driven by (understandable) gut feelings and implicit assumptions, that also surround EBM in general.

  1. First there is the assumption that guidelines are a fixed set of rules, like a protocol, and that there is no room for preferences (both of the doctor and the patient), interpretations and experience. In the same way as EBM is often degraded to “cookbook medicine”, EBM guidelines are turned into mere bullet pointed lists made by a bunch of experts that just want to impose their opinions as truth.
  2. The second assumption (shared by many) is that evidence based medicine is synonymous with “randomized controlled trials”. In analogy, only those EBM guideline recommendations “count” that are based on RCT’s or meta-analyses.

Before I continue, I would strongly advice all readers (and certainly all EBM and guideline-skeptics) to read this excellent and clearly written BJM-editorial by David Sackett et al. that deals with misconceptions, myths and prejudices surrounding EBM : Evidence based medicine: what it is and what it isn’t [5].

Sackett et al define EBM as “the conscientious, explicit and judicious use of current best evidence in making decisions about the care of individual patients” [5]. Sackett emphasizes that “Good doctors use both individual clinical expertise and the best available external evidence, and neither alone is enough. Without clinical expertise, practice risks becoming tyrannised by evidence, for even excellent external evidence may be inapplicable to or inappropriate for an individual patient. Without current best evidence, practice risks becoming rapidly out of date, to the detriment of patients.”

Guidelines are meant to give recommendations based on the best available evidence. Guidelines should not be a set of rules, set in stone. Ideally, guidelines have gathered evidence in a transparent way and make it easier for the clinicians to grasp the evidence for a certain procedure in a certain situation … and to see the gaps.

Contrary to what many people think, EBM is not restricted to randomized trials and meta-analyses. It involves tracking down the best external evidence there is. As I explained in #NotSoFunny #16 – Ridiculing RCTs & EBM, evidence is not an all-or-nothing thing: RCT’s (if well performed) are the most robust, but if not available we have to rely on “lower” evidence (from cohort to case-control to case series or expert opinion even).
On the other hand RCT’s are often not even suitable to answer questions in other domains than therapy (etiology/harm, prognosis, diagnosis): per definition the level of evidence for these kind of questions inevitably will be low*. Also, for some interventions RCT’s are not appropriate, feasible or too costly to perform (cesarean vs vaginal birth; experimental therapies, rare diseases, see also [3]).

It is also good to realize that guidance, based on numerous randomized controlled trials is probably not or limited applicable to groups of patients who are seldom included in a RCT: the cognitively impaired, the patient with multiple comorbidities [6], the old patient [6], children and (often) women.

Finally not all RCTs are created equal (various forms of bias; surrogate outcomes; small sample sizes, short follow-up), and thus should not all represent the same high level of evidence.*

Thus in my opinion, low levels of evidence are not per definition problematic. Even if they are the basis for strong recommendations. As long as it is clear how the recommendations were reached and as long as these are well underpinned (by whatever evidence or motivation). One could see the exposed gaps in evidence as a positive thing as it may highlight the need for clinical research in certain fields.

There is one BIG BUT: my assumption is that guidelines are “just” recommendations based on exhaustive and objective reviews of existing evidence. No more, no less. This means that the clinician must have the freedom to deviate from the recommendations, based on his own expertise and/or the situation and/or the patient’s preferences. The more, when the evidence on which these strong recommendations are based is ‘scant’. Sackett already warned for the possible hijacking of EBM by purchasers and managers (and may I add health insurances and governmental agencies) to cut the costs of health care and to impose “rules”.

I therefore think it is odd that the ACC/AHA guidelines prescribe that Class I recommendations SHOULD be performed/administered even if they are based on level C recommendations (see Figure).

I also find it odd that different guidelines have a different nomenclature. The ACC/AHA have Class I, IIa, IIb and III recommendations and level A, B, C evidence where level A evidence represents sufficient evidence from multiple randomized trials and meta-analyses, whereas the strength of recommendations in the IDSA guidelines includes levels A through C (OR D/E recommendations against use) and quality of evidence ranges from level I through III , where I indicates evidence from (just) 1 properly randomized controlled trial. As explained in [3] this system was introduced to evaluate the effectiveness of preventive health care interventions in Canada (for which RCTs are apt).

Finally, guidelines and guideline makers should probably be more open for input/feedback from people who apply these guidelines.

————————————————

*the new GRADE (Grading of Recommendations Assessment, Development, and Evaluation) scoring system taking into account good quality observational studies as well may offer a potential solution.

Another possibly relevant post at this blog: The Best Study Design for … Dummies

Taken from a summary of an ACC/AHA guideline at http://guideline.gov/
Click to enlarge.

References

  1. Hutchison, G. (2012). Guidelines can harm patients too BMJ, 344 (apr18 1) DOI: 10.1136/bmj.e2685
  2. Tricoci P, Allen JM, Kramer JM, Califf RM, & Smith SC Jr (2009). Scientific evidence underlying the ACC/AHA clinical practice guidelines. JAMA : the journal of the American Medical Association, 301 (8), 831-41 PMID: 19244190
  3. Lee, D., & Vielemeyer, O. (2011). Analysis of Overall Level of Evidence Behind Infectious Diseases Society of America Practice Guidelines Archives of Internal Medicine, 171 (1), 18-22 DOI: 10.1001/archinternmed.2010.482
  4. Menéndez R, Reyes S, Martínez R, de la Cuadra P, Manuel Vallés J, & Vallterra J (2007). Economic evaluation of adherence to treatment guidelines in nonintensive care pneumonia. The European respiratory journal : official journal of the European Society for Clinical Respiratory Physiology, 29 (4), 751-6 PMID: 17005580
  5. Sackett, D., Rosenberg, W., Gray, J., Haynes, R., & Richardson, W. (1996). Evidence based medicine: what it is and what it isn’t BMJ, 312 (7023), 71-72 DOI: 10.1136/bmj.312.7023.71
  6. Aylett, V. (2010). Do geriatricians need guidelines? BMJ, 341 (sep29 3) DOI: 10.1136/bmj.c5340







Follow

Get every new post delivered to your Inbox.

Join 107 other followers