No, Google Scholar Shouldn’t be Used Alone for Systematic Review Searching

9 07 2013

Several papers have addressed the usefulness of Google Scholar as a source for systematic review searching. Unfortunately the quality of those papers is often well below the mark.

In 2010 I already [1]  (in the words of Isla Kuhn [2]) “robustly rebutted” the Anders’ paper PubMed versus Google Scholar for Retrieving Evidence” [3] at this blog.

But earlier this year another controversial paper was published [4]:

“Is the coverage of google scholar enough to be used alone for systematic reviews?

It is one of the highly accessed papers of BMC Medical Informatics and Decision Making and has been welcomed in (for instance) the Twittosphere.

Researchers seem  to blindly accept the conclusions of the paper:

But don’t rush  and assume you can now forget about PubMed, MEDLINE, Cochrane and EMBASE for your systematic review search and just do a simple Google Scholar (GS) search instead.

You might  throw the baby out with the bath water….

… As has been immediately recognized by many librarians, either at their blogs (see blogs of Dean Giustini [5], Patricia Anderson [6] and Isla Kuhn [1]) or as direct comments to the paper (by Tuulevi OvaskaMichelle Fiander and Alison Weightman [7].

In their paper, Jean-François Gehanno et al examined whether GS was able to retrieve all the 738 original studies included in 29 Cochrane and JAMA systematic reviews.

And YES! GS had a coverage of 100%!


All those fools at the Cochrane who do exhaustive searches in multiple databases using controlled vocabulary and a lot of synonyms when a simple search in GS could have sufficed…

But it is a logical fallacy to conclude from their findings that GS alone will suffice for SR-searching.

Firstly, as Tuulevi [7] rightly points out :

“Of course GS will find what you already know exists”

Or in the words of one of the official reviewers [8]:

What the authors show is only that if one knows what studies should be identified, then one can go to GS, search for them one by one, and find out that they are indexed. But, if a researcher already knows the studies that should be included in a systematic review, why bother to also check whether those studies are indexed in GS?


Secondly, it is also the precision that counts.

As Dean explains at his blog a 100% recall with a precision of 0,1% (and it can be worse!) means that in order to find 36 relevant papers you have to go through  ~36,700 items.


Are the authors suggesting that researchers consider a precision level of 0.1% acceptable for the SR? Who has time to sift through that amount of information?

It is like searching for needles in a haystack.  Correction: It is like searching for particular hay stalks in a hay stack. It is very difficult to find them if they are hidden among other hay stalks. Suppose the hay stalks were all labeled (title), and I would have a powerful haystalk magnet (“title search”)  it would be a piece of cake to retrieve them. This is what we call “known item search”. But would you even consider going through the haystack and check the stalks one by one? Because that is what we have to do if we use Google Scholar as a one stop search tool for systematic reviews.

Another main point of criticism is that the authors have a grave and worrisome lack of understanding of the systematic review methodology [6] and don’t grasp the importance of the search interface and knowledge of indexing which are both integral to searching for systematic reviews.[7]

One wonders why the paper even passed the peer review, as one of the two reviewers (Miguel Garcia-Perez [8]) already smashed the paper to pieces.

The authors’ method is inadequate and their conclusion is not logically connected to their results. No revision (major, minor, or discretionary) will save this work. (…)

Miguel’s well funded criticism was not well addressed by the authors [9]. Apparently the editors didn’t see through and relied on the second peer reviewer [10], who merely said it was a “great job” etcetera, but that recall should not be written with a capital R.
(and that was about the only revision the authors made)

Perhaps it needs another paper to convince Gehanno et al and the uncritical readers of their manuscript.

Such a paper might just have been published [11]. It is written by Dean Giustini and Maged Kamel Boulos and is entitled:

Google Scholar is not enough to be used alone for systematic reviews

It is a simple and straightforward paper, but it makes its points clearly.

Giustini and Kamel Boulos looked for a recent SR in their own area of expertise (Chou et al [12]), that included a comparable number of references as that of Gehanno et al. Next they test GS’ ability to locate these references.

Although most papers cited by Chou et al. (n=476/506;  ~95%) were ultimately found in GS, numerous iterative searches were required to find the references and each citation had to be managed once at a time. Thus GS was not able to locate all references found by Chou et al. and the whole exercise was rather cumbersome.

As expected, trying to find the papers by a “real-life” GS search was almost impossible. Because due to its rudimentary structure, GS did not understand the expert search strings and was unable to translate them. Thus using Chou et al.’s original search strategy and keywords yielded unmanageable results of approximately >750,000 items.

Giustini and Kamel Boulos note that GS’ ability to search into the full-text of papers combined with its PageRank’s algorithm can be useful.

On the other hand GS’ changing content, unknown updating practices and poor reliability make it an inappropriate sole choice for systematic reviewers:

As searchers, we were often uncertain that results found one day in GS had not changed a day later and trying to replicate searches with date delimiters in GS did not help. Papers found today in GS did not mean they would be there tomorrow.

But most importantly, not all known items could be found and the search process and selection are too cumbersome.

Thus shall we now for once and for all conclude that GS is NOT sufficient to be used alone for SR searching?

We don’t need another bad paper addressing this.

But I would really welcome a well performed paper looking at the additional value of a GS in SR-searching. For I am sure that GS may be valuable for some questions and some topics in some respects. We have to find out which.


PubMed versus Google Scholar for Retrieving Evidence

8 06 2010

ResearchBlogging.orgA while ago a resident in dermatology told me she got many hits out of PubMed, but zero results out of TRIP. It appeared she had used the same search for both databases: alopecea areata and diphenciprone (a drug with a lot of synonyms). Searching TRIP for alopecea (in the title) only, we found a Cochrane Review and a relevant NICE guideline.

Usually, each search engine has is its own search and index features. When comparing databases one should compare “optimal” searches and keep in mind for what purpose the search engines are designed. TRIP is most suited to search aggregate evidence, whereas PubMed is most suited to search individual biomedical articles.

Michael Anders and Dennis Evans ignore this “rule of the thumb” in their recent paper “Comparison of PubMed and Google Scholar Literature Searches”. And this is not the only shortcoming of the paper.

The authors performed searches on 3 different topics to compare PubMed and Google Scholar search results. Their main aim was to see which database was the most useful to find clinical evidence in respiratory care.

Well quick guess: PubMed wins…

The 3 respiratory care topics were selected from a list of systematic reviews on the Website of the Cochrane Collaboration and represented in-patient care, out-patient care, and pediatrics.

The references in the three chosen Cochrane Systematic Reviews served as a “reference” (or “golden”) standard. However, abstracts, conference proceedings, and responses to letters were excluded.

So far so good. But note that the outcome of the study only allows us to draw conclusions about interventional questions, that seek to find controlled clinical trials. Other principles may apply to other domains (diagnosis, etiology/harm, prognosis ) or to other types of studies. And it certainly doesn’t apply to non-EBM-topics.

The authors designed ONE search for each topic, by taking 2 common clinical terms from the title of each Cochrane review connected by the Boolean operator “AND” (see Table, ” ” are not used). No synonyms were used and the translation of searches in PubMed wasn’t checked (luckily the mapping was rather good).



Search Terms

Noninvasive positive-pressure ventilation for cardiogenic pulmonary edema “noninvasive positive-pressure ventilation” AND “pulmonary edema”
Self-management education and regular practitioner review for adults with asthma “asthma” AND “education”
Ribavirin for respiratory syncytial virus “ribavirin” AND “respiratory syncytial virus”

In PubMed they applied the narrow methodological filter, or Clinical Query, for the domain therapy.
This prefab search strategy (randomized controlled trial[Publication Type] OR (randomized[Title/Abstract] AND controlled[Title/Abstract] AND trial[Title/Abstract]), developed by Haynes, is suitable to quickly detect the available evidence (provided one is looking for RCT’s and doesn’t do an exhaustive search). (see previous posts 2, 3, 4)

Google Scholar, as we all probably know, does not have such methodological filters, but the authors “limited” their search by using the Advanced option and enter the 2 search terms in the “Find articles….with all of the words” space (so this is a boolean “AND“) and they limited it the search to the subject area “Medicine, Pharmacology, and Veterinary Science”.

They did a separate search for publications that were available at their library, which has limited value for others, subscriptions being different for each library.

Next they determined the sensitivity (the number of relevant records retrieved as a proportion of the total number of records in the gold standard) and the precision or positive predictive value, the  fraction of returned positives that are true positives (explained in 3).

Let me guess: sensitivity might be equal or somewhat higher, and precision is undoubtedly much lower in Google Scholar. This is because (in) Google Scholar:

  • you can often search full text instead of just in the abstract, title and (added) keywords/MeSH
  • the results are inflated by finding one and the same references cited in many different papers (that might not directly deal with the subject).
  • you can’t  limit on methodology, study type or “evidence”
  • there is no automatic mapping and explosion (which may provide a way to find more synonyms and thus more relevant studies)
  • has a broader coverage (grey literature, books, more topics)
  • lags behind PubMed in receiving updates from MEDLINE

Results: PubMed and Google Scholar had pretty much the same recall, but for ribavirin and RSV the recall was higher in PubMed, PubMed finding 100%  (12/12) of the included trials, and Google Scholar 58% (7/12)

No discussion as to the why. Since Google Scholar should find the words in titles and abstracts of PubMed I repeated the search in PubMed but only in the title, abstract field, so I searched ribavirin[tiab] AND respiratory syncytial virus[tiab]* and limited it with the narrow therapy filter: I found 26 papers instead of 32. These titles were missing when I only searched title and abstract (between brackets: [relevant MeSH (reason why paper was found), absence of abstract (thus only title and MeSH) and letter], bold: why terms in title abstract are not found)

  1. Evaluation by survival analysis on effect of traditional Chinese medicine in treating children with respiratory syncytial viral pneumonia of phlegm-heat blocking Fei syndrome.
    Respiratory Syncytial Virus Infections/]
  2. Ribavarin in ventilated respiratory syncytial virus bronchiolitis: a randomized, placebo-controlled trial.
    Respiratory Syncytial Virus Infections/[NO ABSTRACT, LETTER]
  3. Study of interobserver reliability in clinical assessment of RSV lower respiratory illness.
    [MeSH:Respiratory Syncytial Virus Infections*]
  4. Ribavirin for severe RSV infection. N Engl J Med.
    [MeSH: Respiratory Syncytial Viruses
  5. Stutman HR, Rub B, Janaim HK. New data on clinical efficacy of ribavirin.
    MeSH: Respiratory Syncytial Viruses
  6. Clinical studies with ribavirin.
    MeSH: Respiratory Syncytial Viruses

Three of the papers had the additional MeSH respiratory syncytial virus and the three others respiratory syncytial virus infections. Although not all papers (2 comments/letters) may be relevant, it illustrates why PubMed may yield results, that are not retrieved by Google Scholar (if one doesn’t use synonyms)

In Contrast to Google Scholar, PubMed translates the search ribavirin AND respiratory syncytial virus so that the MeSH-terms “ribavirin”, “respiratory syncytial viruses”[MeSH Terms] and (indirectly) respiratory syncytial virus infection”[MeSH] are also found.

Thus in Google Scholar articles with terms like RSV and respiratory syncytial viral pneumonia (or lack of specifications, like clinical efficacy) could have been missed with the above-mentioned search.

The other result of the study (the result section comprises 3 sentences) is that “For each individual search, PubMed had better precision”.

The Precision was 59/467 (13%) in PubMed and 57/80,730 (0.07%)  in Google Scholar (p<0.001)!!
(note: they had to add author names in the Google Scholar search to find the papers in the haystack 😉

Héhéhé, how surprising. Well why would it be that no clinician or librarian would ever think of using Google Scholar as the primary, let alone the only, source to search for medical evidence?
It should also ring a bell, that [QUOTE**]:
In the Cochrane reviews the researchers retrieved information from multiple databases, including MEDLINE, the Cochrane Airways Group trial register (derived from MEDLINE)***, CENTRAL, EMBASE, CINAHL, DARE, NHSEED, the Acute Respiratory Infections Group’s specialized register, and LILACS… ”
Google Scholar isn’t mentioned as a source! Google Scholar is only recommendable to search for work citing (already found) relevant articles (this is called forward searching), if one hasn’t access to Web of Science or SCOPUS. Thus only to catch the last fish.

Perhaps the paper could have been more interesting if the authors had looked at any ADDED VALUE of Google Scholar, when exhaustively searching for evidence. Then it would have been crucial to look for grey literature too, (instead of excluding it), because this could be a possible strong point for Google Scholar. Furthermore one could have researched if forward searching yielded extra papers.

The specificity of PubMed is attributed to the used therapy-narrow filter, but the vastly lower specificity of Google Scholar is also due to the searching in the full text, including the reference lists.

For instance, searching for ribavirin AND respiratory syncytial virus in PubMed yields 523 hits. This can be reduced to 32 hits when applying the narrow therapy filter. This means a reduction by a factor of 16.
Yet a similar search in Google Scholar yield
4,080 hits. Thus without the filter there is still an almost 8 times higher yield from Google Scholar than from PubMed.

That evokes another  research idea: what would have happened if randomized (OR randomised) would have been added to the Google Scholar search? Would this have increased the specificity? In case of the above search it lowers the yield with a factor 2, and the first hits look very relevant.

It is really funny but the authors bring down their own conclusion that “These results are important because efficient retrieval of the best available scientific evidence can inform respiratory care protocols, recommendations for clinical decisions in individual patients, and education, while minimizing information overload.” by saying elsewhere that “It is unlikely that users consider more than the first few hundred search results, so RTs who conduct literature searches with Google Scholar on these topics will be much less likely to find references cited in Cochrane reviews.”

Indeed no one would take it into ones head to try to find the relevant papers out of those 4,080 hits retrieved. So what is this study worth from a practical point of view?

Well anyway, as you can ask for the sake of asking you can research for the sake of researching. Despite being an EBM-addict I prefer a good subjective overview on this topic over a weak scientific, quasi-evidence based, research paper.

Does this mean Google Scholar is useless? Does it mean that all those PhD’s hooked on Google Scholar are wrong?

No, Google Scholar serves certain purposes.

Just like the example of PubMed and TRIP, you need to know what is in it for you and how to use it.

I used Google Scholar when I was a researcher:

  • to quickly find a known reference
  • to find citing papers
  • to get an idea of how much articles have been cited/ find the most relevant papers in a quick and dirty way (i.e. by browsing)
  • for quick and dirty searches by putting words string between brackets.
  • to search full text. I used quite extensive searches to find out what methods were used (for instance methods AND (synonym1 or syn2 or syn3)). An interesting possibility is to do a second search for only the last few words (in a string). This will often reveal the next words in the sentence. Often you can repeat this trick, reading a piece of the paper without need for access.

If you want to know more about the pros and cons of Google Scholar I recommend the recent overview by the expert librarian Dean Giustini: “Sure Google Scholar is ideal for some things” [7]”. He also compiled a “Google scholar bibliography” with ~115 articles as of May 2010.

Speaking of librarians, why was the study performed by PhD RRT (RN)’s and wasn’t the university librarian involved?****

* this is a search string and more strict than respiratory AND syncytial AND virus
abbreviations used instead of full (database) names
*** this is wrong, a register contains references to controlled clinical trials from EMBASE, CINAHL and all kind of  databases in addition to MEDLINE.
****other then to read the manuscript afterwards.


