No, Google Scholar Shouldn’t be Used Alone for Systematic Review Searching

9 07 2013

Several papers have addressed the usefulness of Google Scholar as a source for systematic review searching. Unfortunately the quality of those papers is often well below the mark.

In 2010 I already [1] (in the words of Isla Kuhn [2]) “robustly rebutted” the Anders’ paper “PubMed versus Google Scholar for Retrieving Evidence” [3] at this blog.

But earlier this year another controversial paper was published [4]:

“Is the coverage of google scholar enough to be used alone for systematic reviews?“

It is one of the highly accessed papers of BMC Medical Informatics and Decision Making and has been welcomed in (for instance) the Twittosphere.

Researchers seem to blindly accept the conclusions of the paper:

https://twitter.com/jeffvallance/status/340562086524510208

But don’t rush and assume you can now forget about PubMed, MEDLINE, Cochrane and EMBASE for your systematic review search and just do a simple Google Scholar (GS) search instead.

You might throw the baby out with the bath water….

… As has been immediately recognized by many librarians, either at their blogs (see blogs of Dean Giustini [5], Patricia Anderson [6] and Isla Kuhn [1]) or as direct comments to the paper (by Tuulevi Ovaska, Michelle Fiander and Alison Weightman [7].

In their paper, Jean-François Gehanno et al examined whether GS was able to retrieve all the 738 original studies included in 29 Cochrane and JAMA systematic reviews.

And YES! GS had a coverage of 100%!

WOW!

All those fools at the Cochrane who do exhaustive searches in multiple databases using controlled vocabulary and a lot of synonyms when a simple search in GS could have sufficed…

But it is a logical fallacy to conclude from their findings that GS alone will suffice for SR-searching.

Firstly, as Tuulevi [7] rightly points out :

“Of course GS will find what you already know exists”

Or in the words of one of the official reviewers [8]:

What the authors show is only that if one knows what studies should be identified, then one can go to GS, search for them one by one, and find out that they are indexed. But, if a researcher already knows the studies that should be included in a systematic review, why bother to also check whether those studies are indexed in GS?

Right!

Secondly, it is also the precision that counts.

As Dean explains at his blog a 100% recall with a precision of 0,1% (and it can be worse!) means that in order to find 36 relevant papers you have to go through ~36,700 items.

Dean:

Are the authors suggesting that researchers consider a precision level of 0.1% acceptable for the SR? Who has time to sift through that amount of information?

It is like searching for needles in a haystack. Correction: It is like searching for particular hay stalks in a hay stack. It is very difficult to find them if they are hidden among other hay stalks. Suppose the hay stalks were all labeled (title), and I would have a powerful haystalk magnet (“title search”) it would be a piece of cake to retrieve them. This is what we call “known item search”. But would you even consider going through the haystack and check the stalks one by one? Because that is what we have to do if we use Google Scholar as a one stop search tool for systematic reviews.

Another main point of criticism is that the authors have a grave and worrisome lack of understanding of the systematic review methodology [6] and don’t grasp the importance of the search interface and knowledge of indexing which are both integral to searching for systematic reviews.[7]

One wonders why the paper even passed the peer review, as one of the two reviewers (Miguel Garcia-Perez [8]) already smashed the paper to pieces.

The authors’ method is inadequate and their conclusion is not logically connected to their results. No revision (major, minor, or discretionary) will save this work. (…)

Miguel’s well funded criticism was not well addressed by the authors [9]. Apparently the editors didn’t see through and relied on the second peer reviewer [10], who merely said it was a “great job” etcetera, but that recall should not be written with a capital R.
(and that was about the only revision the authors made)

Perhaps it needs another paper to convince Gehanno et al and the uncritical readers of their manuscript.

Such a paper might just have been published [11]. It is written by Dean Giustini and Maged Kamel Boulos and is entitled:

“Google Scholar is not enough to be used alone for systematic reviews“

It is a simple and straightforward paper, but it makes its points clearly.

Giustini and Kamel Boulos looked for a recent SR in their own area of expertise (Chou et al [12]), that included a comparable number of references as that of Gehanno et al. Next they test GS’ ability to locate these references.

Although most papers cited by Chou et al. (n=476/506; ~95%) were ultimately found in GS, numerous iterative searches were required to find the references and each citation had to be managed once at a time. Thus GS was not able to locate all references found by Chou et al. and the whole exercise was rather cumbersome.

As expected, trying to find the papers by a “real-life” GS search was almost impossible. Because due to its rudimentary structure, GS did not understand the expert search strings and was unable to translate them. Thus using Chou et al.’s original search strategy and keywords yielded unmanageable results of approximately >750,000 items.

Giustini and Kamel Boulos note that GS’ ability to search into the full-text of papers combined with its PageRank’s algorithm can be useful.

On the other hand GS’ changing content, unknown updating practices and poor reliability make it an inappropriate sole choice for systematic reviewers:

As searchers, we were often uncertain that results found one day in GS had not changed a day later and trying to replicate searches with date delimiters in GS did not help. Papers found today in GS did not mean they would be there tomorrow.

But most importantly, not all known items could be found and the search process and selection are too cumbersome.

Thus shall we now for once and for all conclude that GS is NOT sufficient to be used alone for SR searching?

We don’t need another bad paper addressing this.

But I would really welcome a well performed paper looking at the additional value of a GS in SR-searching. For I am sure that GS may be valuable for some questions and some topics in some respects. We have to find out which.

References

PubMed versus Google Scholar for Retrieving Evidence 2010/06 (laikaspoetnik.wordpress.com)
Google scholar for systematic reviews…. hmmmm 2013/01 (ilk21.wordpress.com)
Anders M.E. & Evans D.P. (2010) Comparison of PubMed and Google Scholar literature searches, Respiratory care, May;55(5):578-83 PMID: 20420728
Gehanno J.F., Rollin L. & Darmoni S. (2013). Is the coverage of Google Scholar enough to be used alone for systematic reviews., BMC medical informatics and decision making, 13:7 PMID: 23302542 (open access)
Is Google scholar enough for SR searching? No. 2013/01 (blogs.ubc.ca/dean)
What’s Wrong With Google Scholar for “Systematic” Review 2013/01 (etechlib.wordpress.com)
Comments at Gehanno’s paper (www.biomedcentral.com)
Official Reviewer’s report of Gehanno’s paper [1]: Miguel Garcia-Perez, 2012/09
Authors response to comments (www.biomedcentral.com)
Official Reviewer’s report of Gehanno’s paper [2]: Henrik von Wehrden, 2012/10
Giustini D. & Kamel Boulos M.N. (2013). Google Scholar is not enough to be used alone for systematic reviews, Online Journal of Public Health Informatics, 5 (2) DOI: 10.5210/ojphi.v5i2.4623
Chou W.Y.S., Prestin A., Lyons C. & Wen K.Y. (2013). Web 2.0 for Health Promotion: Reviewing the Current Evidence, American Journal of Public Health, 103 (1) e9-e18. DOI: 10.2105/AJPH.2012.301071

Actions

Information

Date : July 9, 2013
Tags: Google Scholar, PubMed, Search Engines, Searching, Systematic Review
Categories : Google Scholar, PubMed/MEDLINE, Researchblogs, Searching, Systematic Review

13 responses

26 07 2013: Henrik von Wehrden (11:34:53) :

Dear author!
Many thanks for you blog entry, on which I would like to comment.
I feel that it is questionable to make suchlike statements without considering all the facts. While opinions can be widely shared through blogs which I generally welcome, peer-review is a different and more formal way of doing research, that is, at least to me. Since I am the reviewer of one of the papers you mentioned I would like to point out strongly that I stand to my review, and that the paper you seemingly criticize was not “just” accepted, but examined by me in detail. I follow a standard protocol in performing reviews, and to date I reviewed several dozens of papers, with an ~20 % rejections and about 70 % of major revisions. I would therefore like to state that I consider myself to be rather critical as a reviewer. The paper you discuss here was in my opinion and based on my expertise very good, and I examined it in length.

Thus while I agree with you general suggestion that GS should not be overall used for systematic reviews (e.g. because of the high false positive rate, as you mention as well), but can be quite helpful indeed. I recommend another paper from a colleague and me (Beckmann M & von Wehrden H (2012) Where you search is what you get: Literature mining – Google Scholar vS ISI using a dataset from a literature search in vegetation science. Journal of Vegetation Science). There we compared results from thousands of papers we examined manually (!) with both ISI and GS searches. Despite these clear results I would nevertheless not say that we should instantly shift to GS, but instead we need to rely on existing databases and procedures, which I also follow in my research (see several reviews at http://www.henrikvonwehrden.de). I feel as a researcher we need to be creative but also conservative, thus I welcome GS as an additional search database, but do by no means suggest to abandon “classic” search databases, since much of our existing research is based on e.g. ISI and Scopus.

Still, I would recommend to not mix up the outcome of the paper with the overall message, and this works both ways for you and the authors of the original paper, whose Twitter comment is a rather over-simplistic in my opinion, yet this is probably how this media works. I strongly suggest that we need to rely on established procedures and databases. Peer-review is an established procedure, and I repeat that I stand to my review and the paper I had the honor to examine. I contributed one review to the editor, yet I feel that it is beyond my role to make a decision on the manuscript, which is to me done by the editor, who of course relies on my review as well.

However I strongly stand to established databases, and feel that e.g. Cochrane standards are something we would need in other branches of science (e.g. ecology) as well. This might however not work in Social sciences, where grey literature is more important. GS is rather new, not established, and most reviewers and editors in my field would hardly accept large reviews based on this database, since the relation of GS with searches in other databases is not sufficiently understood, thus more data is needed.

Thus I support both your opinion and also the BMC Medical Informatics and Decision Making paper (which I feel is very good, and arrives at objective results in its context) , but feel that these represent a necessary diversity we need within modern science. I would neither condemn GS or PubMed in general. Instead I welcome the variety of methods and databases we have at hand, and suggest that more papers are needed to aid the challenges of modern research. What we should however try to avoid is to allow our opinion to shift into general criticism. The merits and problems of peer-review is a completely new debate, yet we might run a chance of discrediting our discussion if we criticize everything to reach our goal. Instead I call for an objective discussion, which is data driven and considers all the facts, and avoids overly simple statements.
Many thanks for sparking this interesting discussion!

Yours,

Henrik von Wehrden

Reply
26 07 2013: Zbys (13:43:46) :

Stimulating and strongly advocating for post publication peer review. But…. In view of the seemingly widely discordant views of 2 peer referees would it not seem appropriate to seek the opinion of a third or arbitrate for a consensus?

Reply
26 07 2013: Henrik von Wehrden (14:00:54) :

Indeed, post publication peer review (for which blogs are a decent start) would be highly welcome by me. Frequent updates of papers as e.g. by Chochrane reviews would be an additional option. As a reviewer one is usually not included in editorial decisions, or is only noted of the decisions.
In such a case I feel the best way is to write a response to the article, however the article in itself I would still recognize as consistent. Questioning the editor would be also difficult, since I generally assume that editors make a tremendous job in managing the current turmoil, yet of course we all know examples when things went off track. Sometimes decisions can be questioned, yet I overall believe in the system. I feel its good that the reviews were published, since this gives for option to the reader to be actually informed, triggering vital discussions.
Next stop: A more dynamic review system, including evaluations by readers and later updates.

I shared some thoughts on this here:

Peak publications? (Part 1)

Many thanks for this interesting discussion,

HVW

Reply
26 07 2013: Zbys (14:17:26) :

As fairly experienced Cochrane Review author (50) I would not/never advocate using GS to search for potentially eligible studies to include in a systematic review other than as a ‘rough and dirty’ search to help develop a robust and focused search strategy.
As a matter of fact Cochrane reviews/review authors generally acknowledge peer reviewers including consumers, content experts, methodologists et al, in the review itself not in an annual summary as is often the case with other journals
26 07 2013: Henrik von Wehrden (14:31:48) :

Although I believe I have only one review where I ever used GS to actual search, and there it was exactly the buckshot scenario you stated, I would say that disciplines may differ. Medicine is with distinction the most important and most resource intense field with a long standing tradition and well established protocols. Some fields have to rely on grey literature. Interestingly enough did at some of my papers suggest the reviewers that we should have used GS, yet this was e.g. in transdisciplinary science. I would and did opt against it, yet GS is also nice because you get much of the pdfs, which is e.g. important for colleagues from developing countries.

May we move this discussion into e-mail? henrikvonwehrden@web.de?

Greetings!

HVW
26 07 2013: Zbys (14:47:12) :

I think the issue remains as always the ultimate responsibility of the Editor but common sense should have prevailed and a 3rd opinion should have been sought, irrespective of whether you are for or against GS as a valid and reliable methodological tool. QED
26 07 2013: laikaspoetnik (14:48:24) :

Thanks for this very interesting discussion, Henrik and Zbys

Unfortunately I’m not yet able to respond to your replies, as I’m working (till late*). All I can do is approve your comments 🙂
It needs some time to write a thoughtful answer.

I would appreciate if you don’t continue privately as long as the discussion remains on topic. There are too many interesting aspects.

I will join in later tonight.

Jacqueline

* one of those exhaustive searches 😉
26 07 2013: Henrik von Wehrden (15:30:55) :

To me it would remain an interesting challenge on how to filter “gate keepers” from really good editors. A nice approach would be http://www.peerageofscience.org , thus making everything open and transparent. I have no experience as an editor, yet as a reviewer I would welcome additional reviews if the ones existing are controversial.

Reply
26 07 2013: Zbys (18:48:38) :

one moment .. Why are you continuing to ‘distract’ attention from the REAL issue? This was that the paper was inadequately peer refereed.
FACT: two referees with divergent opinions;
SOLUTION: an independent/equipoised review and/or discussion between the 2 existing peer reviewers to reach consensus.
QED

Reply
26 07 2013: laikaspoetnik (20:59:16) :

@Henrik

First of all let me thank you for commenting directly to my blog post. Blogging is indeed essentially different from peer reviewing – and I appreciate your willingness to discuss my criticism openly at this platform.

My post is not about peer review. It is not about the (possible) additional value of GS either. It is about a paper which in my view doesn’t address a research question very well and is now debunked by another paper. In fact the recent publication of the paper of Giustini et al was the real trigger for this post.

I do not “condemn” GS. As a matter of fact I clearly state at the end of my post that GS might have additional value as a source, not only for a quick and dirty search but for topics with few publications and/or where relevant search terms are not mentioned directly in title, abstract or MeSH. GS might even be more valuable for non-medical topics (in PubMed and EMBASE we can aim for certain levels of evidence/study types, in other fields the alternative sources may be less useful).

You warn “to not mix up the outcome of the paper with the overall message”. However, this is the entire point. The outcome of the paper is that “all (KNOWN) papers included in de Cochrane SR’s” could be retrieved in GS. However, the MESSAGE was “GS can be used alone for SR-searching” (and the original research question was: can GS be used ALONE for SR searching). The authors didn’t investigate additional value (they didn’t find extra papers) and they didn’t show what would happen if they did the entire search process from scratch in GS. (the only hypothetical additional value would be that it would save time. But it probably doesn’t)

By the way, the paper by Giustini doesn’t even confirm that all known papers can be found in GS.

Looking at more detail at the peer reviewing process of the paper by Gehanno et al, it surpised me that the two reviewers had completely opposite opinions and that none of the very relevant points of the 2nd reviewer was addressed by the authors. It is not my intention to debate your quality as a peer reviewer. Without doubt you have a lot of expertise both as a peer reviewer and an expert in the field. However reading this particular review, I can only state that it wasn’t extensive: “very good, replace recall by Recall”) and not very critical (at least the final writing wasn’t). I think you missed the point that the research question was not well addressed (or the conclusions were not firmly based on the findings).

But we know peer review is largely subjective, and so is the editorial process. It is a good thing that the pre-publication history of BMC Medical Informatics and Decision Making is available to all and we are able to discuss it openly.

p.s. I have not yet looked at your Literature mining – Google Scholar vS ISI paper, but will do so soon. Seems interesting. i suspect results i agricultural science may be different from medical science.

@zbys

Thanks for joining the discussion!

I agree with all of your points. (post publication peer review, need for a 3rd opinion, limited value of GS as a source for SR’s on medical interventions -of the Cochrane quality)

A 3rd reviewer would surely have been more than appropriate.

However, imo the editors should also have analysed the (lack of proper) reponse of the authors to the first reviewer, who said very sensible things about the shortcomings of the study.

Reply
27 07 2013: Zbys (03:30:49) :

Thanks I have some albeit limited experience as a Chief Editor of a journal. This discussion has been about detail when the big picture is that the Editors should have sought a third opinion .. period. The questions can then follow and my question would be why didn’t they seek a third reviewer?

Reply
1 09 2014: Gehanno (11:56:44) :

Hi,
it is interesting how people may comment an article, without correctly quoting it.
The purpose of our article was to study the coverage of GS, for high quality articles, and that is all (see the title : “Is the coverage of google scholar enough to be used alone for systematic reviews”).
We already mentioned in our article that the precision was low.
Last, we never said that GS should be used alone. Please refer to the conclusion of the article “GS could even be used alone. It just requires some improvement in the advanced search features to improve its precision”
I am not a native english speaker, but I thought that could and shoud had not the same meaning …
Finally, the prurpose was to trigger discussion. It seems to have work quite well …
I fully agree with the conclusion of this post :
“For I am sure that GS may be valuable for some questions and some topics in some respects. We have to find out which.”
JF Gehanno

Reply
18 11 2014: What if Google killed Scholar? | Max Kemman (09:24:41) :

[…] and has to assume that not all thousands of search results need to be considered. As such, doing an extensive and systematic literature search within Scholar is already ill-advised; although known-item searches work fairly well, discovery of all related papers should not be […]

Reply

	Judy Mikovits’… on Three Studies Now Refute the P…
	cdesaint on Changing care (for Addison…
	What happened to @Me… on Medpedia, the Medical Wikipedi…
	After 35 years, phil… on Stories [5] – Polly Matz…
	henz on Invisible Chronic Illness: Add…
	Judith Russell on Invisible Chronic Illness: Add…
	Victor Lage on The Trouble with Wikipedia as…
	Beall’s List:… on Jeffrey Beall’s List of…
	Julius Hankin on NLM’s PillBox, a new pil…
	cindy on Invisible Chronic Illness: Add…
	Robin on #NotSoFunny #16 – Ridicu…
	Marisol serrano on Invisible Chronic Illness: Add…

Laika's MedLibLog