When more is less: Truncation, Stemming and Pluralization in the Cochrane Library

5 01 2010

I’m on two mail lists of the Cochrane Collaboration, one is the TSC -list (TSC=Trials Search Coordinator) and the other the IRMG-list. IMRG stands for Information Retrieval Methods Group (of the Cochrane). Sometimes, difficult search problems are posted on the list. It is challenging to try to find the solutions. I can’t remember that a solution was not found.

A while ago a member of the list was puzzled why he got the following retrieval result from the Cochrane Library:

ID Search Hits
#1 (breast near tumour* ) ….. 254
#2 (breast near tumour) …… 640
#3 (breast near tumor*) ….. 428
#4 (breast near tumor) …… 640

where near = adjacent (thus breast should be just before tumour) and the asterisk * is the truncation symbol.  At the end of the word an asterisk is used for all terms that begin with that basic word root. Thus tumour* should find: tumours and tumour and thus broaden the search.

The results are odd, because #2 (without truncation) gives more hits than #1 (with truncation), and the same is true for #4 versus #3. One would expect truncation to give more results. What could be the reason behind it?

I suspected the problem had to do with the truncation. I searched for breast and tumour with or without truncation (#1 to #4) and only tumour* gave odd results: tumour* gave much less results than tumour. (to exclude that it had to do with the fields being searched I only searched the fields ti (title), ab (abstract) and kw (keywords))

Records found with tumour, not with tumour*, contained the word tumor (not shown). Thus tumour automatically searches for tumor (and vice versa). This process is called stemming.

According to the Help-function of the Cochrane Library:

Stemming: The stemming feature within the search allows words with small spelling variants to be matched. The term tumor will also match tumour.

In addition, as I realized later, the Cochrane has pluralization and singularization features.

Pluralization and singularization matches Pluralized forms of words also match singular versions, and vice versa. The term drugs will find both drug and drugs. To match either just the singular or plural form of a terms, use an exact match search and include the word in quotation marks.

Indeed (tumor* OR tumour*) (or shortly tumo*r*) retrieves a little more than tumor OR tumour: words like tumoral, tumorous, tumorectomy. Not particularly useful, although it might not be disadvantagous when used adjacent to breast, as this will filter most noise.

tumor spelling variants searched in the title (ti) only: it doesn't matter how you spell tumor (#8, #9, #10,#11), as long as you don't truncate (while using a single variant)

Thus stemming, pluralization and singularization only work without truncation. In case of truncation you should add the spelling variants yourselves if case stemming/pluralization takes place. This is useful if you’re interested in other word variants that are not automatically accounted for.

Put it another way: knowing that stemming and pluralization takes place you can simply search for the single or plural form, American or English spelling. So breast near tumor (or simply breast tumor) would have been o.k. This is the reason why these features were introduced in the first way. 😉

By the way, truncation and stemming (but not pluralization) are also features in PubMed. And this can give similar and other problems. But this will be dealt with in another blogpost.

Reblog this post [with Zemanta]

Adding Methodological Filters to MyNCBI

26 11 2009

Idea: Arnold Leenders
Text: “Laika”

Methodological Search Filters can help to narrow down a search by enriching for studies with a certain study design or methodology. PubMed has build-in methodological filters, the so called Clinical Queries for domains (like therapy and diagnosis) and for evidence based papers (like theSystematic Review subset” in Pubmed). These searches are often useful to quickly find evidence on a topic or to perform a CAT (Critical Appraised Topic). More exhaustive searches require broader  filters not incorporated in PubMed. (See Search Filters. 1. An Introduction.).

The Redesign of PubMed has made it more difficult to apply Clinical Queries after a search has been optimized. You can still go directly to the clinical queries (on the front page) and fill in some terms, but we rather advise to build the strategy first, check the terms and combine your search with filters afterwards.

Suppose you would like to find out whether spironolactone effectively reduces hirsutism in a female with PCOS (see 10+ 1 Pubmed Tips for Residents and their Instructors, Tip 9). You first check that the main concepts hirsutism and spironactone are o.k. (i.e. they map automatically with the correct MeSH). Applying the clinical queries at this stage would require you to scroll down the page each time you use them.

Instead you can use filters in My NCBI for that purpose. My NCBI is your (free) personal space for saving searches, results, PubMed preferences, for creating automatic email alerts and for creating Search Filters.
The My NCBI-option is at the upper right of the PubMed page. You first have to create a free account.

To activate or create filters, go to [1] My NCBI and click on [2] Search Filters.

Since our purpose is to make filters for PubMed, choose [3] PubMed from the list of NCBI-databases.

Under Frequently Requested Filters you find the most popular Limit options. You can choose any of the optional filters for future use. This works faster than searching for the appropriate limit each time. You can for instance use the filter for humans to exclude animals studies.

The Filters we are going to use are under “Browse Filters”, Subcategory Properties….

….. under Clinical Queries (Domains, i.e. therapy) and Subsets (Systematic Review Filters)

You can choose any filter you like. I choose the Systematic Review Filter (under Subsets) and the Therapy/Narrow Filter under  Clinical Queries.

In addition you can add custom filters. For instance you might want to add a sensitive Cochrane RCT filter, if you perform broad searches. Click Custom Filters, give the filter a name and copy/paste the search string you want to use as filter.

Control via “Run Filter” if the Filter works (the number of hits are shown) and SAVE the filter.

Next you have to activate the filters you want to use. Note there is a limit of five 15 filters (including custom filters) that can be selected and listed in My Filters. [edited: July 5th, hattip Tanya Feddern-Bekcan]

Under  My Filters you now see the Filters you have chosen or created.

From now on I can use these filters to limit my search. So lets go to my original search in “Advanced Search”. Unfiltered, search #3 (hirsutism  AND spironolactone) has 197 hits.

When you click on the number of hits you arrive at the results page.
At the right are the filters with the number of results of your search combined with these filters (between brackets).

When you click at the Systematic Reviews link you see the 11 results, most of them very relevant. Filters (except the Custom Filters) can be appended to the search (and thus saved) by clicking the yellow + button.

Each time you do a search (and you’re logged in into My NCBI)  the filtered results are automatically shown at the right.

Clinical Queries zijn vaak handig als je evidence zoekt of een CAT (Critical Appraised Topic) maakt. In de nieuwe versie van PubMed zijn de Clinical Queries echter moeilijker te vinden. Daarom is het handig om bepaalde ‘Clinical Queries’ op te nemen in ‘My NCBI’. Deze queries bevinden zich onder Browse Filters (mogelijkheid onder Search Filters)

Het is ook mogelijk speciale zoekfilters te creĂ«eren, zoals b.v. het Cochrane highly sensitive filter voor RCT’s. Dit kan onder Custom Filters.

Controleer wel via ‘Run Filter” of het filter werkt en sla het daarna op.

Daarna moet je het filter nog activeren door het hokje aan te vinken. Dus je zou alle filters van de ‘Clinical study category’ kunnen opnemen en deze afhankelijk van het domein van de vraag kunnen activeren.

Zo heb je altijd alle filters bij de hand. De resultaten worden automatisch getoond (aan de rechterkant).

Reblog this post [with Zemanta]