Jeffrey Beall’s List of Predatory, Open-Access Publishers, 2012 Edition

19 12 2011

Perhaps you remember that I previously wrote [1] about  non-existing and/or low quality scammy open access journals. I specifically wrote about Medical Science Journals of  the series, which comprises 45 titles, none of which having published any article yet.

Another blogger, David M [2] also had negative experiences with fake peer review invitations from sciencejournals. He even noticed plagiarism.

Later I occasionally found other posts about open access spam, like the post of Per Ola Kristensson [3] (specifically about Bentham, Hindawi and InTech OA publishers), of Peter Murray-Rust [4] ,a chemist interested in OA (about spam journals and conferences, specifically about Scientific Research Publishing) and of Alan Dove PhD [5] (specifically about The Journal of Computational Biology and Bioinformatics Research (JCBBR) published by Academic Journals).

But now it appears that there is an entire list of “Predatory, Open-Access Publishers”. This list was created by Jeffrey Beall, academic librarian at the University of Colorado Denver. He just updated the list for 2012 here (PDF-format).

According to Jeffrey predatory, open-access publishers

are those that unprofessionally exploit the author-pays model of open-access publishing (Gold OA) for their own profit. Typically, these publishers spam professional email lists, broadly soliciting article submissions for the clear purpose of gaining additional income. Operating essentially as vanity presses, these publishers typically have a low article acceptance threshold, with a false-front or non-existent peer review process. Unlike professional publishing operations, whether subscription-based or ethically-sound open access, these predatory publishers add little value to scholarship, pay little attention to digital preservation, and operate using fly-by-night, unsustainable business models.

Jeffrey recommends not to do business with the following (illegitimate) publishers, including submitting article manuscripts, serving on editorial boards, buying advertising, etc. According to Jeffrey, “there are numerous traditional, legitimate journals that will publish your quality work for free, including many legitimate, open-access publishers”.

(For sake of conciseness, I only describe the main characteristics, not always using the same wording; please see the entire list for the full descriptions.)

Watchlist: Publishers, that may show some characteristics of  predatory, open-access publisher
  • Hindawi Way too many journals than can be properly handled by one publisher
  • MedKnow Publications vague business model. It charges for the PDF version
  • PAGEPress many dead links, a prominent link to PayPal
  • Versita Open paid subscription for print form. ..unclear business model

An asterisk (*) indicates that the publisher is appearing on this list for the first time.

How complete and reliable is this list?

Clearly, this list is quite exhaustive. Jeffrey did a great job listing  many dodgy OA journals. We should watch (many) of these OA publishers with caution. Another good thing is that the list is updated annually.

( described in my previous post is not (yet) on the list 😉  but I will inform Jeffrey).

Personally, I would have preferred a distinction between real bogus or spammy journals and journals that seem to have “too many journals to properly handle” or that ask (too much ) money for subscription/from the author. The scientific content may still be good (enough).

Furthermore, I would rather see a neutral description of what is exactly wrong about a journal. Especially because “Beall’s list” is a list and not a blog post (or is it?). Sometimes the description doesn’t convince me that the journal is really bogus or predatory.

Examples of subjective portrayals:

  • Dove Press:  This New Zealand-based medical publisher boasts high-quality appearing journals and articles, yet it demands a very high author fee for publishing articles. Its fleet of journals is large, bringing into question how it can properly fulfill its promise to quickly deliver an acceptance decision on submitted articles.
  • Libertas Academia “The tag line under the name on this publisher’s page is “Freedom to research.” It might better say “Freedom to be ripped off.” 
  • Hindawi  .. This publisher has way too many journals than can be properly handled by one publisher, I think (…)

I do like funny posts, but only if it is clear that the post is intended to be funny. Like the one by Alan Dove PhD about JCBBR.

JCBBR is dedicated to increasing the depth of research across all areas of this subject.

Translation: we’re launching a new journal for research that can’t get published anyplace else.

The journal welcomes the submission of manuscripts that meet the general criteria of significance and scientific excellence in this subject area.

We’ll take pretty much any crap you excrete.

Hattip: Catherine Arnott Smith, PhD at the MedLib-L list.

  1. I Got the Wrong Request from the Wrong Journal to Review the Wrong Piece. The Wrong kind of Open Access Apparently, Something Wrong with this Inherently… (
  2. A peer-review phishing scam (
  3. Academic Spam and Open Access Publishing (
  4. What’s wrong with Scholarly Publishing? New Journal Spam and “Open Access” (
  5. From the Inbox: Journal Spam (
  6. Beall’s List of Predatory, Open-Access Publishers. 2012 Edition (
  7. Silly Sunday #42 Open Access Week around the Globe (

Silly Sunday #42 Open Access Week around the Globe

23 10 2011

Open Access logo and text

Open Access Week, a global event now entering its 5th year, “is an opportunity for the academic and research community to continue to learn about the potential benefits of Open Access, to share what they’ve learned with colleagues, and to help inspire wider participation in helping to make Open Access (OA) a new norm in scholarship and research”.

It takes place from October 24 to 30 in many places around the globe.

Benjamin Hennig, whose PhD research was built on the work of the Worldmapper project (see earlier post here) created and updated a map of this year’s OA- activities together with SPARC (Scholarly Publishing and Academic Resources Coalition), who are the organisers of the event.
It is a positive trend that India and parts of Africa have many OA-activities next week. (they are relatively large on the map, especially when compared to proportion of scientific papers produced in the world, as shown on Worldmapper).

For further information see the blog post of Benjamin Hennig at his blog: Views of the Worlds.

You can follow Worldmapper at Facebook.

FUTON Bias. Or Why Limiting to Free Full Text Might not Always be a Good Idea.

8 09 2011

ResearchBlogging.orgA few weeks ago I was discussing possible relevant papers for the Twitter Journal Club  (Hashtag #TwitJC), a succesful initiative on Twitter, that I have discussed previously here and here [7,8].

I proposed an article, that appeared behind a paywall. Annemarie Cunningham (@amcunningham) immediately ran the idea down, stressing that open-access (OA) is a pre-requisite for the TwitJC journal club.

One of the TwitJC organizers, Fi Douglas (@fidouglas on Twitter), argued that using paid-for journals would defeat the objective that  #TwitJC is open to everyone. I can imagine that fee-based articles could set a too high threshold for many doctors. In addition, I sympathize with promoting OA.

However, I disagree with Annemarie that an OA (or rather free) paper is a prerequisite if you really want to talk about what might impact on practice. On the contrary, limiting to free full text (FFT) papers in PubMed might lead to bias: picking “low hanging fruit of convenience” might mean that the paper isn’t representative and/or doesn’t reflect the current best evidence.

But is there evidence for my theory that selecting FFT papers might lead to bias?

Lets first look at the extent of the problem. Which percentage of papers do we miss by limiting for free-access papers?

survey in PLOS by Björk et al [1] found that one in five peer reviewed research papers published in 2008 were freely available on the internet. Overall 8,5% of the articles published in 2008 (and 13,9 % in Medicine) were freely available at the publishers’ sites (gold OA).  For an additional 11,9% free manuscript versions could be found via the green route:  i.e. copies in repositories and web sites (7,8% in Medicine).
As a commenter rightly stated, the lag time is also important, as we would like to have immediate access to recently published research, yet some publishers (37%) impose an access-embargo of 6-12 months or more. (these papers were largely missed as the 2008 OA status was assessed late 2009).

PLOS 2009

The strength of the paper is that it measures  OA prevalence on an article basis, not on calculating the share of journals which are OA: an OA journal generally contains a lower number of articles.
The authors randomly sampled from 1.2 million articles using the advanced search facility of Scopus. They measured what share of OA copies the average researcher would find using Google.

Another paper published in  J Med Libr Assoc (2009) [2], using similar methods as the PLOS survey examined the state of open access (OA) specifically in the biomedical field. Because of its broad coverage and popularity in the biomedical field, PubMed was chosen to collect their target sample of 4,667 articles. Matsubayashi et al used four different databases and search engines to identify full text copies. The authors reported an OA percentage of 26,3 for peer reviewed articles (70% of all articles), which is comparable to the results of Björk et al. More than 70% of the OA articles were provided through journal websites. The percentages of green OA articles from the websites of authors or in institutional repositories was quite low (5.9% and 4.8%, respectively).

In their discussion of the findings of Matsubayashi et al, Björk et al. [1] quickly assessed the OA status in PubMed by using the new “link to Free Full Text” search facility. First they searched for all “journal articles” published in 2005 and then repeated this with the further restrictions of “link to FFT”. The PubMed OA percentages obtained this way were 23,1 for 2005 and 23,3 for 2008.

This proportion of biomedical OA papers is gradually increasing. A chart in Nature’s News Blog [9] shows that the proportion of papers indexed on the PubMed repository each year has increased from 23% in 2005 to above 28% in 2009.
(Methods are not shown, though. The 2008 data are higher than those of Björk et al, who noticed little difference with 2005. The Data for this chart, however, are from David Lipman, NCBI director and driving force behind the digital OA archive PubMed Central).
Again, because of the embargo periods, not all literature is immediately available at the time that it is published.

In summary, we would miss about 70% of biomedical papers by limiting for FFT papers. However, we would miss an even larger proportion of papers if we limit ourselves to recently published ones.

Of course, the key question is whether ignoring relevant studies not available in full text really matters.

Reinhard Wentz of the Imperial College Library and Information Service already argued in a visionary 2002 Lancet letter[3] that the availability of full-text articles on the internet might have created a new form of bias: FUTON bias (Full Text On the Net bias).

Wentz reasoned that FUTON bias will not affect researchers who are used to comprehensive searches of published medical studies, but that it will affect staff and students with limited experience in doing searches and that it might have the same effect in daily clinical practice as publication bias or language bias when doing systematic reviews of published studies.

Wentz also hypothesized that FUTON bias (together with no abstract available (NAA) bias) will affect the visibility and the impact factor of OA journals. He makes a reasonable cause that the NAA-bias will affect publications on new, peripheral, and under-discussion subjects more than established topics covered in substantive reports.

The study of Murali et al [4] published in Mayo Proceedings 2004 confirms that the availability of journals on MEDLINE as FUTON or NAA affects their impact factor.

Of the 324 journals screened by Murali et al. 38.3% were FUTON, 19.1%  NAA and 42.6% had abstracts only. The mean impact factor was 3.24 (±0.32), 1.64 (±0.30), and 0.14 (±0.45), respectively! The authors confirmed this finding by showing a difference in impact factors for journals available in both the pre and the post-Internet era (n=159).

Murali et al informally questioned many physicians and residents at multiple national and international meetings in 2003. These doctors uniformly admitted relying on FUTON articles on the Web to answer a sizable proportion of their questions. A study by Carney et al (2004) [5] showed  that 98% of the US primary care physicians used the Internet as a resource for clinical information at least once a week and mostly used FUTON articles to aid decisions about patient care or patient education and medical student or resident instruction.

Murali et al therefore conclude that failure to consider FUTON bias may not only affect a journal’s impact factor, but could also limit consideration of medical literature by ignoring relevant for-fee articles and thereby influence medical education akin to publication or language bias.

This proposed effect of the FFT limit on citation retrieval for clinical questions, was examined in a  more recent study (2008), published in J Med Libr Assoc [6].

Across all 4 questions based on a research agenda for physical therapy, the FFT limit reduced the number of citations to 11.1% of the total number of citations retrieved without the FFT limit in PubMed.

Even more important, high-quality evidence such as systematic reviews and randomized controlled trials were missed when the FFT limit was used.

For example, when searching without the FFT limit, 10 systematic reviews of RCTs were retrieved against one when the FFT limit was used. Likewise when searching without the FFT limit, 28 RCTs were retrieved and only one was retrieved when the FFT limit was used.

The proportion of missed studies (appr. 90%) is higher than in the studies mentioned above. Possibly this is because real searches have been tested and that only relevant clinical studies  have been considered.

The authors rightly conclude that consistently missing high-quality evidence when searching clinical questions is problematic because it undermines the process of Evicence Based Practice. Krieger et al finally conclude:

“Librarians can educate health care consumers, scientists, and clinicians about the effects that the FFT limit may have on their information retrieval and the ways it ultimately may affect their health care and clinical decision making.”

It is the hope of this librarian that she did a little education in this respect and clarified the point that limiting to free full text might not always be a good idea. Especially if the aim is to critically appraise a topic, to educate or to discuss current best medical practice.


  1. Björk, B., Welling, P., Laakso, M., Majlender, P., Hedlund, T., & Guðnason, G. (2010). Open Access to the Scientific Journal Literature: Situation 2009 PLoS ONE, 5 (6) DOI: 10.1371/journal.pone.0011273
  2. Matsubayashi, M., Kurata, K., Sakai, Y., Morioka, T., Kato, S., Mine, S., & Ueda, S. (2009). Status of open access in the biomedical field in 2005 Journal of the Medical Library Association : JMLA, 97 (1), 4-11 DOI: 10.3163/1536-5050.97.1.002
  3. WENTZ, R. (2002). Visibility of research: FUTON bias The Lancet, 360 (9341), 1256-1256 DOI: 10.1016/S0140-6736(02)11264-5
  4. Murali NS, Murali HR, Auethavekiat P, Erwin PJ, Mandrekar JN, Manek NJ, & Ghosh AK (2004). Impact of FUTON and NAA bias on visibility of research. Mayo Clinic proceedings. Mayo Clinic, 79 (8), 1001-6 PMID: 15301326
  5. Carney PA, Poor DA, Schifferdecker KE, Gephart DS, Brooks WB, & Nierenberg DW (2004). Computer use among community-based primary care physician preceptors. Academic medicine : journal of the Association of American Medical Colleges, 79 (6), 580-90 PMID: 15165980
  6. Krieger, M., Richter, R., & Austin, T. (2008). An exploratory analysis of PubMed’s free full-text limit on citation retrieval for clinical questions Journal of the Medical Library Association : JMLA, 96 (4), 351-355 DOI: 10.3163/1536-5050.96.4.010
  7. The #TwitJC Twitter Journal Club, a new Initiative on Twitter. Some Initial Thoughts. (
  8. The Second #TwitJC Twitter Journal Club (
  9. How many research papers are freely available? (

I Got the Wrong Request from the Wrong Journal to Review the Wrong Piece. The Wrong kind of Open Access Apparently, Something Wrong with this Inherently…

27 08 2011

Meanwhile you might want to listen to “Wrong” (Depeche Mode)

Yesterday I screened my spam-folder. Between all male enhancement and lottery winner announcements, and phishing mails for my bank account, there was an invitation to peer review a paper in “SCIENCE JOURNAL OF PATHOLOGY”.

Such an invitation doesn’t belong in the spam folder, doesn’t it? Thus I had a closer look and quickly screened the letter.

I don’t know what alarmed me first. The odd hard returns, the journal using a Gmail address, an invitation for a topic (autism) I knew nothing about, an abstract that didn’t make sense and has nothing to do with Pathology, the odd style of the letter: the informal, but impersonal introduction (How are you? I am sure you are busy with many activities right now) combined with a turgid style (the paper addresses issues of value to our broad-based audience, and that it cuts through the thick layers of theory and verbosity for them and makes sense of it all in a clean, cohesive manner) and some misspellings. And then I never had an invitation from an editor, starting with the impersonal “Colleagues”… 

But still it was odd. Why would someone take the trouble of writing such an invitation letter? For what purpose? And apparently the person did know that I was a scientist, who does -or is able to- peer review medical scientific papers. Since the mail was send to my Laika Gmail account, the most likely source for my contact info must have been my pseudonymous blog. I seldom use this mail account for scientific purposes.

What triggered my caution flag the most, was the topic: autism. I immediately linked this to the anti-vaccination quackery movement, that’s trying to give skeptic bloggers a hard time and fights a personal, not a scientific battle. I also linked it to #epigate, that was exposed at Liz Ditz I Speak of Dreams, a blog with autism as a niche topic.

#Epigate is the story of René Najeraby aka @EpiRen, a popular epidemiologist blogger who was asked to stop engaging in social media by his employers, after a series of complaints by a Mr X, who also threatened other pseudonymous commenters/bloggers criticizing his actions. According to Mr. X no one will be safe, because all i have to do is file a john doe – or hire a cyber investigator. these courses of action cost less than $10,000 each; which means every person who is afraid of the light can be exposed”  In another comment at Liz Ditz’ he actually says he will go after a specific individual: “Anarchic Teapot”.

Ok, I admit that the two issues might be totally coincidental, and they probably are, but I’m hypersensitive for people trying to silence me via my employers (because that did happen to me in the past). Anyway,asking a pseudonymous blogger to peer-review might be a way to hack the real identity of such a blogger. Perhaps far-fetched, I know.

But what would the “editor” do if I replied and said “yes”?

I became curious. Does The Science Journal of Pathology even exist?

Not in PubMed!!

But the Journal “Science Journal of Pathology” does exist on the Internet…. and John Morrison is the editor. But he is the only one. As a matter of fact he is the entire staff…. There are “search”, “current” and “archives” tabs, but the latter two are EMPTY.

So I would have the dubious honor of reviewing the first paper for this journal?…. 😉

  1. (First assumption – David) – High school kids are looking for someone to peer review (and thus improve) their essays to get better grades.
    (me: school kids could also be replaced by “non-successful or starting scientists”)
  2. (Second assumption – David) Perhaps they are only looking to fill out their sucker lists. If you’ve done a bad review, they may blackmail you in other to keep it quiet.
  3. (me) – The journal site might be a cover up for anything (still no clue what).
  4. (me) – The site might get a touch of credibility if the (upcoming) articles are stamped with : “peer-reviewed by…”
  5. (David & me) the scammers target PhD’s or people who the “editors” think have little experience in peer reviewing and/or consider it a honor to do so.
  6. (David & me) It is phishing scam.You have to register on the journal’s website in order to be able to review or submit. So they get your credentials. My intuition was that they might just try to track down the real name, address and department of a pseudonymous blogger, but I think that David’s assumption is more plausible. David thinks that a couple of people in Nigeria is just after your password for your mail, amazon, PayPal etc for “the vast majority of people uses the same password for all logins, which is terribly bad practice, but they don’t want to forget it.”

With David, I would like to warn you for this “very interesting phishing scheme”, which aims at academics and especially PhD’s. We have no clue as to their real intentions, but it looks scammy.

Besides that the scam may affect you personally, such non-existing and/or low quality open access journals do a bad service to the existing, high quality open access journals.

There should be ways to remove such scam websites from the net.


“Academic scams – my wife just received a version of this for an Autism article, PhD/DPhil/Masters students beware that mentions a receipt of a similar autism”
Related articles

Will Nano-Publications & Triplets Replace The Classic Journal Articles?

23 06 2010“Libraries and journals articles as we know them will cease to exists” said Barend Mons at the symposium in honor of our Library 25th Anniversary (June 3rd). “Possibly we will have another kind of party in another 25 years”…. he continued, grinning.

What he had to say the next half hour intrigued me. And although I had no pen with me (it was our party, remember), I thought it was interesting enough to devote a post to it.

I’m basing this post not only on my memory (we had a lot of Italian wine at the buffet), but on an article Mons referred to [1], a Dutch newspaper article [2]), other articles [3-6] and Powerpoints [7-9] on the topic.

This is a field I know little about, so I will try to keep it simple (also for my sake).

Mons started by touching on a problem that is very familiar to doctors, scientists and librarians: information overload by a growing web of linked data.  He showed a picture that looked like the one at the right (though I’m sure those are Twitter Networks).

As he said elsewhere [3]:

(..) the feeling that we are drowning in information is widespread (..) we often feel that we have no satisfactory mechanisms in place to make sense of the data generated at such a daunting speed. Some pharmaceutical companies are apparently seriously considering refraining from performing any further genome-wide association studies (… whole genome association –…) as the world is likely to produce many more data than these companies will ever be able to analyze with currently available methods .

With the current search engines we have to do a lot of digging to get the answers [8]. Computers are central to this digging, because there is no way people can stay updated, even in their own field.

However,  computers can’t deal with the current web and the scientific  information as produced in the classic articles (even the electronic versions), because of the following reasons:

  1. Homonyms. Words that sound or are the same but have a different meaning. Acronyms are notorious in this respect. Barend gave PSA as an example, but, without realizing it, he used a better example: PPI. This means Protein Pump Inhibitor to me, but apparently Protein Protein Interactions to him.
  2. Redundancy. To keep journal articles readable we often use different words to denote the same. These do not add to the real new findings in a paper. In fact the majority of digital information is duplicated repeatedly. For example “Mosquitoes transfer malaria”, is a factual statement repeated in many consecutive papers on the subject.
  3. The connection between words is not immediately clear (for a computer). For instance, anti-TNF inhibitors can be used to treat skin disorders, but the same drugs can also cause it.
  4. Data are not structured beforehand.
  5. Weight: some “facts” are “harder” than others.
  6. Not all data are available or accessible. Many data are either not published (e.g. negative studies), not freely available or not easy to find.  Some portals (GoPubmed, NCBI) provide structural information (fields, including keywords), but do not enable searching full text.
  7. Data are spread. Data are kept in “data silos” not meant for sharing [8](ppt2). One would like to simultaneously query 1000 databases, but this would require semantic web standards for publishing, sharing and querying knowledge from diverse sources…..

In a nutshell, the problem is as Barend put it: “Why bury data first and then mine it again?” [9]

Homonyms, redundancy and connection can be tackled, at least in the field Barend is working in (bioinformatics).

Different terms denoting the same concept (i.e. synonyms) can be mapped to a single concept identifier (i.e. a list of synonyms), whereas identical terms used to indicate different concepts (i.e. homonyms) can be resolved by a disambiguation algorithm.

The shortest meaningful sentence is a triplet: a combination of subject, predicate and object. A triplet indicates the connection and direction.  “Mosquitoes cause/transfer malaria”  is such a triplet, where mosquitoes and malaria are concepts. In the field of proteins: “UNIPROT 05067 is a protein” is a triplet (where UNIPROT 05067 and protein are concepts), as are: “UNIprotein 05067 is located in the membrane” and “UNIprotein 0506 interacts with UNIprotein 0506″[8].  Since these triplets  (statements)  derive from different databases, consistent naming and availability of  information is crucial to find them. Barend and colleagues are the people behind Wikiproteins, an open, collaborative wiki  focusing on proteins and their role in biology and medicine [4-6].

Concepts and triplets are widely accepted in the world of bio-informatics. To have an idea what this means for searching, see the search engine Quertle, which allows semantic search of PubMed & full-text biomedical literature, automatic extraction of key concepts; Searching for ESR1 $BiologicalProcess will search abstracts mentioning all kind of processes where ESR1 (aka ERα, ERalpha, EStrogen Receptor 1) are involved. The search can be refined by choosing ‘narrower terms’ like “proliferation” or “transcription”.

The new aspects is that Mons wants to turn those triplets into (what he calls) nano-publications. Because not every statement is as ‘hard’, nano-publications are weighted by assigning numbers from 0 (uncertain) to 1 (very certain). The nano-publication “mosquitoes transfer malaria” will get a number approaching 1.

Such nano-publications offer little shading and possibility for interpretation and discussion. Mons does not propose to entirely replace traditional articles by nano-publications. Quote [3]:

While arguing that research results should be available in the form of nano-publications, are emphatically not saying that traditional, classical papers should not be published any longer. But their role is now chiefly for the official record, the “minutes of science” , and not so much as the principle medium for the exchange of scientific results. That exchange, which increasingly needs the assistance of computers to be done properly and comprehensively, is best done with machine-readable, semantically consistent nano-publications.

According to Mons, authors and their funders should start requesting and expecting the papers that they have written and funded to be semantically coded when published, preferably by the publisher and otherwise by libraries: the technology exists to provide Web browsers with the functionality for users to identify nano-publications, and annotate them.

Like the wikiprotein-wiki, nano-publications will be entirely open access. It will suffice to properly cite the original finding/publication.

In addition there is a new kind of “peer review”. An expert network is set up to immediately assess a twittered nano-publication when it comes out, so that  the publication is assessed by perhaps 1000 experts instead of 2 or 3 reviewers.

On a small-scale, this is already happening. Nano-publications are send as tweets to people like Gert Jan van Ommen (past president of HUGO and co-author of 5 of my publications (or v.v.)) who then gives a red (don’t believe) or a green light (believe) via one click on his blackberry.

As  Mons put it, it looks like a subjective event, quite similar to “dislike” and “like” in social media platforms like Facebook.

Barend often referred to a PLOS ONE paper by van Haagen et al [1], showing the superiority of the concept-profile based approach not only in detecting explicitly described PPI’s, but also in inferring new PPI’s.

[You can skip the part below if you’re not interested in details of this paper]

Van Haagen et al first established a set of a set of 61,807 known human PPIs and of many more probable Non-Interacting Protein Pairs (NIPPs) from online human-curated databases (and NIPPs also from the IntAct database).

For the concept-based approach they used the concept-recognition software Peregrine, which includes synonyms and spelling variations  of concepts and uses simple heuristics to resolve homonyms.

This concept-profile based approach was compared with several other approaches, all depending on co-occurrence (of words or concepts):

  • Word-based direct relation. This approach uses direct PubMed queries (words) to detect if proteins co-occur in the same abstract (thus the names of two proteins are combined with the boolean ‘AND’). This is the simplest approach and represents how biologists might use PubMed to search for information.
  • Concept-based direct relation (CDR). This approach uses concept-recognition software to find PPIs, taking synonyms into account, and resolving homonyms. Here two concepts (h.l. two proteins) are detected if they co-occur in the same abstract.
  • STRING. The STRING database contains a text mining score which is based on direct co-occurrences in literature.

The results show that, using concept profiles, 43% of the known PPIs were detected, with a specificity of 99%, and 66% of all known PPIs with a specificity of 95%. In contrast, the direct relations methods and STRING show much lower scores:

Word-based CDR Concept profiles STRING
Sensitivity at spec = 99% 28% 37% 43% 39%
Sensitivity at spec = 95% 33% 41% 66% 41%
Area under Curve 0.62 0.69 0.90 0.69

These findings suggested that not all proteins with high similarity scores are known to interact but may be related in another way, e.g.they could be involved in the same pathway or be part of the same protein complex, but do not physically interact. Indeed concept-based profiling was superior in predicting relationships between proteins potentially present in the same complex or pathway (thus A-C inferred from concurrence protein pairs A-B and B-C).

Since there is often a substantial time lag between the first publication of a finding, and the time the PPI is entered in a database, a retrospective study was performed to examine how many of the PPIs that would have been predicted by the different methods in 2005 were confirmed in 2007. Indeed, using concept profiles, PPIs could be efficiently predicted before they enter PPI databases and before their interaction was explicitly described in the literature.

The practical value of the method for discovery of novel PPIs is illustrated by the experimental confirmation of the inferred physical interaction between CAPN3 and PARVB, which was based on frequent co-occurrence of both proteins with concepts like Z-disc, dysferlin, and alpha-actinin. The relationships between proteins predicted are broader than PPIs, and include proteins in the same complex or pathway. Dependent on the type of relationships deemed useful, the precision of the method can be as high as 90%.

In line with their open access policy, they have made the full set of predicted interactions available in a downloadable matrix and through the webtool Nermal, which lists the most likely interaction partners for a given protein.

According to Mons, this framework will be a very rich source for new discoveries, as it will enable scientists to prioritize potential interaction partners for further testing.

Barend Mons started with the statement that nano-publications will replace the classic articles (and the need for libraries). However, things are never as black as they seem.
Mons showed that a nano-publication is basically a “peer-reviewed, openly available” triplet. Triplets can be effectively retrieved ànd inferred from available databases/papers using a
concept-based approach.
Nevertheless, effectivity needs to be enhanced by semantically coding triplets when published.

What will this mean for clinical medicine? Bioinformatics is quite another discipline, with better structured and more straightforward data (interaction, identity, place). Interestingly, Mons and van Haage plan to do further studies, in which they will evaluate whether the use of concept profiles can also be applied in the prediction of other types of relations, for instance between drugs or genes and diseases. The future will tell whether the above-mentioned approach is also useful in clinical medicine.

Implementation of the following (implicit) recommendations would be advisable, independent of the possible success of nano-publications:

  • Less emphasis on “publish or perish” (thus more on the data themselves, whether positive, negative, trendy or not)
  • Better structured data, partly by structuring articles. This has already improved over the years by introducing structured abstracts, availability of extra material (appendices, data) online and by guidelines, such as STARD (The Standards for Reporting of Diagnostic Accuracy)
  • Open Access
  • Availability of full text
  • Availability of raw data

One might argue that disclosing data is unlikely when pharma is involved. It is very hopeful therefore, that a group of major pharmaceutical companies have announced that they will share pooled data from failed clinical trials in an attempt to figure out what is going wrong in the studies and what can be done to improve drug development (10).

Unfortunately I don’t dispose of Mons presentation. Therefore two other presentations about triplets, concepts and the semantic web.



  1. van Haagen HH, ‘t Hoen PA, Botelho Bovo A, de Morrée A, van Mulligen EM, Chichester C, Kors JA, den Dunnen JT, van Ommen GJ, van der Maarel SM, Kern VM, Mons B, & Schuemie MJ (2009). Novel protein-protein interactions inferred from literature context. PloS one, 4 (11) PMID: 19924298
  2. Twitteren voor de wetenschap, Maartje Bakker, Volskrant (2010-06-05) (Twittering for Science)
  3. Barend Mons and Jan Velterop (?) Nano-Publication in the e-science era (Concept Web Alliance, Netherlands BioInformatics Centre, Leiden University Medical Center.), assessed June 20th, 2010.
  4. Mons, B., Ashburner, M., Chichester, C., van Mulligen, E., Weeber, M., den Dunnen, J., van Ommen, G., Musen, M., Cockerill, M., Hermjakob, H., Mons, A., Packer, A., Pacheco, R., Lewis, S., Berkeley, A., Melton, W., Barris, N., Wales, J., Meijssen, G., Moeller, E., Roes, P., Borner, K., & Bairoch, A. (2008). Calling on a million minds for community annotation in WikiProteins Genome Biology, 9 (5) DOI: 10.1186/gb-2008-9-5-r89
  5. Science Daily (2008/05/08) Large-Scale Community Protein Annotation — WikiProteins
  6. Boing Boing: (2008/05/28) WikiProteins: a collaborative space for biologists to annotate proteins
  7. (ppt1) SWAT4LS 2009Semantic Web Applications and Tools for Life Sciences
    Amsterdam, Science Park, Friday, 20th of November 2009
  8. (ppt2) Michel Dumontier: triples for the people scientists liberating biological knowledge with the semantic web
  9. (ppt3, only slide shown): Bibliography 2.0: A citeulike case study from the Wellcome Trust Genome Campus – by Duncan Hill (EMBL-EBI)
  10. WSJ (2010/06/11) Drug Makers Will Share Data From Failed Alzheimer’s Trials