Scopus is useless

May 29, 2012

Scopus bills itself as “the largest abstract and citation database of research literature and quality web sources covering nearly 18,000 titles from more than 5,000 publishers.”

Sounds useful. But it’s useless. Literally.

Because it’s a subscription-only resource:

Now I am an associate researcher at the University of Bristol. UoB is part of the UK Access Management Federation, so I select that in the Shibboleth authentication page:

But the list of member universities doesn’t include Bristol, instead skipping straight from “University of Birmingham” to the intriguingly named “University of Bolton – Do Not Use”:

I can’t use it.

So it’s useless to me. Literally.

This is why it’s frustrating to me when I read statements like this from Elsevier’s Alicia Wise:

Commercial publishers are especially able to command resources to … develop new technologies and platforms to access journal content and improve researcher productivity (e.g., ScienceDirect, Scopus, Scirus, CrossRef, CrossCheck. Article of the future, text-mining tools, measurement tools).

I’m sure those things are all very nice (though I doubt they are better than what other people might build given access to the data). But it makes no difference how nice they are if I can’t access them.

Other people who also presumably can’t access Scopus include: Mike Benton, my head of department at Bristol; Greg Paul, who’s not affiliated with a university; Jere Lipps, who recently retired from his post at UCMP; and, as it turns out, Heather Piwowar, data-miner at the University of British Columbia. I’ve picked those four names our of millions of candidates, more more less at random: Benton is probably the UK’s most prolific palaeontologist, Paul is the most influential living palaeoartist, Lipps has had a hugely distinguished career, and Piwowar is in the vanguard of the current efforts to mainstream the text-mining techniques that we can all see are the future.

For all these people, Scopus might just as well not exist. If we’re working with collaborators who do have access, they can’t send us URLs that point into Scopus, so it can’t be a shared resource within such collaborative projects.

An alternative

Google Scholar is better option for Benton, Paul, Lipps, Piwowar and me: it’s free to use and has a pretty good “cited by” feature. But it’s not flawless. For example, it claims that our 2009 sauropod neck posture paper has been cited 38 times, but as you work your way down the list, you find that some of the “citations” are from SV-POW! articles, or from news reports, rather than from proper published works. And Google Scholar is rather opaque: there’s no published list of what journals its database includes, or how often it’s updated.

Building a better alternative

The obvious solutzion is for someone to build an open competitor.  But for them to do that, they need access to the papers that are to be crawled, analysed and indexed.  And of course, they don’t have that access in general, because (all together for the chorus!) publishers put most papers behind paywalls.

If we want something better than Google Scholar, something more available than Scopus, something made by people who care deeply about citation graphs and who want to open them up as objects of research in their own right, then we need entrepreneurial programmers to have access to papers, so they can crawl them and access the references lists.

If you want this to happen, there is something you can do right now that will accelerate it: go and sign the White House’s public access petition.  Make a difference to opening up the world of research.

18 Responses to “Scopus is useless”

  1. Mickey Mortimer Says:

    And here I thought you were going to show a tiny hamerkop skeleton next to a huge sauropod. ;)

  2. telescoper Says:

    Reblogged this on In the Dark and commented:
    Another illustration of how the Academic Journal Racketeers (in this case Elsevier) have a stranglehold on research. As well as levying huge subscription charges they also supply a service called SCOPUS which the panels in the Research Excellence Framework will use to inform their deliberations. Needless to say, SCOPUS itself is a subscription-only resource…

  3. Mike Taylor Says:

    Over on Twitter, Stephen Curry makes the point that he doesn’t have access to Scopus at Imperial, either.

    These are not two-bit schools. In the most recent Times Higher Education World University Rankings (with appropriate disclaimers about how flawed such rankings are), Imperial is ranked #8 in the world, and Bristol is #66.

    You have to say that a service that’s not available to institutions of this quality is just not a relevant part of the academic landscape.

  4. Matt Hodgkinson Says:

    Might a wiki be a solution to building an open citation database?

  5. Mike Taylor Says:

    If by “solution” you mean “get people to spend time manually duplicating information that already exists in machine-readable form”, then yes.

  6. Matt Hodgkinson Says:

    Knowledge for All is a Canadian project already working on building an open citation database: http://www.k4all.ca/about

  7. John Scanlon, FCD Says:

    Didn’t know about Scopus, but it doesn’t surprise.

    Dave Hone also had a post recently about Google Scholar counting blog citations, and in a comment I mentioned some other odd behaviour (spurious citations of a different paper, by a different author with similar name to first author on one of mine).

    I haven’t figured out a way to fix it, i.e. ‘tell’ GS that they’re not the same paper. Anyone got an idea how to do that?

  8. Mike Taylor Says:

    I missed that Dave Hone post when it came out — I’ve been swamped in the last month! — but now that I’ve read it, it’s really interesting. By looking at the citations listed by Google Scholar and Web Of Science of a couple of his own papers, he concludes that while Google Scholar probably does slightly over-count, Web Of Science massively under-counts — so Google Scholar is actually the better of the two.

    I’d love to run a similar exercise to evaluate Google Scholar against Scopus, but … well, if I could access Scopus, I’d be able to do it.

  9. Dr. Gunn Says:

    Great post, Mike. I like to think Mendeley has gone a pretty good way towards building an open catalog of research, even creating an API for that information so that others can build on it. We don’t have flawless metadata in some cases, because we’ve had to extract it from the PDFs directly where publishers don’t give us access to their canonical metadata records, but it’s improving constantly.

    Thanks for all your tireless work to promote access. The sooner we get access problems behind us, we can move on to tackle other looming problems, like the reproducibility of research findings: http://blog.scienceexchange.com/2012/04/the-need-for-reproducibility-in-academic-research/

  10. Mike Taylor Says:

    I like to think Mendeley has gone a pretty good way towards building an open catalog of research, even creating an API for that information so that others can build on it. We don’t have flawless metadata in some cases, because we’ve had to extract it from the PDFs directly where publishers don’t give us access to their canonical metadata records, but it’s improving constantly.

    Mendeley has a good database of citations, but does it have anything on the citation graph? (I’ve not seen such functionality, but then I have only dabbled with Mendeley — It’d be great if it’s been there all along and I missed it.) To be a replacement for Scopus, it’s not enough for Mendeley to tell me the full citation for Taylor and Naish 2007, it also has to tell me what papers that cited, and what papers have cited it in turn.

    Thanks for all your tireless work to promote access. The sooner we get access problems behind us, we can move on to tackle other looming problems, like the reproducibility of research findings:

    Yes, exactly! I long for the day when we can stop with all this tedious mucking about in hyperspace, and devote our full energies to actually doing science, rather than on preventing people from stopping us from doing science.

  11. Dr. Gunn Says:

    If you’ll look at the references tab of a individual article page (such as http://www.mendeley.com/research/an-unusual-new-neosauropod-dinosaur-from-the-lower-cretaceous-hastings-beds-group-of-east-sussex-england/), you can see that we’re collecting this information, but we dont have any public products to show at this point.

    I personally feel like the interesting story isn’t so much in the “A cites B” story as it is in the “A extends a finding of B” or “A cites B in the introduction when it talks about findings it couldn’t reproduce”. Jason Priem has done some work to show that the aggregate readership information we’re collecting matches with citations, but the difference is that we get readership information from the day the article comes out, as opposed to a year or two later when the next paper comes out that cites the first one.

    So I agree the citation graph is important, but it’s at best a lagging indicator of impact.

  12. Mike Taylor Says:

    Good to see that Mendeley is moving in the direction of having this information. I agree that much more can be done than merely “A cites B” — we all know of papers that have been cited numerous times, but by people saying that the paper is wrong, so it would be good to avoid giving B credit for all those citations — but starting to build a graph, and making it possible to traverse, is going to be a big and important step.

  13. Matt Hodgkinson Says:

    “If by “solution” you mean “get people to spend time manually duplicating information that already exists in machine-readable form”, then yes.”

    Bots and scripts can be used on wikis.

  14. Mike Taylor Says:

    They can be used on actual papers, too, and then we know they’re getting the original information straight from the horse’s mouth. So once again we come back to the central, stupid choice: primary literature, which is unavailable; or secondary literature, which is unreliable.

    Don’t get me wrong: wikis can be great, and I’d argue that Wikipedia is perhaps the single greatest achievement of the Internet. But really. Why the heck should scientists do science with anything less than scientific papers? It just makes no sense.

  15. David Haden Says:

    Thanks for the tip on Benton’s reputation and his large open cache of full-text PDFs. This means that Sauropods are now roaming rather more freely than before in JURN! JURN is a human-curated search tool for open access academic content, and has recently expanded beyond the arts and humanities.

  16. Siavash Fatemi Says:

    Scopus has NO policy for inclusion procedure into their database. The following is a true example:
    Scopus reported to the chairs of a conference on Image Processing that their proceedings have been selected for inclusion into Scopus (I was on the PC of that conference). We all were quite happy about this news. At the same time, we wondered as to the procedure used by Scopus to come up with such a decision (we were expecting that they would at least ask us about our evaluation procedure, …). We assumed that they checked the citation record of the proceedings (which were quite high) and made the decision based on that. After a few months, Scopus said that they have decided not to include the proceedings! This was due to the fact that Scopus received 8 emails saying that the conference is not good! After some extensive investigation, it was found that all those 8 emails (different email addresses with made-up names) were sent to Scopus by ONE individual who had been excluded from the same conference a year earlier.

    OK – the above means: if you do not like a journal or a conference, then simply create different email addresses under different made-up names and send emails (on different days) to Scopus trashing that journal or conference (better yet, create web sites with lies about the journal) and point to those web sites in your emails to Scopus. Guess what Scopus would do? You guessed right – they would exclude the journal/conference from their database (unilaterally) without even discussing the issue with the editors.

    Scopus is indeed redundant (I used to have a lot of respect for their database – but no more).

  17. Jeroen Bosman Says:

    Hi Mike,

    Why single out Scopus? What you are saying here also holds for Web of Science, Chemical Abstracts, PsycInfo or any other scholarly database that is not free. BTW there are many free databases with citation info apart from Google Scholar: Microsoft Academic Search, RepEc, Citeseer, Aspire and more.
    What also might be of interest is the effort to create an open citation database: http://opencitations.net/.
    Personally I think institutions and libaries should collaboratively fund a project to enhance the Bielefeld Academic Search Engine (BASE) for OA stuff with full text indexing and citation info.

    Jeroen

  18. Mike Taylor Says:

    Why single out Scopus?

    It’s the one I happened to be trying to use.

    What you are saying here also holds for Web of Science, Chemical Abstracts, PsycInfo or any other scholarly database that is not free.

    Yes.

    BTW there are many free databases with citation info apart from Google Scholar: Microsoft Academic Search, RepEc, Citeseer, Aspire and more.

    Good to know!

    What also might be of interest is the effort to create an open citation database: http://opencitations.net/.

    Yes, this is exactly what we need! Th§anks for the pointer.

    Personally I think institutions and libaries should collaboratively fund a project to enhance the Bielefeld Academic Search Engine (BASE) for OA stuff with full text indexing and citation info.

    I hadn’t even heard of that one!


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 421 other followers

%d bloggers like this: