Dear publishers: please get out of the way

September 30, 2013

A few years ago, in my programming day-job, we had a customer who we were providing with software components and a bit of custom development. While this was going on, we had a sequence of meetings with them in which we pitched several possible system designs, explaining how we could help them use our components in various ways.

After this had been going on for a while, our contact at the customer had to take us to one side. He was gentle with us: “Look, you seem to have the idea that we’re looking for some kind of ongoing consultancy from you”, he said. “We’re really not. We like your tools, and we’re happy to pay for them, but that’s all we need from you. We’ll take it from there”.

And that’s what I think about whenever I read anything like this:

Elsevier is receiving an increasing number of content mining requests and we are developing solutions to meet customer needs. […] We wish to understand our customers’ text mining requirements and as practically every content mining request has a different goal and there is not a common solution to provide this. Consequently we request that customers looking to mine our content should speak to their Elsevier Account Manager.

Even if we assume generously that this is a genuine attempt to be helpful and not just a land-grab, it’s WRONG WRONG WRONG WRONG WRONG.

No, Elsevier. Your customers’ text mining requirements are very, very simple. Every content mining request has exactly the same goal and there is a common solution to provide this. That solution is: get out of the way.

No-one needs Elsevier’s (or Wiley’s or Springer’s) help with text-mining. No-one wants them as partners. No-one needs their APIs. All anyone wants is to get hold of the papers. That’s all. The only role of the publisher in this process is not to impede it.

Publishers: your job is to publish (“make public”), then step aside and let the world make use of what you’ve published.

Advertisements

6 Responses to “Dear publishers: please get out of the way”


  1. OK, I have to disagree here. APIs are useful for many forms of textmining and data re-use.

    This being said, I doubt that Elsevier’s efforts will really make things better or easier. This is how it should be done, and it’s not very complicated:

    “Using BioMed Central’s open access full-text corpus for text mining research” http://www.biomedcentral.com/about/datamining

    API: http://www.biomedcentral.com/about/api

  2. Mike Taylor Says:

    APIs are useful for access to structured information. The whole point of text-mining is that it takes a big bucket of unstructured information and induces structure. So it makes no difference how you obtain that unstructured information (the papers) in the first place. APIs are fine, but so is HTTP GET.

  3. Bryan Riolo Says:

    Mike; while most of the time I agree with you and Matt, this time, I disagree. This comment is for customer/client issues in general, not just for you guys and publishers. Varied customers have varied wants and needs, and those wants can vary even with the same customer: sometimes we want/need help, other times, stay the heck out of our way. :)

  4. Mike Taylor Says:

    There may be areas where we could use some help from publishers. Text-mining is not among them.


  5. We might have different experience with textmining, but the whole point as far as I am concerned is not to “take a big bucket of unstructured information”. The whole point is to extract useful information, and if you can already identify title, authors, keywords, references, etc, you gain power and specificity.

    Also, some form of collaboration with publishers is useful to download frequently large amounts of data/text without getting blacklisted or whatnot. That’s true even for Gold Open access or NCBI/NLM resources. They need at least to specify conditions of use in a clear manner. BMC and PMC are excellent in that respect in my experience.

  6. Nathan Myers Says:

    You don’t need an API to identify title, authors, etc. At most you need a format specification which, while almost certainly wrong and incomplete, you can correct and complete by example. It’s much easier to extract metadata from minimally structured text than to code to an API, even if the API were not ill-designed and -documented.

    [let us] download frequently large amounts of data/text without getting blacklisted

    It’s hard for me to understand a practical difference between this and “get out of the way”.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: