## How much does “typesetting” cost?

### June 11, 2015

We as a community often ask ourselves how much it should cost to publish an open-access paper. (We know how much it does cost, roughly: typically $3000 with a legacy publisher, or an average of$900 with a born-open publisher, or nothing at all for many journals.)

We know that peer-review is essentially free to publishers, being donated free by scholars. We know that most handling editors also work for free or for peanuts. We know that hosting things on the Web is cheap (“publishing [in this sense] is just a button“).

Publishers have costs associated with rejecting manuscripts — checking that they’re by real people at real institutions, scanning for obvious pseudo-scholarship, etc. But let’s ignore those costs for now, as being primarily for the benefit of the publishers rather than the author. (When I pay a publisher an APC, they’re not serving me directly by running plagiarism checks.)

The tendency of many discussions I’ve been involved with has been that the main technical contribution of publishers is the process that is still, for historical reasons, known as “typesetting” — that is, the transformation of the manuscript from from an opaque form like an MS-Word file (or indeed a stack of hand-written sheets) into a semantically rich representation such as JATS XML. From there, actual typesetting into HTML or a pretty PDF can be largely automated.

So: what does it cost to typeset a manuscript?

First data point: I have heard that Kaveh Bazargan’s River Valley Technologies (the typesetter that PeerJ and many more mainstream publishers use) charges between £3.50 and £9 per page, including XML, graphics, PDF generation and proof correction.

Second data point: in a Scholarly Kitchen post that Kent Anderson intended as a criticism of PubMed Central but which in fact makes a great case for what good value it provides, he quotes an email from Kent A. Smith, a former Deputy Director of the NLM:

Under the % basis I am using here $47 per article. John [Mullican, a program analyst at NCBI] and I looked at this yesterday and based the number on a sampling of a few months billings. It consists on the average of about$34-35 per tagged article plus $10-11 for Q/A plus administrative fees of$2-3, where applicable.

Using the quoted figure of $47 per PMC article and the £6.25 midpoint of River Valley’s range of per-page prices (=$9.68 per page), that would be consistent with typical PMC articles being a bit under five pages long. The true figure is probably somewhat higher — maybe twice as long or more — but this seems to be at least in the same ballpark.

Third data point: Charles H. E. Ault, in a comment on that Scholarly Kitchen post, wrote:

As a production director at a small-to-middling university press that publishes no journals, I’m a bit reluctant to jump into this fray. But I must say that I am astonished at how much PMC is paying for XML tagging. Most vendors looking for the small amount of business my press can offer (say, maybe 10,000 pages a year at most) charge considerably less than $0.50 per page for XML tagging. Assuming a journal article is about 30 pages long, it should cost no more than$15 for XML tagging. Add another few bucks for quality assurance, and you might cross the $20 threshold. Does PMC have to pay a federally mandated minimum rate, like bridge construction projects? Where can I submit a bid? I find the idea of 50-cent-per-page typesetting hard to swallow — it’s more than an order of magnitude cheaper than the River Valley/PMC level, and I’d like to know more about Ault’s operation. Is what they’re doing really comparable with what the others are doing? Are there other estimates out there? ### 22 Responses to “How much does “typesetting” cost?” 1. Darius Says: Every time I look at the issue I have to try and wrap my mind around the issue that it’s scientists who have to pay journals to publish their work, and not the reverse. Aren’t scientists the people who do all the work and who by that work ensure that aforementioned journals don’t just contain blank pages? Publishers are making profit from publishing other people’s work and on top of that they are charging them for it. I think that’s just perverse. Publishers should be paying scientists for the right to publish their papers! 2. Marcin Says: Thanks for this post; truly sobering one. I found thses by googling (XML tagging per page cost): http://www.infomanagementcenter.com/enewsletter/2010/201003/third.htm “The conversion that costs$300 per page is a rare case indeed—in fact, even the more-expensive ITAR-compliant conversions required for military documentation rarely cost the client more than $10–20 a page. (ITAR-regulated conversions always cost more because the documents and their data must not leave U.S. soil.) For materials that can be handled offshore, prices can drop to as low as$2–8 a page. ”

” In almost
all cases, a conversion of text documentation to an XML DTD
should not cost more than $3–$5 a page; in most cases, it will
cost less than that. Whether you are evaluating what kind of
conversion to pursue or looking to justify your conversion budget,
a thorough understanding of the factors that affect the cost of
conversion will allow you to avoid common misconceptions
and make better-informed decisions about converting your
documentation.”

3. Pandelis Says:

This is not the first time I suggest that authors demand a price breakdown in the invoices issued by publishers for APCs

4. Michael Eisen Says:

You’re not really breaking down the costs right.

First of all, while peer review is technically free, it actually costs a good deal of money to manage peer review, in part because the systems most journals currently use to manage peer review cost money, and more importantly, because there is a lot of manual labor involved in identifying and chasing down reviewers. In some cases this is done for free by the editors, but many journals have staff who do that (otherwise it’s hard to recruit editors). These costs should go down/away as better software is developed. But for now this is a non-trivial cost.

Second, the costs of XML conversion are only part of the cost of typesetting, which generally also involves staff on the journal end to make sure the right files are submitted by authors, to manage the process with the XML conversion shop, and to deal with the back and forth that often happens to fix things the conversion shop didn’t get right. Again, these costs should go down/away soon as software to handle to conversion directly, and to give authors control over fixing typos, layout etc… But again, for now, these things also cost money.

5. “On the publisher’s side, average first-copy costs of journal papers are estimated to range between 20 and 40 US dollars per page, depending on rejection rates [37]; [17], which neither explains open access publication fees as high as 5,000 $US (e.g., Cell Reports by Elsevier) nor hybrid journals, where publishers charge twice per article, i.e. the subscription and open access fees (e.g., Open Choice by Springer or Online Open by Sage Publications).” [17] Tenopir C, King DW. Towards Electronic Journals: Realities for Scientists, Librarians, and Publishers. Washington, D. C.: Special Libraries Association; 2000 [37] Rowland F. The peer-review process. Learned Publishing. 2002;15(4): 247–258. doi: 10.1087/095315102760319206 Larivière V, Haustein S, Mongeon P (2015) The Oligopoly of Academic Publishers in the Digital Era. PLoS ONE 10(6): e0127502. doi:10.1371/journal.pone.0127502 Given that PeerJ can publisher papers for money on the order of a few hundred dollars, they clearly don’t have the legacy cost of large publishers. I don’t know if the above figure ($20-40) is for paper as well, but if it is, then electronic-only should be less than this. Economies of scale work strangely in a very large organisation. Small-scale publishers don’t need phalanxes of lawyers and vice-presidents, for instance, who cost many multiples that a single production-level employee or developer does.

Also, I note that per-page figures are better than per-paper. Mathematics journals are compared for subscription prices at how much a page costs, see https://www.math.uni-bielefeld.de/~rehmann/BIB/AMS/Price_per_Page.html.

Also, it would be useful to see a figure that tells us how much a single paper (or a per-page figure) costs to publish, versus how much it costs when rejection rate is taken into account.

6. brembs Says:

BTW, these numbers match the numbers I got from ScienceOpen, F1000 Reserach (and indirectly from Hindawi). So 200 are enough from Word-document to publication. Much less if we would submit in JATS XML.

7. Are there any JATS XML authoring tools out there? Searching around gives me this presentation/pitch

aimed at people who use OJS (and so usually aren’t flush with money).

8. Mike Taylor Says:

9. brembs Says:

There is a link to one of them in my post above. In addition, I think authorea can do that, too.

10. brembs Says:

The one I linked to i the post (i.e., Paper now from PeerJ, I think.
I also think auithorea is going to provide that option.
And there is one that I’ve come across recently and I’ve been trying to find again for the last few days, unsuccessfully, however.
So if there are none around right now, there should be some soon.
Alternatively, one could write an add-on for GDocs, which now has Paperpile for references.

11. Coming from the world of LaTeX, where (if done properly) a document is well-designed and sections and elements tagged already, I’d be interested to see what it takes to go from .tex to JATS XML. Or Markdown+just enough maths to JATS, via pandoc or similar.

But it would be interesting if there could be a front-end gui to an underlying doc with JATS (as Word is apparently now XML under the hood, but just terrible XML). Such innovation is what I would expect from a publishing company that wants to do the ultimate outsourcing of typesetting: to authors. That’s what maths journals have pretty much done, especially those that are shoestring operations, or diamond OA. One would need something very simple, like a RTF-style editor — something with just enough functionality to get articles written, and not a bloated mess like Word, and primitive version control built in, but separate from the document (unlike ‘track changes’).

12. Mike Taylor Says:

“… something with just enough functionality to get articles written, and not a bloated mess like Word …”

Alarms go off in my mind when I read this kind of thing. I think that every generation looks at the previous generation’s tools and thinks “Oh, that’s too big and complicated, we need something simpler”. So they make it, then find it needs to support headers and footers; then footnotes; then tables; then equations, then in-document change-tracking, then semantic styles, then page-styles, and so on.

I’m certainly not saying that Word or LibreOffice is perfect — very far from it — but the reality is that I use all of those features, and on the occasions when I’ve tried to use MarkDown to write serious documents, the lack of them has hurt.

I think writing rich documents well (especially collaboratively) is just an intrinsically complex process. Yes, everything should be made as simple as possible; but no simpler.

13. @Mike

do you use image editing? Clip art? Mail merge? Translation? Word is made to be an all-purpose office tool, not a sensible document preparation system.

At least the ability to arbitrarily edit fonts/typefaces/colours/styles etc should be restricted to being encoded completely via styles: with the understanding that the final production team will do what they need to, with the appropriately marked-up content.

That said, what have the Romans ever done for us?

14. Mike Taylor Says:

Image editing, clip art, mail merge — no. As far as I know those things don’t even exist in LibreOffice, which is what I use, and I don’t miss them. They’re really not about writing documents, which is the problem we’re trying to solve here.

While in principle I sort of agree that style changes should be implemented only with semantic styles, in practice there are always reasons why you need to be able to break out of that: for example, the two words that I set in italics in this very paragraph. The last thing we want is an authoring environment that makes people feel they have to fight against it.

15. I guess, as was noted in another thread, you can’t stop stupidity, but at least give people tools that break things only with concerted effort.

A conversion platform is clearly the way forward, since we aren’t going to wean people off Word/LibreOffice. Maybe a step-through system that gets the author to clarify exactly what that italicised word is meant to be: emphasis, technical/foreign word etc. Or flag ‘not sure’.

Interesting to consider, nonetheless. On a related topic, there was an older mathematician who recently noted that MathML is now some sort of standard (more than before) and wanted to know where he could learn. Was completely unfamiliar with LaTeX, but ok with HTML, so wanted to get cracking! I don’t think I convinced him to learn Markdown/MathJax and convert to MathML…

16. If people are still reading, here’s something I just got an email about, that deploys “…a new backend for its LaTeX input language, teaming up with the ambitious LaTeXML project, which strives to offer a full reimplementation of TeX with targeted generation of web-first manuscripts, supporting HTML5 and ePub.”

https://www.authorea.com/users/5713/articles/28015/_show_article

17. Mark Fretz Says:

I believe what Robert Ault was saying is that he paid $0.50 for XML tagging, not for typesetting. That is, the 50 cents only covered what many refer to as styling or coding, what I call composition or composing the document. Typesetting happens after composition or coding the document. The cost for typesetting, I venture to guess, is far higher than$0.50 per page for Charles or any publisher, regardless of how they pay for it.

18. Mark Fretz Says:

After the author explains that getting to rich XML is one step, he concludes “From there, actual typesetting into HTML or a pretty PDF can be largely automated.”
Yes and no. The conversion of XML into a format that is friendly to typesetting software such as InDesign can be very easy. Scribe Inc. has an online tool that converts files from Word to XML to InDesign (IDTT) in seconds. However, when you flow the InDesign file into a design template, that’s when the clock really starts ticking with respect to the cost of typesetting. Because, the file that is flowed in must be adjusted to match the publisher’s aesthetic, image files need to be placed and manipulated, tables need to be adjusted, hyperlinks need to be checked (if the document is destined for a dynamic delivery environment such as ePub or the Web), corrections need to be entered following proofreading, and those need to be QC-ed, etc. If the book involves complex page layout and design specs, or as with STEM titles lots of equations, that takes more time. So, to claim that actual typesetting can be largely automated is misleading at best.

19. Mike Taylor Says:

Point taken, Mark; yet the automatically-formatted versions of articles such as those on PMC are more than adequate for most purposes — in fact they’re probably better for most purposes, being free from the constraints of the 8.5×11″ page.

20. […] The innovations provided by RIO journal do not stop there though. Another of the major features of the journal is its authoring, reviewing and publishing system called ARPHA for short. This system is a revolution in itself: if used fully, it eliminates the need for an outsourced typesetting process and all the associated errors that entails which frustrate authors and delay publication. It also speeds up the publishing process; no delay waiting for typesetters, and it reduces the cost of production as good typesetters charge a non-negligible sum per page. […]

21. […] do nearly enough to earn their very high fees, but one very real contribution they do make is the process that is still, for historical reasons, known as “typesetting” — transforming a human-readable manuscript into a machine-readable one from which […]

This site uses Akismet to reduce spam. Learn how your comment data is processed.