Don’t Trust Your CMS


One of the great promises of digital publishing has always been that it’s easier than old-fashioned printing. No presses needed, no warehouses, no infrastructure, no pesky typographical union: You type into a box, you hit the blue button, and you’ve published.

Of course “easier” usually just means that the work that used to be done by humans and machinery is now being done on and by computers. Publishing to the web is, in many ways, hugely more complicated than publishing on a printing press; it’s just that we trust our programs — the text editors we use to write and the content-management systems, or CMSes, we use to publish — to do most of the complicated work.

But as anyone who spends most of their time head-down in a CMS will tell you, CMSes (and text editors, and email clients), are not to be trusted. Just ask, for example, former New York Daily News editor Jotham Sederstrom, who was fired from his post on Tuesday after it was discovered that two pieces he’d edited by the writer Shaun King contained entire paragraphs lifted from other sources that weren’t correctly attributed. Emails furnished by King showed that his original copy (which he filed in the body of his emails — the sound you hear is my teeth grinding just thinking about that) had in fact correctly attributed the quotes:

Today, Sederstrom wrote a long, well-articulated apology on Medium, in which he takes full responsibility for the mistake. He also explains how it happened:

In all honesty, the controversy — a fuck up on my part, to put it bluntly — comes down to two unintentional, albeit inexcusable, instances of sloppy editing on my part and a formatting glitch that until Tuesday I had no idea was systematically stripping out large blocks of indented quotations each time I moved Shaun’s copy from an email to The News’ own Content Management System, or “CMS” as it’s called in media parlance.

This may sound like a paltry excuse: a poor carpenter blaming his tools. And, obviously, Sederstrom and King both should have been checking the final product. But it’s also the kind of story that web editors everywhere will recognize immediately. After all: The only indication that King is directly quoting rather than paraphrasing is the indented block quote line. In other words, the line doesn’t exist in the text as written. It exists in the text as formatted.

It feels like a mystical distinction, and, in the supposedly more complicated earliest eras of print, it more or less was. Italic type and roman (that is, regular, straight) type were printed using actual different letters: the print block for the roman (that is, non-italic) “a” was an entirely different object than the print block for the italic “a.” Even now, well beyond the era of lead type, a printed “a,” roman or italic, is fixed on the page: once printed, it operates only as one thing.

On the web, and in most editors, however, text isn’t fixed in quite the same way. Every time you load a page, you’re loading a set of documents that contain both the site’s contents and instructions to your browser, or client, telling it how to display those contents. An italic “a” isn’t a character in itself — it’s (I’m hand-waving a bit here) a roman “a,” accompanied by code instructing your browser to render it on the screen as an italic “a.” That code is the formatting. If that formatting is misinterpreted, or, worse, stripped out, as it often is when copying and pasting, or moving from one text editor to another, your “italic” a would appear roman.

If we all wrote directly inside the content-management systems where we publish our work to the world, this would be less of a problem. But very few of us do (for one thing, they generally don’t save your work). From assignment to publication, text can pass through several different editors, some of them made to handle richly formatted text in one particular way, some made to handle it in another, and some of them … not really made for that at all. A story that begins in Microsoft Word might then be pasted into Gmail and emailed to an editor using Outlook; from Outlook it might be pasted again into Google Docs, before finally being pasted into the text box of a proprietary CMS, for example. In the process, it will likely have picked up (or lost) stray bits of HTML formatting — odd <div> and <span> tags used by one text editor but not another — even if the text as written remains intact.

This extra HTML junk, or lost formatting, is extremely annoying, especially for copy editors and web producers, but in most cases it’s not a huge deal. It’s rare that an italicized or bolded word will have a vastly different meaning than its roman equivalent. But formatting can matter a lot when it’s being used to convey actual meaning to a reader, and not just design cues to the browser.

One particular hobbyhorse of mine, as an example: “strikethrough” formatting is meant to communicate to the reader “we originally wrote this, but now no longer stand by it.” It’s sometimes used with corrections in blog posts and news articles. But if stricken-through text loses its formatting, it will appear as normal, and readers likely won’t be able to tell that it’s wrong. The headline on this Gawker post — “Here Is a List of All the Assholes Handsome Law-Abiding Citizens Who Own Guns Some People in New York City” — is a good example: In 2013, the word “Assholes” and the phrases “Handsome Law-Abiding Citizens” and “Who Own Guns” were all stricken-through using strikethrough HTML formatting, creating a funny visual joke. When Gawker updated its content-management system, its headlines no longer supported strike-through, and this post now lives on under a totally incoherent headline.

Strike-through, where the formatting is meant to communicate that the text actually means the opposite of what you’re reading, is an obvious example. But it applies to block quoting, too. From a visual perspective, block-quote formatting is superior to your standard quotation mark — it’s a highly visible signal that the offset text should be understood differently than the surrounding paragraphs. But when considered in light of the particularity with which different browsers and email clients render formatting, and the ease with which formatting can be stripped entirely when copying and pasting, it becomes a liability. The humble quotation mark is unlikely to suffer the same indignities as the flashier <blockquote> tag when copied or pasted. Tags are no substitute for punctuation.

None of which is to blame the formatting, or the CMS, or the terrible tools we’ve developed to make publishing “easier,” for the Daily News’s error: King’s editors had a responsibility to ensure that his accurate sourcing was reproduced when published, no matter the vagaries of their CMS. (King, too, should probably have been reading his articles once they were published.) But a CMS that strips formatting — a specific and important kind of formatting — is setting up overworked editors to fail. To the extent that we rely on programs to do certain kinds of publishing work — and to the extent that we try to communicate meaning using HTML-formatting rather than “hard-coded” orthography — we’re asking to be misinterpreted.