A Strong Test for Markup In Titles & Summaries

I’ve been hacking on Benjamin Smedberg’s Atom 1.0 plug-in for WordPress. I’ve added a preference panel for choosing between full text and summary feeds. Now I’ve fixed the double escaping of content in titles and summaries. (Escaped HTML is evil and should never have been allowed into Atom.)

However I’m not sure how my hack will react when posts contain markup in titles and summaries so I’m playing with that now. Hence this post. I may delete it once I’m convinced I’ve covered the various special cases well enough.

Things may look a little funny in the feed until I’m done since I’ll be deliberately breaking things to see how WordPress behaves.

So far it seems like WordPress’s titles are guaranteed to be plain text except for numeric character references. i.e. it strips tags. That’s good since it will make it easier to use real XHTML in the Atom titles. I need to check if the the_title_rss() function really promises that it will never include any tags in the string it returns.

Also it seems I’ve uncovered a bug in WordPress itself outside the feeds, as this title shows:

A <strong style="color: green">Strong</strong> Test for Markup In Titles & Summaries

The bug is in the default theme. I’m fixing it here and I’ve reported it to the WordPress folks. To reproduce it yourself, create a post with this string as the title:

A Strong Test for Markup In Titles & Summaries

Publish it and look at what WordPress puts out into the h1 header:

<h1 class="single"><a href="http://www.elharo.com/blog/software-development/web-development/2007/03/17/a-strong-test-for-markup-in-titles-summaries/" rel="bookmark" title="Permanent Link: A <strong style="color: green">Strong</strong> Test for Markup In Titles &amp; Summaries">A <strong style="color: green">Strong</strong> Test for Markup In Titles &amp; Summaries</a></h1>

the_title_rss() function behaves appropriately. The bad text is probably coming from the_title and single_post_title though I haven’t verified that yet.

WordPress is stuffing the title text (including markup with < and > and “) into a title attribute without sanitizing it first. I suspect I could reproduce this just by using the ” and > characters in a title without explicitly putting tags into my title.

Arguably this is a theme bug, but it is present in the WordPress default theme. Here’s the relevant code from the theme:

<h2><a href="<?php the_permalink() ?>" rel="bookmark" title="Permanent Link to <?php the_title(); ?>"><?php the_title(); ?></a></h2>

I’ve updated my theme code so it no longer shares this bug. To do this, just change Permanent Link to <?php the_title(); ?> to Permanent Link to <?php the_title()_rss; ?>. You need to do this in three files, archive.php, single.php and index.php.

4 Responses to “A Strong Test for Markup In Titles & Summaries”

  1. John Cowan Says:

    Escaped HTML isn’t evil as such; it’s just another text format. It’s unmarked escaped HTML that’s evil, as in the RSS variants, where you have to guess whether the person at the other end will interpret what you send as plain text or HTML.

    I have an RSS feed that represents a bunch of plain-text documents: consequently I have to first convert the plain text into HTML, then escape the HTML. In Atom, of course, I just mark the summary as plain.

  2. Elliotte Rusty Harold Says:

    The problem is that escaped HTML isn’t just text. I get bug reports every time I try to treat it as just text. Escaped HTML is only really just text when you really do want to see the markup as text. For example, it’s completely legitimate to include escaped in HTML a web page about HTML. However escaped HTML in other contexts just makes life difficult. I can’t parse it; and even if I double parse, it’s more than likely malformed.

  3. Ed Davies Says:

    Whatever with respect to the Atom stuff – this page looks bloody awful viewed directly in Firefox. I haven’t followed through what’s happening fully but, as far as I can see having downloaded the page with wget, the problems start with the double quotes in the double-quote delimited title attribute of the h1 element.

  4. Mokka mit Schlag » Is This a Security Issue? Says:

    […] Mokka mit Schlag » Web Development: Blogging: PHP « A Strong Test for Markup In Titles & Summaries […]

Leave a Reply