Developing Feeds with RSS and Atom

Introducing Modules

Modules are additional sets of elements, giving the feed a greater range of expression: they allow the specification to be extended without actually being changed, which is a very clever trick. You can make your own module match any data you might wish to syndicate. Admittedly, most aggregators will ignore it, but your own applications can take advantage of it. And, happily, the most popular modules are increasingly being supported by the latest aggregators as a matter of course.

Modules in RSS, both Versions 2.0 and 1.0, are created with a system known as XML Namespaces. Namespaces are the XML solution to the classic language problem of one word meaning two things in different contexts. Take “Windows,” for example. In the context of houses, “windows” are holes in the wall through which we can look. In the context of computers, “Windows” is a trademark of the Microsoft Corporation and refers to its range of operating systems. The context within which the name has a particular meaning is called its namespace.

In XML, you can distinguish between the two meanings by assigning a namespace and placing the namespace’s name in front of the element name, separated by a colon, like this:

<computing:windows>This is an operating system</computing:windows>

<building:windows>This is a hole in a wall</building:windows>

Namespaces solve two problems. First, they allow you to distinguish between different meanings for words that are spelled the same way, which means you can use words more than once for different meanings. Second, they allow you to group together words that are related to each other; for example, using a computer to look through an XML document for all elements with a certain namespace is easy.

Both RSS 1.0 and 2.0 use namespaces to allow for modularization . This modularization means that developers can add new features to RSS documents without changing the core specification.

Modularization has great advantages over the older RSS 0.9x’s method for including new elements. For starters, anyone can create a module: there are no standards issues or any need for approval, aside from making sure that the namespace URI you use has not been used before. And, it means both RSS 1.0 and 2.0 are potentially far more powerful than RSS 0.9x ever was.

A module works in the actual RSS document by declaring a namespace within the root element of the feed and by prefixing the element’s names with that namespace prefix, like so:

<?xml version="1.0"?>
<rss version="2.0" xmlns:blogChannel="http://backend.userland.com/blogChannelModule">
...

  <blogChannel:blink>http://www.benhammersley.com</blogChannel:blink>
...

You should note that the URI the namespace declaration points to is the unique identifier of the namespace and not the namespace prefix. In other words, from the perspective of a program processing XML, this:

<?xml version="1.0"?>
<rss version="2.0" xmlns:blogChannel="http://backend.userland.com/blogChannelModule">
...

  <blogChannel:blink>http://www.benhammersley.com</blogChannel:blink>
...

is absolutely identical to this:

<?xml version="1.0"?>
<rss version="2.0" xmlns:bingbangbong="http://backend.userland.com/blogChannelModule">
...

  <bingbandbong:blink>http://www.benhammersley.com</bingbangbong:blink>...

This will become clear as we study some common modules. It is customary, and also very good manners, to have documentation for the module to be found at the namespace’s URI, but this isn’t technically necessary. As discussed in Chapter 11, the different feed standards have different scopes for the form this documentation can take. The presence of anything at all at the namespace URI is entirely optional, both in terms of RSS and within the scope of the broader XML specification itself.

blogChannel Module

Designed by Dave Winer only a week after he formalized RSS 2.0, the blogChannel module allows the inclusion of data used by weblogging applications and, specifically, the newer generation of aggregating and filtering systems.

It consists of three optional elements, all of which are subelements of channel and have the following namespace declaration:

xmlns:blogChannel="http://backend.userland.com/blogChannelModule"

The elements are:

blogChannel:blogRoll: Contains a literal string that is the URL of an OPML file containing the blogroll for the site. A blogroll is the list of blogs the blog author habitually reads.
blogChannel:blink: Contains a literal string that is the URL of a site the blog author recommends the reader visits.
blogChannel:mySubscriptions: Contains a literal string that is the URL of the OPML file containing the URLs of the RSS feeds to which the blog author is subscribed in her desktop reader.

Example 4-4 shows the beginning of an RSS 2.0 feed using the blogChannel module.

Example 4-4. An RSS 2.0 feed with the blogChannel module

<?xml version="1.0"?>
<rss version="2.0" xmlns:blogChannel="http://backend.userland.com/blogChannelModule">
<channel>
  <title>RSS2.0Example</title> 
  <link>http://www.exampleurl.com/example/index.html</link> 
  <description>This is an example RSS 2.0 feed</description> 
  <blogChannel:blogRoll>http://www.exampleurl.com/blogroll.opml</blogChannel:blogRoll>
<blogChannel:blink>http://www.benhammersley.com</blogChannel:blink>
<blogChannel:mySubscriptions>http://www.exampleurl.com/mySubscriptions.opml
</blogChannel:mySubscriptions>
...

Creative Commons Module

Also designed by Dave Winer, the Creative Commons module allows RSS 2.0 feeds to specify which Creative Commons license applies to them. The Creative Commons organization, http://creativecommons.org/, offers a variety of content licenses that allow feed publishers to release content under more flexible copyright restrictions than previously available. Feed consumers can consult the license to see how they can reuse the content for their own work.

The element can apply to either the complete channel or the individual item.

It consists of only one element, creativeCommons:license, which contains the URL of the Creative Commons license on the Creative Commons site. It has the following namespace declaration:

xmlns:creativeCommons="
http://backend.userland.com/creativeCommonsRssModule"

In action, it looks like Example 4-5.

Example 4-5. Part of an RSS 2.0 feed with the Creative Commons module

<rss version="2.0" xmlns:creativeCommons="http://backend.userland.com/
    creativeCommonsRssModule">

<channel>
<title>Creative Commons Example</title>
<link>http://www.example.com/</link>
<creativeCommons:license>http://www.creativecommons.org/licenses/by-nd/1.0
</creativeCommons:license>
...
<item>
<description>blah blah blah</description>
<creativeCommons:license>http://www.creativecommons.org/licenses/by-nc/1.0
 </creativeCommons:license>
</item>
...

Note that a creativeCommons:license element on an item overrides the same on the channel for that item.

More details can be found at:

http://backend.userland.com/creativeCommonsRssModule

Simple Semantic Resolution Module

One of the never-ending arguments within the RSS world is that between the pro- and anti-RDF camps. The fork between RSS 0.91 and 1.0 was almost entirely caused by this disagreement. The pro-RDF camp stated, quite rightly, that RDF data has a great deal more meaningful utility than plain XML, whilst the anti-RDF camp stated, also quite rightly, that the RDF syntax was horrible, and that no one can understand it without reading the documentation and having a nice lie down.

That may be—we’ll find out your own feelings on this in the next chapters—but in the meantime, the Simple Semantic Resolution module was one idea put forward to bridge the divide between the two cultures.

Written by Danny Ayers, its presence in an RSS 2.0 feed simply means “this data should be considered RDF, and to use it with an RDF-compatible application you should apply this transformation to it first.” Whereupon, it points you to a nice XSLT stylesheet. That stylesheet consists of one single element, a subelement of channel, and has the following namespace declaration:

xmlns:ssr="http://purl.org/stuff/ssr"

The element is:

ssr:rdf

It’s empty, but contains a single attribute, transform, which is equal to the URL of the necessary stylesheet:

<ssr:rdf transform="http://w3future.com/weblog/gems/rss2rdf.xsl" />

Example 4-6 shows the SSR module in use.

Example 4-6. Part of an RSS 2.0 feed with the SSR module

<?xml version="1.0"?>
<rss version="2.0" xmlns:ssr="http://purl.org/stuff/ssr">
<ssr:rdf transform="http://w3future.com/weblog/gems/rss2rdf.xsl" />
...

More details can be found at http://ideagraph.net/xmlns/ssr/.

Trackback Module

The trackback system for weblog content management systems (see http://www.movabletype.org/docs/mttrackback.html for the technical details) has grown up in the same neighborhood as RSS, so it’s only fair that the one should be represented in the other.

This module, also available in tasty RSS 1.0, comes from Justin Klubnik and allows RSS 2.0 feeds to display both the URL that people should trackback to, but also the URL that the item has trackbacked itself. The idea is that aggregators can send pings and also follow links to find related pages, because items might ping places they don’t explicitly link to.

This module is made up of two elements, subelements of item, and has the following namespace declaration:

xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/"

Here are the elements:

trackback:ping: This contains the item’s trackback URL:

<trackback:ping>http://foo.com/trackback/tb.cgi?tb_id=20020923</trackback:ping>

trackback:about

This contains any trackback URL that was pinged in reference to the item:

<trackback:about>http://foo.com/trackback/tb.cgi?tb_id=20020923</trackback:about>

More details can be found at http://madskills.com/public/xml/rss/module/trackback/.

ICBM Module

This module, written by Matt Croydon and Kenneth Hunt, allows RSS feeds to state the geographical location of the origin of the feed or an individual item within it.

It’s alleged that ICBM does actually stand for intercontinental ballistic missile, and certainly a half-arsed attempt at Googling for it produces only the explanation that describing one’s position as an ICBM address is so that, should anyone wish, your data will allow the baddies to target you directly, presumably for being far too clever with your syndication feeds.

Either way, the namespace declaration is thus:

xmlns:icbm="http://postneo.com/icbm"

It contains two elements, usable in either the channel or the item context. The item context overrides the former, as you might expect.

icbm:latitude

This contains the latitude value as per the geographic standard WGS84:

<icbm:latitude>43.7628</icbm:latitude>

icbm:longitude

This contains the longitude value as per the geographic standard WGS84.

<icbm:longitude>11.2442</icbm:longitude>

That’s my house, actually.

Go to http://www.postneo.com/icbm/ for more verbose details on the thinking behind the specification.

Yahoo!’s Media RSS Module

In December 2004, Yahoo! launched a beta video search engine at http://video.search.yahoo.com/. The original system spidered the Web looking for video files and indexed them with the implied information found in the filename and link text. To make it easier for video content producers to have Yahoo! index their sites, and to give the search engine much better data to play with, Yahoo! is now offering to regularly spider RSS feeds containing details of media files. This additional data is encoded in its new Media RSS Module.

That module consists of one element, <media:content>, with a namespace declaration of:

xmlns:media="http://tools.search.yahoo.com/mrss/"

and four optional subelements. <media:content> is a subelement of item and consists of ten optional attributes.

url: url specifies the direct URL to the media object. It is an optional attribute. If a URL isn’t included, a playerURL must be specified.
fileSize: The size, in bytes, of the media object. It is an optional attribute.
type: The standard MIME type of the object. It is an optional attribute.
playerURL: playerURL is the URL of the media player console. It is an optional attribute.
playerHeight: playerHeight is the height of the window the playerURL should be opened in. It is an optional attribute.
playerWidth: playerWidth is the width of the window the playerURL should be opened in. It is an optional attribute.
isDefault: isDefault determines if this is the default object that should be used for this element. It can be true or false. So, if an item contains more than one media:content element, setting this to true makes it the default. It’s an optional attribute but can be used only once within each item.
expression: expression determines if the object is a sample or the full version of the object. It can be either sample or full. It is an optional attribute.
bitrate: The bit rate of the file, in kilobits per second. It is an optional attribute.
duration: The number of seconds the media plays, for audio and video. It is an optional attribute.

There are also four optional subelements to <media:content>, which can be also used as subelements to item:

<media:thumbnail>

Allows a particular image to be used as the representative image for the media object:

<media:thumbnail height="50" width="50">
         http://www.foo.com/keyframe.jpg</media:thumbnail>

It takes two optional attributes. height specifies the height of the thumbnail. width specifies the width of the thumbnail.

<media:category>

Allows a taxonomy to be set that gives an indication of the type of media content and its particular contents:

<media:category>music/artist name/album/song</media:category>
<media:category>television/series/episode/episode number</media:category>

<media:people>

Lists the notable individuals or businesses and their contribution to the creation of the media object.

<media:people role="editor">Simon St Laurent</media:people>

role specifies the role individuals played. Examples include: producer, artist, news anchor, cast member, etc. It is an optional attribute.

<media:text>

Allows the inclusion of a text transcript, closed captioning, or lyrics of the media content:

<media:text>Oh, say, can you see, by the dawn's early light,</media:text>

Once your site has a feed working with the Media RSS Module, like that shown in Example 4-7, you can submit it to Yahoo! at http://tools.search.yahoo.com/mrss/submit.html.

Example 4-7. media:content in action

<media:content url="http://www.example.com/movie.mov" fileSize="12345678" type=
               "video/quicktime"
    playerUrl="http://http://www.example.com/player?id=1" playerHeight="200" 
               playerWidth="400"
    isDefault="true" expression="full" bitrate="128" duration="185">
    <media:thumbnail height="50" width="50">http://www.example.com/thumbnail.jpg
               thumbnail></media:

    <media:category>comedy/slapstick/custard</media:category>
    <media:people role="stuntman">Ben Hammersley</media:people>
    <media:text>Take that! And that! And that!</media:text>
</media:content>

The development of your own modules is covered in Chapter 11.

Get Developing Feeds with RSS and Atom now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Developing Feeds with RSS and Atom by Ben Hammersley