Sitemaps - Search Engine Optimization for Flash

by Todd Perkins

Sometimes your web pages can get lost in the mix because there aren’t many links to them, because they’re new pages, or because of the content contained within the pages (i.e., dynamic content, Flash content, AJAX, or other media). Search engines can find out about these pages through a sitemap, a file containing information about pages on your site that you want search engines to index. Sitemap files can be in XML or plain-text format, and they require a special syntax to work properly.

Search Engine Optimization for Flash book cover

This excerpt is from Search Engine Optimization for Flash. Search Engine Optimization for Flash dispels the myth that Flash-based websites won't show up in a web search by demonstrating exactly what you can do to make your site fully searchable -- no matter how much Flash it contains. You'll learn best practices for using HTML, CSS and JavaScript, as well as SWFObject, for building sites with Flash that will stand tall in search rankings.

buy button

Sitemaps

Note

XML, or eXtensible Markup Language, is somewhat similar to HTML, which is used to represent data in a simple, organized, universally accessible way. XML files are simply text files that contain XML code, saved with the .xml extension.

Plain-text files refer to files created in a text editor that contain only text, and have a .txt extension.

Along with using a sitemap to tell search engines about your pages, you can optionally add extra information about each page. The information can include when the page was last modified, how often the page is updated, and how the page’s importance ranks in relation to the other pages on your site.

Note

Many search engines, including Google, Yahoo!, and MSN, follow the sitemap standards at http://www.sitemaps.org. You can find more information on sitemaps and sitemap standards on that site.

Why You Need a Sitemap

Sitemaps are great for pages that are more difficult for search engines to index. This includes pages with Flash, AJAX, and dynamic content. If you have pages that utilize any of these technologies, it’s a good idea to create a sitemap. Although it’s not guaranteed that search engines will index every page in your sitemap, creating one won’t hurt your rankings, so you may find it makes sense to create one.

Creating a Sitemap

All you need to create a sitemap is a plain-text editor, such as TextEdit on the Mac or Notepad on the PC. As I mentioned earlier, there are two types of sitemaps: XML and plain text. The creation process is slightly different depending on which type of sitemap you decide to create.

Note

Some websites, such as http://www.xml-sitemaps.com/, create sitemaps for you by giving you cut-and-paste text to put into an XML file and upload to your site. This is a great option, especially if you want to create a sitemap quickly.

Creating an XML-Based Sitemap

Following is an example of an XML-based sitemap.

File: sitemap.xml

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
   <url>
      <loc>http://www.yoursite.com/page1.html</loc>
      <lastmod>2008-01-01</lastmod>

      <changefreq>hourly</changefreq>
      <priority>1</priority>
   </url>
   <url>
      <loc>http://www.yoursite.com/page2.html</loc>

      <lastmod>2008-02-01</lastmod>
      <changefreq>weekly</changefreq>
      <priority>0.5</priority>
   </url>

   <url>
      <loc>http://www.yoursite.com/page3.html</loc>
      <lastmod>2008-03-01</lastmod>
      <changefreq>monthly</changefreq>

      <priority>0.2</priority>
   </url>
</urlset>

Here’s a walkthrough of the preceding code.

<?xml version="1.0" encoding="UTF-8"?>

This line is standard as the first line in an XML file, and it includes information about the XML version used in the file and how the text data is encoded.

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
...(code not shown here)
</urlset>

This code defines a set of URLs for your sitemap, in the urlset element. The xmlns attribute defines an XML namespace, a syntax standard unique to the URL value. This namespace is the http://www.sitemaps.org/schemas/sitemap/0.9 namespace.

<url>
   ...(code not shown here)
</url>

The url tag defines a URL in your sitemap. Except for the required URL location element, all other elements are optional.

<loc>http://www.yoursite.com/page1.html</loc>

The loc element contains the location of a URL in your sitemap. In this example, the URL is http://www.yoursite.com/page1.html. Each url element must at least contain the loc element.

<lastmod>2008-01-01</lastmod>

This element, lastmod, contains the date that the URL from the loc element, was last modified. The format for the date is YYYY-MM-DD, or a four-digit year, a two-digit month, and a two-digit date, separated by hyphens. This date represents January 1, 2008.

<changefreq>weekly</changefreq>

The changefreq element refers to how often the page is updated, or its change frequency. For example, this code declares that the URL is updated weekly. Valid values for changefreq are as follows:

  • always

  • hourly

  • daily

  • weekly

  • monthly

  • yearly

  • never

It’s important to note that declaring your page’s change frequency doesn’t necessarily mean search engine spiders will index your page as often as they’re updated. This is more of a guideline for them. In fact, spiders will occasionally crawl pages that have a value of never, just in case any changes have been made.

<priority>0.2</priority>

The priority element dictates a page’s importance relative to other pages in your sitemap. This element has a default value of 0.5, and it ranges from 0 to 1. Giving your pages higher-priority ratings doesn’t mean the pages will rank higher than other sites in search engine results. Rather, this is for the sitemap to choose which pages on your site are more important than their pages on your site. This is a means of controlling which of your pages get priority over your other pages in the results pages. Pages with higher priority will show up higher than pages with lower priority.

Note

All major search engines (Google, Yahoo!, and MSN) use the same sitemap syntax, so you don’t have to create a unique sitemap for each search engine.

In summary, remember to declare the XML version and the encoding, keep url elements in a urlset element, make sure to declare the namespace, and include at least the loc element inside each url.

Warning

You can declare a maximum of 50,000 URLs in a sitemap.

If you have a lot of URLs in your sitemap, you may want to consider creating multiple sitemaps and linking them together. You can find instructions for doing that at http://www.sitemaps.org.

Warning: special characters

When creating an XML sitemap, or any XML file, for that matter, certain characters aren’t allowed because they’re reserved XML characters. To use these characters in an XML file, characters such as <, for example, you must escape them by using special syntax to represent them. Table 2.1, “Characters that must be escaped” shows which characters must be escaped and how to escape them.

Table 2.1. Characters that must be escaped

Character

Escape code

&

&amp;

' (single quote)

&apos;

" (double quote)

&quot;

<

&lt;

>

&gt;


Most likely, the only character escaping you’ll need to do when creating a sitemap is for a dynamic URL. For example, you may want to include a page that keeps track of a person’s username and ID, so your page URL may look like this:

http://www.yourwebsite.com/index.php?user=someone&id=83736

To separate URL parameters, user and id in this case, you need to use an ampersand (&). To escape an ampersand, use the escape code &amp;. Your sitemap code would then need to look like this:

<loc>http://www.yourwebsite.com/index.php?user=someone&amp;id=83736</loc>

Note

Even though problems with escaping characters may not be immediately obvious to you, and may not cause errors when you submit your sitemap, it’s important to double-check your URLs to make sure the proper characters are escaped.

Creating a Plain-Text Sitemap

A plain-text sitemap is much simpler than an XML sitemap, but it offers you less control. Using a plain-text sitemap lets you specify one URL on each line.

Following is an example of a plain-text sitemap.

File: sitemap.txt

http://www.yoursite.com/page1.html
http://www.yoursite.com/page2.html
http://www.yoursite.com/page3.html

Your plain-text sitemap should contain only the URLs for the pages on your site. Don’t include header, footer, or any other text in your plain-text sitemap.

Placing Your Sitemap

Your sitemap should be in the highest directory level that you want to be indexed. For example, if you want your entire site to be indexed, you’d put your sitemap in your root directory. If your domain was Yoursite.com and your sitemap file was called sitemap.xml, the URL to your sitemap would be:

http://www.yoursite.com/sitemap.xml

Note

Search engines will automatically look for a sitemap called sitemap.xml at the root directory of your server, so if you put it here you can skip the process of submitting your sitemap.

Sometimes you may only want part of your site to be indexed. For example, you may have several folders on your web server that are password-protected and one folder for the public to view. In this case, you’d put your sitemap in the public folder. Here’s an example of what the URL to your sitemap would be if you were to call the sitemap file sitemap.xml and the public folder public:

http://www.yoursite.com/public/sitemap.xml

Submitting a Sitemap

For Google, you can log in to your Webmaster Tools account (assuming you’ve created an account) at http://www.google.com/webmasters and give Google the URL to your sitemap. The process is the same for MSN. Log in to Webmaster Tools at http://webmaster.live.com, and submit your website and sitemap URL. To submit your sitemap to Yahoo!, go to https://siteexplorer.search.yahoo.com/submit, submit your site, and submit your sitemap in the Submit Site Feed area.

Once you’ve submitted your sitemap, the search engines will do the rest of the work for you. Although it’s not guaranteed that submitting a sitemap will force spiders to crawl all of your pages, it still gives you more control over what pages are indexed and how they rank in relation to other pages on your site.

Note

Again, search engine spiders automatically look for sitemaps called sitemap.xml at the root level of your website.

If you enjoyed this excerpt, buy a copy of Search Engine Optimization for Flash.