Sitemaps - Search Engine Optimization for Flash
by Todd PerkinsSometimes your web pages can get lost in the mix because there aren’t many links to them, because they’re new pages, or because of the content contained within the pages (i.e., dynamic content, Flash content, AJAX, or other media). Search engines can find out about these pages through a sitemap, a file containing information about pages on your site that you want search engines to index. Sitemap files can be in XML or plain-text format, and they require a special syntax to work properly.
This excerpt is from Search Engine Optimization for Flash. Search Engine Optimization for Flash dispels the myth that Flash-based websites won't show up in a web search by demonstrating exactly what you can do to make your site fully searchable -- no matter how much Flash it contains. You'll learn best practices for using HTML, CSS and JavaScript, as well as SWFObject, for building sites with Flash that will stand tall in search rankings.
Note
XML, or eXtensible Markup Language, is
somewhat similar to HTML, which is used to represent data in a
simple, organized, universally accessible way. XML files are simply
text files that contain XML code, saved with the .xml extension.
Plain-text files refer to files created in a text editor that
contain only text, and have a .txt
extension.
Along with using a sitemap to tell search engines about your pages, you can optionally add extra information about each page. The information can include when the page was last modified, how often the page is updated, and how the page’s importance ranks in relation to the other pages on your site.
Note
Many search engines, including Google, Yahoo!, and MSN, follow the sitemap standards at http://www.sitemaps.org. You can find more information on sitemaps and sitemap standards on that site.
Sitemaps are great for pages that are more difficult for search engines to index. This includes pages with Flash, AJAX, and dynamic content. If you have pages that utilize any of these technologies, it’s a good idea to create a sitemap. Although it’s not guaranteed that search engines will index every page in your sitemap, creating one won’t hurt your rankings, so you may find it makes sense to create one.
All you need to create a sitemap is a plain-text editor, such as TextEdit on the Mac or Notepad on the PC. As I mentioned earlier, there are two types of sitemaps: XML and plain text. The creation process is slightly different depending on which type of sitemap you decide to create.
Note
Some websites, such as http://www.xml-sitemaps.com/, create sitemaps for you by giving you cut-and-paste text to put into an XML file and upload to your site. This is a great option, especially if you want to create a sitemap quickly.
Following is an example of an XML-based sitemap.
File: sitemap.xml
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>http://www.yoursite.com/page1.html</loc>
<lastmod>2008-01-01</lastmod>
<changefreq>hourly</changefreq>
<priority>1</priority>
</url>
<url>
<loc>http://www.yoursite.com/page2.html</loc>
<lastmod>2008-02-01</lastmod>
<changefreq>weekly</changefreq>
<priority>0.5</priority>
</url>
<url>
<loc>http://www.yoursite.com/page3.html</loc>
<lastmod>2008-03-01</lastmod>
<changefreq>monthly</changefreq>
<priority>0.2</priority>
</url>
</urlset>
Here’s a walkthrough of the preceding code.
<?xml version="1.0" encoding="UTF-8"?>
This line is standard as the first line in an XML file, and it includes information about the XML version used in the file and how the text data is encoded.
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> ...(code not shown here) </urlset>
This code defines a set of URLs for your sitemap, in the
urlset element. The xmlns attribute defines an XML namespace, a syntax standard unique
to the URL value. This namespace is the http://www.sitemaps.org/schemas/sitemap/0.9
namespace.
<url> ...(code not shown here) </url>
The url tag
defines a URL in your sitemap. Except for the required URL location
element, all other elements are optional.
<loc>http://www.yoursite.com/page1.html</loc>
The loc element contains
the location of a URL in your sitemap. In this example, the URL is
http://www.yoursite.com/page1.html. Each
url element must at least contain the
loc element.
<lastmod>2008-01-01</lastmod>
This element, lastmod, contains the date that the URL from
the loc element, was last modified.
The format for the date is YYYY-MM-DD, or a four-digit year, a
two-digit month, and a two-digit date, separated by hyphens. This
date represents January 1, 2008.
<changefreq>weekly</changefreq>
The changefreq element
refers to how often the page is updated, or its change frequency.
For example, this code declares that the URL is updated weekly.
Valid values for changefreq are as
follows:
-
always -
hourly -
daily -
weekly -
monthly -
yearly -
never
It’s important to note that declaring your page’s
change frequency doesn’t necessarily mean search engine
spiders will index your page as often as they’re updated.
This is more of a guideline for them. In fact, spiders will
occasionally crawl pages that have a value of never, just in case any changes have been
made.
<priority>0.2</priority>
The priority element
dictates a page’s importance relative to other pages in your
sitemap. This element has a default value of 0.5, and it ranges
from 0 to 1. Giving your pages higher-priority ratings
doesn’t mean the pages will rank higher than other sites in
search engine results. Rather, this is for the sitemap to choose
which pages on your site are more important than their pages on
your site. This is a means of controlling which of your pages get
priority over your other pages in the results pages. Pages with
higher priority will show up higher than pages with lower
priority.
Note
All major search engines (Google, Yahoo!, and MSN) use the same sitemap syntax, so you don’t have to create a unique sitemap for each search engine.
In summary, remember to declare the XML version and the
encoding, keep url elements in a
urlset element, make sure to declare
the namespace, and include at least the loc element inside each url.
Warning
You can declare a maximum of 50,000 URLs in a sitemap.
If you have a lot of URLs in your sitemap, you may want to consider creating multiple sitemaps and linking them together. You can find instructions for doing that at http://www.sitemaps.org.
When creating an XML sitemap, or any XML
file, for that matter, certain characters aren’t allowed
because they’re reserved XML characters. To use these
characters in an XML file, characters such as <, for example, you must escape them by using special syntax to
represent them. Table 2.1,
“Characters that must be escaped” shows which
characters must be escaped and how to escape them.
Table 2.1. Characters that must be escaped
|
Character |
Escape code |
|---|---|
|
& |
& |
|
' (single quote) |
' |
|
" (double quote) |
" |
|
< |
< |
|
> |
> |
Most likely, the only character escaping you’ll need to do when creating a sitemap is for a dynamic URL. For example, you may want to include a page that keeps track of a person’s username and ID, so your page URL may look like this:
http://www.yourwebsite.com/index.php?user=someone&id=83736
To separate URL parameters, user
and id in this case, you need to use
an ampersand (&). To escape an
ampersand, use the escape code &. Your sitemap code would then need to look like
this:
<loc>http://www.yourwebsite.com/index.php?user=someone&id=83736</loc>
A plain-text sitemap is much simpler than an XML sitemap, but it offers you less control. Using a plain-text sitemap lets you specify one URL on each line.
Following is an example of a plain-text sitemap.
File: sitemap.txt
http://www.yoursite.com/page1.html http://www.yoursite.com/page2.html http://www.yoursite.com/page3.html
Your plain-text sitemap should contain only the URLs for the pages on your site. Don’t include header, footer, or any other text in your plain-text sitemap.
Your sitemap should be in the highest
directory level that you want to be indexed. For example, if you
want your entire site to be indexed, you’d put your sitemap
in your root directory. If your domain was Yoursite.com and your
sitemap file was called sitemap.xml,
the URL to your sitemap would be:
http://www.yoursite.com/sitemap.xml
Note
Search engines will automatically look for a sitemap called
sitemap.xml at the root directory of
your server, so if you put it here you can skip the process of
submitting your sitemap.
Sometimes you may only want part of your site to be indexed. For
example, you may have several folders on your web server that are
password-protected and one folder for the public to view. In this
case, you’d put your sitemap in the public folder.
Here’s an example of what the URL to your sitemap would be if
you were to call the sitemap file sitemap.xml and the public folder public:
http://www.yoursite.com/public/sitemap.xml
For Google, you can log in to your Webmaster Tools account (assuming you’ve created an account) at http://www.google.com/webmasters and give Google the URL to your sitemap. The process is the same for MSN. Log in to Webmaster Tools at http://webmaster.live.com, and submit your website and sitemap URL. To submit your sitemap to Yahoo!, go to https://siteexplorer.search.yahoo.com/submit, submit your site, and submit your sitemap in the Submit Site Feed area.
Once you’ve submitted your sitemap, the search engines will do the rest of the work for you. Although it’s not guaranteed that submitting a sitemap will force spiders to crawl all of your pages, it still gives you more control over what pages are indexed and how they rank in relation to other pages on your site.
