O'Reilly Hacks
oreilly.comO'Reilly NetworkSafari BookshelfConferences Sign In/My Account | View Cart   
Book List Learning Lab PDFs O'Reilly Gear Newsletters Press Room Jobs  


WEBMASTER HACK

Handling Search Engine Referrals to Framed Pages
HTML pages that belong in framesets are sometimes displayed alone if the visitor is referred to your site by a search engine. If you want search engines to index all of your pages but don't want them orphaned, you can use JavaScript to wrap the appropriate frameset around out-of-context requests.

Contributed by:
Bogart Salzberg
[04/18/03 | Discuss (7) | Link to this hack]

Abstract: HTML pages that belong in framesets are sometimes displayed alone if the visitor is referred to your site by a search engine. If you want search engines to index all of your pages but don't want them orphaned, you can use JavaScript to wrap the appropriate frameset around out-of-context requests.

Problem: If search engines are allowed to index the HTML pages that compose the frames of your frameset, it is likely that some users will attempt to access your site through direct links to these out-of-context pages. The result is usually bad: the page is orphaned without its navigation frame, and faceless without its banner frame. Layout which relies on frame dimensions suddenly overflows. The typical user's thin tolerance for bad web pages is used up before they read a word.

Solution: By using Javascript to evaluate the context in which your site's HTML pages are displayed, you can conditionally replace lone pages with the appropriate frameset. The first step is to name the frameset page so it can be identified by other pages:

	...
	<SCRIPT language="JavaScript">
	window.name = 'snowmen_frameset';
	</SCRIPT>
	...

The next step is to add a script reference and function call to the HEAD element of all content pages:

	...
	<SCRIPT language="JavaScript" src="navigate.js"></SCRIPT>
	<SCRIPT language="JavaScript">checkFrames(location.pathname)</SCRIPT>
	...

The checkFrames function is included in navigate.js:

	var frameset_name = 'snowmen_frameset'
	function checkFrames(loc) {
		if (top.name != frameset_name) {
			loc = 'index.html?' + loc.slice(loc.lastIndexOf('/') + 1);
			location.replace(loc);
		}
	}

If the name of the top-level window is not the name of the appropriate frameset page, then the current page is replaced with that page. But the user may rightly feel misled to land at the front door of a site after clicking on a deep link. If referred by a search engine, he/she is probably looking for specific information contained on only one page. Therefore, our checkFrames function adds a reference to the current page which will allow it to be loaded in the frameset, in place of the default page.

(The appropriate frameset model in our example has only one "content frame," with supporting navigation and banner frames. With two or more content frames, additional code would be required in order to establish and track which frame the out-of-context page should be loaded into.)

The page reference is created by parsing the location.pathname property. If the page is deeply nested in the domain (e.g.: home/sites/winter/snowmen/frosty.html), the lastIndexOf and slice methods return only the file name ("frosty.html"). The file name is then appended to the name of the frameset file, plus another critical component: a question mark.

(Again, we're making an assumption about your site: that all the content files are contained in the same directory as the frameset file. Our solution depends on relative URLs.)

The frameset then loads with a funny-looking URL in the address bar (e.g.: http://foo.com/home/sites/winter/snowmen/index.html?frosty.html). The reference to our original page is now available to the frameset page through the location.search property. In this case, the property returns the string "?frosty.html" (i.e.: the URL's "query string", often used to submit data to server-side scripts). You could attempt to dynamically set the src property of the content frame so that this page is loaded immediately, but I found it easier to leave the frameset alone and simply pass the reference along to the content frame's default page, which then has the special duty of replacing itself with the originally requested page.

Here is the code for parsing the page reference. These lines would be placed in the HEAD element of the content frame's default page (e.g.: intro.html):

	...
	<SCRIPT language="JavaScript" src="navigate.js"></SCRIPT>
	<SCRIPT language="JavaScript">
	checkFrames(location.pathname);
	target_page = getPageRef(top.location.search);
	if ((target_page.length ≫ 0) && (target_page.indexOf('intro.html') == -1)) {
		location.replace(target_page);
	}
	</SCRIPT>
	...

And here is the function it calls, included in navigate.js:

	function getPageRef(loc) {
		if (loc.indexOf('?') != -1) {
			loc = loc.slice(1);
		}
		else {loc = "";}
		return loc;
	}

Note that the argument to the getPageRef function refers to the frameset's URL (top.location) rather than the page's URL (location). Thus the reference to the original page is not actually "passed" but remains accessible after the frameset has loaded.

Our parsing of the search string and our handling of the result must account for numerous possibilities:

1. If the search string is "?frosty.html", for example, the function uses the slice method to return the whole string EXCEPT for the first character. (Our argument tells the method to return the substring starting at position 1, which is actually the second character since counting starts at 0).

2. If the argument does not include a "?", then the user is presumed to have entered our site through the "front door". In this case, an empty string is returned which then fails the conditional test for replacing the page. Returning the empty string avoids the error of trying to evaluate an undefined variable in the conditional. Of course, this could also be accomplished by initializing the variable with a default value of "".

3. We have to handle cases in which the content frame's default page (e.g.: intro.html) IS the originally requested page. Our test for the presence of "intro.html" prevents the page from unnecessarily reloading itself.

4. What happens after all the parsing and handling are done? Well, if our content is any good, the user browses from page to page. Subsequent pages will check their service context according to the checkFrames function and find they are contained within the appropriate frameset. The replacement test will fail, resulting in seemless browsing. But what if the user originally requested a page like "frosty.html", landed there as intended and then browses to the content frame's default page (e.g.: intro.html)? Because of the special code in that page, the URL is parsed and the user is unceremoniously returned to the "frosty.html" page which he/she has already seen. (The URL of the frameset, which in this example includes the "?frosty.html" page reference, does not change during browsing unless through scripting). One solution to this problem is to hack the link(s) to the default page:

	Instead of...
	<A href="intro.html" target="content_frame">Introduction</A>
	
	We have...
	<A href="javascript:top.location.search='?intro.html'" target="content_frame">Introduction</A>

Setting the search property causes the page to reload, which is why we can't simply direct the default page to clear it after parsing it. (It would reload the frameset with the default pages).

While we're on the subject of frameset URLs, we should take a look at how our site will be bookmarked. Without more code, any "funny-looking URL" passed to a frameset by the checkFrames function will persist through the user's browsing of your site. Therefore, where the user might assume he/she was bookmarking the frameset and its default pages, he/she will actually be bookmarking the frameset AND its customized page reference. Depending on how you look at it, this could be considered a bug OR a feature. However, if the "default page link hack" comes into play as cited above, the user will be bookmarking the default frameset anyway.

Another thing to consider with framed sites is that even if only one frame is being replaced during browsing, the others (such as a navigation frame) could also be susceptible to "out-of-context" requests. The easy solution is to paste this script into the HEAD element of these navigation or banner pages:

	...
	<SCRIPT language="JavaScript">
	if (top.name != 'long_range_frameset') {
		location.replace('index.html');
	}
	</SCRIPT>
	...

The code is from our checkFrames function. Since the pages in question are named in the default frameset anyway, there is no need to pass a page reference. Indeed, if one were passed it would result in the navigation frame, for example, being duplicated in the content frame.

See also: See this hack in action at http://inkfist.com/honeymoon/hh.html


O'Reilly Home | Privacy Policy

© 2007 O'Reilly Media, Inc.
Website: | Customer Service: | Book issues:

All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners.