BUY THIS BOOK

Safari Books Online

What is this?

Looking to Reprint this content?


Google Hacks
Google Hacks, Second Edition Tips & Tools for Smarter Searching By Tara Calishain, Rael Dornfest
December 2004
Pages: 479

Cover | Table of Contents | Colophon


Table of Contents

Chapter 1: Web
Google's front page is deceptively simple: a search form and a couple of buttons. Yet that basic interface—so alluring in its simplicity—belies the power of the Google engine underneath and the wealth of information at its disposal. If you use Google's search syntax to its fullest, the Web is your oyster.
Searching in Google doesn't have to be a case of just entering what you're looking for in the search box and hoping for the best. Google offers you many ways—via special syntax and search options—to refine your search criteria and help Google better understand what you're looking for. We'll dig into Google's powerful, all-but-undocumented special syntax and search options, and show how to use them to their fullest. We'll cover the basics of Google searching, wildcards, word limits, syntax for special cases, mixing syntax elements, advanced search techniques, and using specialized vocabularies, including slang and jargon.
Whenever you search for more than one keyword at a time, a search engine has a default strategy for handling and combining those keywords. Can those words appear individually anywhere in a page, or do they have to be right next to each other? Will the engine search for both keywords or for either keyword?
Google defaults to searching for occurrences of your specified keywords anywhere in the page, whether side-by-side or scattered throughout. To return results of pages containing specifically ordered words, enclose them in quotes, turning your keyword search into a phrase search , to use Google's terminology.
On entering a search for the keywords:
to be or not to be
Google will find matches where the keywords appear anywhere on the page. If you want Google to find you matches where the keywords appear together as a phrase, surround them with quotes, like this:
"to be or not to be"
Google will return matches only where those words appear together (not to mention explicitly including stop words such as "to" and "or"; see the section "Explicit Inclusion" a little later).
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Hacks 1-20
Google's front page is deceptively simple: a search form and a couple of buttons. Yet that basic interface—so alluring in its simplicity—belies the power of the Google engine underneath and the wealth of information at its disposal. If you use Google's search syntax to its fullest, the Web is your oyster.
Searching in Google doesn't have to be a case of just entering what you're looking for in the search box and hoping for the best. Google offers you many ways—via special syntax and search options—to refine your search criteria and help Google better understand what you're looking for. We'll dig into Google's powerful, all-but-undocumented special syntax and search options, and show how to use them to their fullest. We'll cover the basics of Google searching, wildcards, word limits, syntax for special cases, mixing syntax elements, advanced search techniques, and using specialized vocabularies, including slang and jargon.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Google Web Search Basics
Whenever you search for more than one keyword at a time, a search engine has a default strategy for handling and combining those keywords. Can those words appear individually anywhere in a page, or do they have to be right next to each other? Will the engine search for both keywords or for either keyword?
Google defaults to searching for occurrences of your specified keywords anywhere in the page, whether side-by-side or scattered throughout. To return results of pages containing specifically ordered words, enclose them in quotes, turning your keyword search into a phrase search , to use Google's terminology.
On entering a search for the keywords:
to be or not to be
Google will find matches where the keywords appear anywhere on the page. If you want Google to find you matches where the keywords appear together as a phrase, surround them with quotes, like this:
"to be or not to be"
Google will return matches only where those words appear together (not to mention explicitly including stop words such as "to" and "or"; see the section "Explicit Inclusion" a little later).
Phrase searches are also useful when you want to find a phrase but aren't quite sure of the exact wording. This is accomplished in combination with wildcards, explained later in the chapter in "Full-Word Wildcards."
Whether an engine searches for all keywords or any of them depends on what is called its Boolean default . Search engines can default to Boolean AND (searching for all keywords) or Boolean OR (searching for any keywords). Of course, even if a search engine defaults to searching for all keywords, you can usually give it a special command to instruct it to search for any keyword. Lacking specific instructions, the engine falls back on its default setting.
Google's Boolean default is AND, which means that, if you enter query words without modifiers, Google will search for all of your query words. For example, if you search for:
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Full-Word Wildcards
Some search engines support a technique called stemming. Stemming is adding a wildcard character—usually * (asterisk) but sometimes ? (question mark)—to part of your query, requesting the search engine return variants of that query using the wildcard as a placeholder for the rest of the word at hand. For example, moon* would find moons, moonlight, moonshot, etc.
Google doesn't support explicit stemming. It didn't used to support stemming at all, but now it implicitly stems for you. So, dietary will yield results for diet, diets, and other variations on the theme.
Google does offer a full-word wildcard. While you can't have a wildcard stand in for part of a word, you can insert a wildcard (Google's wildcard character is *) into a phrase and have the wildcard act as a substitute for one full word. Searching for "three * mice", therefore, finds three blind mice, three blue mice, three green mice, etc.
What good is the full-word wildcard? It's certainly not as useful as stemming, but then again, it's not as confusing to the beginner. One * is a stand-in for one word; two * signifies two words, and so on. The full-word wildcard comes in handy in the following situations:
  • Avoiding the 10-word limit (see "The 10-Word Limit" next) on Google queries. You'll most frequently run into these examples when you're trying to find song lyrics or a quote. Plugging the phrase Fourscore and seven years ago, our fathers brought forth on this continent into Google will search only as far as the word "on"; everything thereafter is summarily ignored by Google.
  • Checking the frequency of certain phrases and derivatives of phrases, such as: intitle:"methinks the * doth protest too much" and intitle: "the * of Seville" (intitle: is described later in "Special Syntax").
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
The 10-Word Limit
Unless you're fond of long, detailed queries, you might never have noticed that Google has a hard limit of 10 words—that's keywords and special syntaxes combined—ignoring anything beyond. While this has no real effect on casual Google users, search hounds quickly find that this limit rather cramps their style.
By limiting your query to the more obscure of your keywords or phrase fragments, you'll hone results without squandering precious query words. Let's say you're interested in a phrase from Hamlet: "The lady doth protest too much, methinks." At first blush, you might simply paste the entire phrase into the query field. But that's 7 of your 10 allotted words right there, leaving no room for additional query words or search syntax.
The first thing to do is ditch the first couple of words; "The lady" is just too common a phrase. This leaves the 5 words "doth protest too much, methinks." Neither "methinks" nor "doth" are words that you might hear every day, providing a nice Shakespearean anchor for the phrase. That said, one or the other should suffice, leaving the query at an even 4 words with room to grow:
"protest too much methinks"
or:
"doth protest too much"
Either of these will provide, within the first five results, origins of the phrase and pointers to more information.
Unfortunately, this technique won't do you much good in the case of "Do as I say, not as I do," which doesn't provide much in the way of obscurity. Attempt clarification by adding something like quote origin English usage and you're stepping beyond the 10-word limit. One solution is described next.
Help comes in the form of Google's full-word wildcard, described earlier. It turns out that Google doesn't count wildcards toward the limit.
So, when you have more than 10 words, substitute a wildcard for common words, like so:
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Special Syntax
In addition to the basic AND, OR, and phrase searches, Google offers some rather extensive special syntax for narrowing your searches.
As a full-text search engine, Google indexes entire web pages instead of just titles and descriptions. Additional commands, called special syntax or advanced operators, let Google users search specific parts of web pages for specific types of information. This comes in handy when you're dealing with more than eight billion web pages and need every opportunity to narrow your search results. Specifying that your query words must appear only in the title or URL of a returned web page is a great way to have your results get very specific without making your keywords themselves too specific. Following are descriptions of the special syntax elements, ordered by common usage and function.
Some of these syntax elements work well in combination. Others fare not quite as well. Still others do not work at all. For detailed discussion on what does and does not mix, see "Mixing Syntax," below.
intitle:
intitle: restricts your search to the titles of web pages. The variation allintitle: finds pages wherein all the words specified appear in the title of the web page. Using allintitle: is basically the same as using the intitle: before each keyword.
intitle:"george bush"
allintitle:"money supply" economics
You may wish to avoid the allintitle: variation, because it doesn't mix well with some of the other syntax elements.
intext:
intext: searches only body text (i.e., ignores link text, URLs, and titles). While its uses are limited, it's perfect for finding query words that might be too common in URLs or link titles.
intext:"yahoo.com"
intext:html
There's an
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Mixing Syntax
There was a time when you couldn't mix Google's special syntax elements; you were limited to one per query. Even as Google released ever more powerful special syntax elements, not being able to combine them for their composite power stunted many a search.
This has since changed. While there remain some syntax elements that you just can't mix, there are plenty to combine in clever and powerful ways. A thoughtful combination can do wonders to narrow a search.
There are some simple rules to follow when mixing syntax elements. These, for the most part, revolve around how not to mix.
  • Don't mix syntax elements that will cancel each other out, such as:
        site:ucla.edu -inurl:ucla
    Here you're saying you want all results to come from ucla.edu, but that site results should not have the string "ucla" in the results. Obviously, that's not going to produce many URLs.
  • Don't overuse single syntax elements, as in:
        site:com site:edu
    While you might think you're asking for results from either .com or .edu sites, what you're actually saying is that site results should come from both simultaneously. Obviously, a single result can come from only one domain. Take the example perl site:edu site:com. This search will get you exactly zero results. Why? Because a result page cannot come from a .edu domain and a .com domain at the same time. If you want results from .edu and .com domains only, rephrase your search like this:
        perl (site:edu | site:com)
    With the pipe character (|), you're specifying that you want results to come either from the .edu or the .com domain.
  • Don't use allinurl: or allintitle: when mixing syntax. It takes a careful hand not to misuse these in a mixed search. Instead, stick to inurl: or intitle:. If you don't put
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Advanced Search
Google's default simple search allows you to do quite a bit, but not everything. Google's Advanced Search page (http://www.google.com/advanced_search?hl=en), shown in Figure 1-1, provides more options, such as date search and filtering, with "fill in the blank" searching options for those who don't take naturally to memorizing special syntax.
Figure 1-1: Google's Advanced Search page
Most of the options presented on this page are self-explanatory, but we'll take a quick look at the kinds of searches that would be more difficult using the single-text-field interface of a simple search.
Because Google uses Boolean AND by default, it's sometimes hard to logically build out the nuances of a particular query. Using the text boxes at the top of the Advanced Search page, you can specify words that must appear—exact phrases, lists of words, at least one of which must appear—and words to be excluded.
Using the Language pull-down menu, you can specify what language all returned pages must be in, from Arabic to Turkish.
The File Format option lets you include or exclude several different file formats, including Microsoft Word and Excel. There are a couple of Adobe formats (most notably PDF) and Rich Text Format as options here, too. This is where the Advanced Search is at its most limited; there are literally dozens of file formats that Google can search for, and this set of options represents only a small subset. To get at the others, use the filetype: special syntax described earlier in "Special Syntax."
Date allows you to specify search results updated in the last three months, six months, or year. This date search is much more limited than the daterange: special syntax, which can give you results as narrow as one day, but Google stands behind the results generated using the Date option on the Advanced Search, while not officially sanctioning the use of the
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Quick Links
If you're a Google regular, you've no doubt noticed those snippets of linked information proliferating near the top-left of the first results page (see Figure 1-2). Where once there was only a sponsored link or two between you and your results, now there are spelling suggestions, news headlines, stock quotes, and all other manner of bits and bobs of rather useful information.
Figure 1-2: Quick links augment search results with relevant, current, and local information
Google is going beyond Web search results to include relevant finds from its other properties and those of third parties. Here, briefly, is the current catalog of quick links:
Spelling
One nice side effect of Google's listening to the Web is that it picks up a lot of words along the way. Some appear in the dictionary, while others haven't quite made their way into common parlance. Some are made up, while others are simply misspelled. Query Google for something that is commonly spelled another way, and it'll proffer some suggestions. [Hack #9] delves further into the wonders of Google's spell checker.
Definitions
TLAs (that's "three-letter acronyms") and geek speak abound. Rather than smiling knowingly when you've not a clue what someone just said, ask Google if it knows what your friend, boss, or medical professional is talking about. Prepend just about any word, obscure or garden-variety, with define (e.g., define happy) and the first item on your results page will in all probability be a definition pulled from one of any number of Web dictionaries. Use define: (note the colon—e.g., define:osteichthyes) and you'll pull up a whole page full of definitions [Hack #10] .
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Language Tools
In the early days of the Web, it seemed like most web pages were in English. But as more and more countries have come online, materials have become available in a variety of languages—including languages that don't originate with a particular country (such as Esperanto and Klingon).
Google offers several language tools, including one for translation and one for Google's interface. The interface option is much more extensive than the translation option, but the translation has a lot to offer.
The language tools are available by clicking the "Language Tools" link on the front page or by going to http://www.google.com/language_tools?hl=en.
The first tool allows you to search for materials from a certain country and/or in a certain language. This is an excellent way to narrow your searches; searching for French pages from Japan gives you far fewer results than searching for French pages from France. You can narrow the search further by searching for a slang word in another language. For example, search for the English slang word bonce on French pages from Japan.
The second tool on this page allows you to translate either a block of text or an entire web page from one language to another. Most of the translations are to and from English.
Machine translation is not nearly as good as human translation, so don't rely on this translation as either the basis of a search or as a completely accurate translation of the page you're looking at. Use it instead to give you the gist of whatever it translates.
You don't have to come to this page to use the translation tools. When you enter a search, you'll see that some search results that aren't in your language of choice (which you set via Google's preferences) have "[Translate this page]" next to their titles. Click on one of those and you'll be presented with a framed, translated version of the page. The Google frame, at the top, gives you the option of viewing the original version of the page, as well as returning to the results or viewing a copy suitable for printing.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Anatomy of a Search Result
You'd think a list of search results would be pretty straightforward, wouldn't you—just a page title and a link, possibly a summary? Not so with Google. Google encompasses so many search properties and has so much data at its disposal that it fills every results page to the rafters. Within a typical search result you can find sponsored links, ads, links to stock quotes, page sizes, spelling suggestions, and more.
By knowing more of the nitty-gritty details of what's what in a search result, you'll be able to make some guesses ("Wow, this page that links to my page is very large; perhaps it's a link list") and correct roadblocks ("I can't find my search term on this page; I'll check the version Google has cached").
Let's use the word "flowers" to examine this anatomy. Figure 1-3 shows the result page for flowers.
Figure 1-3: Result page for "flowers"
First, you'll note at the top of the page is a selection of tabs, allowing you to repeat your search across other Google search categories besides web pages, including Google Groups [ [Hack #1] . Beneath that you'll see a count for the number of results and how long the search took: about 48,000,000 results in 0.61 seconds (this will vary, sometimes by quite a bit).
Sometimes you'll see results/sites called out on colored backgrounds at the top or right of the results page (see Figure 1-3). These are called sponsored links (read: advertisements). Google has a policy of very clearly distinguishing ads and sticking to text-based advertising only rather than throwing flashing banners in your face like other sites do.
Beneath the sponsored links you sometimes see a category list. You'll see a category list only if you're searching for very general terms and your search consists of only one word. For example, if you searched for pinwheel flowers, Google wouldn't present the flowers category.
Other times you'll see news stories [Chapter 4] related to your query.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Setting Preferences
Google's Preferences page, shown in Figure 1-5, provides a nice, easy way to set and save your searching preferences.
Figure 1-5: Google's Preferences page
You can set your Interface Language, affecting the language in which tips and messages are displayed. Language choices range from Afrikaans to Zulu, with plenty of odd options, including Bork, bork, bork! (the Swedish Chef), Elmer Fudd, and Pig Latin, thrown in for fun.
Not to be confused with Interface Language, Search Language restricts what languages should be considered when searching Google's page index. The default is any language, but you could be interested only in web pages written in Chinese and Japanese, or French, German, and Spanish—the combination is up to you.
Google's SafeSearch filtering affords you a method of avoiding search results that may offend your sensibilities. No filtering means you're offered anything in the Google index. Moderate filtering rules out explicit images, but not explicit language. Strict filtering filters both text and images. The default is moderate filtering.
By default, Google displays 10 results per page. For more results, click any of the "Result Page: 1 2 3..." links at the bottom of each result page, or simply click the "Next" link.
You can specify your preferred number of results per page (10, 20, 30, 50, or 100), along with whether you want results to open in the current window or a new browser window.
For the purpose of research, it's best to have as many search results as possible on the page. Because it's all text, it doesn't take that much longer to load 100 results than it does to load 10. If you have a computer with a decent amount of memory, it's also good to have search results open in a new window; it'll keep you from losing your place and leave you a window with all the search results readily available.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Understanding Google URLs
If you're like most people, you usually pay little attention to the URLs in your browser's address bar as you surf from one site to the next. And you might choose to stick with this habit while searching Google. You ought to know, however, that a subtle alteration made to the URL that Google returns after a search can be an efficient method of tweaking your result set. In fact, there's at least one thing you can do by fiddling with (we like to call it hacking) the URL that you can do no other way, and there are quick tricks that might save you a trip back to the Advanced Search page.
Say you want to search for three blind mice. The URL of the page of results will vary depending on the preferences you've set, but it will look something like this:
http://www.google.com/search?num=100&hl=en&q=%22three+blind+mice%22
The query itself—q=%22three+blind+mice%22, %22 being a URL-encoded " (double quote)—is pretty obvious, but let's break down what those extra bits mean.
The num=100 refers to the number of search results to a page: 100 in this case. Google accepts any number from 1 to 100. Altering the value of num is a nice shortcut to altering the preferred size of your result set without having to meander over to the Advanced Search page and rerun your search.
Don't see the num= in your URL? Simply append it by clicking at the end of the URL in your browser's address bar and typing it in. To set the number of results per page to 20, for instance, you'd add &num=20.
You can add or alter any of the modifiers described here by appending them to the URL or changing their values—the part after the = (equals)—to something within the accepted range for the modifier in question. If you're adding a modifier, you'll need to use an & symbol (ampersand) too. Look at how the modifiers are joined together on URLs for other search results to see how it's done.
The hl=en means the language interface—the language in which you use Google, reflected in the home page, messages, and buttons—is in English. Google's Language Tools ["Language Tools" earlier in this chapter] page provides a list of language choices. Run your mouse over each language choice and notice the change reflected in the URL; the URL for Pig Latin looks like this:
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Browse the Google Directory
Google has a searchable subject index in addition to its eight-billion-page web search.
Google's Web Search indexes over eight billion pages, which means that it isn't suitable for all searches. When you've got a search that you can't narrow down, like if you're looking for information on a person about whom you know nothing, billions of pages will get very frustrating very quickly.
But you don't have to limit your searches to the Web. Google also has a searchable subject index, the Google Directory, at http://directory.google.com. Instead of indexing the entirety of billions of pages, the directory describes sites instead, indexing about 1.5 million URLs. This makes it a much better search for general topics.
Does Google spend time building a searchable subject index in addition to a full-text index? No, Google bases its directory on the Open Directory Project data at http://dmoz.org/. The collection of URLs at the Open Directory Project is gathered and maintained by a group of volunteers, but Google does add some of its own Googlish magic to it.
As you can see in Figure 1-6, the front of the site is organized into several topics. To find what you're looking for, you can either do a keyword search, or drill down through the hierarchies of subjects.
Figure 1-6: The Google Directory
Beside most listings, a couple of which are shown in Figure 1-7, you'll see a green bar. The green bar is an approximate indicator of the site's PageRank in the Google search engine. (Not every listing in the Google Directory has a corresponding PageRank in the Google web index.) Web sites are listed in the default order of Google PageRank, but you also have the option to list them in alphabetical order.
Figure 1-7: Individual listings under Science > Math > Mathematicians > Nash, John F., Jr.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Glean a Snapshot of Google in Time
Google Zeitgeist provides a weekly, monthly, and yearly overview of what the Web was interested in.
Turning to Google itself for a definition of zeitgeist, (define:zeitgeist), there's consensus that it refers to "the spirit of the times." And Google Zeitgeist (http://www.google.com/press/zeitgeist.html) is just that: a mirror that the Web (according to Google) holds up to us, providing a snapshot of the week, month, or year that was.
A typical weekly Google Zeitgeist (Figure 1-8) lists the top 10 gaining and declining queries and some hand-picked statistics (e.g., top Google News queries, popular sequels), fun facts (e.g., Tour de France versus Wimbledon), aggregate information gleaned about Googlers (e.g., operating systems, web browsers, languages), and any other trends that the Zeitgeist crew cares to delve into.
Figure 1-8: The week's top 10 gaining and declining queries
It takes only a few moments of visiting Google Zeitgeist before you're itching to go back a little further in time: the week your second child was born, the month of the Olympics, the year you graduated from high school. Click the "Archived information available here" link to browse the Google Zeitgeist Archive (Figure 1-9) of updates for every week, month, and year since January 2001.
Weekly Zeitgeist updates actually started in June 2001 at the same time the monthlies switched from PDF to HTML format.
Figure 1-9: The Zeitgeist Archive holds weekly, monthly, and yearly updates from January 2001 to today
The monthlies and year-ends provide more detail with trend graphs and also further break down searching by country, from Korea to Canada and points in between.
While Google Zeitgeist's statistics aren't earth shattering (e.g., "Searches for `iraq' more than doubled on March 19, the date that Operation Iraqi Freedom began"—imagine that!), it does provide you a snapshot of what the world in aggregate (55 billion searches in 2003) found interesting enough to look up.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Graph Google Results over Time
Use Google as a trend watcher.
As of November 2004, Google's index contains a whopping eight billion pages and growing. And it doesn't just record pages; it's filled with news and events, commentary and discussion, changes and trends. You might think of Google as a mirror that we hold up to the Web that approximates how we define and represent ourselves and our world.
It should come as no surprise, then, that people spend an awful lot of time and energy watching Google results in an attempt to spot emerging topics and track trends. If you've been tapped to do this for your company, product, project, or service, G-Metrics (http://g-metrics.com) might be right up your alley. G-Metrics measures the occurrence of a keyword or set of keywords defined by you across time—complete with graphs.
Register with G-Metrics (registration requires only your name and email address) for a login key. Once logged in, you can set queries, alter, remove, or review your queries and the results they've captured. Figure 1-10 shows my current watchlist, each query sporting a result count and percentage change over time.
Figure 1-10: G-Metrics watchlist results
Click a query for a trend graph from the time you added the search, counts for the past seven days, and Google's current top 10 results for that query, as shown in Figure 1-11.
Figure 1-11: G-Metrics trend graphing and details for a particular query
We also show you how to track result counts over time [Hack #3] , but G-Metrics takes this further, allowing you to monitor trends without a lot of legwork; your queries are "set it and forget it." You can even subscribe to an RSS feed of the results of any one of your queries. Sure, you could set up Google Alerts [Hack #59] , feed the numbers into a spreadsheet, and do the graphing yourself—but why?
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Visualize Google Results
The TouchGraph Google Browser is the perfect Google complement for those who appreciate visual displays of information.
Some people are born text crawlers. They can retrieve the mostly text resources of the Internet and browse them happily for hours. But others are more visually oriented and find that the flat text results of the Internet leave something to be desired, especially when it comes to search results.
If you're the type who appreciates visual displays of information, you're bound to like the TouchGraph Google Browser (http://www.touchgraph.com/TGGoogleBrowser.html). This Java applet allows you to start with the pages that are similar to one URL, and then expand outward to pages that are similar to the first set of pages, on and on, until you have a giant map of nodes (a.k.a. URLs) on your screen.
The TouchGraph Google Browser was created by Alex Shapiro (http://www.touchgraph.com/).
Note that what you're finding here are URLs that are similar to another URL. You aren't doing a keyword search, and you're not using the link: syntax. You're searching by Google's measure of similarity.
Start your journey by entering a URL on the TouchGraph home page and clicking the "Graph It" link. Your browser will launch the TouchGraph Java applet, covering your window with a large mass of linked nodes, as shown in Figure 1-12.
Figure 1-12: Mass of linked nodes generated by TouchGraph
You'll need a web browser capable of running Java applets. If Java support in your preferred browser comes in the form of a plug-in, your browser should have the smarts to launch a plug-in locator/downloader and walk you through the installation process.
If you're easily entertained like me, you might amuse yourself for a while just by clicking and dragging the nodes around. But there's more to do than that.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Check Your Spelling
Google sometimes takes the liberty of "correcting" what it perceives is a spelling error in your query.
If you've ever used other Internet search engines, you'll have experienced what I call stupid spellcheck. That's when you enter a proper noun and the search engine suggests a completely ludicrous query ("Elvish Parsley" for "Elvis Presley"). Google's quite a bit smarter than that.
When Google thinks it can spell individual words or complete phrases in your search query better than you can, it'll suggest a "better" search, hyperlinking it directly to a query. For example, if you search for hydrecefallus, Google will ask if you meant hydrocephalus, as shown in Figure 1-16.
Figure 1-16: Google offers spelling suggestions when it thinks it knows better
Suggestions aside, Google will assume that you know of what you speak and return your requested results, provided that your query gleaned results.
If your query found no results for the spellings you provided and Google believes it knows better, it will automatically run a new search of its own suggestions. Thus, a search for hydrecefallus finding (hopefully) no results will spark a Google-initiated search for hydrocephalus.
Given the sheer number of pages on the Web and the odds that at least one of the people proffering a page on the subject you're after either can't spell or can't type, I don't see these automatically generated searches based on Google's suggestions coming up that often these days.
For instance, because two web pages cite this hack as it first appeared in the previous edition of this title, the hydrecefallus example is blown. And I couldn't find another misspelling that both came up short on results and for which Google had any suggestions.
On the other hand—at least for now—a search for spodding oil texas turns up only 4 results, while the same search with correct spelling ("spudding"),
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Google Phonebook: Let Google's Fingers Do the Walking
Google makes an excellent phonebook, even to the extent of doing reverse lookups.
Google combines residential and business phone number information and its own excellent interface to offer a phonebook lookup that provides listings for businesses and residences in the United States. However, the search offers three different syntaxes, different levels of information provide different results, the syntaxes are finicky, and Google doesn't provide documentation.
Google offers three ways to search its phonebook:
phonebook
Searches the entire Google phonebook
rphonebook
Searches residential listings only
bphonebook
Searches business listings only
The result page for phonebook: lookups lists only five results, residential and business combined. The more specific rphonebook: and bphonebook: searches provide up to 30 results per page. For a better chance of finding what you're looking for, use the appropriate targeted lookup.
Using a standard phonebook requires knowing quite a bit of information about what you're looking for: first name, last name, city, and state. Google's phonebook requires no more than last name and state to get it started. Casting a wide net for all the Smiths in California is as simple as:
phonebook:smith ca
Try giving 411 a whirl with that request! Figure 1-17 shows the results of the query.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Think Global, Google Local
Take web searching to the streets—your street, in fact. Google Local narrows down all those zillions of results to those within range of a particular city, state, or postal code.
While the Web and Google have taught us to think global when it comes to looking for information, web searches often fail in the simple task of finding things in our own backyards. Sure, the island of Celebes is the home to Sulawesi Kalossi, but where can I find the finest cup of Sulawesi coffee within walking distance? And even more importantly: do they have free wireless Internet access?
That's not to say that Google isn't paying attention to any mention of locale in your queries. If you were, let's say, to search for scooters san francisco, you would notice a set of local San Francisco finds ["Quick Links" earlier in this chapter] at the top of the results page. As you can see in Figure 1-18, Google also provides addresses, phone numbers, and mileage (from the center of San Francisco, presumably).
Figure 1-18: Local find sometimes appear as "magic links" at the top of the results page
Google combines its index with data gleaned from the Yellow Pages to zero in on local results that very often prove interesting and useful.
This data is so interesting, in fact, that Google has taken the service beyond that sprinkling of magic links, launching Google Local (http://local.google.com), a location-aware frontend to the Google search engine. The Google Local home page (Figure 1-19) looks very much like what you're used to from Google, the only real difference being that there are two search boxes instead of just the one: What and Where. In the What box, you type your search query as usual. In the Where box, you can localize your search by providing a city (by itself, if the city is unambiguously well-known—e.g., San Francisco or New York, not Rome or Concord) and a state name or Zip Code.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Track Stocks
A well-crafted Google query will usually net you company information beyond those provided by traditional stock services.
Among the pantheon of lesser-known Google syntaxes is stocks:. Searching for stocks:symbol, where symbol represents the stock you're looking for, will redirect you to Yahoo! Finance (http://finance.yahoo.com/) for details. The Yahoo! page is actually framed by Google; off to the top-left is the Google logo, along with links to Quicken, Fool.com, MSN MoneyCentral, and other financial sites.
Feed Google a bum stock: query and you'll still find yourself at Yahoo! Finance, usually staring at a quote for stock that you've never heard of or a "Stock Not Found" page. Of course, you can use this to your advantage. Enter stocks: followed by the name of a company you're looking for (e.g., stocks:friendly). If the company's name is more than one word, choose the most unique word. Run your query and you'll arrive at the Yahoo! Finance stock lookup page shown in Figure 1-22.
Figure 1-22: Yahoo! Finance stock lookup page
Notice the Look Up button; click it and you'll be offered a list of companies that match "friendly" in some way. From there you can get the stock information that you want (assuming the company you wanted is on the list).
Google isn't particularly set up for basic stock research. You'll have to do your initial groundwork elsewhere, returning to Google armed with a better understanding of what you're looking for. I recommend going straight to Yahoo! Finance (http://finance.yahoo.com) to quickly look up stocks by symbol or company name. There you'll find all the basics: quotes, company profiles, charts, and recent news. For more in-depth coverage, I heartily recommend Hoovers (http://www.hoovers.com). Some of the information is free. For more depth, you'll have to pay a subscription fee.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Consult the Dictionary
Google, in addition to its own spellchecking index, provides hooks into Dictionary.com.
Google's spellchecking [Hack #5] is built on its own word and phrase database, gleaned while indexing web pages. Thus, it provides suggestions for lesser-known proper names, phrases, common sentence constructs, etc. Google also offers a definition service powered by Dictionary.com (http://www.dictionary.com). Such definitions, coming from a credible source and augmented by various specialty indexes, can be more limited.
Run a search. On the results page, you'll notice the phrase "Searched the web for [query words]." If the query words would appear in a dictionary, they will be hyperlinked to a dictionary definition. Identified phrases will be linked as a phrase; for example, the query "jolly roger" will allow you to look up the phrase "jolly roger." On the other hand, the phrase "computer legal" will allow you to look up the separate words "computer" and "legal."
The definition search will sometimes fail on obscure words, very new words, slang, and technical vocabularies (otherwise known as jargon). If you search for a word's meaning and Google can't help you, try enlisting the services of a meta-search dictionary, like OneLook (http://www.onelook.com/), which indexes over six million words from over 1,000 dictionaries. If that doesn't work, try Google again with one of the following tricks, queryword being the word you want to find:
  • If you're searching for several words—you're reading a technical manual, for example—search for them at the same time. Sometimes you'll find a glossary this way. For example, maybe you're reading a book about marketing, and you don't know many of the words. If you search for storyboard stet SAU, you'll get only a few search results, and they'll all be glossaries.
  • Try searching for your word and the word glossary, say, stet glossary. Be sure to use an unusual word; you may not know what a "spread" is in the context of marketing but searching for
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Look Up Definitions
Do you find yourself smiling knowingly when your boss mentions that well-known business principle that you've never heard of? Overwhelmed with "geek speak"? Chances are Google's heard it mentioned—and possibly even defined—somewhere before.
Most specialized vocabularies remain, for the most part, fairly static; words don't suddenly change their meaning all that often. Not so with technical and computer-related jargon. It seems like every 12 seconds someone comes up with a new buzzword or term relating to computers or the Internet, and then 12 minutes later it becomes obsolete or means something completely different—often more than one thing at a time. Maybe it's not that bad. It just feels that way.
Google can help you in two ways, by helping you look up words and by helping you figure out what words you don't know but need to know.
Before you assume you're going to be in for a lot of Googling, try the define search syntax mentioned in the "Quick Links" section earlier in this chapter. Simply prepend the definition you're after with the special syntax keyword define, like so:
define google
define julienne
define 42
Google tells us that these are defined as "most important spidering search engine," "cut a vegetable into long thin matchsticks," and "being two more than forty," thanks to to Juice New Media's Search Engine Glossary, The Youth Online Club, and WordNet at Princeton, respectively.
Click the associated "Definition in context" link to visit the page from which the definition was drawn.
Click the "Web definitions for..." link or prefix the word you're defining with define: (note the addition of a colon) in the first place, and you'll net a full page of definitions drawn from all manner of places. For instance, define:TLA finds turns up oodles of definitions (all about the same, mind you), as shown in Figure 1-24.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Search Article Archives
Google serves as a handy searchable archive for back issues of online publications.
Not all sites have their own search engines, and even the ones that do are sometimes difficult to use. Complicated or incomplete search engines are more pain than gain when attempting to search through archives of published articles. If you follow a couple of rules, Google is handy for finding back issues of published resources.
The trick is to use a common phrase to find the information you're looking for. Let's use The New York Times as an example.
Your first intuition when searching for previously published articles from NYTimes.com might be to simply use site:nytimes.com in your Google query. For example, if I wanted to find articles on George Bush, why not use:
"george bush" site:nytimes.com
This will indeed find you all articles mentioning George Bush published on NYTimes.com. What it won't find is all the articles produced by The New York Times but republished elsewhere.
While doing research, keep credibility firmly in mind. If you're doing casual research, maybe you don't need to double-check a story to make sure that it actually comes from The New York Times, but if you're researching a term paper, double-check the veracity of every article you find that isn't actually on The New York Times site.
What you actually want is a clear identifier, no matter the site of origin, that an article comes from The New York Times. Copyright disclaimers are perfect for the job. A New York Times copyright notice typically reads:
Copyright 2004 The New York Times Company
Of course, this would only find articles from 2004. A simple workaround is to replace the year with a Google full-word wildcard ["Full-Word Wildcards" earlier in this chapter]:
Copyright * The New York Times Company
Let's try that George Bush search again, this time using the snippet of copyright disclaimer instead of the
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Find Directories of Information
Use Google to find directories, link lists, and other collections of information.
Sometimes you're more interested in large information collections than scouring for specific bits and bobs. Using Google, there are a couple of different ways of finding directories, link lists, and other information collections. The first way makes use of Google's full-word wildcards ["Full-Word Wildards" earlier in this chapter] and the intitle: ["Special Syntax" earlier in this chapter]. The second is judicious use of particular keywords.
Pick something you'd like to find collections of information about. We'll use "trees" as our example. The first thing we'll look for is any page with the words "directory" and "trees" in its title. In fact, we'll build in a little buffering for words that might appear between the two using a couple of full-word wildcards (* characters). The resultant query looks something like this:
intitle:"directory * * trees"
This query will find "directories of evergreen trees," "South African trees," and of course "directories containing simply trees."
What if you wanted to take things up a notch, taxonomically speaking, and find directories of botanical information? You'd use a combination of intitle: and keywords, like so:
botany intitle:"directory of"
And you'd get almost 1,000 results. Changing the tenor of the information might be a matter of restricting results to those coming from academic institutions. Appending an edu site specification brings you to:
botany intitle:"directory of" site:edu
This gets you around 150 results, a mixture of resource directories, and, unsurprisingly, directories of university professors.
Mixing these syntaxes works rather well when you're searching for something that might also be an offline print resource. For example:
cars intitle:"encyclopedia of"
This query pulls in results from Amazon.com and other sites selling car encyclopedias. Filter out some of the more obvious book finds by tweaking the query slightly:
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Seek Out Weblog Commentary
Build queries to search only recent commentary appearing in weblogs.
There was a time when you needed to find current commentary, you didn't turn to a full-text search engine like Google. You searched Usenet, combed mailing lists, or searched through current news sites like CNN.com and hoped for the best.
But as search engines have evolved, they've been able to index pages more quickly than once every few weeks. In fact, Google tunes its engine to more readily index sites with a high information churn rate. At the same time, a phenomenon called the weblog (http://www.oreilly.com/catalog/essblogging/) has arisen: an online site keeps a running commentary and associated links, updated daily—and indeed, even more often in many cases. Google indexes many of these sites on an accelerated schedule. If you know how to find them, you can build a query that searches just these sites for recent commentary.
When weblogs first appeared on the Internet, they were generally updated manually or by using homemade programs. Thus, there were no standard words you could add to a search engine to find them. Now, however, many weblogs are created using either specialized software packages (lsuch as Movable Type, http://www.movabletype.org, or Radio Userland, http://radio.userland.com) or as web services (such as Blogger, http://www.blogger.com/). These programs and services are more easily found online with some clever use of special syntaxes or magic words.
For hosted weblogs, the site: syntax makes things easy. Blogger weblogs hosted at blog*spot (http://www.blogspot.com) can be found using site:blogspot.com. Even though Radio Userland is a software program able to post its weblogs to any web server, you can find the majority of Radio Userland weblogs at the Radio Userland community server (http://radio.weblogs.com) using site:radio.weblogs.com.
Finding weblogs powered by weblog software and hosted elsewhere is more problematic; Movable Type weblogs, for example, can be found all over the Internet. However, most of them sport a "powered by movable type" link of some sort; searching for the phrase "
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Cover Your Bases
Content preview·