Assemble Advanced Search Queries

By understanding how Yahoo! Advanced Search URLs are structured, you can create your own Advanced Search queries on the fly.

In addition to the simple search form you’ll find at http://search.yahoo.com, Yahoo! offers an Advanced Web Search form at http://search.yahoo.com/web/advanced. This form lets you refine your search in a number of ways, so you can narrow the results to a more useful list.

For example, if you’d like to find information about a generic topic, such as astronomy, you could go to Yahoo!, type astronomy into the search form, and find hundreds of sites related to the word. But if you want only a segment of those results, you can browse over to the Advanced Web Search form, type astronomy, and limit the results by top-level domain, as shown in Figure 1-9.

Yahoo! Advanced Search form

Figure 1-9. Yahoo! Advanced Search form

A search for astronomy across .gov sites returns only pages at NASA’s web site. The same search limited to .edu sites results in astronomy programs at various universities, and limiting to .com gives you astronomy magazines at the top of the results.

You can further refine your search by limiting it to a specific file format, such as PDF files, Excel spreadsheets, or XML files. For any given search, you can also override your global preferences settings for language, number of results, and adult content filtering.

Anatomy of an Advanced Search URL

To get started with hacking URLs, type a term into the Advanced Web Search form and click the Yahoo! Search button, which will take you to the results page. Once there, note the insanely long URL in the address of your browser. It will look something like this:

	http://search.yahoo.com/search?_adv_prop=web&x=op&ei=UTF-8&va=astronomy&va_
	vt=any&vp_vt=any&vo_vt=any&ve_vt=any&vd=all&vst=.gov&vs=.gov&vf=all&vm=p&  
	fl=0&n=20

For any given search URL, some of the variables you’ll find in the URL are redundant or not necessary. The web form basically acts as a URL-building tool that has assembled this URL for you, and it isn’t picky about which variables it includes. By understanding the pieces of the URL, you can construct your own queries using shorter URLs without the form.

Note that the domain is followed by /search?, followed by a series of variable/ value pairs separated by ampersands. Not all of these variables will affect the search results, but there are some that are useful to play with. The variables are a bit cryptic (to keep the URLs as short as possible), so here’s a list of the relevant variables and what they represent.

The v* variables represent the way you’d like Yahoo! to handle the phrase. You can choose from the following variables:

Table 1-2. 

va

Use this variable when you’re looking for all of the words in a particular query. A query with the value astronomy magazine finds pages that contain both astronomy and magazine.

vp

This variable holds the search query when you want to match a specific phrase, so a query with the value astronomy magazine finds pages that contain the exact phrase astronomy magazine.

vo

This variable indicates a search for any of the words in a particular query. So a query with the value astronomy magazine returns documents that contain either astronomy or magazine.

ve

This variable indicates words that should not appear in any of the pages, and it must be used with one of the other variables. For example, combining one of the above queries with ve=NASA allows you to search for astronomy magazine on pages that don’t include the term NASA.

Another group of similarly patterned variables lets you limit searching to a specific part of a document, such as the title or URL. The format for these variables is v*_vt, where the asterisk is replaced by the type of primary search query. The possible values include any, title,or url. For example, if you’d like to search for pages that have the exact phrase astronomy magazine in the title, use the vp and vp_vt variables together, like so:

	search?vp=astronomy+magazine&vp_vt=title

If you’d like to limit your results to pages that have been updated recently, you can use the vd variable. You can get all results, which is the default, or limit them to pages updated within the last three months, six months, or year. The respective values for these are all, m3, m6,or y. So finding all documents that contain the phrase astronomy magazine that have been updated within the last three months looks like this:

	search?vp=astronomy+magazine&vp_vt=any&vd=m3

The vs variable is useful for limiting searches to a top-level domain, such as .com. In addition to top-level searches, you can narrow things to a specific web site. If you want to find every mention of astronomy magazine at the specific web site http://www.cnn.com, you could use the variable like this:

	search?vp=astronomy+magazine&vp_vt=any&vs=cnn.com

The vf variable limits searches to a specific file type. Yahoo! supports a set number of file types, and here are the current values you can use with this variable:

all

The default value; returns any type of document

html

HTML documents

pdf

Adobe PDF files

xl

Microsoft Excel spreadsheets (note that this value is an abbreviation for the full file extension, .xls)

ppt

Microsoft PowerPoint presentations

msword

Microsoft Word files

rss

Files formatted for syndication across web sites

text

Plain text files, which typically end with .txt

To continue with the example, say you want to find the phrase astronomy magazine in only PowerPoint presentations. Append the vf variable, like so:

	search?vp=astronomy+magazine&vp_vt=any&vf=ppt

The number of results is controlled by the n variable, which can be set only to some predetermined values: 10, 15, 20, 30, 40,or 100. To return the first 40 results for the phrase astronomy magazine, add the n variable, like so:

	search?vp=astronomy+magazine&vp_vt=any&n=40

There are other variables in advanced search URLs, but these are a few that will affect the content of search results. Now that you know why the initial Advanced Web Search URL was so long, you can use some of the variables to create your own advanced Yahoo! searches on the fly.

Get Yahoo! Hacks now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.