O'Reilly Hacks
oreilly.comO'Reilly NetworkSafari BookshelfConferences Sign In/My Account | View Cart   
Book List Learning Lab PDFs O'Reilly Gear Newsletters Press Room Jobs  



HACK
#17
Calculate Google Centuryshare
Determine the year in which Elvis achieved the height of his fame, over what period disco took hold of your nightlife, and when fuel economy actually mattered to anyone
[Discuss (0) | Link to this hack]

Looking to pin down the year something big happened or watch a trend unfold gradually over time? FindForward (http://www.findforward.com)—nee the Google Centuryshare Calculator (http://blog.outer-court.com/centuryshare)—employs some of the same logic as the Google Mindshare Calculator to determine the weight of a search query across a 50- year period.

Let's say, for example, we search for Chernobyl, site of a terrible nuclear power plant accident in April 1986. Enter the search term—in this case, chernobyl—in FindForward's search box and choose a range of years from the pull-down menu to the right. Given that my choices were 1900-1950 or 1950-2000 and the fact that I know I was alive when it happened, I chose 1950-2000. Click Find and the engine will chew on your query for a bit, it's backend feeding a steady stream of queries to Google via the Google API [ clearly shows that the Web knows a little something about Chernobyl and the year 1986.

Figure 1. The Centuryshare Calculator clearly shows something important happened at Chernobyl in 1986

So, how does the Centuryshare algorithm work?

Centuryshare tries to find natural peaks for ideas in particular years by searching the Web via Google. For every year, the number of the year is combined with the search query: to find out when Elvis Presley was at the height of his fame, the engine searches for Elvis Presley 1950, ElvisPresley 1951, Elvis Presley 1952, and so on, keeping track of the returned result count along the way.

But a simple count of results isn't quite enough. There is an additional transformation of these numbers that needs to be done in order for the result to be meaningful. Mention of various years occur in much larger quantity online: 1900, 1910, and 1920 occur more frequently, as do the years in the late part of the twentieth century—the boom of the Web.

So the Centuryshare calculator also gleans result count for each year by itself, without any additional search query (i.e., Google for 1950, 1951, 1952, etc.)

Those base numbers in hand, the engine then calculates a percentage based on the result count of year and search query relative to year by itself, without search query.

These result count percentages are normalized for display purposes and returned to you as a nice bar graph of results by year.

Compare the Chernobyl results in with those for the gentle rise and fall of disco in .

Figure 2. The Centuryshare Calculator on the bell (bottomed) curve of disco's reign

FindForward sports a host of other search features (http://findforward.com/about/), including Amazon.com, IRC logs, weblogs, assorted files people leave lying about on the Web, people (famous and not), and things. For instance, you can ask a question such as "When was Albert Einstein born?" and FindForward will trawl the Web, figure it out (or something close enough for horseshoes), and provide a link to the source, as shown in ).

Figure 3. Ask a decent question...

Check the source out for yourself by clicking the "Check source" link or find another by clicking "Find new answer."

Philipp Lenssen


O'Reilly Home | Privacy Policy

© 2007 O'Reilly Media, Inc.
Website: | Customer Service: | Book issues:

All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners.