O'Reilly Hacks
oreilly.comO'Reilly NetworkSafari BookshelfConferences Sign In/My Account | View Cart   
Book List Learning Lab PDFs O'Reilly Gear Newsletters Press Room Jobs  



Deluxe Google scraper bookmarklet
Scrapes the search links from a Google search results page and produces an HTML table that can be imported to Excel.

Contributed by:
David Crossman
[04/21/05 | Discuss (2) | Link to this hack]

This hack builds on a prior contribution Google search scraper bookmarklet.. and operates in the same way. Installed in your Bookmark Toolbar, it generates a popup window containing a table of the Google Search results, which can be imported into Excel:

<resultNumber> <title> <link> <sequenceNumber>

ResultNumber and SequenceNumber are zero-based.

"Deluxe" features:
1. The description is now included as the title column of the first row for each result (with an empty link column).
2. A "Next" link requests the next page of Google search results, then asks for your confirmation that the page of results has loaded before adding them to the table.

Caveat: because of its size, this bookmarklet will not work in IE.

I have tested it only in FireFox 1.03, but it should work in any Mozilla browser. It uses DOM Level 1 Core functions.

To paste into MS-Excel:
  select the results: click in the results iframe, then (ctrl-A,ctrl-C),
  right-click on the target Excel worksheet cell,
  select "Paste special -> HTML".
That's all there is to it.

Here it is..
Drag this link gScrape to your toolbar and you're in business..

Here's what the bookmarklet creates in the popup window (prettied-up):
<body>
    <script language='javascript'>
    var s=0;
    function o(q){
        var d=frames[0].document,b=d.getElementsByTagName('tbody')[0],e,i=1,l,m,
        r=/^.*?<font size="-1">(.*)<br><font color.*$/,u,v,w,x,y,z;
        if (q>0){
            l=opener.document.links;
            i=l.length;
            while(i)
                if (/&sa=N$/i.test(l[--i].href)) {
                    opener.location.href = l[i].href;
                    break;
                }
        }
        if (!i)
            alert('No more results');
        else if (!q || confirm('Ready?')) {
            z=opener.document.getElementsByTagName('p');
            for(y in z)
                if(z[y].className=='g'){
                    u=z[y].innerHTML.replace(/[\\r\\n]+/g,' ');
                    w=0;
                    if(u.match(r))
                        a(u.replace(r,'$1').replace(/<.*?>/g,' ').replace(/\\s+/g,' '),'',0);
                    m=u.match(/<a .*?<\/a>/ig);
                    while(w<m.length)
                        if(!!(v=m[w++].match(/href="?([^ ">]+)[^>]*>(.+?)<\/a>/i)))
                            a(v[2].replace(/<br>/ig,' ').replace(/<[^>]+./g,''),(/^\//.test(v[1])?'http://'+location.host:'')+v[1],w);
                    s++;
                }
        }
        function a(l,u,c){
            var t='</td><td nowrap>',e=d.createElement('tr');
            b.appendChild(e);
            e.innerHTML = '<td>'+s+t+l+t+u+t+c+'</td>';
        }
    }
    </script>
    <div align=center>
        <a href='' Onclick='o(1);return false'>Next</a>
    </div>
    <iframe height=100% width=100%
        src="javascript:with(document){write('<body onload=top.o(0)><table><tbody></tbody></table></body>');close()}")>
    </iframe>
</body>


O'Reilly Home | Privacy Policy

© 2007 O'Reilly Media, Inc.
Website: | Customer Service: | Book issues:

All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners.