Search Engine Hacking

Search engines, by definition, are used to find and locate information on the World Wide Web. In addition to using search engines to search for information, attackers have ways of using search engines to identify and locate vulnerabilities and confidential data.

Using search engines to find vulnerabilities offers a way for attackers to probe a network without the target’s knowledge since the entire search request and response come from the search engine and not the target. The attacker doesn’t leave a footprint since he is not sending information to the target. Attackers also use a cached page to view the information, instead of accessing the site directly, which creates another layer of protection for them.

Google Hacking

Numerous books and presentations discuss how to gather “sensitive” information from Google. Attackers can use Google to gather basic information such as contact lists, internal documents, and top-level organizational structures, as well as locate potential vulnerabilities in an organization’s web application.

Attackers can use a specific type of search query, called a dork, to locate security issues or confidential data. Attackers can use dorks to obtain firewall logs and customer data, and to find ways to access an organization’s database.

Security professionals have developed public databases of dorks. Dork databases exist for several different search engines; the most common dork database is the Google Hacking Database.

Note

The Google Hacking Database (GHDB) is a great resource for finding dorks that can aid an attacker. The GHDB is located at http://johnny.ihackstuff.com/ghdb/.

Using a dork is relatively simple. An attacker locates a dork of interest, and then uses Google to search for the dork. The following code is a dork that attempts to identify web applications that are susceptible to an SQL injection vulnerability by searching for a MySQL error message that commonly signifies the existence of an SQL injection flaw:

"Unable to jump to row" "on MySQL result index" "on line"

An attacker can limit the dork to a certain domain by adding the site: directive to the query string. For example, here is a Google query that is limited to the example.com domain:

"Unable to jump to row" "on MySQL result index" "on line" site:example.com

Figure 1-4 illustrates the execution of the SQL injection dork. Notice that more than 900,000 results were returned!

Execution of an SQL injection dork

Figure 1-4. Execution of an SQL injection dork

Automating Google Hacking

An attacker can use the Search Engine Assessment Tool (SEAT), developed by Midnight Research Labs, to automate Google hacking. SEAT uses search engines and search caches to search for vulnerabilities for a particular domain.

SEAT supports multiple search engines, including Google, Yahoo!, and MSN. SEAT also has a variety of built-in dorks. The databases that SEAT uses (shown in Figure 1-5) were compiled from multiple sources, including the GHDB and Nikto.

An attacker can select multiple databases and search engines when using SEAT. Along with SEAT’s multithreading, these features aid the attacker greatly when he’s gathering information via search engine hacking. Figure 1-6 shows SEAT during the execution stage running 15 simultaneous queries.

Note

You can obtain the latest version of SEAT from http://midnightresearch.com/projects/search-engine-assessment-tool/.

Extracting Metadata from Online Documents

Metadata is “data about other data.” A good example of metadata is the data that is often inserted into Microsoft Office documents such as Word. For instance, Microsoft Word inserts data such as usernames and folder paths of the author’s machine. Attackers can extract this metadata from documents that corporations have put online.

Using search engines, attackers can use specific directives to limit their results to specific file types that are known to include metadata. For example, the Google directive filetype:doc will return only Microsoft Word files. The following is a query that returns only PowerPoint presentations that contain the phrase “Q4 Expenses”:

filetype:ppt "Q4 Expenses"
SEAT’s different built-in vulnerability databases

Figure 1-5. SEAT’s different built-in vulnerability databases

Attackers query Google using such queries; then they download the documents that are returned and examine them, pulling out any metadata stored within them.

Metagoofil is an automated tool that queries Google to find documents that are known to contain metadata. Metagoofil will query Google using a specific domain, download the files that are returned, and then attempt to extract the contents. Here is a demonstration of Metagoofil being used against example.com:

$ python metagoofil.py -d example.com -f all -l 3 -o example.html -t DL
*************************************
*MetaGooFil Ver. 1.4a               *
*Coded by Christian Martorella      *
*Edge-Security Research             *
*cmartorella@edge-security.com      *
*************************************

[+] Command extract found, proceeding with leeching
[+] Searching in example.com for: pdf
[+] Total results in google: 5300
[+] Limit: 3
        [ 1/3 ] http://www.example.com/english/lic/gl_app1.pdf
        [ 2/3 ] http://www.example.com/english/lic/gl_app2.pdf
        [ 3/3 ] http://www.example.com/english/lic/gl_app3.pdf
[+] Searching in example.com for: doc
[+] Total results in google: 1500
[+] Limit: 3
        [ 1/3 ] http://www.example.com/english/lic/gl_app1.doc
        [ 2/3 ] http://www.example.com/english/lic/gl_app2.doc
        [ 3/3 ] http://www.example.com/english/lic/gl_app3.doc
[+] Searching in example.com for: xls
[+] Total results in google: 20
[+] Limit: 3
        [ 1/3 ] http://www.example.com/english/lic/gl_app1.xls
        [ 2/3 ] http://www.example.com/english/lic/gl_app2.xls
        [ 3/3 ] http://www.example.com/english/lic/gl_app3.xls
[+] Searching in example.com for: ppt
[+] Total results in google: 60
[+] Limit: 3
        [ 1/3 ] http://www.example.com/english/lic/gl_app1.ppt
        [ 2/3 ] http://www.example.com/english/lic/gl_app1.ppt
        [ 3/3 ] http://www.example.com/english/lic/gl_app1.ppt
[+] Searching in example.com for: sdw
[+] Total results in google: 0
[+] Searching in example.com for: mdb
[+] Total results in google: 0
[+] Searching in example.com for: sdc
[+] Total results in google: 0
[+] Searching in example.com for: odp
[+] Total results in google: 0
[+] Searching in example.com for: ods
[+] Total results in google: 0

Usernames found:
================
rmiyazaki
tyamanda
hlee
akarnik
April Jacobs
Rwood
Amatsuda
Dmaha
Dock, Matt

Paths found:
============
C:\WINNT\Profiles\Dmaha\
C:\TEMP\Dmaha\
C:\Program Files\Microsoft Office\Templates|Presentation Designs\example
C:\WINNT\Profiles\Rwood
[+] Process finished
SEAT using 15 threads, searching for vulnerabilities using multiple search engines

Figure 1-6. SEAT using 15 threads, searching for vulnerabilities using multiple search engines

Note

The publicly available Python script metagoofil.py aids in searching, gathering, and extracting metadata from documents. It is available from http://www.edge-security.com/metagoofil.php.

Searching for Source Code

Developers will often post code on public forums when they discover a bug they cannot solve. Too often, these developers will post code without redacting it in any way. It is unsettling how often these forums display code that clearly belongs to a specific organization.

Information such as the developer’s name, internal comments, code descriptions, and organizational ownership are among the items you can find in source code that is posted on public forums on the Internet.

Using Google, it is trivial to find some of this code in a short period of time. Using search terms such as “here is the code” and “here is the exact code” will return many results. Here is a code snippet that we found using Google (the code has been redacted):

<?php
$error = ""; // Set a variable that will be used for errors
$sendTo = ""; // Set a variable that will be used for emailing
// Form is submitted
if(isset($_POST['upload']) && $_POST['upload'] == 'Upload File')
{
$whereto = $_POST['where']; // Gets post value from select menu
// Gets file value from file upload input
$whatfile = $_FILES['uploadedfile']['name'];
// This is the subject that will appear in the email
$subject = "File uploaded to ". $whereto ." directory";
$from = "FTP UPLOAD <noreply@redacted.com>";
// Checks to see if $whereto is empty, if so echo error
if(empty($whereto))
{
$error = "You need to choose a directory.<br />";
}
// Checks to see if file input field is empty, if so throw an error
if($whatfile == NULL) {
$error .= "You need to choose a file.";
}
//if no errors so far then continue uploading

if(!empty($whereto) && $whatfile != NULL) {
$target_path = "$whereto/"; // The directory the file will be placed
...

This code snippet describes upload functionality that is on a web server. An attacker can use this code to reverse-engineer how to get a file into a different directory, or how to bypass the security mechanisms that are in place.

Get Hacking: The Next Generation now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.