Search engines, by definition, are used to find and locate information on the World Wide Web. In addition to using search engines to search for information, attackers have ways of using search engines to identify and locate vulnerabilities and confidential data.
Using search engines to find vulnerabilities offers a way for attackers to probe a network without the target’s knowledge since the entire search request and response come from the search engine and not the target. The attacker doesn’t leave a footprint since he is not sending information to the target. Attackers also use a cached page to view the information, instead of accessing the site directly, which creates another layer of protection for them.
Numerous books and presentations discuss how to gather “sensitive” information from Google. Attackers can use Google to gather basic information such as contact lists, internal documents, and top-level organizational structures, as well as locate potential vulnerabilities in an organization’s web application.
Attackers can use a specific type of search query, called a dork, to locate security issues or confidential data. Attackers can use dorks to obtain firewall logs and customer data, and to find ways to access an organization’s database.
Security professionals have developed public databases of dorks. Dork databases exist for several different search engines; the most common dork database is the Google Hacking Database.
Note
The Google Hacking Database (GHDB) is a great resource for finding dorks that can aid an attacker. The GHDB is located at http://johnny.ihackstuff.com/ghdb/.
Using a dork is relatively simple. An attacker locates a dork of interest, and then uses Google to search for the dork. The following code is a dork that attempts to identify web applications that are susceptible to an SQL injection vulnerability by searching for a MySQL error message that commonly signifies the existence of an SQL injection flaw:
"Unable to jump to row" "on MySQL result index" "on line"
An attacker can limit the dork to a certain domain by adding the
site:
directive to the query string. For example,
here is a Google query that is limited to the example.com domain:
"Unable to jump to row" "on MySQL result index" "on line" site:example.com
Figure 1-4 illustrates the execution of the SQL injection dork. Notice that more than 900,000 results were returned!
An attacker can use the Search Engine Assessment Tool (SEAT), developed by Midnight Research Labs, to automate Google hacking. SEAT uses search engines and search caches to search for vulnerabilities for a particular domain.
SEAT supports multiple search engines, including Google, Yahoo!, and MSN. SEAT also has a variety of built-in dorks. The databases that SEAT uses (shown in Figure 1-5) were compiled from multiple sources, including the GHDB and Nikto.
An attacker can select multiple databases and search engines when using SEAT. Along with SEAT’s multithreading, these features aid the attacker greatly when he’s gathering information via search engine hacking. Figure 1-6 shows SEAT during the execution stage running 15 simultaneous queries.
Note
You can obtain the latest version of SEAT from http://midnightresearch.com/projects/search-engine-assessment-tool/.
Metadata is “data about other data.” A good example of metadata is the data that is often inserted into Microsoft Office documents such as Word. For instance, Microsoft Word inserts data such as usernames and folder paths of the author’s machine. Attackers can extract this metadata from documents that corporations have put online.
Using search engines, attackers can use specific directives to
limit their results to specific file types that are known to include
metadata. For example, the Google directive filetype:doc
will return only Microsoft
Word files. The following is a query that returns only PowerPoint
presentations that contain the phrase “Q4 Expenses”:
filetype:ppt "Q4 Expenses"
Attackers query Google using such queries; then they download the documents that are returned and examine them, pulling out any metadata stored within them.
Metagoofil is an automated tool that queries Google to find documents that are known to contain metadata. Metagoofil will query Google using a specific domain, download the files that are returned, and then attempt to extract the contents. Here is a demonstration of Metagoofil being used against example.com:
$ python metagoofil.py -d example.com -f all -l 3 -o example.html -t DL
*************************************
*MetaGooFil Ver. 1.4a *
*Coded by Christian Martorella *
*Edge-Security Research *
*cmartorella@edge-security.com *
*************************************
[+] Command extract found, proceeding with leeching
[+] Searching in example.com for: pdf
[+] Total results in google: 5300
[+] Limit: 3
[ 1/3 ] http://www.example.com/english/lic/gl_app1.pdf
[ 2/3 ] http://www.example.com/english/lic/gl_app2.pdf
[ 3/3 ] http://www.example.com/english/lic/gl_app3.pdf
[+] Searching in example.com for: doc
[+] Total results in google: 1500
[+] Limit: 3
[ 1/3 ] http://www.example.com/english/lic/gl_app1.doc
[ 2/3 ] http://www.example.com/english/lic/gl_app2.doc
[ 3/3 ] http://www.example.com/english/lic/gl_app3.doc
[+] Searching in example.com for: xls
[+] Total results in google: 20
[+] Limit: 3
[ 1/3 ] http://www.example.com/english/lic/gl_app1.xls
[ 2/3 ] http://www.example.com/english/lic/gl_app2.xls
[ 3/3 ] http://www.example.com/english/lic/gl_app3.xls
[+] Searching in example.com for: ppt
[+] Total results in google: 60
[+] Limit: 3
[ 1/3 ] http://www.example.com/english/lic/gl_app1.ppt
[ 2/3 ] http://www.example.com/english/lic/gl_app1.ppt
[ 3/3 ] http://www.example.com/english/lic/gl_app1.ppt
[+] Searching in example.com for: sdw
[+] Total results in google: 0
[+] Searching in example.com for: mdb
[+] Total results in google: 0
[+] Searching in example.com for: sdc
[+] Total results in google: 0
[+] Searching in example.com for: odp
[+] Total results in google: 0
[+] Searching in example.com for: ods
[+] Total results in google: 0
Usernames found:
================
rmiyazaki
tyamanda
hlee
akarnik
April Jacobs
Rwood
Amatsuda
Dmaha
Dock, Matt
Paths found:
============
C:\WINNT\Profiles\Dmaha\
C:\TEMP\Dmaha\
C:\Program Files\Microsoft Office\Templates|Presentation Designs\example
C:\WINNT\Profiles\Rwood
[+] Process finished
Note
The publicly available Python script metagoofil.py aids in searching, gathering, and extracting metadata from documents. It is available from http://www.edge-security.com/metagoofil.php.
Developers will often post code on public forums when they discover a bug they cannot solve. Too often, these developers will post code without redacting it in any way. It is unsettling how often these forums display code that clearly belongs to a specific organization.
Information such as the developer’s name, internal comments, code descriptions, and organizational ownership are among the items you can find in source code that is posted on public forums on the Internet.
Using Google, it is trivial to find some of this code in a short period of time. Using search terms such as “here is the code” and “here is the exact code” will return many results. Here is a code snippet that we found using Google (the code has been redacted):
<?php $error = ""; // Set a variable that will be used for errors $sendTo = ""; // Set a variable that will be used for emailing // Form is submitted if(isset($_POST['upload']) && $_POST['upload'] == 'Upload File') { $whereto = $_POST['where']; // Gets post value from select menu // Gets file value from file upload input $whatfile = $_FILES['uploadedfile']['name']; // This is the subject that will appear in the email $subject = "File uploaded to ". $whereto ." directory"; $from = "FTP UPLOAD <noreply@redacted.com>"; // Checks to see if $whereto is empty, if so echo error if(empty($whereto)) { $error = "You need to choose a directory.<br />"; } // Checks to see if file input field is empty, if so throw an error if($whatfile == NULL) { $error .= "You need to choose a file."; } //if no errors so far then continue uploading if(!empty($whereto) && $whatfile != NULL) { $target_path = "$whereto/"; // The directory the file will be placed ...
This code snippet describes upload functionality that is on a web server. An attacker can use this code to reverse-engineer how to get a file into a different directory, or how to bypass the security mechanisms that are in place.
Get Hacking: The Next Generation now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.