
|


|
|
|
Removing Your Materials from Google
How to remove your content from
Google's various web properties.

Contributed by:
[03/13/03 | Discuss (6) | Link to this hack] |
Some people are more than thrilled to have Google's
properties index their sites. Other folks don't want
the Google bot anywhere near them. If you fall into the latter
category and the bot's already done its worst, there
are several things you can do to remove your materials from
Google's index. Each of Google's
properties—Web Search, Google Images, and Google
Groups—has its own set of methodologies.
Google's Web Search
Here are several tips to avoid being listed.
Making sure your pages never get there to begin with
While you can take steps to remove your content from the Google index
after the fact, it's always much easier to make sure
the content is never found and indexed in the first place.
Google's crawler obeys the "robot
exclusion protocol," a set of instructions you put
on your web site that tells the crawler how to behave when it comes
to your content. You can implement these instructions in two ways:
via a META tag that you put on each page (handy
when you want to restrict access to only certain pages or certain
types of content) or via a robots.txt file that
you insert in your root directory (handy when you want to block some
spiders completely or want to restrict access to kinds or directories
of content). You can get more information about the
robots exclusion protocol
and how to implement it at http://www.robotstxt.org/.
Removing your pages after they're indexed
There are several things you can have removed from
Google's results.
TIP
These instructions are for keeping your site out of
Google's index only. For information on keeping your
site out of all major search engines, you'll have to
work with the robots exclusion protocol.
- Removing the whole site
-
Use the robots exclusion protocol, probably with
robots.txt.
- Removing individual pages
-
Use the following META tag in the
HEAD section of each page you want to remove:
<META NAME="GOOGLEBOT" CONTENT="NOINDEX, NOFOLLOW">
- Removing snippets
-
A "snippet" is the little excerpt
of a page that Google displays on its search result. To remove
snippets, use the following META tag in the
HEAD section of each page for which you want to
prevent snippets:
<META NAME="GOOGLEBOT" CONTENT="NOSNIPPET">
- Removing cached pages
-
To keep Google from keeping cached versions of your pages in their
index, use the following META tag in the
HEAD section of each page for which you want to
prevent caching:
<META NAME="GOOGLEBOT" CONTENT="NOARCHIVE">
Removing that content now
Once you implement these changes, Google will remove or limit your
content according to your META tags and
robots.txt file the next time your web site is
crawled, usually within a few weeks. But if you want your materials
removed right away, you can use the automatic remover at http://services.google.com:8882/urlconsole/controller.
You'll have to sign in with an account (all an
account requires is an email address and a password). Using the
remover, you can request either that Google crawl your newly created
robots.txt file, or you can enter the URL of a
page that contains exclusionary META tags.
TIP
Make sure you have your exclusion tags all set up before you use this
service. Going to all the trouble of getting Google to pay attention
to a robots.txt file or exclusion rules that
you've not yet set up will simply be a waste of your
time.
Reporting pages with inappropriate content
You may like your content
fine, but you might find that even if you have filtering activated
you're getting search results with explicit content.
Or you might find a site with a misleading title tag and content
completely unrelated to your search.
You have two options for reporting these sites to Google. And bear in
mind that there's no guarantee that Google will
remove the sites from the index, but they will investigate them. At
the bottom of each page of search results, you'll
see "Help Us Improve" link; follow
it to a form for reporting inappropriate sites. You can also send the
URL of explict sites that show up on a SafeSearch but probably
shouldn't to
safesearch@google.com. If you have more general
complaints about a search result, you can send an email to
search-quality@google.com.
Google Images
Google Images' database of materials is separate
from that of the main search index. To remove items from Google
Images, you should use
robots.txt to specify that the Google bot Image
crawler should stay away from your site. Add these lines to your
robots.txt file:
User-agent: Googlebot-Image
Disallow: /
You can use the automatic remover mentioned in the web search section
to have Google remove the images from its index database quickly.
There may be cases where someone has put images on their server for
which you own copyright. In other words, you don't
have access to their server to add a robots.txt
file, but you need to stop Google's indexing of your
content there. In this case, you need to contact Google directly.
Google has instructions for situations just like this at http://www.google.com/remove.html; look at
Option 2, "If you do not have any access to the
server that hosts your image."
Removing Material from Google Groups
Like the Google Web Index,
you have the option to both prevent material from being archived on
Google and to remove it after the fact.
Preventing your material from being archived
To prevent your material from being archived on Google, add the
following line to the headers of your Usenet posts:
X-No-Archive: yes
If you do not have the options to edit the headers of your post, make
that line the first line in your post itself.
Removing materials after the fact
If you want materials removed after the fact, you have a couple of
options:
-
If the materials you want removed were posted under an address to
which you still have access, you may use the automatic removal tool
mentioned earlier in this hack.
-
If the materials you want removed were posted under an address to
which you no longer have access, you'll need to send
an email to groups-support@google.com with the
following information:
-
Your full name and contact information, including a verifiable email
address.
-
The complete Google Groups URL or message ID for each message you
want removed.
-
A statement that says "I swear under penalty of
civil or criminal laws that I am the person who posted each of the
foregoing messages or am authorized to request removal by the person
who posted those messages."
-
Your electronic signature.
Removing Your Listing from Google Phonebook
You may not wish to have
your contact information made available via the phonebook searches on
Google. You'll have to follow one of two procedures,
depending on whether the listing you want removed is for a business
or for a residential number.
If you want to remove a business phone number,
you'll need to send a request on your business
letterhead to:
Google PhoneBook Removal
2400 Bayshore Parkway
Mountain View, CA 94043
You'll also have to include a phone number where
Google can reach you to verify your request.
If you want to remove a residential phone number,
it's much simpler. You'll need to
fill out a form at http://www.google.com/help/pbremoval.html.
The form asks for your name, city and state, phone number, email
address, and reason for removal, a multiple choice: incorrect number,
privacy issue, or
"other."
Showing messages 1 through 6 of 6.
-
I had no idea Google had 'Phone Book' Output
2005-12-16 11:14:58
futurefront
[View]
-
phone # listing
2003-07-07 18:07:03
anonymous2
[View]
-
cheap trick
2003-03-23 15:24:03
anonymous2
[View]
-
cheap trick
2004-04-13 17:33:52
orgate
[View]
-
cheap trick
2003-05-07 08:14:31
anonymous2
[View]
-
cheap trick
2003-03-23 15:23:55
anonymous2
[View]
|
Showing messages 1 through 6 of 6.
|
|
O'Reilly Home | Privacy Policy

© 2007 O'Reilly Media, Inc.
Website:
| Customer Service:
| Book issues:
All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners.
|
|
|
David Martinez
http://www.seidon.com/
http://www.futurefront.com/