WebmasterLingo
JustEdge Dedicated Servers

Go Back   WebmasterLingo > Search Engine Corner
User Name
Password

Reply
 
Thread Tools Search this Thread Rate Thread Display Modes
Old November 18th, 2006, 06:01 PM   #1
ashgilpin
Registered User
 
Join Date: Nov 2006
Location: Virginia Beach, VA
Posts: 6
ashgilpin is on a distinguished road
Using Robots.txt on Your Web Site

If you are a webmaster, then you are probably already familiar with robots.txt or you have at least heard about it. This tool is used to exclude search engines from spidering particular content on your site.

A Brief History

Robots (also called wanderers or spiders) are programs that traverse many pages in the World Wide Web by recursively retrieving linked pages.

In 1993 and 1994 there have been occasions where robots have visited WWW servers where they weren’t welcome for various reasons. Sometimes these reasons were robot specific, e.g. certain robots swamped servers with rapid-fire requests, or retrieved the same files repeatedly. In other situations robots traversed parts of WWW servers that weren’t suitable, e.g. very deep virtual trees, duplicated information, temporary information, or cgi-scripts with side-effects (such as voting).

These incidents indicated the need for established mechanisms for WWW servers to indicate to robots which parts of their server should not be accessed. This standard addresses this need with an operational solution.

The method used to exclude robots from a server is to create a file on the server which specifies an access policy for robots. This file must be accessible via HTTP on the local URL “/robots.txt”.

Example of a robots.txt file:

User-agent: *
Disallow: /cyberworld/map/ # This is an infinite virtual URL space
Disallow: /foo.html

More information can be found on http://www.robotstxt.org or visit the Robot Control Code Generation Tool to prepare your own robots.txt.
ashgilpin is offline   Reply With Quote
Old January 21st, 2007, 12:15 PM   #2
Thomas Schulz
Registered User
 
Join Date: Jul 2006
Posts: 17
Thomas Schulz is on a distinguished road
I believe I read somewhere that Google supports some robot.txt extensions? Can not remember the specifics though.
Thomas Schulz is offline   Reply With Quote
Old June 12th, 2007, 03:22 PM   #3
Captain Tycoon
Registered User
 
Join Date: Jun 2007
Location: United Kingdom
Posts: 10
Captain Tycoon is on a distinguished road
Send a message via MSN to Captain Tycoon
Great article, learned something that i didn't know
Captain Tycoon is offline   Reply With Quote
Old June 20th, 2007, 08:50 AM   #4
Thomas Schulz
Registered User
 
Join Date: Jul 2006
Posts: 17
Thomas Schulz is on a distinguished road
Quote:
Originally Posted by Thomas Schulz
I believe I read somewhere that Google supports some robot.txt extensions? Can not remember the specifics though.

Well, one extension supported by Google, Yahoo, Ask is that you can have your robots.txt link xml sitemap . Basicly just by adding: "Sitemap: http://www.example.com/sitemap.xml" in your robots file. That is kinda neat, and likely to help you get indexed and save time in doing do.
Thomas Schulz is offline   Reply With Quote
Old September 15th, 2007, 04:31 AM   #5
eUKhost.com
Registered User
 
Join Date: Sep 2007
Posts: 2
eUKhost.com is on a distinguished road
Placing robots.txt may also help you in removing old pages from Google's index as the URLs with robots.txt are not spidered by the search engine bots.
__________________
eUKhost.com
eUKhost.com is offline   Reply With Quote
Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



All times are GMT -5. The time now is 05:42 AM.

Windows 2003, cPanel & DirectAdmin Unix Web Hosting