Hi,
Good morning every one .
Today i am going to point out the few things regarding robots.txt file .A robots.txt file defines the privilege for Search engine crawler that which part of the website will be crawled and which part will not.
Example : Suppose in the root folder of your website there are many folders ,which you don't want to give access to anyone or robots (Crawler) like anon_ftp , cgi-bin , you can simply restrict these folder by adding a file in your root directory called robots.txt. Here is the format
User-agent: *
Disallow: /anon_ftp/
Disallow: /cgi-bin/
The above example defines that for all the crawler these two folders are restricted , they cannot index the content of these folder.if you wish , you can specify the different crawler by different restrictions.
Like you want only Googlebot not index these above mentioned two folders. then the syntax of robots.txt file will be .
User-Agent: Googlebot
Disallow: /anon_ftp/
Same applies for other crawler also .
Blocking user-agents
The Disallow line lists the pages you want to block. You can list a
specific URL or a pattern. The entry should begin with a forward slash
(/).
To block the entire site, use a forward slash.
Disallow: /
To block a directory and everything in it, follow the directory name with a forward slash.
Disallow: /junk-directory/
To block a page, list the page.
Disallow: /private_file.html
To remove a specific image from Google Images, add the following:
User-agent: Googlebot-Image
Disallow: /images/dogs.jpg
To remove all images on your site from Google Images:
User-agent: Googlebot-Image
Disallow: /
To block files of a specific file type (for example, .gif), use the following:
User-agent: Googlebot
Disallow: /*.gif$
To prevent pages on your site from being crawled, while still displaying AdSense ads on those pages,
disallow all bots other than Mediapartners-Google. This keeps the pages
from appearing in search results, but allows the Mediapartners-Google
robot to analyze the pages to determine the ads to show. The
Mediapartners-Google robot doesn't share pages with the other Google
user-agents. For example:
User-agent: *
Disallow: /
ediapartners-Google
Allow: /
User-agent:
M
Note that directives are case-sensitive. For instance, Disallow: /junk_file.asp
would block http://www.example.com/junk_file.asp, but would allow
http://www.example.com/Junk_file.asp. Googlebot will ignore white-space
(in particular empty lines)and unknown directives in the robots.txt.
Hope this is helpful, Suggestions please write me akhi8601@gmail.com