Types of Robots.txt

on Tuesday, 5 March 2013
Hi,

Good morning every one .

Today i am going to point out the few things regarding robots.txt file .A robots.txt file defines the privilege for Search engine crawler that which part of the website will be crawled and which part will not.

Example : Suppose in the root folder of your website there are many folders ,which you don't want to give access to anyone or robots (Crawler) like anon_ftp , cgi-bin , you can simply restrict these folder by adding a file in your root directory called robots.txt. Here is the format

User-agent: *
Disallow: /anon_ftp/
Disallow: /cgi-bin/ 

The above example defines that for all the crawler  these two folders are restricted , they cannot index the content of these folder.if you wish , you can specify the different crawler by different restrictions.
Like you want only Googlebot not index these above mentioned two folders. then the syntax of robots.txt file will be .

User-Agent: Googlebot
Disallow: /anon_ftp/
 
Same applies for other crawler also .
 

Blocking user-agents

The Disallow line lists the pages you want to block. You can list a specific URL or a pattern. The entry should begin with a forward slash (/).
  • To block the entire site, use a forward slash.
    Disallow: /
     
  • To block a directory and everything in it, follow the directory name with a forward slash.
    Disallow: /junk-directory/
  • To block a page, list the page.
    Disallow: /private_file.html
     
  • To remove a specific image from Google Images, add the following:
    User-agent: Googlebot-Image
    Disallow: /images/dogs.jpg 
     
  • To remove all images on your site from Google Images:
    User-agent: Googlebot-Image
    Disallow: / 
     
  • To block files of a specific file type (for example, .gif), use the following:
    User-agent: Googlebot
    Disallow: /*.gif$
     
  • To prevent pages on your site from being crawled, while still displaying AdSense ads on those pages, disallow all bots other than Mediapartners-Google. This keeps the pages from appearing in search results, but allows the Mediapartners-Google robot to analyze the pages to determine the ads to show. The Mediapartners-Google robot doesn't share pages with the other Google user-agents. For example:
     
    User-agent: *
    Disallow: /
    ediapartners-Google Allow: /
    User-agent:
    M
Note that directives are case-sensitive. For instance, Disallow: /junk_file.asp would block http://www.example.com/junk_file.asp, but would allow http://www.example.com/Junk_file.asp. Googlebot will ignore white-space (in particular empty lines)and unknown directives in the robots.txt.
 
 
Hope this is helpful, Suggestions please write me akhi8601@gmail.com


 
 
 

0 comments:

Post a Comment

Leave a Reply