Hi,
Good morning every one .
Today i am going to point out the few things regarding robots.txt file .A robots.txt file defines the privilege for Search engine crawler that which part of the website will be crawled and which part will not.
Example : Suppose in the root folder of your website there are many folders ,which you don't want to give access to anyone or robots (Crawler) like anon_ftp , cgi-bin , you can simply restrict these folder by adding a file in your root directory called robots.txt. Here is the format
User-agent: *Disallow: /anon_ftp/
Disallow: /cgi-bin/
The above example defines that for all the crawler these two folders are restricted , they cannot index the content of these folder.if you wish , you can specify the different crawler by different restrictions.
Like you want only Googlebot not index these above mentioned two folders. then the syntax of robots.txt file will be .
User-Agent: GooglebotDisallow: /anon_ftp/
Same applies for other crawler also .
Blocking user-agents
The Disallow line lists the pages you want to block. You can list a
specific URL or a pattern. The entry should begin with a forward slash
(/).
- To block the entire site, use a forward slash.
Disallow: /
- To block a directory and everything in it, follow the directory name with a forward slash.
Disallow: /junk-directory/
- To block a page, list the page.
Disallow: /private_file.html
- To remove a specific image from Google Images, add the following:User-agent: Googlebot-ImageDisallow: /images/dogs.jpg
- To remove all images on your site from Google Images:User-agent: Googlebot-ImageDisallow: /
- To block files of a specific file type (for example, .gif), use the following:User-agent: GooglebotDisallow: /*.gif$
- To prevent pages on your site from being crawled, while still displaying AdSense ads on those pages, disallow all bots other than Mediapartners-Google. This keeps the pages from appearing in search results, but allows the Mediapartners-Google robot to analyze the pages to determine the ads to show. The Mediapartners-Google robot doesn't share pages with the other Google user-agents. For example:User-agent: *Disallow: /ediapartners-Google Allow: /User-agent:M
Note that directives are case-sensitive. For instance,
Disallow: /junk_file.asp
would block http://www.example.com/junk_file.asp, but would allow
http://www.example.com/Junk_file.asp. Googlebot will ignore white-space
(in particular empty lines)and unknown directives in the robots.txt.Hope this is helpful, Suggestions please write me akhi8601@gmail.com
0 comments:
Post a Comment
Leave a Reply