Types of Robots.txt

Hi,

Good morning every one .

Today i am going to point out the few things regarding robots.txt file .A robots.txt file defines the privilege for Search engine crawler that which part of the website will be crawled and which part will not.

Example : Suppose in the root folder of your website there are many folders ,which you don't want to give access to anyone or robots (Crawler) like anon_ftp , cgi-bin , you can simply restrict these folder by adding a file in your root directory called robots.txt. Here is the format


User-agent: *

Disallow: /anon_ftp/

Disallow: /cgi-bin/

The above example defines that for all the crawler these two folders are restricted , they cannot index the content of these folder.if you wish , you can specify the different crawler by different restrictions.

Like you want only Googlebot not index these above mentioned two folders. then the syntax of robots.txt file will be .


User-Agent: Googlebot

Disallow: /anon_ftp/

Same applies for other crawler also .

Blocking user-agents

The Disallow line lists the pages you want to block. You can list a specific URL or a pattern. The entry should begin with a forward slash (/).

To block the entire site, use a forward slash.
```
Disallow: /
```
```
 
```
To block a directory and everything in it, follow the directory name with a forward slash.
```
Disallow: /junk-directory/
```
To block a page, list the page.
```
Disallow: /private_file.html
```
```
 
```

To remove a specific image from Google Images, add the following:


User-agent: Googlebot-Image

Disallow: /images/dogs.jpg

To remove all images on your site from Google Images:


User-agent: Googlebot-Image

Disallow: /

To block files of a specific file type (for example, .gif), use the following:
```
User-agent: Googlebot

Disallow: /*.gif$
```
```
 
```
To prevent pages on your site from being crawled, while still displaying AdSense ads on those pages, disallow all bots other than Mediapartners-Google. This keeps the pages from appearing in search results, but allows the Mediapartners-Google robot to analyze the pages to determine the ads to show. The Mediapartners-Google robot doesn't share pages with the other Google user-agents. For example:
```
 
```
```
User-agent: *
Disallow: /


ediapartners-Google
Allow: /
User-agent: 

M
```

Note that directives are case-sensitive. For instance, Disallow: /junk_file.asp would block http://www.example.com/junk_file.asp, but would allow http://www.example.com/Junk_file.asp. Googlebot will ignore white-space (in particular empty lines)and unknown directives in the robots.txt.

Hope this is helpful, Suggestions please write me akhi8601@gmail.com

Digital Marketing : Seo & PPC Tips

Types of Robots.txt

Blocking user-agents

0 comments:

Post a Comment

Subscribe Now

Contact Me

Blog Archive

RIP APJ ABDUL Kalam

Google Trends

Certifications

Total Pageviews