robot.txt - guiding the crawler

Robot.txt is meant for crawler and spiders. Using robot.txt web masters can guide and instruct crawler to specific areas of the website. Web masters can disallow crawler’s access to certain pages and folders of the website. One can also specify the crawl speed with which a crawler can index the site.

“Robot.txt is placed at the root folder of the site”

Webmaster can specify three instructions for crawlers namely

  • disallow : which tells the robot about sections which it cannot visit.
  • crawl delay: speed at which crawler crawls the site.
  • sitemap: Tells the crawler about all the pages associated with the Site and their URL.

Sitemap instruction is very useful for the Search Engine Optimization.

Example of Robots.txt:

User-agent: *
Disallow: /javascript/
Disallow: /css/
Disallow: /images/
 
sitemap:www.robotdevpaliwal.com/sitemap.xml

In this example robots wont be able to access three folders namely javascript, css, images using any user-agent.

Sitemap command tells the robot the location of the sitemap file.

Another example

User-agent: *
Disallow: /

This will not allow the robot to index even a single page from the server.

Another example

User-agent: xyzBot
Disallow: /

This will disallow xyzBot to crawl the website.

Another example

User-agent: *
Disallow: /password.php

This will disallow acces to password.php in the root folder.

This entry was posted in SEO, WEB 2.0 and tagged , , , . Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*