Introduction
Often, your website will get crawled by different search engines and bots from around the world. Sometimes a bot may be crawling the site which can use a lot of bandwidth. This can start using too many resources for your website. To help avoid this, it is recommended to go through and set up a robots.txt file in the home directory of your website.
Procedure
To block the most common search engines in the robots.txt, it will need entries similar to the following examples. These will go through and block the search engines from crawling the site completely:
User-agent: Yandex Disallow: / User-agent: Baiduspider Disallow: / User-agent: Googlebot Disallow: / User-agent: Slurp Disallow: /
If you would like to go through and limit the search engines to specific folders you can go through and block specific directories:
User-agent: Googlebot Disallow: /cgi-bin/ User-agent: Yandex Disallow: /wp-admin
You can also add a Crawl-delay to reduce the frequency of requests from crawlers like so:
User-agent: *
Crawl-delay: 30
Note that Google does not respect crawl-delay settings. The article below covers how to set this for Google's crawler.
Comments
0 comments
Article is closed for comments.