Question
How can I prevent bots from crawling my website?
Answer
Often, your website will get crawled by different search engines and bots from around the world. Sometimes a bot may be crawling the site which can use a lot of bandwidth; this can also start using too many resources for your website. To help avoid this, it is recommended to go through and set up a robots.txt file in the home directory of your website.
To block the most common search engines in the robots.txt, it will need entries similar to the following examples. These will go through and block the search engines from crawling the site completely:
CONFIG_TEXT: User-agent: Yandex
Disallow: /
User-agent: Baiduspider
Disallow: /
User-agent: Googlebot
Disallow: /
User-agent: Slurp
Disallow: /
If you would like to go through and limit the search engines to specific folders you can go through and block specific directories:
CONFIG_TEXT: User-agent: Googlebot
Disallow: /cgi-bin/
User-agent: Yandex
Disallow: /wp-admin
You can also add a Crawl-delay to reduce the frequency of requests from crawlers like so:
CONFIG_TEXT: User-agent: *
Crawl-delay: 30
Note: Google does not respect crawl-delay settings. The article below covers how to set this for Google's crawler.
How to Change Googlebot crawl rate
Note: Apple's crawler may not respect the rules if there is a Googlebot rule present. The article below mentions this.
Comments
0 comments
Article is closed for comments.