Blocking web crawlers. ModSecurity or in vhost?
Lately, a lot of our customers' websites has been crawled by a lot of bots. Yesterday, a single website was crawled by 4 different bots at the same time. All of the bots were bad bots.
We want to block these bots but I'm wondering which method is the best performance wise or if it really doesn't matter.
So, does anyone have any recommendations for blocking bad bots?
-
the best recommendation would be robots.txt so you block the bad ones and allow the good ones. * Even if you configure it, you decide whether or not to follow the instructions, blocking robots through apache is not very convenient, there may be problems with advertising campaigns, site validators, etc. 0 -
@Handssler Lopez I'm only talking about bad bots - not bots in general. Blocking access by configuring robots.txt is not a viable solution because a) we need to do it for every website on all of our servers, and b) it makes no difference if the bot doesn't respect robots.txt. A lot of them don't. 0 -
We just use mod_secuity. Example of rules (that we picked up somewhere) SecRule REQUEST_HEADERS:User-Agent "@rx (?:AhrefsBot)" "msg:'AhrefsBot Spiderbot blocked',phase:1,log,id:7777771,t:none,block,status:403" SecRule REQUEST_HEADERS:User-Agent "@rx (?:MJ12bot)" "msg:'MJ12bot Spiderbot blocked',phase:1,log,id:7777772,t:none,block,status:403" SecRule REQUEST_HEADERS:User-Agent "@rx (?:Yandex)" "msg:'Yandex Spiderbot blocked',phase:1,log,id:7777773,t:none,block,status:403" SecRule REQUEST_HEADERS:User-Agent "@rx (?:SeznamBot)" "msg:'SeznamBot Spiderbot blocked',phase:1,log,id:7777774,t:none,block,status:403"
We grab the User-Agent for Apache logs and then just plug in. When you add another, you just need to increment the ID, so you don't have duplicates.0 -
@ffeingol thanks for the input! I'm also looking for some feedback on whether one method is worse than the other (performance wise). 0 -
There is a pretty good plugin that handles bad bots very well also. Just google "stopbadbots". The developer makes a plugin for wordpress and a stand alone plugin, I use both effectively. 0 -
Sorry if it wasn't clear enough :-) We want to block bots server wide - not for a single vhost. 0 -
We just use mod_secuity. Example of rules (that we picked up somewhere)
SecRule REQUEST_HEADERS:User-Agent "@rx (?:AhrefsBot)" "msg:'AhrefsBot Spiderbot blocked',phase:1,log,id:7777771,t:none,block,status:403" SecRule REQUEST_HEADERS:User-Agent "@rx (?:MJ12bot)" "msg:'MJ12bot Spiderbot blocked',phase:1,log,id:7777772,t:none,block,status:403" SecRule REQUEST_HEADERS:User-Agent "@rx (?:Yandex)" "msg:'Yandex Spiderbot blocked',phase:1,log,id:7777773,t:none,block,status:403" SecRule REQUEST_HEADERS:User-Agent "@rx (?:SeznamBot)" "msg:'SeznamBot Spiderbot blocked',phase:1,log,id:7777774,t:none,block,status:403"
We grab the User-Agent for Apache logs and then just plug in. When you add another, you just need to increment the ID, so you don't have duplicates.
Hi, These rules work but if customer disabled modsec they wont apply to his account, right? Is there a way to use similar for CSF?0 -
If you had a list of IPs that you want to block you could use CSF. 0 -
Actually, these custom ModSec rules trigger the CSF too: Time: Wed Mar 9 22:29:40 2022 +0200 IP: 95.108.213.9 (RU/Russia/95-108-213-9.spider.yandex.com) Failures: 4 (mod_security-custom) Interval: 3600 seconds Blocked: Permanent Block [LF_CUSTOMTRIGGER]
0
Please sign in to leave a comment.
Comments
9 comments