Skip to main content

Blocking web crawlers. ModSecurity or in vhost?

Comments

9 comments

  • Handssler Lopez
    the best recommendation would be robots.txt so you block the bad ones and allow the good ones. * Even if you configure it, you decide whether or not to follow the instructions, blocking robots through apache is not very convenient, there may be problems with advertising campaigns, site validators, etc.
    0
  • DennisMidjord
    @Handssler Lopez I'm only talking about bad bots - not bots in general. Blocking access by configuring robots.txt is not a viable solution because a) we need to do it for every website on all of our servers, and b) it makes no difference if the bot doesn't respect robots.txt. A lot of them don't.
    0
  • ffeingol
    We just use mod_secuity. Example of rules (that we picked up somewhere) SecRule REQUEST_HEADERS:User-Agent "@rx (?:AhrefsBot)" "msg:'AhrefsBot Spiderbot blocked',phase:1,log,id:7777771,t:none,block,status:403" SecRule REQUEST_HEADERS:User-Agent "@rx (?:MJ12bot)" "msg:'MJ12bot Spiderbot blocked',phase:1,log,id:7777772,t:none,block,status:403" SecRule REQUEST_HEADERS:User-Agent "@rx (?:Yandex)" "msg:'Yandex Spiderbot blocked',phase:1,log,id:7777773,t:none,block,status:403" SecRule REQUEST_HEADERS:User-Agent "@rx (?:SeznamBot)" "msg:'SeznamBot Spiderbot blocked',phase:1,log,id:7777774,t:none,block,status:403"
    We grab the User-Agent for Apache logs and then just plug in. When you add another, you just need to increment the ID, so you don't have duplicates.
    0
  • DennisMidjord
    @ffeingol thanks for the input! I'm also looking for some feedback on whether one method is worse than the other (performance wise).
    0
  • nootkan
    There is a pretty good plugin that handles bad bots very well also. Just google "stopbadbots". The developer makes a plugin for wordpress and a stand alone plugin, I use both effectively.
    0
  • DennisMidjord
    Sorry if it wasn't clear enough :-) We want to block bots server wide - not for a single vhost.
    0
  • masterross
    We just use mod_secuity. Example of rules (that we picked up somewhere) SecRule REQUEST_HEADERS:User-Agent "@rx (?:AhrefsBot)" "msg:'AhrefsBot Spiderbot blocked',phase:1,log,id:7777771,t:none,block,status:403" SecRule REQUEST_HEADERS:User-Agent "@rx (?:MJ12bot)" "msg:'MJ12bot Spiderbot blocked',phase:1,log,id:7777772,t:none,block,status:403" SecRule REQUEST_HEADERS:User-Agent "@rx (?:Yandex)" "msg:'Yandex Spiderbot blocked',phase:1,log,id:7777773,t:none,block,status:403" SecRule REQUEST_HEADERS:User-Agent "@rx (?:SeznamBot)" "msg:'SeznamBot Spiderbot blocked',phase:1,log,id:7777774,t:none,block,status:403"
    We grab the User-Agent for Apache logs and then just plug in. When you add another, you just need to increment the ID, so you don't have duplicates.

    Hi, These rules work but if customer disabled modsec they wont apply to his account, right? Is there a way to use similar for CSF?
    0
  • cPRex Jurassic Moderator
    If you had a list of IPs that you want to block you could use CSF.
    0
  • masterross
    Actually, these custom ModSec rules trigger the CSF too: Time: Wed Mar 9 22:29:40 2022 +0200 IP: 95.108.213.9 (RU/Russia/95-108-213-9.spider.yandex.com) Failures: 4 (mod_security-custom) Interval: 3600 seconds Blocked: Permanent Block [LF_CUSTOMTRIGGER]
    0

Please sign in to leave a comment.