Skip to main content

Blocking bad bots

Comments

12 comments

  • fuzzylogic
    Your Modsec rule would not work, it has no id (which is mandatory). Below is a copy of OWASP CRS rule 913102 (a Paranoia Level 2 rule), edited so as to block all the bots you listed in your examples. I am not recommending that all these bots should be blocked, just offering working syntax for the bots you decide you want to block. The id has had the number 1 added at the end so as to never cause a duplicate id error. SecRule REQUEST_HEADERS:User-Agent "@rx ^(?:MJ12bot|AhrefsBot|SemrushBot|Baiduspider|HTTrack|Yandex|CareerBot)$" \ "msg:'Found User-Agent associated with web crawler/bot',\ severity:'CRITICAL',\ id:9131021,\ rev:'1',\ phase:request,\ block,\ t:none,\ ver:'OWASP_CRS/3.0.0',\ maturity:'9',\ accuracy:'9',\ capture,\ logdata:'Matched Data: %{TX.0} found within %{MATCHED_VAR_NAME}: %{MATCHED_VAR}',\ tag:'application-multi',\ tag:'language-multi',\ tag:'platform-multi',\ tag:'attack-reputation-crawler',\ tag:'OWASP_CRS/AUTOMATION/CRAWLER',\ tag:'WASCTC/WASC-21',\ tag:'OWASP_TOP_10/A7',\ tag:'PCI/6.5.10',\ tag:'paranoia-level/2',\ setvar:'tx.msg=%{rule.msg}',\ setvar:tx.anomaly_score=+%{tx.critical_anomaly_score},\ setvar:tx.%{rule.id}-OWASP_CRS/AUTOMATION/CRAWLER-%{matched_var_name}=%{matched_var},\ setvar:ip.reput_block_flag=1,\ expirevar:ip.reput_block_flag=%{tx.reput_block_duration},\ setvar:'ip.reput_block_reason=%{rule.msg}'"
    0
  • nunoleite
    Hi! The list of bots i have in the examples are not necessarily the ones o need to block has there are only 3 or 4 that i see more and have big impact on the server load. So with this code i can use just the ModSecurity Tools and add this custom rule and changing the bot list on the first line would block all the bots i need, right? With this approach i can add and remove easily the bots i need to block in the whole server, right? Thanks
    0
  • cPanelMichael
    So with this code i can use just the ModSecurity Tools and add this custom rule and changing the bot list on the first line would block all the bots i need, right? With this approach i can add and remove easily the bots i need to block in the whole server, right?

    Hello, You can use the rule as an example, but note some of the entries are designed for use with the OWASP rule set: OWASP ModSecurity CRS - cPanel Knowledge Base - cPanel Documentation Thank you.
    0
  • fuzzylogic
    cPanelMichael is correct the rule I posted relies on other OWASP CRS rules to do the blocking. It was wrong of me to assume you would have this in your environment. Here is another rule example based on the faulty rule example you posted. It will work as a standalone rule or alongside any rule-set I know of. SecRule REQUEST_HEADERS:User-Agent "@rx ^(?:MJ12bot|AhrefsBot)$" "msg:'Spiderbot blocked',phase:1,log,id:777777,t:none,block,status:403"
    I tested this rule and it returned a 403 status for request with either of the following headers... User-Agent: MJ12bot User-Agent: AhrefsBot
    So with this code i can use just the ModSecurity Tools and add this custom rule and changing the bot list on the first line would block all the bots i need, right?

    That is correct. Add bots so that the regex has this form ^(?:bot1|bot2|bot3)$
    With this approach i can add and remove easily the bots i need to block in the whole server, right?

    ModSecurity rules added through... Home " Security Center " ModSecurity" Tools " Rules List " Add Rule are applied to all http requests to the cPanel server.
    0
  • nunoleite
    Thanks fuzzylogic. That's what i was looking for... a simple rule that could block these bad bots. I don't have OWASP rules installed because some time ago i tried that and it created lots of problems with some CMS i have in the server, and i didn't investigate better what rules to enable or disable to be compatible. I will try this new SecRule, thanks. What about the other option using apache configuration? Is it valid? Or using modsecurity is better? Thanks
    0
  • cPanelMichael
    What about the other option using apache configuration? Is it valid? Or using modsecurity is better?

    Mod_Security rules are a better option in my opinion. It will make it easier for you to exclude rules for specific accounts if necessary. Thank you.
    0
  • nunoleite
    Hi! I have published this rule: SecRule REQUEST_HEADERS:User-Agent "@rx ^(?:AhrefsBot|MJ12bot|Yandex)$" "msg:'Spiderbot blocked',phase:1,log,id:777777,t:none,block,status:403"
    But i still see this visitors with this user agent: Mozilla/5.0 (compatible; MJ12bot/v1.4.8; MJ12Bot | Home | from Majestic) This isn't being blocked.
    0
  • fuzzylogic
    If you want to match a fragment of the User-Agent you require a looser regular expression. SecRule REQUEST_HEADERS:User-Agent "@rx (?:MJ12bot|AhrefsBot)" "msg:'Spiderbot blocked',phase:1,log,id:777777,t:none,block,status:403"
    Also it is possible you want to cover variations of uppercase and lowercase. This can be achieved by having modsecurity transform all User-Agent values to lowercase then enter the bot names in the regex as all lowercase. SecRule REQUEST_HEADERS:User-Agent "@rx (?:ahrefsbot|mj12bot|yandex)" "msg:'Spiderbot blocked',phase:1,log,id:777777,t:lowercase,block,status:403"
    0
  • nunoleite
    Hi! With this rule: SecRule REQUEST_HEADERS:User-Agent "@rx (?:AhrefsBot|MJ12bot|Yandex)" "msg:'Spiderbot blocked',phase:1,log,id:777777,t:none,block,status:403"
    I think MJ12bot are being blocked, but i still see: user-agent: Mozilla/5.0 (compatible; AhrefsBot/5.2; +ahrefs.com/robot/) hum..... strange....
    0
  • nunoleite
    Hi! I have made some changes... I have added this 3 rules: SecRule REQUEST_HEADERS:User-Agent "@rx (?:AhrefsBot)" "msg:'AhrefsBot Spiderbot blocked',phase:1,log,id:7777771,t:none,block,status:403" SecRule REQUEST_HEADERS:User-Agent "@rx (?:MJ12bot)" "msg:'MJ12bot Spiderbot blocked',phase:1,log,id:7777772,t:none,block,status:403" SecRule REQUEST_HEADERS:User-Agent "@rx (?:Yandex)" "msg:'Yandex Spiderbot blocked',phase:1,log,id:7777773,t:none,block,status:403"
    With these 3 rules i know in the "hits list" what is going on with each bot and what is hitting the rule, because they are being logged separately. If i understand right the bots still have access to sites, but they receive 0bytes and an 403 http error. After some hits the CSF firewall blocks permanently the IP. Is this right? And is this the right behavior?
    0
  • cPanelMichael
    After some hits the CSF firewall blocks permanently the IP. Is this right? And is this the right behavior?

    Hello @nunoleite, Yes, this is the case as long as you have enabled the LF_MODSEC feature in CSF. Thank you.
    0
  • nunoleite
    Hi! Thanks. Now i think this is working fine, as i can see lots of Hits in the rules and some being blocked. But analyzing visitors it seems that AhrefsBot is still being served. 52759 Is this possible?
    0

Please sign in to leave a comment.