Blocking bad bots

nunoleite

June 07, 2018 18:29

Hi! I have seen lots of bots accessing my websites on my VPS. For now i just block IPs temporarily using CSF, but i would like to have a better and global solution. So, i'm thinking in 2 options... first: Apache Configuration -> Include Editor -> "Pre Main Include"


   SetEnvIfNoCase User-Agent "MJ12bot" bad_bots
   SetEnvIfNoCase User-Agent "AhrefsBot" bad_bots
   SetEnvIfNoCase User-Agent "SemrushBot" bad_bots
   SetEnvIfNoCase User-Agent "Baiduspider" bad_bots
   ...
  
     Require all granted
     Require not env bad_bots


  BrowserMatchNoCase "Baiduspider" bots
  BrowserMatchNoCase "HTTrack" bots
  BrowserMatchNoCase "Yandex" bots
  ...
  Order Allow,Deny
  Allow from ALL
  Deny from env=bots

second: using ModSecurity rules SecRule REQUEST_HEADERS:User-Agent "CareerBot" "deny,log,noauditlog,severity:2,msg:'Spiderbot blocked',status:403"
I don't know if this codes are 100% correct, as i found them on the internet and have not tested. Can i have some advice about these two options, using apache or modsecurity and if these codes would work? Thanks Nuno Leite

Comments

12 comments

fuzzylogic

June 08, 2018 07:53
Your Modsec rule would not work, it has no id (which is mandatory). Below is a copy of OWASP CRS rule 913102 (a Paranoia Level 2 rule), edited so as to block all the bots you listed in your examples. I am not recommending that all these bots should be blocked, just offering working syntax for the bots you decide you want to block. The id has had the number 1 added at the end so as to never cause a duplicate id error. SecRule REQUEST_HEADERS:User-Agent "@rx ^(?:MJ12bot|AhrefsBot|SemrushBot|Baiduspider|HTTrack|Yandex|CareerBot)$" \ "msg:'Found User-Agent associated with web crawler/bot',\ severity:'CRITICAL',\ id:9131021,\ rev:'1',\ phase:request,\ block,\ t:none,\ ver:'OWASP_CRS/3.0.0',\ maturity:'9',\ accuracy:'9',\ capture,\ logdata:'Matched Data: %{TX.0} found within %{MATCHED_VAR_NAME}: %{MATCHED_VAR}',\ tag:'application-multi',\ tag:'language-multi',\ tag:'platform-multi',\ tag:'attack-reputation-crawler',\ tag:'OWASP_CRS/AUTOMATION/CRAWLER',\ tag:'WASCTC/WASC-21',\ tag:'OWASP_TOP_10/A7',\ tag:'PCI/6.5.10',\ tag:'paranoia-level/2',\ setvar:'tx.msg=%{rule.msg}',\ setvar:tx.anomaly_score=+%{tx.critical_anomaly_score},\ setvar:tx.%{rule.id}-OWASP_CRS/AUTOMATION/CRAWLER-%{matched_var_name}=%{matched_var},\ setvar:ip.reput_block_flag=1,\ expirevar:ip.reput_block_flag=%{tx.reput_block_duration},\ setvar:'ip.reput_block_reason=%{rule.msg}'"
0
nunoleite

June 08, 2018 09:31
Hi! The list of bots i have in the examples are not necessarily the ones o need to block has there are only 3 or 4 that i see more and have big impact on the server load. So with this code i can use just the ModSecurity Tools and add this custom rule and changing the bot list on the first line would block all the bots i need, right? With this approach i can add and remove easily the bots i need to block in the whole server, right? Thanks
0
cPanelMichael

June 08, 2018 17:21
So with this code i can use just the ModSecurity Tools and add this custom rule and changing the bot list on the first line would block all the bots i need, right? With this approach i can add and remove easily the bots i need to block in the whole server, right?

Hello, You can use the rule as an example, but note some of the entries are designed for use with the OWASP rule set: OWASP ModSecurity CRS - cPanel Knowledge Base - cPanel Documentation Thank you.
0
fuzzylogic

June 08, 2018 23:06
cPanelMichael is correct the rule I posted relies on other OWASP CRS rules to do the blocking. It was wrong of me to assume you would have this in your environment. Here is another rule example based on the faulty rule example you posted. It will work as a standalone rule or alongside any rule-set I know of. SecRule REQUEST_HEADERS:User-Agent "@rx ^(?:MJ12bot|AhrefsBot)$" "msg:'Spiderbot blocked',phase:1,log,id:777777,t:none,block,status:403"
I tested this rule and it returned a 403 status for request with either of the following headers... User-Agent: MJ12bot User-Agent: AhrefsBot
So with this code i can use just the ModSecurity Tools and add this custom rule and changing the bot list on the first line would block all the bots i need, right?

That is correct. Add bots so that the regex has this form ^(?:bot1|bot2|bot3)$
With this approach i can add and remove easily the bots i need to block in the whole server, right?

ModSecurity rules added through... Home " Security Center " ModSecurity" Tools " Rules List " Add Rule are applied to all http requests to the cPanel server.
0
nunoleite

June 11, 2018 11:51
Thanks fuzzylogic. That's what i was looking for... a simple rule that could block these bad bots. I don't have OWASP rules installed because some time ago i tried that and it created lots of problems with some CMS i have in the server, and i didn't investigate better what rules to enable or disable to be compatible. I will try this new SecRule, thanks. What about the other option using apache configuration? Is it valid? Or using modsecurity is better? Thanks
0
cPanelMichael

June 11, 2018 14:23
What about the other option using apache configuration? Is it valid? Or using modsecurity is better?

Mod_Security rules are a better option in my opinion. It will make it easier for you to exclude rules for specific accounts if necessary. Thank you.
0
nunoleite

June 12, 2018 11:30
Hi! I have published this rule: SecRule REQUEST_HEADERS:User-Agent "@rx ^(?:AhrefsBot|MJ12bot|Yandex)$" "msg:'Spiderbot blocked',phase:1,log,id:777777,t:none,block,status:403"
But i still see this visitors with this user agent: Mozilla/5.0 (compatible; MJ12bot/v1.4.8; MJ12Bot | Home | from Majestic) This isn't being blocked.
0
fuzzylogic

June 13, 2018 11:38
If you want to match a fragment of the User-Agent you require a looser regular expression. SecRule REQUEST_HEADERS:User-Agent "@rx (?:MJ12bot|AhrefsBot)" "msg:'Spiderbot blocked',phase:1,log,id:777777,t:none,block,status:403"
Also it is possible you want to cover variations of uppercase and lowercase. This can be achieved by having modsecurity transform all User-Agent values to lowercase then enter the bot names in the regex as all lowercase. SecRule REQUEST_HEADERS:User-Agent "@rx (?:ahrefsbot|mj12bot|yandex)" "msg:'Spiderbot blocked',phase:1,log,id:777777,t:lowercase,block,status:403"
0
nunoleite

June 13, 2018 14:26
Hi! With this rule: SecRule REQUEST_HEADERS:User-Agent "@rx (?:AhrefsBot|MJ12bot|Yandex)" "msg:'Spiderbot blocked',phase:1,log,id:777777,t:none,block,status:403"
I think MJ12bot are being blocked, but i still see: user-agent: Mozilla/5.0 (compatible; AhrefsBot/5.2; +ahrefs.com/robot/) hum..... strange....
0
nunoleite

June 13, 2018 14:49
Hi! I have made some changes... I have added this 3 rules: SecRule REQUEST_HEADERS:User-Agent "@rx (?:AhrefsBot)" "msg:'AhrefsBot Spiderbot blocked',phase:1,log,id:7777771,t:none,block,status:403" SecRule REQUEST_HEADERS:User-Agent "@rx (?:MJ12bot)" "msg:'MJ12bot Spiderbot blocked',phase:1,log,id:7777772,t:none,block,status:403" SecRule REQUEST_HEADERS:User-Agent "@rx (?:Yandex)" "msg:'Yandex Spiderbot blocked',phase:1,log,id:7777773,t:none,block,status:403"
With these 3 rules i know in the "hits list" what is going on with each bot and what is hitting the rule, because they are being logged separately. If i understand right the bots still have access to sites, but they receive 0bytes and an 403 http error. After some hits the CSF firewall blocks permanently the IP. Is this right? And is this the right behavior?
0
cPanelMichael

June 13, 2018 14:53
After some hits the CSF firewall blocks permanently the IP. Is this right? And is this the right behavior?

Hello @nunoleite, Yes, this is the case as long as you have enabled the LF_MODSEC feature in CSF. Thank you.
0
nunoleite

June 13, 2018 15:31
Hi! Thanks. Now i think this is working fine, as i can see lots of Hits in the rules and some being blocked. But analyzing visitors it seems that AhrefsBot is still being served. 52759 Is this possible?
0

Please sign in to leave a comment.

Comments

Didn't find what you were looking for?