Huge increase in Apache processes

GoWilkes

June 01, 2019 14:00

I'm having a problem that I can't figure out, and I'm wondering if it's cPanel related? If not, maybe you guys will have an idea of how to narrow it down. Yesterday from around 5:30am until 7am, I had a huge increase in Apache processes that was causing my server to freeze up. I normally don't have more than 50 or so processes during my peak time, but this period was hitting the Server Limit that I had set in Apache configuration of 100. By time I saw it, though, it had ended. Then at around 4pm, it started again. This time I was there to see it, but couldn't find any reason for it. I checked the number of connections using: netstat -plan | grep :80 | awk '{print $5}' | cut -d : -f 1 | sort | uniq -c | sort -nr | head
but didn't see anything unexpected. I rebooted Apache, then MySQL, then the entire server, but none of them had any impact. I was able to stop the server from freezing up by increasing Server Limit in Apache configuration to 256, but that's just a Band-aid. My number of Apache processes has stayed between 100 and 150 all night and all day, even when netstat showed that I only had 4 or 5 connections. It's also notable that "Individual Interrupts" and "Disk Latency" in Munin went crazy at the same time. I'm not sure what "Individual Interrupts" means, but an orange graph that's usually near 1e+02 dropped down below 1e-04. And under "Disk Latency", /dev/xvdb/ has a green graph that's usually at around 1e-02 that dropped down to 1e-04. That made me suspect hardware failure, but I messaged Softlayer (who has the worst service now) and they said that with it being a virtual server then I wouldn't see hardware errors like that. So I'm not sure if the change in Interrupts and Latency is relevant, or just a symptom of another problem. I'm running CentOS 6.10 xen hvm, and WHM is v 76.0.20. I'm still running EasyApache 3, so WHM/cPanel hasn't updated to 78. Any suggestions you guys can give would be greatly appreciated!! Thanks in advance!

Comments

16 comments

GOT

June 01, 2019 20:27
Well, it sounds like you are getting some kind of DoS attack. This command: /usr/bin/lynx -dump -width 500
0
GoWilkes

June 02, 2019 01:30
Thanks for the commands, those are very helpful! I didn't think about changing it from :80 after I moved everything to HTTPS. But I'm still not seeing a high number of connections. From the first command, I have 28 connections right now, but Munin is showing about 100 Apache processes; roughly double the number that I had at this time on May 30. And using the second command, the IP with the highest number of connections is a local IP, with 13 connections (pretty much what I would expect). I even blocked all non-US IP addresses in CSF (firewall) using CC_ALL_FILTER (only allowing US), but it had no noticeable impact on the problem.
0
GOT

June 02, 2019 02:50
You might want to read the docs on that csf filter. If my memory serves me I dont think it works like you're expecting. As for munin I would not necessarily use that for real time diagnostics. ps axf|grep httpd|wc will give you a live count of Apache processes. From your numbers it doesn't sound like an attack but you should look at your general Apache settings. I believe the default max children/servers is set to 150 by default and if you are exceeding that then pages won't load I would also look at Apache status in whm because sometimes your eyes can show you things that just getting numbers from commands doesn't reveal.
0
GoWilkes

June 02, 2019 05:42
This is what I was going by on CC_ALLOW_FILTER: [quote]An alternative to CC_ALLOW is to only allow access from the following countries but still filter based on the port and packets rules. All other connections are dropped
And this: crybit.com/block-whole-countries-csf/ I used the command you posted (ps axf|grep httpd|wc ) and this was the result: 46 366 3106 There wasn't a column header, though, so I'm not sure what I'm looking at here. It's 1:30am here right now, and the 46 matches what Munin shows for the current number of processes, but I'm not sure what the 366 or 3106 represent. Regardless, I would usually have 46 processes at peak time, not at 1:30. It should be more like 15-20 right now. [quote]From your numbers it doesn't sound like an attack but you should look at your general Apache settings. I believe the default max children/servers is set to 150 by default and if you are exceeding that then pages won't load
You're right, and that turned out to be why my site was freezing up. Raising the number stopped it from freezing, but I have no clue why it increased in the first place :-( [quote]I would also look at Apache status in whm because sometimes your eyes can show you things that just getting numbers from commands doesn't reveal.
Possibly because of the increase I made on Max Clients and Server Limit, but I do have about 100 of these: ::1 myservername.com OPTIONS * HTTP/1.0 I'm guessing that's normal, though... 100+/- free slots?
0
MaFt

June 03, 2019 06:19
I'm following this as I've seen exactly the same. For years my sites averaged 4-6 Entry Processes and suddenly on Friday around 4-5pm UK time I was hitting "resource limit is reached" errors as these were limited to 20 on this server. I'm a reseller though and have no control over the limits. I've managed to minimise this by shutting down 1 site completely and using Cloudflare's "I'm under attack" to reduce the number of visitors. Not ideal though as it's meant a 40% loss of income over the weekend compared to normal - but at least the sites are online. The hosts are being painfully slow and keep saying they'll increase the limits. They still haven't. However, they've still not actually responded to my main query as to why the sites in question, with no changes at my end, are suddenly being reported as using a lot more processes than previously. Looking at the cPanel "concurrent usage" logs for 30 days you can see the sudden spike from Friday. It seems very weird that the only similar thing I can find is this post - and the same issue also started on Friday too; Around the same time too (assuming the original poster is in the US). I'm hopeful my hosts can find out what's going on and I'll certainly report back here if they find anything out.
0
GoWilkes

June 03, 2019 18:14
You're right, MaFt, I'm in eastern US. That's too much to be a coincidence, I think. I ran ClamAV and rkhunter, and neither found anything, so I'm ruling out a virus on my end. Right now (roughly 2pm EST) I have 101 busy Apache servers, but only 46 connections. The IP with the highest connection has 13 connections, which is reasonable, so I think that I can rule out a DDoS attack. My RAM is high, too; I'm usually at around 3G at this time of day, but it's currently over 4G (I have 4G of RAM, so it's maxing out). My CPU load is fine, though: 0.87, and since I have 2 CPUs a load of 2 would be a normal-high. MaFt, there's no excuse for your host to be dragging their feet on increasing the limits. It literally takes 30 seconds, and the restart of Apache might have a downtime of less than 1 second. It doesn't solve the problem, but it definitely help with the symptom (and should bring your revenue back on track).
0
cPanelMichael

June 03, 2019 19:40
Hello Everyone, Can anyone affected by this issue verify if the Prefork MPM is enabled? You can execute the following command to check: rpm -qa|grep mpm
If so, verify if any recent entries like the one below exist in /usr/local/apache/logs/error_log: AH00144: couldn't grab the accept mutex
Thank you.
0
GoWilkes

June 04, 2019 02:29
I SSH'ed in to my server as root via Putty, ran rpm -qa|grep mpm, and basically nothing happened. It ran for about 2 seconds, then just gave me the prompt again. In /usr/local/apache/conf/httpd.conf, though, the only reference to prefork is here: Timeout 60 TraceEnable Off ServerSignature Off ServerTokens ProductOnly FileETag None StartServers 15 MinSpareServers 10 MaxSpareServers 20 MinSpareServers 10 MaxSpareServers 20 ServerLimit 256 MaxClients 150 MaxRequestsPerChild 10000 KeepAlive On KeepAliveTimeout 5 MaxKeepAliveRequests 100
I checked my error_log, anyway, but didn't find any reference to "mutex". The oldest entry was May 31, about 12 hours before this problem began the first time. I looked through, and don't see any errors other than attempts for pages that don't exist, and a handful of errors that I see all the time that I don't understand, but I doubt that they're related to this: RewriteOptions: MaxRedirects option has been removed in favor of the global LimitInternalRecursion directive and will be ignored. Hostname X provided via SNI and hostname example.com provided via HTTP are different
Thanks, Michael!
0
dalem

June 04, 2019 14:03
Are you running a lot of WordPress sites? What you are describing sounds Just your run of the mill Layer 7 attack which happen 24/7 365 days a year non stop from bots, the typical wp-login & xmlrp attacks. I have noticed that some of the bots have a new plan instead of rapid fire brute force they are connecting and reconnecting or one in out & switch to a new IP which will allow them to not get banned as easily. So we did not notice right away what was going on. A good custom Mod security rule stops them in their tracks. One of our servers has been acting up as you described a couple times a day and we realized on of our clients multiple Magento installs was getting hammered adedd a mod security rule all is well now (well all most as soon a all in the botnet ips get banned ). Also realized for some reason our WordPress mod security rule was not working which did not help
0
MaFt

June 04, 2019 14:40
I have 2 wordpress installs on the hosting I mentioned in my reply. Can you expand on what the "good mod security rule" would be?
0
dalem

June 04, 2019 16:04
one that blocks bots "connections with no referrer" Like this one xmlrpc is the same rule just change the wp-login.php to xmlrpc & change the mod security ID and set up your firewall to ban them the time to ban will be entirely up to you & is server specific We have ours set to 1 time block permanent as the more WordPress sites on a server the more connections there will be. Make sure it works as expected as different server set ups seam to behave differently
0
cPanelMichael

June 04, 2019 16:12
Hello @GoWilkes, Thank you for sharing the additional information. The issue reported on this thread does not appear related to the case quoted below, but feel free to test out the temporary workaround if the affected system uses the Prefork MPM to see if it has any impact on the reported issue: [QUOTE] Internal case EA-8508 was recently opened to address an issue where an update to the ea-apr RPM lead to instability on some systems using the Prefork MPM. The temporary workaround for affected systems is to execute the following command: echo "Mutex sysvsem" >> /etc/apache2/conf.modules.d/000_mod_mpm_prefork.conf; /scripts/rebuildhttpdconf; /scripts/restartsrv_httpd --hard
Note the above command includes a restart of the Apache service. We're tentatively planning to publish a fix for this case in the next EasyApache 4 release (you can follow the EA4 Change Log support ticket so we can rule out any issues with cPanel & WHM? Post the ticket number here and I'll link this thread to it. Thank you.
0
dalem

June 04, 2019 16:22
PS this was just a guess as what your issue is on our server it was definitely the issue you can do a quick check and see how many foreign ip's are brute forcing grep -ir wp-login.php /var/log/apache2/domlogs grep -ir wp-admin /var/log/apache2/domlogs
0
GoWilkes

June 05, 2019 07:12
Michael, it turns that I don't have Prefork, after all. This was the result when I ran the commands you gave: -bash: /etc/apache2/conf.modules.d/000_mod_mpm_prefork.conf: No such file or directory Built /usr/local/apache/conf/httpd.conf OK Waiting for "httpd"httpd" Service Status httpd (/usr/local/apache/bin/httpd -k start) is running as root with PID 4923 (pidfile+/proc check method). Startup Log [Wed Jun 05 02:50:50 2019] [error] VirtualHost *:443 -- mixing * ports and non-* ports with a NameVirtualHost address is not supported, proceeding with undefined results Log Messages [Wed Jun 05 02:50:51 2019] [notice] ModSecurity for Apache/2.9.0 (http://www.modsecurity.org/) configured. [Wed Jun 05 02:50:51 2019] [notice] suEXEC mechanism enabled (wrapper: /usr/local/apache/bin/suexec) httpd restarted successfully.
I'm a tad concerned about the error message, considering that all of the accounts on the server were created with WHM and I haven't manually edited httpd.conf in years... probably not since I got this server, honestly. All of my sites seem to be running so I don't think it's a fatal error, but I definitely wasn't expecting it! @dalem, that was a great thought, but unfortunately not my issue :-( My log files were at: /usr/local/apache/domlogs/[USERNAME]/[DOMAIN.COM] I already test for references to wp-admin and wp-login via PHP and block IPs, but not at the firewall so it was an idea! But I only had 5 references to wp-login, and 2 to wp-admin. So that wasn't the culprit, either. @GOT, just FYI, it looks like CC_ALLOW_FILTER isn't blocking non-US IPs the way I'd hoped, so you could be right on that one. I was manually adding RIPE, APNIC, and LACNIC IP ranges but removed them in favor of CC_ALLOW_FILTER a few days ago. I didn't notice an increase in processes or anything, but I just now looked and saw that I have 7 RIPE connections. But anyway... no change on my end, I still have almost double the number of processes, my RAM usage is off the charts, etc. I'm at a complete loss.
0
dalem

June 05, 2019 15:10
My log files were at: /usr/local/apache/domlogs/[USERNAME]/[DOMAIN.COM]

you are still running Easyapache3 (EOL) best to think about upgrading to Easyapache4
0
GoWilkes

June 06, 2019 01:30
I am... I'm procrastinating for 2 reasons: 1. I always wait until the last minute for software updates, to let everyone else figure out the bugs before I deal with them; and 2. Nothing in the documentation has commented on potential down time while waiting for it to update, so I'm waiting for a time when I have a few hours to possibly wait, and then another few hours to sort out bugs before the next business day.
0

Please sign in to leave a comment.

Comments

Didn't find what you were looking for?