troubleshooting suggestions for repeated site timeouts
I use a service to monitor the sites on my VPS server for up/downtime and lately have received notifications several times a day that all of the sites feeding off the server are down. A few minutes later (usually 5-20min) I get a new notification saying they are back up. This has been increasing enough that I have been trying to catch it in the act. When I have been able to it appears that users are just getting a basic timeout error when visiting any of the sites. Yet at the same time the sites are down I can log into WHM and see that it is very responsive and fast like normal. The load averages on the server are usually at 2 or less. All of the services appear to be up and running.
What should I do to troubleshoot this? If it was a once in a while thing I would just chalk it up to an automatic update. This has been happening a lot more often lately though. What logs should I be looking at and what entries should I be looking for in those logs? Are there other things in WHM I should be checking?
-
Check the webserver error logs if it is reaching its limits: For Apache: # grep MaxRequestWorkers /etc/apache2/logs/error_log
0 -
There are quite a few lines like that: [Fri Sep 08 08:47:41.731700 2023] [mpm_prefork:error] [pid 24144] AH00161: server reached MaxRequestWorkers setting, consider raising the MaxRequestWorkers setting [Wed Sep 13 22:06:27.464856 2023] [mpm_prefork:error] [pid 20648] AH00161: server reached MaxRequestWorkers setting, consider raising the MaxRequestWorkers setting [Wed Sep 20 21:34:09.125013 2023] [mpm_prefork:error] [pid 20710] AH00161: server reached MaxRequestWorkers setting, consider raising the MaxRequestWorkers setting but none from today or yesterday which is when the problems are occurring most frequently. It just now went down again. 0 -
Based on your suggestion I did some searching and found a cpanel page talking about this. I used a different command: apachectl status and got this: Current Time: Saturday, 23-Sep-2023 22:47:37 UTC Restart Time: Wednesday, 20-Sep-2023 21:29:14 UTC Parent Server Config. Generation: 2 Parent Server MPM Generation: 1 Server uptime: 3 days 1 hour 18 minutes 22 seconds Server load: 1.14 1.46 1.73 Total accesses: 1967288 - Total Traffic: 102.3 GB - Total Duration: 5033614027 CPU Usage: u22.76 s117.01 cu11213.8 cs3099.42 - 5.48% CPU load 7.45 requests/sec - 406.4 kB/second - 54.5 kB/request - 2558.66 ms/request 150 requests currently being processed, 0 idle workers RKRRRRRRRRRRRWRRKRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR RRRRRRRRRRRRRRRRRRCRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR RRRRRRRRRRRRRRRRRRRRRR Scoreboard Key: "_" Waiting for Connection, "S" Starting up, "R" Reading Request, "W" Sending Reply, "K" Keepalive (read), "D" DNS Lookup, "C" Closing connection, "L" Logging, "G" Gracefully finishing, "I" Idle cleanup of worker, "." Open slot with no current process I guess I have no empty slots? I also found this: egrep 'MaxRequestWorkers|ServerLimit' /etc/apache2/conf/httpd.conf which gave me: ServerLimit 256 MaxRequestWorkers 150 And according to free I have this for available memory: total used free shared buff/cache available Mem: 8.0G 6.8G 319M 999M 863M 1.1G Swap: 8.0G 331M 7.7G Total: 16G 7.2G 8.0G I went ahead and changed the maxrequestworkers to 175 and restarted apache. It filled back up within just a few minutes and now the free memory reports as zero. Any ideas what I should do? Am I just getting hit with too many requests? The server has ran for years with very few problems and the sites are typically very low usage. 0 -
I would run the following command to see what Apache is doing in real-time: apachectl fullstatus This will show you all the connections being handled by Apache, and you should be able to see if it is the same IP or network making many requests to your server in an attack. If it is, I would recommend installing Apache's mod_evasive to help stop small attacks and see if that resolves the issue: 0
Please sign in to leave a comment.
Comments
4 comments