Tracking down source of load spike

GoWilkes

December 03, 2023 06:23

I had an email alert that my server load was 75+ :-O By time I got to my PC to see what was up, though, the load was already back down to normal.

Looking at the email, I have apachestatus.html. In that, I see:

0-0	11008	0/166/33797	R	75.74	0	2453	547158	0.0	1.97	409.81	172.70.134.134	h2	example.com:443	[1/0] read: stream 0,
1-0	13430	0/4/37803	_	2.10	0	243	670625	0.0	0.01	456.70	123.45.67.89	http/1.1	example.com:443	GET /legit_page HT

Above that was 24 more connections, all with what appears to be a load of 75+.

Am I right that the connection to /legit_page had a normal load of 2.10, but then the Cloudflare connection (172.70.134.134) to read:stream suddenly caused it to spike to 75.74?

If so, any suggestions on tracking down what could have caused it to spike?

Comments

6 comments

vanessa

December 04, 2023 00:29
It is very difficult to tell after the fact. You can try looking in /var/cpanel/dcpumon to see if there are any 'top' snapshots from that period.
0
GoWilkes

December 04, 2023 18:58
Unfortunately, I don't see a dcpumon... neither a file nor a directory :-/

I installed sys-snap.pl, though, and that has a lot of data! I'm sure it would be helpful if I knew what to look for! LOL

Just a few minutes ago my load was at about 70, so I restarted Apache and it went back down. I looked at the log from sys-snap at that time, and I have this:

20231204 13 44 Load Average: 62.25 35.52 15.91 1/641 7680

Memory Usage:

MemTotal: 8388608 kB
MemFree: 1268296 kB
MemAvailable: 5930512 kB
Cached: 3041276 kB
Buffers: 0 kB
Active: 3702112 kB
Inactive: 1576520 kB
Active(anon): 1453272 kB
Inactive(anon): 903460 kB
Active(file): 2248840 kB
Inactive(file): 673060 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 2097152 kB
SwapFree: 1727392 kB
Dirty: 624 kB
Writeback: 0 kB
AnonPages: 2356732 kB
Shmem: 119376 kB
Slab: 1773032 kB
SReclaimable: 1740316 kB
SUnreclaim: 32716 kB

Virtual Memory Stats:

procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 0 369760 1267708 0 4781592 0 0 21 14 0 1 4 1 96 0 0
3 0 369760 1255240 0 4781540 0 0 4 1588 0 1299 3 0 97 0 0
3 0 369760 1249200 0 4781492 0 0 0 1052 0 1136 2 0 98 0 0
0 0 369760 1242272 0 4781428 0 0 8 1592 0 1040 1 0 99 0 0
1 0 369760 1235608 0 4781372 0 0 44 2916 0 1130 1 0 99 0 0
2 0 369760 1226260 0 4781348 0 0 20 1832 0 1566 1 0 99 0 0
1 0 369760 1220480 0 4781280 0 0 4 1308 0 883 1 0 99 0 0
0 0 369760 1215608 0 4781260 0 0 8 1328 0 1073 2 0 98 0 0
0 0 369760 1193308 0 4781236 0 0 16 1352 0 1396 1 0 98 0 0
0 0 369760 1202988 0 4781248 0 0 12 532 0 1260 1 0 99 0 0

Under "Process list", the highest was /usr/sbin/mysqld at 13.9% of CPU and 16.4% of memory (fairly normal). Then I had a few httpd processes that were a little high:

nobody 7326 8.7 0.8 2999384 74072 ? Sl 13:44:20 00:00:00 \_ /usr/sbin/httpd -k start
nobody 7340 3.5 0.7 1817680 66856 ? Sl 13:44:20 00:00:00 \_ /usr/sbin/httpd -k start
nobody 7353 2.9 0.7 2278488 60524 ? Sl 13:44:20 00:00:00 \_ /usr/sbin/httpd -k start
nobody 7356 9.0 0.8 2933848 71864 ? Sl 13:44:20 00:00:01 \_ /usr/sbin/httpd -k start
nobody 7393 6.8 0.8 3261528 67744 ? Sl 13:44:20 00:00:00 \_ /usr/sbin/httpd -k start
nobody 7618 9.8 0.8 2999384 74424 ? Sl 13:44:21 00:00:00 \_ /usr/sbin/httpd -k start

Just before those heavy httpd processes, though, I had this:

root 7321 0.0 0.0 12744 2740 ? S 13:44:19 00:00:00 \_ /usr/local/cpanel/bin/splitlogs --dir=/etc/apache2/logs/domlogs --main=my.server.com --suffix=-bytes_log
root 7322 0.0 0.0 12744 2584 ? S 13:44:19 00:00:00 \_ /usr/local/cpanel/bin/splitlogs --dir=/etc/apache2/logs/domlogs --main=my.server.com --mainout=/etc/apache2/logs/access_log
nobody 7323 0.0 0.3 272936 26464 ? S 13:44:19 00:00:00 \_ /usr/sbin/httpd -k start
root 7324 2.6 0.1 53268 13864 ? S 13:44:19 00:00:00 \_ /usr/local/cpanel/3rdparty/bin/perl /usr/local/cpanel/bin/leechprotect

Could splitlogs or leechprotect be the source of the problem?
0
cPRex Jurassic Moderator

December 04, 2023 23:36
It's incredibly unlikely for either of those to be the source of the issue.

sys-snap only works if it is installed during the time the issue happens - you can't install it after the problem has already happened and expect to get useful data.

I find the sar logs in /var/log/sa to be the most helpful when trying to track down load issues after they have happened.
0
GoWilkes

December 05, 2023 04:47
sys-snap only works if it is installed during the time the issue happens - you can't install it after the problem has already happened and expect to get useful data.

Yeah, but that's not an issue in this case. I'm getting these spikes about 6-8 times /day! In fact, just a few minutes after I made the previous post, I had yet another spike. And again, splitlogs was running at the exact same time.

Interesting to note that access_log finally rotated at 7:49pm, and I haven't had a load spike since. But that also means that it took at least 6 hours for the 10MB log to rotate.

I remember several years ago having some sort of problem with log rotation, and the recommendation was to lower the threshold setting to a lower number so that it wouldn't take so long to complete. I've had it set to 10MB ever since, and there's no practical reason that it would take more than a few seconds to rotate a 10MB file.
0
cPRex Jurassic Moderator

December 05, 2023 18:20
I agree, but if the I/O was THAT high, it could take a while. I would reach out to the host to have them check the hardware on the machine - since there haven't been any changes to the server, failing hardware could be an issue.
0
GoWilkes

December 12, 2023 19:18
FWIW, there doesn't appear to be any hardware issues. And I think that I've ruled out splitlogs and leechprotect as issues.

I started another thread on this, so this is mainly posted for posterity. Best I can tell it's "fixed" by enabling RLimitMEM. I think that's more of a Band-aid than a fix, but without being able to find the source of the problem this is the closest fix I've found.
0

Please sign in to leave a comment.

Comments

Didn't find what you were looking for?