Skip to main content

Tracking down source of load spike

Comments

6 comments

  • vanessa

    It is very difficult to tell after the fact. You can try looking in /var/cpanel/dcpumon to see if there are any 'top' snapshots from that period.

    0
  • GoWilkes

    Unfortunately, I don't see a dcpumon... neither a file nor a directory :-/

    I installed sys-snap.pl, though, and that has a lot of data! I'm sure it would be helpful if I knew what to look for! LOL

    Just a few minutes ago my load was at about 70, so I restarted Apache and it went back down. I looked at the log from sys-snap at that time, and I have this:

    20231204 13 44 Load Average: 62.25 35.52 15.91 1/641 7680

    Memory Usage:

    MemTotal:        8388608 kB
    MemFree:         1268296 kB
    MemAvailable:    5930512 kB
    Cached:          3041276 kB
    Buffers:               0 kB
    Active:          3702112 kB
    Inactive:        1576520 kB
    Active(anon):    1453272 kB
    Inactive(anon):   903460 kB
    Active(file):    2248840 kB
    Inactive(file):   673060 kB
    Unevictable:           0 kB
    Mlocked:               0 kB
    SwapTotal:       2097152 kB
    SwapFree:        1727392 kB
    Dirty:               624 kB
    Writeback:             0 kB
    AnonPages:       2356732 kB
    Shmem:            119376 kB
    Slab:            1773032 kB
    SReclaimable:    1740316 kB
    SUnreclaim:        32716 kB

    Virtual Memory Stats:

    procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
     r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
     0  0 369760 1267708      0 4781592    0    0    21    14    0    1  4  1 96  0  0
     3  0 369760 1255240      0 4781540    0    0     4  1588    0 1299  3  0 97  0  0
     3  0 369760 1249200      0 4781492    0    0     0  1052    0 1136  2  0 98  0  0
     0  0 369760 1242272      0 4781428    0    0     8  1592    0 1040  1  0 99  0  0
     1  0 369760 1235608      0 4781372    0    0    44  2916    0 1130  1  0 99  0  0
     2  0 369760 1226260      0 4781348    0    0    20  1832    0 1566  1  0 99  0  0
     1  0 369760 1220480      0 4781280    0    0     4  1308    0  883  1  0 99  0  0
     0  0 369760 1215608      0 4781260    0    0     8  1328    0 1073  2  0 98  0  0
     0  0 369760 1193308      0 4781236    0    0    16  1352    0 1396  1  0 98  0  0
     0  0 369760 1202988      0 4781248    0    0    12   532    0 1260  1  0 99  0  0

    Under "Process list", the highest was /usr/sbin/mysqld at 13.9% of CPU and 16.4% of memory (fairly normal). Then I had a few httpd processes that were a little high:

    nobody                7326  8.7  0.8 2999384 74072 ?       Sl   13:44:20 00:00:00  \_ /usr/sbin/httpd -k start
    nobody                7340  3.5  0.7 1817680 66856 ?       Sl   13:44:20 00:00:00  \_ /usr/sbin/httpd -k start
    nobody                7353  2.9  0.7 2278488 60524 ?       Sl   13:44:20 00:00:00  \_ /usr/sbin/httpd -k start
    nobody                7356  9.0  0.8 2933848 71864 ?       Sl   13:44:20 00:00:01  \_ /usr/sbin/httpd -k start
    nobody                7393  6.8  0.8 3261528 67744 ?       Sl   13:44:20 00:00:00  \_ /usr/sbin/httpd -k start
    nobody                7618  9.8  0.8 2999384 74424 ?       Sl   13:44:21 00:00:00  \_ /usr/sbin/httpd -k start

    Just before those heavy httpd processes, though, I had this:

    root                  7321  0.0  0.0  12744  2740 ?        S    13:44:19 00:00:00  \_ /usr/local/cpanel/bin/splitlogs --dir=/etc/apache2/logs/domlogs --main=my.server.com --suffix=-bytes_log
    root                  7322  0.0  0.0  12744  2584 ?        S    13:44:19 00:00:00  \_ /usr/local/cpanel/bin/splitlogs --dir=/etc/apache2/logs/domlogs --main=my.server.com --mainout=/etc/apache2/logs/access_log
    nobody                7323  0.0  0.3 272936 26464 ?        S    13:44:19 00:00:00  \_ /usr/sbin/httpd -k start
    root                  7324  2.6  0.1  53268 13864 ?        S    13:44:19 00:00:00  \_ /usr/local/cpanel/3rdparty/bin/perl /usr/local/cpanel/bin/leechprotect

    Could splitlogs or leechprotect be the source of the problem?

    0
  • cPRex Jurassic Moderator

    It's incredibly unlikely for either of those to be the source of the issue.

    sys-snap only works if it is installed during the time the issue happens - you can't install it after the problem has already happened and expect to get useful data.

    I find the sar logs in /var/log/sa to be the most helpful when trying to track down load issues after they have happened.

    0
  • GoWilkes

    sys-snap only works if it is installed during the time the issue happens - you can't install it after the problem has already happened and expect to get useful data.

    Yeah, but that's not an issue in this case. I'm getting these spikes about 6-8 times /day! In fact, just a few minutes after I made the previous post, I had yet another spike. And again, splitlogs was running at the exact same time.

    Interesting to note that access_log finally rotated at 7:49pm, and I haven't had a load spike since. But that also means that it took at least 6 hours for the 10MB log to rotate.

    I remember several years ago having some sort of problem with log rotation, and the recommendation was to lower the threshold setting to a lower number so that it wouldn't take so long to complete. I've had it set to 10MB ever since, and there's no practical reason that it would take more than a few seconds to rotate a 10MB file.

    0
  • cPRex Jurassic Moderator

    I agree, but if the I/O was THAT high, it could take a while.  I would reach out to the host to have them check the hardware on the machine - since there haven't been any changes to the server, failing hardware could be an issue.

    0
  • GoWilkes

    FWIW, there doesn't appear to be any hardware issues. And I think that I've ruled out splitlogs and leechprotect as issues.

    I started another thread on this, so this is mainly posted for posterity. Best I can tell it's "fixed" by enabling RLimitMEM. I think that's more of a Band-aid than a fix, but without being able to find the source of the problem this is the closest fix I've found.

    0

Please sign in to leave a comment.