Symptoms
A notification similar to this is received:
The process “$PROCESS_NAME” stalled while it ran for user “$USERNAME”
The process has run for 2 days, 6 hours, and 10.67 seconds.
At the time that the system generated this notification, the process had a PID of 25542, and it consumed 0.05% of the system's memory and 7% of the system's CPU time.
The server's current health statistics are:
Notification Type stall
Server host.domain.com
Primary IP Address 127.0.0.1
Service cpuwatch
Memory Information
Used 6.97 GB
Available 55.79 GB
Installed 62.76 GB
Load Information 2.90 2.55 2.59
Uptime 30 days, 16 hours, 24 minutes, and 36 seconds
Depending on the situation, there might be other symptoms, however, generally, this happens when a process stops or is killed without any warning and you will see the CPUwatch process listed in the output of the ps command.
Description
This means that there is a process running that has been using too much CPU time and as a result, the CPUWatch service has stopped or stalled the process. Most often, this process is related to the cPanel automated backup service. If this is the case, you can confirm if the process was killed by reviewing the server backup logs.
Where are the cPanel Backup logs?
You can also confirm if the service has been stopped or stalled by using strace. You will need to grab the $PROCESS_NAME from the notification above and search for the PID of the process like this:
[root@server ~]cPs# ps auux | grep -Ei "($PROCESS_NAME|cpwatch)" | grep -iv grep
$USER16458
0.0 0.0 262864 27328 ? TNOct05
0:05pkgacct - $USER - av: 4 - create tar stream
root25541
0.0 0.0 4356 704 ? SNOct05
0:04/usr/local/cpanel/bin/cpuwatch
0.8750 --report-fd 8 /usr/local/cpanel/bin/pkgacct $USER /backup/2020-09-29/accounts backup
The PID of the process is this (16458
), so now you can strace the process and see its current state
strace -p 16458
If the process has been programmatically stopped by CPUWatch, you will see the following in the strace output:
strace: Process 16458 attached
--- stopped by SIGSTOP ---
If CPUwatch stalls or curtails the process, you will see something similar to this:
strace: Process 16458 attached
wait4(10141,
This process is simply waiting and not doing anything else.
The coefficient for when CPUWatch stops a process is built into the Perl module. When a process exceeds that value, CPUWatch sends the process a SIGSTOP signal which stalls the process. To see the built-in coefficient value, run the following grep command:
[root@server ~]cPs# grep -i PERCENT_LOAD /usr/local/cpanel/Cpanel/Cpu*
$PERCENT_LOAD_TO_START_CURTAILMENT = 0.875;
Based on this output, when CPUWatch launches and watches a process, it will stop it at 87.5% of the CPU power of the server.
You can also look at the load average information on the server to confirm if the system has been experiencing periods of high load average recently, which will show more CPU power is being used.
Workaround
You can change the configuration for CPUWatch in "Extra CPUs for server load" value under the "Stats and Logs" tab in "Home / Server Configuration / Tweak Settings" to increase the CPU limit to other values. For more information please see this page: (Search for "Extra CPUs for server load")
https://docs.cpanel.net/whm/server-configuration/tweak-settings/
After the proper changes have been made, and you would like to use the newly set limit, you will need to stop the process that is running and restart it. When the process restarts, it will use the new extra CPU value that you have defined. You can stop a process by running this command: ($PID is the process ID which you can get from the ps command, see above)
kill -9 $PID