Symptoms
A notification similar to this is received:
The process “$PROCESS_NAME” stalled while it ran for user “$USERNAME”
The process has run for 2 days, 6 hours, and 10.67 seconds.
At the time that the system generated this notification, the process had a PID of 25542, and it consumed 0.05% of the system's memory and 7% of the system's CPU time.
The server's current health statistics are:
Notification Type stall
Server host.domain.com
Primary IP Address 127.0.0.1
Service cpuwatch
Memory Information
Used 6.97 GB
Available 55.79 GB
Installed 62.76 GB
Load Information 2.90 2.55 2.59
Uptime 30 days, 16 hours, 24 minutes, and 36 seconds
Depending on the situation, there might be other symptoms, however, generally, this happens when a process stops or is killed without any warning and you will see the CPUwatch process listed in the output of the ps command.
Description
This means that there is a process running that has been using too much CPU time and as a result, the CPUWatch (AKA CPUPower) service has stopped or stalled the process. You can confirm if that's the case by following these steps. First, grab the $PROCESS_NAME from the notification above and search for the PID of the process like this:
ps auux | grep -Ei "($PROCESS_NAME|cpwatch)" | grep -iv grep
$USER16458
0.0 0.0 262864 27328 ? TNOct05
0:05pkgacct - $USER - av: 4 - create tar stream
root25541
0.0 0.0 4356 704 ? SNOct05
0:04/usr/local/cpanel/bin/cpuwatch
0.8750 --report-fd 8 /usr/local/cpanel/bin/pkgacct $USER /backup/2020-09-29/accounts backup
The PID of the process is this (16458
), so now you can strace the process and see its current state:
strace -p 16458
strace: Process 16458 attached--- stopped by SIGSTOP ---
As you can see the process has been programmatically stopped by CPUWatch. In certain other situations, CPUwatch might stall or curtail the process which you will be able to see something similar to this in the strace output of those processes:
strace -p 10141
strace: Process 10141 attached
wait4(10141,
The process is simply waiting and not doing anything else.
The coefficient for when CPUWatch stops a process is built into the Perl module and when a process exceeds that value CPUWatch sends the process a SIGSTOP signal and then the process is stalled. The built-in coefficient value is this:
grep -i PERCENT_LOAD /usr/local/cpanel/Cpanel/Cpu*
$PERCENT_LOAD_TO_START_CURTAILMENT = 0.875;
This means that when CPUWatch launches and watches a process, it will stop it at 87.5% of the CPU power of the server.
You can also look at the load average information on the server to confirm if the system has been experiencing periods of high load average recently. Please refer to these links for more information:
How to diagnose high loads with "sar" command
How to diagnose high loads with "top" command
How to diagnose high I/O and high load with iotop
How to diagnose high loads with the "uptime" command
How to diagnose high loads with the "iostat" command
How to diagnose high loads with the "ps" command
Workaround
You can change the configuration for CPUWatch in "Extra CPUs for server load" value under the "Stats and Logs" tab in "WHM >> Tweak Settings" to increase the CPU limit to other values. For more information please see this page: (Search for "Extra CPUs for server load")
https://docs.cpanel.net/whm/server-configuration/tweak-settings/
After the proper changes have been made, and you would like to use the newly set limit, you will need to stop the process that is running and restart it. When it restarts, it will use the new extra CPU value that you have defined. You can stop a process by running this command: ($PID is the process ID which you can get from the ps command, see above)
kill -9 $PID