Server started crashing at the same time after recent update

tomfra

April 04, 2025 21:53

OK, this one is strange so bear with me...

A few days ago a legacy cPanel server crashed/got stuck at night on April 2nd and it now seems to crash at the very same time, once a day at or around 22:30 CEST/20:30 UTC

No new installation from our side that could have caused it. Server does not crash per se - still responds to ping but not possible to login via SSH, websites not accesible, load starts going up. "top" command did not reveal any particular process to be the issue.

Server is running on:

CloudLinux v7.9.0 STANDARD vmware

cPanel version: 110.5.55

Server is considered "legacy" and cannot be updated to new cPanel so please do not suggest that as a solution.

--------------------------

I checked what was the last Yum update that day:

/var/log/yum.log
Apr 02 02:54:25 Updated: cpanel-plugin-common-1.9.0-2.6.1.cpanel.noarch
Apr 02 02:54:26 Updated: cpanel-plugin-components-1.4.4-1.2.1.cpanel.noarch
Apr 02 02:54:28 Updated: cpanel-sitejet-plugin-3.1.3-1.2.1.cpanel.noarch
Apr 02 02:54:29 Updated: lvemanager-xray-1.0-9.el7.cloudlinux.noarch

Cron log seems normal:

/var/log/cron
Apr 4 22:30:01 vps1 CROND[492476]: (root) CMD (/usr/bin/test -e /etc/cpanel-dovecot-solrdisable || /usr/local/cpanel/3rdparty/scripts/cpanel_dovecot_solr_commit)
Apr 4 22:30:01 vps1 CROND[492483]: (root) CMD (/usr/bin/flock -n /var/run/cloudlinux_processpaneluser.cronlock /usr/sbin/processpaneluserspackages)
Apr 4 22:30:01 vps1 CROND[492480]: (root) CMD (/usr/lib64/sa/sa1 1 1)
Apr 4 22:30:01 vps1 CROND[492482]: (root) CMD (/usr/bin/flock -n /var/run/cloudlinux_lve_manager.cronlock /usr/share/l.v.e-manager/cpanel/hooks/l.v.e-manager_postupcp_hook.sh --cronjob)
Apr 4 22:30:01 vps1 CROND[492485]: (root) CMD ( bash -c "sleep $((RANDOM % 60))" ; /opt/imunify360/venv/share/imunify360/scripts/check-detached.py > /dev/null 2>&1 || :)
Apr 4 22:30:01 vps1 CROND[492492]: (root) CMD (/usr/bin/flock -n /var/run/python_cllib_detector_cronlock /usr/share/python-cllib/detector.py)
Apr 4 22:30:01 vps1 CROND[492496]: (root) CMD (/usr/bin/flock -n /var/run/cloudlinux_cl-quota.cronlock /usr/bin/cl-quota -YC)
Apr 4 22:30:01 vps1 CROND[492498]: (root) CMD (/usr/bin/flock -n /var/run/cloudlinux_cache_php.cronlock /usr/share/l.v.e-manager/utils/cache_phpdata.py)
Apr 4 22:30:01 vps1 CROND[492499]: (root) CMD (/usr/bin/flock -n /var/run/cloudlinux_cagefs_exclude_users.cronlock /usr/share/cagefs/exclude_users_cleaner.py)
Apr 4 22:30:01 vps1 CROND[492515]: (root) CMD (/usr/bin/flock -n /var/run/edition_watcher.cronlock /usr/sbin/cloudlinux-edition-watcher check &> /dev/null)
Apr 4 22:30:01 vps1 CROND[492507]: (root) CMD (/usr/local/cpanel/scripts/dcpumon-wrapper >/dev/null 2>&1)
Apr 4 22:30:01 vps1 CROND[492508]: (root) CMD (/usr/bin/flock -n /var/run/cloudlinux-xray-plugin-check.cronlock /usr/share/lvemanager-xray/plugins/install-xray-plugin.py --check > /dev/null 2>&1)
Apr 4 22:30:01 vps1 CROND[492512]: (root) CMD (/usr/bin/flock -n /var/run/cloudlinux_cl-syncpkg.cronlock /usr/bin/cl-syncpkgs)
Apr 4 22:30:01 vps1 CROND[492509]: (root) CMD (/usr/bin/flock -n /var/run/detect_edition_changes.cronlock /usr/share/l.v.e-manager/detect_edition_changes.py > /dev/null 2>&1)
Apr 4 22:30:01 vps1 CROND[492513]: (root) CMD (/usr/local/maldetect/maldet --mkpubpaths >> /dev/null 2>&1)
Apr 4 22:30:01 vps1 CROND[492514]: (root) CMD (/usr/sbin/imunify-notifier -update-cron)
Apr 4 22:30:01 vps1 CROND[492510]: (root) CMD (/usr/bin/flock -n /var/run/get_panel_users.cronlock /usr/sbin/getpaneluserscount &> /dev/null)
Apr 4 22:30:01 vps1 CROND[492516]: (root) CMD (/usr/bin/flock -n /var/run/cloudlinux_panel-detect.cronlock /usr/bin/package_reinstaller.py check)
Apr 4 22:30:01 vps1 CROND[492518]: (root) CMD (/usr/bin/flock -n /var/run/cloudlinux_kill_php_script.cronlock bash /usr/sbin/kill_php_script)

-----------------

Is there any known recent cpanel update that could have caused this problem? Also, I disabled Clam in WHM but still see clamscan quite often in processes, it takes a lot of cpu and memory so perhaps that is the problem?

Comments

19 comments

Andrew

April 06, 2025 15:00
Load might be going up as external connections cut. Do you have KVM (console) access? If so could you login from the console? Are you able to ping external addresses from the server?

Andrew N. - cPanel Plesk VMWare Certified Professional
Do you need immediate assistance? 20 minutes response time!* Open a ticket
EmergencySupport - Professional Server Management and One-time Services
0
tomfra

April 06, 2025 15:12
I will check it using the VM console, good idea! It's a cloud server so I do have access to it.

Honestly, it behaves almost as if it was DDOSed at the same time each day but then when I restart the cloud server it all goes back to normal so that probably does not make sense?
0
Andrew

April 06, 2025 18:16
Yeah I'm pretty sure it's something else.

Andrew N. - cPanel Plesk VMWare Certified Professional
Do you need immediate assistance? 20 minutes response time!* Open a ticket
EmergencySupport - Professional Server Management and One-time Services
1
tomfra

April 06, 2025 21:41
Just happened again, was not at the PC and did not catch it in time but after about an hour, the server load was over 60 and in the KVM I could only see something like 2 lines of:

[ . ]

Nothing more. Will need to catch it when it starts.
0
Andrew

April 07, 2025 09:22
I only see one dot in your response. Did you try hitting enter or space? Something console is blank until you hit some keys.

Andrew N. - cPanel Plesk VMWare Certified Professional
Do you need immediate assistance? 20 minutes response time!* Open a ticket
EmergencySupport - Professional Server Management and One-time Services
0
tomfra

April 07, 2025 13:59
In the console there was "[" character followed by dot and closing "]" bracket.
0
tomfra

April 07, 2025 20:41
0
tomfra

April 07, 2025 20:42
This is what I was able to see in the KVM console when the load started going up. I was able to ping external IPs from the server, from the already running SSH session but unable to open a new one and the current one previously also stopped working after a while.
0
tomfra

April 07, 2025 20:45
I was also unable to login as root via the console, it simply was stuck after I entered my password.
0
Andrew

April 08, 2025 09:00
Based on the screenshot the server was already stopping when you opened the console as services were being shut down. It might be worth also asking your server provider if they are somehow rebooting the server due to some abuse, excessive usage etc....

Andrew N. - cPanel Plesk VMWare Certified Professional
Do you need immediate assistance? 20 minutes response time!* Open a ticket
EmergencySupport - Professional Server Management and One-time Services
0
tomfra

April 08, 2025 15:45
They did not mention anything, I will try to catch the KVM output even before the crash Today. It started about 2 minutes later than the day before so it is not exact time but +/- 3 minutes so far.
0
mtindor

April 08, 2025 15:56
Just curious... Do you have backup tasks running per day? And if so, when do they start/end. Do the box crashes happen during those backup events? Wondering if their are some disk assets going south on you.
0
tomfra

April 08, 2025 16:13
That is actually something I checked first, there is a backup task but at 6:30 local time and this crash happens around 22:30 local time. But I will have to ask the datacenter if they perhaps added something else from their side. This problem seems to be external as otherwise something would have to recorded in the server logs in my opinion.
0
tomfra

April 08, 2025 16:18
But perhaps the time displayed for the backup is incorrect/different timezone as that would explain a lot, will have to ask them.
0
tomfra

April 08, 2025 20:37
OK, just happened again at 22:32 local time, was watching the console, nothing at all recorded there, just the standard screen that's normally there. Will have to contact the datacenter, I am pretty sure they messed up something with the node. Has to be that.
0
tomfra

April 09, 2025 17:06
I communicated with the datacenter support and they told me our server has a very old version of vmware tools installed, from 2014, so I upgraded it. Will see if it helped or not, in a few hours...
0
tomfra

April 09, 2025 20:40
OK, it seems the vmware upgrade indeed solved the problem!
0
Andrew

April 10, 2025 11:54
Great, I'm happy this is finally figured, good job :)

Andrew N. - cPanel Plesk VMWare Certified Professional
Do you need immediate assistance? 20 minutes response time!* Open a ticket
EmergencySupport - Professional Server Management and One-time Services
0
tomfra

April 10, 2025 21:39
Thank you for the support, appreciate it!
0

Please sign in to leave a comment.

Comments

Didn't find what you were looking for?