Server started crashing at the same time after recent update
OK, this one is strange so bear with me...
A few days ago a legacy cPanel server crashed/got stuck at night on April 2nd and it now seems to crash at the very same time, once a day at or around 22:30 CEST/20:30 UTC
No new installation from our side that could have caused it. Server does not crash per se - still responds to ping but not possible to login via SSH, websites not accesible, load starts going up. "top" command did not reveal any particular process to be the issue.
Server is running on:
CloudLinux v7.9.0 STANDARD vmware
cPanel version: 110.5.55
Server is considered "legacy" and cannot be updated to new cPanel so please do not suggest that as a solution.
--------------------------
I checked what was the last Yum update that day:
/var/log/yum.log
Apr 02 02:54:25 Updated: cpanel-plugin-common-1.9.0-2.6.1.cpanel.noarch
Apr 02 02:54:26 Updated: cpanel-plugin-components-1.4.4-1.2.1.cpanel.noarch
Apr 02 02:54:28 Updated: cpanel-sitejet-plugin-3.1.3-1.2.1.cpanel.noarch
Apr 02 02:54:29 Updated: lvemanager-xray-1.0-9.el7.cloudlinux.noarch
Cron log seems normal:
/var/log/cron
Apr 4 22:30:01 vps1 CROND[492476]: (root) CMD (/usr/bin/test -e /etc/cpanel-dovecot-solrdisable || /usr/local/cpanel/3rdparty/scripts/cpanel_dovecot_solr_commit)
Apr 4 22:30:01 vps1 CROND[492483]: (root) CMD (/usr/bin/flock -n /var/run/cloudlinux_processpaneluser.cronlock /usr/sbin/processpaneluserspackages)
Apr 4 22:30:01 vps1 CROND[492480]: (root) CMD (/usr/lib64/sa/sa1 1 1)
Apr 4 22:30:01 vps1 CROND[492482]: (root) CMD (/usr/bin/flock -n /var/run/cloudlinux_lve_manager.cronlock /usr/share/l.v.e-manager/cpanel/hooks/l.v.e-manager_postupcp_hook.sh --cronjob)
Apr 4 22:30:01 vps1 CROND[492485]: (root) CMD ( bash -c "sleep $((RANDOM % 60))" ; /opt/imunify360/venv/share/imunify360/scripts/check-detached.py > /dev/null 2>&1 || :)
Apr 4 22:30:01 vps1 CROND[492492]: (root) CMD (/usr/bin/flock -n /var/run/python_cllib_detector_cronlock /usr/share/python-cllib/detector.py)
Apr 4 22:30:01 vps1 CROND[492496]: (root) CMD (/usr/bin/flock -n /var/run/cloudlinux_cl-quota.cronlock /usr/bin/cl-quota -YC)
Apr 4 22:30:01 vps1 CROND[492498]: (root) CMD (/usr/bin/flock -n /var/run/cloudlinux_cache_php.cronlock /usr/share/l.v.e-manager/utils/cache_phpdata.py)
Apr 4 22:30:01 vps1 CROND[492499]: (root) CMD (/usr/bin/flock -n /var/run/cloudlinux_cagefs_exclude_users.cronlock /usr/share/cagefs/exclude_users_cleaner.py)
Apr 4 22:30:01 vps1 CROND[492515]: (root) CMD (/usr/bin/flock -n /var/run/edition_watcher.cronlock /usr/sbin/cloudlinux-edition-watcher check &> /dev/null)
Apr 4 22:30:01 vps1 CROND[492507]: (root) CMD (/usr/local/cpanel/scripts/dcpumon-wrapper >/dev/null 2>&1)
Apr 4 22:30:01 vps1 CROND[492508]: (root) CMD (/usr/bin/flock -n /var/run/cloudlinux-xray-plugin-check.cronlock /usr/share/lvemanager-xray/plugins/install-xray-plugin.py --check > /dev/null 2>&1)
Apr 4 22:30:01 vps1 CROND[492512]: (root) CMD (/usr/bin/flock -n /var/run/cloudlinux_cl-syncpkg.cronlock /usr/bin/cl-syncpkgs)
Apr 4 22:30:01 vps1 CROND[492509]: (root) CMD (/usr/bin/flock -n /var/run/detect_edition_changes.cronlock /usr/share/l.v.e-manager/detect_edition_changes.py > /dev/null 2>&1)
Apr 4 22:30:01 vps1 CROND[492513]: (root) CMD (/usr/local/maldetect/maldet --mkpubpaths >> /dev/null 2>&1)
Apr 4 22:30:01 vps1 CROND[492514]: (root) CMD (/usr/sbin/imunify-notifier -update-cron)
Apr 4 22:30:01 vps1 CROND[492510]: (root) CMD (/usr/bin/flock -n /var/run/get_panel_users.cronlock /usr/sbin/getpaneluserscount &> /dev/null)
Apr 4 22:30:01 vps1 CROND[492516]: (root) CMD (/usr/bin/flock -n /var/run/cloudlinux_panel-detect.cronlock /usr/bin/package_reinstaller.py check)
Apr 4 22:30:01 vps1 CROND[492518]: (root) CMD (/usr/bin/flock -n /var/run/cloudlinux_kill_php_script.cronlock bash /usr/sbin/kill_php_script)
-----------------
Is there any known recent cpanel update that could have caused this problem? Also, I disabled Clam in WHM but still see clamscan quite often in processes, it takes a lot of cpu and memory so perhaps that is the problem?
-
Load might be going up as external connections cut. Do you have KVM (console) access? If so could you login from the console? Are you able to ping external addresses from the server?
Andrew N. - cPanel Plesk VMWare Certified Professional
Do you need immediate assistance? 20 minutes response time!* Open a ticket
EmergencySupport - Professional Server Management and One-time Services0 -
I will check it using the VM console, good idea! It's a cloud server so I do have access to it.
Honestly, it behaves almost as if it was DDOSed at the same time each day but then when I restart the cloud server it all goes back to normal so that probably does not make sense?0 -
Yeah I'm pretty sure it's something else.
Andrew N. - cPanel Plesk VMWare Certified Professional
Do you need immediate assistance? 20 minutes response time!* Open a ticket
EmergencySupport - Professional Server Management and One-time Services1 -
Just happened again, was not at the PC and did not catch it in time but after about an hour, the server load was over 60 and in the KVM I could only see something like 2 lines of:
[ . ]
Nothing more. Will need to catch it when it starts.
0 -
I only see one dot in your response. Did you try hitting enter or space? Something console is blank until you hit some keys.
Andrew N. - cPanel Plesk VMWare Certified Professional
Do you need immediate assistance? 20 minutes response time!* Open a ticket
EmergencySupport - Professional Server Management and One-time Services0 -
In the console there was "[" character followed by dot and closing "]" bracket.
0 -
0
-
This is what I was able to see in the KVM console when the load started going up. I was able to ping external IPs from the server, from the already running SSH session but unable to open a new one and the current one previously also stopped working after a while.
0 -
I was also unable to login as root via the console, it simply was stuck after I entered my password.
0 -
Based on the screenshot the server was already stopping when you opened the console as services were being shut down. It might be worth also asking your server provider if they are somehow rebooting the server due to some abuse, excessive usage etc....
Andrew N. - cPanel Plesk VMWare Certified Professional
Do you need immediate assistance? 20 minutes response time!* Open a ticket
EmergencySupport - Professional Server Management and One-time Services0 -
They did not mention anything, I will try to catch the KVM output even before the crash Today. It started about 2 minutes later than the day before so it is not exact time but +/- 3 minutes so far.
0 -
Just curious... Do you have backup tasks running per day? And if so, when do they start/end. Do the box crashes happen during those backup events? Wondering if their are some disk assets going south on you.
0 -
That is actually something I checked first, there is a backup task but at 6:30 local time and this crash happens around 22:30 local time. But I will have to ask the datacenter if they perhaps added something else from their side. This problem seems to be external as otherwise something would have to recorded in the server logs in my opinion.
0 -
But perhaps the time displayed for the backup is incorrect/different timezone as that would explain a lot, will have to ask them.
0 -
OK, just happened again at 22:32 local time, was watching the console, nothing at all recorded there, just the standard screen that's normally there. Will have to contact the datacenter, I am pretty sure they messed up something with the node. Has to be that.
0 -
I communicated with the datacenter support and they told me our server has a very old version of vmware tools installed, from 2014, so I upgraded it. Will see if it helped or not, in a few hours...
0 -
OK, it seems the vmware upgrade indeed solved the problem!
0 -
Great, I'm happy this is finally figured, good job :)
Andrew N. - cPanel Plesk VMWare Certified Professional
Do you need immediate assistance? 20 minutes response time!* Open a ticket
EmergencySupport - Professional Server Management and One-time Services0 -
Thank you for the support, appreciate it!
0
Please sign in to leave a comment.
Comments
19 comments