MySQL crashes when doing backups

Richard Edwards

July 16, 2018 15:10

Hi, My server has around 15 sites on it, 8gb of memory and only a handful of users at any one time. Yet every time it does backups, it seems to kill MySQL. I can never restart mysql, I have to go into terminal and stop it dead entirely before it will come back up, as well as kill all backup processes 'pkgacct'. The backups are not huge, and there is plenty of space on the disk. This happens more and more often and now it is happening every time backups run. This is the 'failed' error cPanel sends when I kill mysql:

Cpanel::Exception::Timeout/(XID uhdf6d) The system failed to lock the file "/var/run/restartsrv_mysql" after 197 seconds.
at /usr/local/cpanel/Cpanel/Exception/CORE.pm line 336.
Cpanel::Exception::create("Timeout", "The system failed to lock the file \x{e2}\x{80}\x{9c}[_1]\x{e2}\x{80}\x{9d} after [quant,_2"..., ARRAY(0x247b700)) called at /usr/local/cpanel/Cpanel/Exception.pm line 61
Cpanel::Exception::__ANON__(__CPANEL_HIDDEN__, __CPANEL_HIDDEN__..., ARRAY(0x247b700)) called at /usr/local/cpanel/Cpanel/SafeFile.pm line 729
Cpanel::SafeFile::_timeout_exception("/var/run/restartsrv_mysql", 197) called at /usr/local/cpanel/Cpanel/SafeFile.pm line 761
Cpanel::SafeFile::_die_if_file_is_flocked_cuz_already_waited_a_while("/var/run/restartsrv_mysql", 197) called at /usr/local/cpanel/Cpanel/SafeFile.pm line 829
Cpanel::SafeFile::_lock_wait("/var/run/restartsrv_mysql") called at /usr/local/cpanel/Cpanel/SafeFile.pm line 355
Cpanel::SafeFile::_safelock("/var/run/restartsrv_mysql") called at /usr/local/cpanel/Cpanel/SafeFile.pm line 558
Cpanel::SafeFile::_safe_open(undef, ">>", "/var/run/restartsrv_mysql", CODE(0xb8ee48), "safeopen") called at /usr/local/cpanel/Cpanel/SafeFile.pm line 115
Cpanel::SafeFile::safeopen(undef, ">>", "/var/run/restartsrv_mysql") called at /usr/local/cpanel/Cpanel/RestartSrv/Lock.pm line 25
Cpanel::RestartSrv::Lock::new("Cpanel::RestartSrv::Lock", "mysql") called at /usr/local/cpanel/Cpanel/ServiceManager/Base.pm line 183
Cpanel::ServiceManager::Base::lock(Cpanel::ServiceManager::Services::Mysql=HASH(0x2137ae8)) called at /usr/local/cpanel/Cpanel/ServiceManager/Base.pm line 459
Cpanel::ServiceManager::Base::BUILD(Cpanel::ServiceManager::Services::Mysql=HASH(0x2137ae8), HASH(0xbac398)) called at /usr/local/cpanel/3rdparty/perl/526/lib64/perl5/cpanel_lib/x86_64-linux-64int/Cpanel/Class/Object.pm line 49
Cpanel::Class::Object::BUILDALL(Cpanel::ServiceManager::Services::Mysql=HASH(0x2137ae8), HASH(0xbac398)) called at /usr/local/cpanel/3rdparty/perl/526/lib64/perl5/cpanel_lib/x86_64-linux-64int/Cpanel/Class/Object.pm line 41
Cpanel::Class::Object::new("Cpanel::ServiceManager::Services::Mysql", "service", "mysql") called at /usr/local/cpanel/Cpanel/ServiceManager.pm line 99
Cpanel::ServiceManager::new("Cpanel::ServiceManager", "service", "mysql") called at bin/restartsrv_base.pl line 79
scripts::restartsrv_base::__ANON__() called at /usr/local/cpanel/3rdparty/perl/526/lib64/perl5/cpanel_lib/Try/Tiny.pm line 99
eval {...} called at /usr/local/cpanel/3rdparty/perl/526/lib64/perl5/cpanel_lib/Try/Tiny.pm line 90
Try::Tiny::try(CODE(0xb6f018), Try::Tiny::Catch=REF(0xb43fb0)) called at bin/restartsrv_base.pl line 125
scripts::restartsrv_base::run("--check", "--notconfigured-ok") called at bin/restartsrv_base.pl line 33

The subprocess "/usr/local/cpanel/scripts/restartsrv_mysql" reported error number 255 when it ended.

I have had to disable backups as it seems to be the cause based on the times and the fact they seem to be running whenever this happens. When MySQL hits this issue, it just locks up and no sites are able to connect to it, sending all of my clients sites down. There are no other heavy use processes going on, MySQL is at the top using 5% CPU and 3.4gb of memory - nothing the server cannot handle. Why do the backups keep killing my databases? When it restarts, almost every database the backups touch have tables which need repairing - not uncommon, but when mysql has been up for only 2 days since it last crashed, I wouldn't expect that. My clients are very concerned and I need to know why backups are doing this - or why something else in cPanel is causing this. Datacenter checked hardware and it is all ok.

Comments

8 comments

Richard Edwards

July 16, 2018 20:14
Can't edit, but here is the 'back up process' failures reports: pixelhero (XID upvwwc) "/usr/local/cpanel/bin/pkgacct" reported error code "1" when it ended: northantsdecorat (XID feaqud) "/usr/local/cpanel/bin/pkgacct" reported error code "1" when it ended: thatlittleagency (XID 9crkwa) "/usr/local/cpanel/bin/pkgacct" reported error code "1" when it ended: recruitmentevent (XID reb9bx) "/usr/local/cpanel/bin/pkgacct" reported error code "1" when it ended:
I killed the process after this last one as it was when my clients noticed their sites were down. [2018-07-16 16:56:05 +0100] thatlittleagency_wp: mysqldump: Got error: 1049: "Unknown database 'thatlittleagency_wp'" when selecting the database warn [pkgacct] thatlittleagency_wp: mysqldump: Got error: 1049: "Unknown database 'thatlittleagency_wp'" when selecting the database [2018-07-16 16:56:05 +0100] thatlittleagency_wp: mysqldump: Got error: 1049: "Unknown database 'thatlittleagency_wp'" when selecting the database warn [pkgacct] thatlittleagency_wp: mysqldump: Got error: 1049: "Unknown database 'thatlittleagency_wp'" when selecting the database [2018-07-16 16:56:05 +0100] thatlittleagency_wp: mysqldump: Got error: 1049: "Unknown database 'thatlittleagency_wp'" when selecting the database warn [pkgacct] thatlittleagency_wp: mysqldump: Got error: 1049: "Unknown database 'thatlittleagency_wp'" when selecting the database
That database exists, I assume by this point MySQL had crashed
0
cPanelLauren

July 17, 2018 17:59
Hi @Richard Edwards Is there anything notable logged to the mysql error logs? The location of this is noted in /etc/my.cnf under log-error for example mine: [root@server mysql]# grep log-error /etc/my.cnf log-error=/var/log/mysqld.log
It may also be helpful to check /var/log/messages
for any relevant output during the time of the backups as well. Thanks!
0
Richard Edwards

July 18, 2018 06:58
Hi, So with backups turned off, this is still happening. Still usually in the morning when I used to run backups but must just be a co-incidence. It died again at 7:40am, here are the logs. The site 'carousel' is a wordpress site, the user who viewed it was the client so nothing unusual. 2018-07-18 7:04:33 140575962253056 [Warning] Aborted connection 44677 to db: 'carousellights_wp' user: 'carousellights_wp' host: 'localhost' (Got an error reading communication packets) 180718 7:16:05 [ERROR] mysqld got signal 7 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. To report this bug, see https://mariadb.com/kb/en/reporting-bugs We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. Server version: 10.2.16-MariaDB key_buffer_size=134217728 read_buffer_size=131072 max_used_connections=130 max_threads=153 thread_count=124 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 467246 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. Thread pointer: 0x7fda3401bd68 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 0x7fd96bfb3d30 thread_stack 0x49000 2018-07-18 7:35:08 139627844749504 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins 2018-07-18 7:35:08 139627844749504 [Note] InnoDB: Uses event mutexes 2018-07-18 7:35:08 139627844749504 [Note] InnoDB: Compressed tables use zlib 1.2.7 2018-07-18 7:35:08 139627844749504 [Note] InnoDB: Using Linux native AIO 2018-07-18 7:35:08 139627844749504 [Note] InnoDB: Number of pools: 1 2018-07-18 7:35:08 139627844749504 [Note] InnoDB: Using SSE2 crc32 instructions 2018-07-18 7:35:08 139627844749504 [Note] InnoDB: Initializing buffer pool, total size = 128M, instances = 1, chunk size = 128M 2018-07-18 7:35:08 139627844749504 [Note] InnoDB: Completed initialization of buffer pool 2018-07-18 7:35:08 139627076548352 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority(). 2018-07-18 7:35:08 139627844749504 [Note] InnoDB: Highest supported file format is Barracuda. 2018-07-18 7:35:08 139627844749504 [Note] InnoDB: Starting crash recovery from checkpoint LSN=217525724982 2018-07-18 7:40:03 139627844749504 [Note] InnoDB: 128 out of 128 rollback segments are active. 2018-07-18 7:40:03 139627844749504 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1" 2018-07-18 7:40:03 139627844749504 [Note] InnoDB: Creating shared tablespace for temporary tables 2018-07-18 7:40:03 139627844749504 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the filefull; Please wait ... 2018-07-18 7:40:03 139627844749504 [Note] InnoDB: File './ibtmp1' size is now 12 MB. 2018-07-18 7:40:03 139627844749504 [Note] InnoDB: Waiting for purge to start 2018-07-18 7:40:03 139627844749504 [Note] InnoDB: 5.7.22 started; log sequence number 217525724991 2018-07-18 7:40:03 139626434000640 [Note] InnoDB: Loading buffer pool(s) from /var/lib/mysql/ib_buffer_pool 2018-07-18 7:40:03 139627844749504 [Note] Plugin 'FEEDBACK' is disabled. 2018-07-18 7:40:03 139627844749504 [Note] Recovering after a crash using tc.log 2018-07-18 7:40:03 139627844749504 [Note] Starting crash recovery... 2018-07-18 7:40:03 139627844749504 [Note] Crash recovery finished. 2018-07-18 7:40:03 139627844749504 [Note] Server socket created on IP: '::'. 2018-07-18 7:40:03 139627710367488 [Note] /usr/sbin/mysqld (initiated by: unknown): Normal shutdown 2018-07-18 7:40:06 139627844749504 [Note] Reading of all Master_info entries succeded 2018-07-18 7:40:06 139627844749504 [Note] Added new Master_info '' to hash table 2018-07-18 7:40:06 139627844749504 [Note] /usr/sbin/mysqld: ready for connections. Version: '10.2.16-MariaDB' socket: '/var/lib/mysql/mysql.sock' port: 3306 MariaDB Server 2018-07-18 7:40:06 139627710367488 [Note] Event Scheduler: Purging the queue. 0 events 2018-07-18 7:40:06 139626954086144 [Note] InnoDB: FTS optimize thread exiting. 2018-07-18 7:40:06 139627710367488 [Note] InnoDB: Starting shutdown... 2018-07-18 7:40:06 139626434000640 [Note] InnoDB: Buffer pool(s) load completed at 180718 7:40:06 2018-07-18 7:40:06 139626434000640 [Note] InnoDB: Dumping buffer pool(s) to /var/lib/mysql/ib_buffer_pool 2018-07-18 7:40:06 139626434000640 [Note] InnoDB: Instance 0, restricted to 2048 pages due to innodb_buf_pool_dump_pct=25 2018-07-18 7:40:06 139626434000640 [Note] InnoDB: Buffer pool(s) dump completed at 180718 7:40:06 2018-07-18 7:40:07 139627710367488 [Note] InnoDB: Shutdown completed; log sequence number 217525725028 2018-07-18 7:40:07 139627710367488 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1" 2018-07-18 7:40:07 139627710367488 [Note] /usr/sbin/mysqld: Shutdown complete 2018-07-18 7:40:08 140500380170432 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins 2018-07-18 7:40:08 140500380170432 [Note] InnoDB: Uses event mutexes 2018-07-18 7:40:08 140500380170432 [Note] InnoDB: Compressed tables use zlib 1.2.7 2018-07-18 7:40:08 140500380170432 [Note] InnoDB: Using Linux native AIO 2018-07-18 7:40:08 140500380170432 [Note] InnoDB: Number of pools: 1 2018-07-18 7:40:08 140500380170432 [Note] InnoDB: Using SSE2 crc32 instructions 2018-07-18 7:40:08 140500380170432 [Note] InnoDB: Initializing buffer pool, total size = 128M, instances = 1, chunk size = 128M 2018-07-18 7:40:08 140500380170432 [Note] InnoDB: Completed initialization of buffer pool 2018-07-18 7:40:08 140499609212672 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority(). 2018-07-18 7:40:08 140500380170432 [Note] InnoDB: Highest supported file format is Barracuda. 2018-07-18 7:40:09 140500380170432 [Note] InnoDB: 128 out of 128 rollback segments are active. 2018-07-18 7:40:09 140500380170432 [Note] InnoDB: Creating shared tablespace for temporary tables 2018-07-18 7:40:09 140500380170432 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the filefull; Please wait ... 2018-07-18 7:40:09 140500380170432 [Note] InnoDB: File './ibtmp1' size is now 12 MB. 2018-07-18 7:40:09 140500380170432 [Note] InnoDB: Waiting for purge to start 2018-07-18 7:40:09 140500380170432 [Note] InnoDB: 5.7.22 started; log sequence number 217525725028 2018-07-18 7:40:09 140499302209280 [Note] InnoDB: Loading buffer pool(s) from /var/lib/mysql/ib_buffer_pool 2018-07-18 7:40:09 140500380170432 [Note] Plugin 'FEEDBACK' is disabled. 2018-07-18 7:40:09 140499302209280 [Note] InnoDB: Buffer pool(s) load completed at 180718 7:40:09 2018-07-18 7:40:09 140500380170432 [Note] Server socket created on IP: '::'. 2018-07-18 7:40:09 140500380170432 [Note] Reading of all Master_info entries succeded 2018-07-18 7:40:09 140500380170432 [Note] Added new Master_info '' to hash table 2018-07-18 7:40:09 140500380170432 [Note] /usr/sbin/mysqld: ready for connections. Version: '10.2.16-MariaDB' socket: '/var/lib/mysql/mysql.sock' port: 3306 MariaDB Server 2018-07-18 7:40:15 140499926767360 [Note] Detected table cache mutex contention at instance 1: 54% waits. Additionaltable cache instance activated. Number of instances after activation: 2.
This is the messages log around the time mysql got signal 7 at 7:16: Jul 18 07:15:32 server kernel: INFO: task php-cgi:21973 blocked for more than 120 seconds. Jul 18 07:15:32 server kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jul 18 07:15:32 server kernel: php-cgi D ffff88020e6efe40 0 21973 16287 0x00000080 Jul 18 07:15:32 server kernel: ffff880157727b20 0000000000000082 ffff88001b1bdc00 ffff880157727fd8 Jul 18 07:15:33 server kernel: ffff880157727fd8 ffff880157727fd8 ffff88001b1bdc00 ffff880157727c48 Jul 18 07:15:34 server kernel: ffff880157727c50 7fffffffffffffff ffff88001b1bdc00 ffff88020e6efe40 Jul 18 07:15:34 server kernel: Call Trace: Jul 18 07:15:34 server kernel: [] schedule+0x29/0x70 Jul 18 07:15:34 server kernel: [] schedule_timeout+0x209/0x2d0 Jul 18 07:15:34 server kernel: [] ? wake_up_process+0x23/0x40 Jul 18 07:15:35 server kernel: [] ? wake_up_worker+0x24/0x30 Jul 18 07:15:35 server kernel: [] wait_for_completion+0x116/0x170 Jul 18 07:15:35 server kernel: [] ? wake_up_state+0x20/0x20 Jul 18 07:15:36 server kernel: [] call_usermodehelper_exec+0x169/0x1a0 Jul 18 07:15:37 server kernel: [] do_coredump+0x5e5/0xb10 Jul 18 07:15:37 server kernel: [] ? __sigqueue_free.part.11+0x33/0x40 Jul 18 07:15:44 server kernel: [] ? __dequeue_signal+0x13c/0x220 Jul 18 07:15:46 server kernel: [] get_signal_to_deliver+0x1c7/0x6d0 Jul 18 07:15:46 server kernel: [] do_signal+0x57/0x6c0 Jul 18 07:15:46 server kernel: [] ? is_prefetch.isra.17+0xe4/0x215 ... Jul 18 07:16:09 server kernel: sd 0:1:0:0: [sda] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Jul 18 07:16:09 server kernel: sd 0:1:0:0: [sda] Sense Key : Medium Error [current] Jul 18 07:16:09 server kernel: sd 0:1:0:0: [sda] Add. Sense: Unrecovered read error Jul 18 07:16:09 server kernel: sd 0:1:0:0: [sda] CDB: Read(10) 28 00 00 09 ae 80 00 00 40 00 Jul 18 07:16:10 server kernel: blk_update_request: critical medium error, dev sda, sector 634496 Jul 18 07:16:10 server kernel: Read-error on swap-device (8:0:634504) Jul 18 07:16:10 server kernel: Read-error on swap-device (8:0:634512) Jul 18 07:16:10 server kernel: Read-error on swap-device (8:0:634520) Jul 18 07:16:11 server kernel: Read-error on swap-device (8:0:634528) Jul 18 07:16:11 server kernel: Read-error on swap-device (8:0:634536) Jul 18 07:16:11 server kernel: Read-error on swap-device (8:0:634544) Jul 18 07:16:11 server kernel: Read-error on swap-device (8:0:634552) Jul 18 07:16:11 server kernel: Read-error on swap-device (8:0:634560) Jul 18 07:16:12 server kernel: type=1701 audit(1531894565.339:104808): auid=4294967295 uid=1042 gid=1040 ses=4294967295 pid=21855 comm="php-cgi" reason="memory violation" sig=7 Jul 18 07:16:19 server kernel: sd 0:1:0:0: [sda] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Jul 18 07:16:19 server kernel: sd 0:1:0:0: [sda] Sense Key : Medium Error [current] Jul 18 07:16:19 server kernel: sd 0:1:0:0: [sda] Add. Sense: Unrecovered read error Jul 18 07:16:19 server kernel: sd 0:1:0:0: [sda] CDB: Read(10) 28 00 00 7e 36 c0 00 00 40 00 Jul 18 07:16:19 server kernel: blk_update_request: critical medium error, dev sda, sector 8271552 Jul 18 07:16:20 server kernel: Read-error on swap-device (8:0:8271560) Jul 18 07:16:20 server kernel: Read-error on swap-device (8:0:8271568) Jul 18 07:16:21 server kernel: Read-error on swap-device (8:0:8271576) Jul 18 07:16:21 server kernel: Read-error on swap-device (8:0:8271584) Jul 18 07:16:21 server kernel: Read-error on swap-device (8:0:8271592) Jul 18 07:16:22 server kernel: Read-error on swap-device (8:0:8271600) Jul 18 07:16:22 server kernel: Read-error on swap-device (8:0:8271608) Jul 18 07:16:22 server kernel: Read-error on swap-device (8:0:8271616)
0
cPanelLauren

July 18, 2018 12:40
Hi @Richard Edwards Based on those errors I'm curious to know when is the last time you ran an fsck on the device? This sounds indicative of a failure or a bad block that may need to be mapped out. I would suggest rebooting and running an fsck at the very least on the swap device. Ultimately though this sounds like something that would best be suited for a system administrator. If you don't have one you might find one here: System Administration Services | cPanel Forums . Thanks!
0
Richard Edwards

July 18, 2018 13:11
Thank you, I believe I need to unmount the drives to do that, so have asked the sys admins at the datacentre to do it for me and will let you know if it helps Thanks
0
cPanelLauren

July 18, 2018 14:08
Hi @Richard Edwards Good plan, please let us know if the issue persists once they run the fsck. Thanks!
0
Richard Edwards

July 18, 2018 14:16
They have advised that fsck can also do more damage than it fixes - you should say the same if you are suggesting it to your users! They will consider it, but are investigating the server before resorting to doing that. If they do, hopefully it does not kill the drive as it looks in a bad state to me. The other option is they will set up a new box for me, but I will need to migrate all my stuff - is there a way to clone everything from 1 cpanel to another? I want to avoid doing it account by account and manually updating configs as that would be a huge downtime while I do it.
0
cPanelLauren

July 18, 2018 14:40
They have advised that fsck can also do more damage than it fixes - you should say the same if you are suggesting it to your users!They will consider it, but are investigating the server before resorting to doing that. If they do, hopefully it does not kill the drive as it looks in a bad state to me.

I'm not sure where that came from as that's not entirely true the only instance in which an fsck would actually be dangerous is if the underlying hardware is damaged, none the less this is why a suggestion to contact a system administrator was made.
The other option is they will set up a new box for me, but I will need to migrate all my stuff - is there a way to clone everything from 1 cpanel to another? I want to avoid doing it account by account and manually updating configs as that would be a huge downtime while I do it.

There is no native option to clone a cPanel server, you can use the transfer tool in the event the drive is functioning. Information on the Transfer Tool can be found in the documentation here: Transfer Tool - Version 72 Documentation - cPanel Documentation Thanks!
0

Please sign in to leave a comment.

Comments

Didn't find what you were looking for?