Notification: DegradedArray event on /dev/md/

stokmu

September 25, 2022 04:08

Plis help, i get many email today, [QUOTE] This is an automatically generated mail message from mdadm running on mail.********.com A DegradedArray event had been detected on md device /dev/md/0. Faithfully yours, etc. P.S. The /proc/mdstat file currently contains the following: Personalities : [raid1] md1 : active raid1 sdb2[1] 967412736 blocks super 1.2 [2/1] [_U] bitmap: 8/8 pages [32KB], 65536KB chunk md0 : active raid1 sdb1[1] 1023424 blocks super 1.2 [2/1] [_U] bitmap: 1/1 pages [4KB], 65536KB chunk unused devices:
How solves this problem. Thank alot

Comments

12 comments

kodeslogic

September 25, 2022 13:40
This means one of your hard disks has failed. It needs to be replaced and then re-add the new disk to the RAID array, this is a very critical task (needs to be handled by an expert). If not handled carefully you may lose all your data so before proceeding you should take the complete backup of your server to remote storage (not on the same server) so if anything goes wrong you can recover from the backup. Another easy workaround would be to build the new server and get everything migrated to the new server as soon as possible.
0
stokmu

September 25, 2022 13:47
This means one of your hard disks has failed. It needs to be replaced and then re-add the new disk to the RAID array, this is a very critical task (needs to be handled by an expert). If not handled carefully you may lose all your data so before proceeding you should take the complete backup of your server to remote storage (not on the same server) so if anything goes wrong you can recover from the backup. Another easy workaround would be to build the new server and get everything migrated to the new server as soon as possible.

Thanks your answer, this problem after previously rebooting the server. So Is the solution just migrating to a new server? is there no other solution?
0
kodeslogic

September 25, 2022 13:53
Can you share the output of: # lsblk #blkid
0
stokmu

September 25, 2022 13:55
Can you share the output of: # lsblk #blkid

[QUOTE] [root@mail ~]# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 931.5G 0 disk ??sda1 8:1 0 1000M 0 part ??sda2 8:2 0 922.7G 0 part ??sda3 8:3 0 7.8G 0 part [SWAP] sdb 8:16 0 931.5G 0 disk ??sdb1 8:17 0 1000M 0 part ? ??md0 9:0 0 999.4M 0 raid1 /boot ??sdb2 8:18 0 922.7G 0 part ? ??md1 9:1 0 922.6G 0 raid1 / ??sdb3 8:19 0 7.8G 0 part [SWAP] [root@mail ~]# blkid /dev/sda1: UUID="00864c51-051f-6429-0abc-ac124afc2aa7" UUID_SUB="1963219b-22c3-1673-0cc3-51c924f99d3e" LABEL="m1605.contaboserver.net:0" TYPE="linux_raid_member" /dev/sda2: UUID="535e3104-f569-a8f2-c0dc-a29a02206f71" UUID_SUB="98c382ec-e450-5151-37b0-b71ce51502f7" LABEL="m1605.contaboserver.net:1" TYPE="linux_raid_member" /dev/sda3: UUID="78620e94-3df9-41fe-a2d6-c885c31362a6" TYPE="swap" /dev/sdb1: UUID="00864c51-051f-6429-0abc-ac124afc2aa7" UUID_SUB="e41ab881-a5ab-a9aa-154e-143786dcbd89" LABEL="m1605.contaboserver.net:0" TYPE="linux_raid_member" /dev/sdb2: UUID="535e3104-f569-a8f2-c0dc-a29a02206f71" UUID_SUB="20621aa3-3212-f432-a1c8-aa555ee03375" LABEL="m1605.contaboserver.net:1" TYPE="linux_raid_member" /dev/sdb3: UUID="ebf37afa-33be-4730-8eba-f4e9c8560fed" TYPE="swap" /dev/md0: UUID="77e66859-119a-4858-9237-2fa5a7a5c6f2" TYPE="ext4" /dev/md1: UUID="e747843e-5acc-4e5a-af3b-133cc021cb65" TYPE="ext4" [root@mail ~]#
0
kodeslogic

September 25, 2022 14:14
It seems problem with the /dev/sda drive. Run a self test on /dev/sda using smartmontools and share the complete output # smartctl -l selftest /dev/sda
0
stokmu

September 25, 2022 14:23
It seems problem with the /dev/sda drive. Run a self test on /dev/sda using smartmontools and share the complete output # smartctl -l selftest /dev/sda

output code: [QUOTE] smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-693.21.1.el7.x86_64] (local build) Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org === START OF READ SMART DATA SECTION === SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed without error 00% 5375 - # 2 Extended offline Completed without error 80% 5372 - # 3 Short offline Completed without error 00% 20113 -
0
andrew.n

September 25, 2022 14:48
@stokmu: can you paste the output of "cat /proc/mdstat"? I'm pretty sure the RAID array got degraded due to the reboot and that it is/was rebuilding. Depending on the size of the array this could take from a couple of hours to couple of days. Furthermore the array is being rebuilt automatically every Sunday to make sure the array is in healthy state (this is run from crontab)
0
kodeslogic

September 25, 2022 14:48
Is this server umanaged or managed by your server provider? If it is managed one then your server provider should take care of this for you. If it is unmanaged server then the first thing you should do it take the complete backup to the remote location and reach out to one of the
0
stokmu

September 25, 2022 14:58
@stokmu: can you paste the output of "cat /proc/mdstat"? I'm pretty sure the RAID array got degraded due to the reboot and that it is/was rebuilding. Depending on the size of the array this could take from a couple of hours to couple of days. Furthermore the array is being rebuilt automatically every Sunday to make sure the array is in healthy state (this is run from crontab)

output code: [QUOTE] Personalities : [raid1] md1 : active raid1 sdb2[1] 967412736 blocks super 1.2 [2/1] [_U] bitmap: 8/8 pages [32KB], 65536KB chunk md0 : active raid1 sdb1[1] 1023424 blocks super 1.2 [2/1] [_U] bitmap: 1/1 pages [4KB], 65536KB chunk unused devices:
And than, what should i do after this? Thanks
0
stokmu

September 25, 2022 15:04
Is this server umanaged or managed by your server provider? If it is managed one then your server provider should take care of this for you. If it is unmanaged server then the first thing you should do it take the complete backup to the remote location and reach out to one of the
0
kodeslogic

September 25, 2022 15:32
It is not in a rebuilding state. If it is managed server then your server provider should help you with this. Make sure you take complete backup for the safe side before they perform any action.
0
andrew.n

September 25, 2022 16:29
@kodeslogic is right, it is not rebuilding.I also suggest you to consult with a cPanel certified system administrator to whom you can give access to the server and can advise the best way how to move forward.
0

Please sign in to leave a comment.

Comments

Didn't find what you were looking for?