Skip to main content

Notification: DegradedArray event on /dev/md/

Comments

12 comments

  • kodeslogic
    This means one of your hard disks has failed. It needs to be replaced and then re-add the new disk to the RAID array, this is a very critical task (needs to be handled by an expert). If not handled carefully you may lose all your data so before proceeding you should take the complete backup of your server to remote storage (not on the same server) so if anything goes wrong you can recover from the backup. Another easy workaround would be to build the new server and get everything migrated to the new server as soon as possible.
    0
  • stokmu
    This means one of your hard disks has failed. It needs to be replaced and then re-add the new disk to the RAID array, this is a very critical task (needs to be handled by an expert). If not handled carefully you may lose all your data so before proceeding you should take the complete backup of your server to remote storage (not on the same server) so if anything goes wrong you can recover from the backup. Another easy workaround would be to build the new server and get everything migrated to the new server as soon as possible.

    Thanks your answer, this problem after previously rebooting the server. So Is the solution just migrating to a new server? is there no other solution?
    0
  • kodeslogic
    Can you share the output of: # lsblk #blkid
    0
  • stokmu
    Can you share the output of: # lsblk #blkid

    [QUOTE] [root@mail ~]# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 931.5G 0 disk ??sda1 8:1 0 1000M 0 part ??sda2 8:2 0 922.7G 0 part ??sda3 8:3 0 7.8G 0 part [SWAP] sdb 8:16 0 931.5G 0 disk ??sdb1 8:17 0 1000M 0 part ? ??md0 9:0 0 999.4M 0 raid1 /boot ??sdb2 8:18 0 922.7G 0 part ? ??md1 9:1 0 922.6G 0 raid1 / ??sdb3 8:19 0 7.8G 0 part [SWAP] [root@mail ~]# blkid /dev/sda1: UUID="00864c51-051f-6429-0abc-ac124afc2aa7" UUID_SUB="1963219b-22c3-1673-0cc3-51c924f99d3e" LABEL="m1605.contaboserver.net:0" TYPE="linux_raid_member" /dev/sda2: UUID="535e3104-f569-a8f2-c0dc-a29a02206f71" UUID_SUB="98c382ec-e450-5151-37b0-b71ce51502f7" LABEL="m1605.contaboserver.net:1" TYPE="linux_raid_member" /dev/sda3: UUID="78620e94-3df9-41fe-a2d6-c885c31362a6" TYPE="swap" /dev/sdb1: UUID="00864c51-051f-6429-0abc-ac124afc2aa7" UUID_SUB="e41ab881-a5ab-a9aa-154e-143786dcbd89" LABEL="m1605.contaboserver.net:0" TYPE="linux_raid_member" /dev/sdb2: UUID="535e3104-f569-a8f2-c0dc-a29a02206f71" UUID_SUB="20621aa3-3212-f432-a1c8-aa555ee03375" LABEL="m1605.contaboserver.net:1" TYPE="linux_raid_member" /dev/sdb3: UUID="ebf37afa-33be-4730-8eba-f4e9c8560fed" TYPE="swap" /dev/md0: UUID="77e66859-119a-4858-9237-2fa5a7a5c6f2" TYPE="ext4" /dev/md1: UUID="e747843e-5acc-4e5a-af3b-133cc021cb65" TYPE="ext4" [root@mail ~]#
    0
  • kodeslogic
    It seems problem with the /dev/sda drive. Run a self test on /dev/sda using smartmontools and share the complete output # smartctl -l selftest /dev/sda
    0
  • stokmu
    It seems problem with the /dev/sda drive. Run a self test on /dev/sda using smartmontools and share the complete output # smartctl -l selftest /dev/sda

    output code: [QUOTE] smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-693.21.1.el7.x86_64] (local build) Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org === START OF READ SMART DATA SECTION === SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed without error 00% 5375 - # 2 Extended offline Completed without error 80% 5372 - # 3 Short offline Completed without error 00% 20113 -
    0
  • andrew.n
    @stokmu: can you paste the output of "cat /proc/mdstat"? I'm pretty sure the RAID array got degraded due to the reboot and that it is/was rebuilding. Depending on the size of the array this could take from a couple of hours to couple of days. Furthermore the array is being rebuilt automatically every Sunday to make sure the array is in healthy state (this is run from crontab)
    0
  • kodeslogic
    Is this server umanaged or managed by your server provider? If it is managed one then your server provider should take care of this for you. If it is unmanaged server then the first thing you should do it take the complete backup to the remote location and reach out to one of the
    0
  • stokmu
    @stokmu: can you paste the output of "cat /proc/mdstat"? I'm pretty sure the RAID array got degraded due to the reboot and that it is/was rebuilding. Depending on the size of the array this could take from a couple of hours to couple of days. Furthermore the array is being rebuilt automatically every Sunday to make sure the array is in healthy state (this is run from crontab)

    output code: [QUOTE] Personalities : [raid1] md1 : active raid1 sdb2[1] 967412736 blocks super 1.2 [2/1] [_U] bitmap: 8/8 pages [32KB], 65536KB chunk md0 : active raid1 sdb1[1] 1023424 blocks super 1.2 [2/1] [_U] bitmap: 1/1 pages [4KB], 65536KB chunk unused devices:
    And than, what should i do after this? Thanks
    0
  • stokmu
    Is this server umanaged or managed by your server provider? If it is managed one then your server provider should take care of this for you. If it is unmanaged server then the first thing you should do it take the complete backup to the remote location and reach out to one of the
    0
  • kodeslogic
    It is not in a rebuilding state. If it is managed server then your server provider should help you with this. Make sure you take complete backup for the safe side before they perform any action.
    0
  • andrew.n
    @kodeslogic is right, it is not rebuilding.I also suggest you to consult with a cPanel certified system administrator to whom you can give access to the server and can advise the best way how to move forward.
    0

Please sign in to leave a comment.