Yeah…try to mitigate against this in the future. We basically were doing a reshape to grow the array using these commands:

# mdadm –add /dev/md0 /dev/sdfp1 # mdadm –add /dev/md0 /dev/sdgp1 # mdadm –grow /dev/md0 –raid-devices=9

So what initially happened is the box lost power and then the power came back the box automatically restarted…
With this, mdadm actually assembled the array in a read-only mode and saw the array as the new number of devices:

root@whyte:/home/god# cat /proc/mdstat Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md127 : active (auto-read-only) raid6 sdd1[4] sdi1[3] sdh1[2] sdj1[0] sde1[1] sdb1[5] sdc1[6] sdg1[8] sdf1[7] 8790405888 blocks super 1.2 level 6, 64k chunk, algorithm 2 [99] [UUUUUUUUU] unused devices: root@whyte:/home/god# mdadm -D /dev/md127 /dev/md127: Version : 1.2 Creation Time : Sun Nov 3 12:37:08 2013 Raid Level : raid6 Array Size : 8790405888 (8383.18 GiB 9001.38 GB) Used Dev Size : 2930135296 (2794.39 GiB 3000.46 GB) Raid Devices : 9 Total Devices : 9 Persistence : Superblock is persistent Update Time : Sat Nov 16 13:24:05 2013 State : clean Active Devices : 9 Working Devices : 9 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 64K Delta Devices : 4, (5->9) Name : psoa:0 UUID : 91f3fbda:3e3d14b6:3bb9aea0:9769ab89 Events : 6198 Number Major Minor RaidDevice State 0 8 145 0 active sync /dev/sdj1 1 8 65 1 active sync /dev/sde1 2 8 113 2 active sync /dev/sdh1 3 8 129 3 active sync /dev/sdi1 4 8 49 4 active sync /dev/sdd1 8 8 97 5 active sync /dev/sdg1 7 8 81 6 active sync /dev/sdf1 6 8 33 7 active sync /dev/sdc1 5 8 17 8 active sync /dev/sdb1

However, it didn’t seem to have started a reshape. At this point, I really had no clue what to do. I figured we would have to somehow start the entire reshape process over. But for kicks, I tried stopping the array and re-assembling it just to see what would happen:

mdadm -S /dev/md127 mdadm –assemble /dev/md1 –uuid=91f3fbda:3e3d14b6:3bb9aea0:9769ab89 mdadm: /dev/md1 has been started with 9 drives. root@whyte:~# mdadm -D /dev/md1 /dev/md1: Version : 1.2 Creation Time : Sun Nov 3 12:37:08 2013 Raid Level : raid6 Array Size : 8790405888 (8383.18 GiB 9001.38 GB) Used Dev Size : 2930135296 (2794.39 GiB 3000.46 GB) Raid Devices : 9 Total Devices : 9 Persistence : Superblock is persistent Update Time : Sat Nov 16 13:36:36 2013 State : clean, reshaping Active Devices : 9 Working Devices : 9 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 64K Reshape Status : 55% complete Delta Devices : 4, (5->9) Name : psoa:0 UUID : 91f3fbda:3e3d14b6:3bb9aea0:9769ab89 Events : 6200 Number Major Minor RaidDevice State 0 8 145 0 active sync /dev/sdj1 1 8 65 1 active sync /dev/sde1 2 8 113 2 active sync /dev/sdh1 3 8 129 3 active sync /dev/sdi1 4 8 49 4 active sync /dev/sdd1 8 8 97 5 active sync /dev/sdg1 7 8 81 6 active sync /dev/sdf1 6 8 33 7 active sync /dev/sdc1 5 8 17 8 active sync /dev/sdb1 cat /proc/mdstat Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md1 : active raid6 sdj1[0] sdb1[5] sdc1[6] sdf1[7] sdg1[8] sdd1[4] sdi1[3] sdh1[2] sde1[1] 8790405888 blocks super 1.2 level 6, 64k chunk, algorithm 2 [99] [UUUUUUUUU] [===========>………] reshape = 55.8% (16356903682930135296) finish=229.0min speed=94192K/sec

So apparantly, it knew where it left off (or just rescanned to see how much had been completed, and resumed from where it left off (I remember it being around 50%).
This is actually really awesome.

Mario Loria is a builder of diverse infrastructure with modern workloads on both bare-metal and cloud platforms. He's traversed roles in system administration, network engineering, and DevOps. You can learn more about him here.