ONTAP Discussions

7-mode "disk replace start" command never starts

DRhodes
8,257 Views

I am trying to replace a disk that is still active in an aggregate, but the physical latch that retains the disk has been damaged so the only thing keeping it in is friction.  We have a new disk to swap in, but I do not want to pull an active disk.  

 

I have tried running |disk replace start 1c.27 1c.29| and I get a prompt "really replace disk 1c.27 with 1c.29?"  No matter what I respond with, nothing happens.  If I run a |disk replace stop 1c.27| it tells me the disk isn't being replaced.

 

1c.27 and 1c.28 are both in Pool 0 on the same controller.  Both are 1k rpm disks of the same size and are both even on the same shelf.  I haven't gotten any errors about a disk mismatch but I tried adding -m to the command anyway and still get nothing.

 

What am I doing wrong?  am I not responding to the "really replace?" prompt incorrectly?  I have tried Y, Yes, True, and just hitting enter.  The documentation I have found for the command doesn't say anything about the confirmation query.

1 ACCEPTED SOLUTION

Ontapforrum
8,140 Views

When you start disk replace process, do you see anything recorded in the 'messages' log ? Or does it show 'copying' in the aggr status -r output. I have attached one such example. It's weird not to get any output. I will go through the output you have shared later. Please see the attached notepad.

View solution in original post

6 REPLIES 6

Ontapforrum
8,243 Views

Hi,

 

I hope I understood your query. 'Disk replace command' is used to logically copy data from one disk to another disk (Example: To swap out mismatched disks from a RAID group), i.e from the active RAID-GROUP Disk to a spare disk which is already available to System. Once the data is copied 'old disk' is marked spare. Disk replace command is not a 'soft signal' to eject a drive ( I am assuming this is what you meant?) Sorry if that's not you meant.

 

That was my first assumption, now coming to 2nd: If you believe that you have a failed disk '1c.27' , then you need not do anything the data will be re-constructed into the new 'spare' disk' (Provided there was a 'spare' available). All you have to do is just replace that faulty disk with the new disk and once assigned to the Node (auto/manual), it will become a new spare.

 

If you run this command : [What is the status you get?]
7-mode>aggr status -r <aggr_where_failed-disk_belong>

 

If the status:
degraded = The aggregate contains at least one RAID group with single disk failure that is not being reconstructed.
copying = The aggregate is currently the target aggregate of an active copy operation.
normal = The aggregate and all of its RAID groups are functional.

 

If it's stuck (For mechanical reasons and cannot be removed), please raise a ticket with NetApp I guess (But again it depends if your Hardware is under the support warranty)

 

Thanks!

DRhodes
8,222 Views

I don't have a "Failed" disk.  I have a disk that was physically damaged when another device was being removed from the same rack and impacted it.  The retention lock (the lever you "click" into place) broke so the drive is currently only being held in by friction and gravity.  Eventually this drive may work it's way loose so we are replacing it before we have a failure.

 

I am trying to run the drive replace command to remove this drive from the aggregate before physically removing it and replacing it with a new, undamaged drive.

 

As I said before, I am running the command and nothing is happening after I try to answer the confirmation dialogue. 

Ontapforrum
8,209 Views

Interesting, very rarely I have come across situation where you have to deal with a 'physically/cosmetically' broken but internally ok disk, I get your point.

 

Could you give this output:

filer>aggr status -s

filer>aggr status -f

 

Plus, could you show us the command you are using for disk replace, an actual screenshot  from the filer will do well.

filer> disk replace start -f <disk_name> <spare_disk_name>

 

Thanks!

DRhodes
8,176 Views

I figured it must be rare because I can find very few references to the command actually being used.  And no mention of what the proper response to the "really replace disk ...."   I copy and pasted the output below.  1c.27 is the disk that is being replaced.  1c.29 is the spare disk I am replacing it with.  Same size, rpm, Pool, and it's even on the same shelf.  No error messages saying it cannot be done.  

 

Pioneer*> disk replace start 1c.27 1c.29
*** You are about to copy and replace the following file system disk ***
Disk /aggr3_SAS/plex0/rg0/1c.27

RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)
--------- ------ ------------- ---- ---- ---- ----- -------------- --------------
data 1c.27 1c 1 11 FC:B 0 FCAL 10000 272000/557056000 274845/562884296
***
Really replace disk 1c.27 with 1c.29? Y

<<15 minutes later>>

Pioneer*>disk replace stop 1c.27
disk replace: Disk is not being replaced.

 

Pioneer*> aggr status -s

Pool1 spare disks (empty)

Pool0 spare disks

RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)
--------- ------ ------------- ---- ---- ---- ----- -------------- --------------
Spare disks for block checksum
spare 1c.29 1c 1 13 FC:B 0 FCAL 10000 272000/557056000 280104/573653840
spare 2c.19 2c 1 3 FC:A 0 FCAL 10000 272000/557056000 280104/573653840
spare 2d.77 2d 4 13 FC:A 0 ATA 7200 423111/866531584 423889/868126304
Pioneer*>

 


Pioneer*> aggr status -r
Aggregate aggr3_SAS (online, raid_dp) (block checksums)
Plex /aggr3_SAS/plex0 (online, normal, active, pool0)
RAID group /aggr3_SAS/plex0/rg0 (normal, block checksums)

RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)
--------- ------ ------------- ---- ---- ---- ----- -------------- --------------
dparity 2c.16 2c 1 0 FC:A 0 FCAL 10000 272000/557056000 274845/562884296
parity 1c.17 1c 1 1 FC:B 0 FCAL 10000 272000/557056000 280104/573653840
data 2c.18 2c 1 2 FC:A 0 FCAL 10000 272000/557056000 274845/562884296
data 1c.28 1c 1 12 FC:B 0 FCAL 10000 272000/557056000 274845/562884296
data 1c.20 1c 1 4 FC:B 0 FCAL 10000 272000/557056000 274845/562884296
data 1c.21 1c 1 5 FC:B 0 FCAL 10000 272000/557056000 280104/573653840
data 1c.22 1c 1 6 FC:B 0 FCAL 10000 272000/557056000 280104/573653840
data 2c.24 2c 1 8 FC:A 0 FCAL 10000 272000/557056000 274845/562884296
data 2c.25 2c 1 9 FC:A 0 FCAL 10000 272000/557056000 274845/562884296
data 2c.26 2c 1 10 FC:A 0 FCAL 10000 272000/557056000 274845/562884296
data 1c.27 1c 1 11 FC:B 0 FCAL 10000 272000/557056000 274845/562884296

Aggregate agg1_ATA (online, raid_dp) (block checksums)
Plex /agg1_ATA/plex0 (online, normal, active, pool0)
RAID group /agg1_ATA/plex0/rg0 (normal, block checksums)

RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)
--------- ------ ------------- ---- ---- ---- ----- -------------- --------------
dparity 1a.80 1a 5 0 FC:B 0 ATA 7200 423111/866531584 423889/868126304
parity 2d.16 2d 1 0 FC:A 0 ATA 7200 423111/866531584 423889/868126304
data 2d.64 2d 4 0 FC:A 0 ATA 7200 423111/866531584 423889/868126304
data 1a.81 1a 5 1 FC:B 0 ATA 7200 423111/866531584 423889/868126304
data 2d.65 2d 4 1 FC:A 0 ATA 7200 423111/866531584 423889/868126304
data 1a.82 1a 5 2 FC:B 0 ATA 7200 423111/866531584 423889/868126304
data 2d.66 2d 4 2 FC:A 0 ATA 7200 423111/866531584 423889/868126304
data 1a.83 1a 5 3 FC:B 0 ATA 7200 423111/866531584 423889/868126304
data 2d.67 2d 4 3 FC:A 0 ATA 7200 423111/866531584 423889/868126304
data 1a.84 1a 5 4 FC:B 0 ATA 7200 423111/866531584 423889/868126304
data 2d.68 2d 4 4 FC:A 0 ATA 7200 423111/866531584 423889/868126304
data 1a.85 1a 5 5 FC:B 0 ATA 7200 423111/866531584 423889/868126304
data 2d.69 2d 4 5 FC:A 0 ATA 7200 423111/866531584 423889/868126304
data 1a.86 1a 5 6 FC:B 0 ATA 7200 423111/866531584 423889/868126304

RAID group /agg1_ATA/plex0/rg1 (normal, block checksums)

RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)
--------- ------ ------------- ---- ---- ---- ----- -------------- --------------
dparity 2d.70 2d 4 6 FC:A 0 ATA 7200 423111/866531584 423889/868126304
parity 1a.87 1a 5 7 FC:B 0 ATA 7200 423111/866531584 423889/868126304
data 2d.71 2d 4 7 FC:A 0 ATA 7200 423111/866531584 423889/868126304
data 1a.88 1a 5 8 FC:B 0 ATA 7200 423111/866531584 423889/868126304
data 2d.72 2d 4 8 FC:A 0 ATA 7200 423111/866531584 423889/868126304
data 1a.89 1a 5 9 FC:B 0 ATA 7200 423111/866531584 423889/868126304
data 2d.73 2d 4 9 FC:A 0 ATA 7200 423111/866531584 423889/868126304
data 1a.90 1a 5 10 FC:B 0 ATA 7200 423111/866531584 423889/868126304
data 2d.74 2d 4 10 FC:A 0 ATA 7200 423111/866531584 423889/868126304
data 1a.91 1a 5 11 FC:B 0 ATA 7200 423111/866531584 423889/868126304
data 2d.75 2d 4 11 FC:A 0 ATA 7200 423111/866531584 423889/868126304
data 1a.92 1a 5 12 FC:B 0 ATA 7200 423111/866531584 423889/868126304
data 2d.76 2d 4 12 FC:A 0 ATA 7200 423111/866531584 423889/868126304
data 1a.93 1a 5 13 FC:B 0 ATA 7200 423111/866531584 423889/868126304

Aggregate aggr2_ATA (online, raid_dp) (block checksums)
Plex /aggr2_ATA/plex0 (online, normal, active, pool0)
RAID group /aggr2_ATA/plex0/rg0 (normal, block checksums)

RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)
--------- ------ ------------- ---- ---- ---- ----- -------------- --------------
dparity 2a.17 2a 1 1 FC:A 0 FCAL 15000 418000/856064000 420584/861357448
parity 1d.16 1d 1 0 FC:B 0 FCAL 15000 418000/856064000 420584/861357448
data 2a.26 2a 1 10 FC:A 0 FCAL 15000 418000/856064000 420584/861357448
data 1d.21 1d 1 5 FC:B 0 FCAL 15000 418000/856064000 420584/861357448
data 2a.29 2a 1 13 FC:A 0 FCAL 15000 418000/856064000 420584/861357448
data 2a.23 2a 1 7 FC:A 0 FCAL 15000 418000/856064000 420584/861357448
data 1d.25 1d 1 9 FC:B 0 FCAL 15000 418000/856064000 420584/861357448
data 1d.22 1d 1 6 FC:B 0 FCAL 15000 418000/856064000 420584/861357448
data 1d.20 1d 1 4 FC:B 0 FCAL 15000 418000/856064000 420584/861357448
data 2a.19 2a 1 3 FC:A 0 FCAL 15000 418000/856064000 420584/861357448
data 1d.28 1d 1 12 FC:B 0 FCAL 15000 418000/856064000 420584/861357448
data 1d.18 1d 1 2 FC:B 0 FCAL 15000 418000/856064000 420156/860480768
data 2a.24 2a 1 8 FC:A 0 FCAL 15000 418000/856064000 420584/861357448


Pool1 spare disks (empty)

Pool0 spare disks

RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)
--------- ------ ------------- ---- ---- ---- ----- -------------- --------------
Spare disks for block checksum
spare 1c.29 1c 1 13 FC:B 0 FCAL 10000 272000/557056000 280104/573653840
spare 2c.19 2c 1 3 FC:A 0 FCAL 10000 272000/557056000 280104/573653840
spare 2d.77 2d 4 13 FC:A 0 ATA 7200 423111/866531584 423889/868126304

Broken disks

RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)
--------- ------ ------------- ---- ---- ---- ----- -------------- --------------
bad label 2a.27 2a 1 11 FC:A 0 FCAL 15000 418000/856064000 420584/861357448  << I know about this.  It is on my list of things to fix>>

Partner disks

RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)
--------- ------ ------------- ---- ---- ---- ----- -------------- --------------
partner 2d.42 2d 2 10 FC:A 0 ATA 7200 0/0 423889/868126304
partner 2d.45 2d 2 13 FC:A 0 ATA 7200 0/0 423889/868126304
partner 2d.61 2d 3 13 FC:A 0 ATA 7200 0/0 423889/868126304
partner 2d.57 2d 3 9 FC:A 0 ATA 7200 0/0 423889/868126304
partner 2d.59 2d 3 11 FC:A 0 ATA 7200 0/0 423889/868126304
partner 2d.58 2d 3 10 FC:A 0 ATA 7200 0/0 423889/868126304
partner 2d.53 2d 3 5 FC:A 0 ATA 7200 0/0 423889/868126304
partner 2d.17 2d 1 1 FC:A 0 ATA 7200 0/0 423889/868126304
partner 2d.52 2d 3 4 FC:A 0 ATA 7200 0/0 423889/868126304
partner 2d.48 2d 3 0 FC:A 0 ATA 7200 0/0 423889/868126304
partner 2d.50 2d 3 2 FC:A 0 ATA 7200 0/0 423889/868126304
partner 2d.54 2d 3 6 FC:A 0 ATA 7200 0/0 423889/868126304
partner 2d.56 2d 3 8 FC:A 0 ATA 7200 0/0 423889/868126304
partner 2d.51 2d 3 3 FC:A 0 ATA 7200 0/0 423889/868126304
partner 2d.55 2d 3 7 FC:A 0 ATA 7200 0/0 423889/868126304
partner 2d.60 2d 3 12 FC:A 0 ATA 7200 0/0 423889/868126304
partner 2d.49 2d 3 1 FC:A 0 ATA 7200 0/0 423889/868126304
partner 1a.107 1a 6 11 FC:B 0 ATA 7200 0/0 423889/868126304
partner 1a.98 1a 6 2 FC:B 0 ATA 7200 0/0 423889/868126304
partner 1a.108 1a 6 12 FC:B 0 ATA 7200 0/0 423889/868126304
partner 1a.102 1a 6 6 FC:B 0 ATA 7200 0/0 423889/868126304
partner 1a.104 1a 6 8 FC:B 0 ATA 7200 0/0 423889/868126304
partner 2d.100 2d 6 4 FC:A 0 ATA 7200 0/0 423889/868126304
partner 1a.101 1a 6 5 FC:B 0 ATA 7200 0/0 423889/868126304
partner 1a.106 1a 6 10 FC:B 0 ATA 7200 0/0 423889/868126304
partner 1a.96 1a 6 0 FC:B 0 ATA 7200 0/0 423889/868126304
partner 1a.99 1a 6 3 FC:B 0 ATA 7200 0/0 423889/868126304
partner 1a.103 1a 6 7 FC:B 0 ATA 7200 0/0 423889/868126304
partner 1a.97 1a 6 1 FC:B 0 ATA 7200 0/0 423889/868126304
partner 1a.109 1a 6 13 FC:B 0 ATA 7200 0/0 423889/868126304
partner 1a.105 1a 6 9 FC:B 0 ATA 7200 0/0 423889/868126304
partner 2d.26 2d 1 10 FC:A 0 ATA 7200 0/0 423889/868126304
partner 2d.23 2d 1 7 FC:A 0 ATA 7200 0/0 423889/868126304
partner 2d.18 2d 1 2 FC:A 0 ATA 7200 0/0 423889/868126304
partner 2d.21 2d 1 5 FC:A 0 ATA 7200 0/0 423889/868126304
partner 2d.22 2d 1 6 FC:A 0 ATA 7200 0/0 423889/868126304
partner 2d.20 2d 1 4 FC:A 0 ATA 7200 0/0 423889/868126304
partner 2d.38 2d 2 6 FC:A 0 ATA 7200 0/0 423889/868126304
partner 2d.32 2d 2 0 FC:A 0 ATA 7200 0/0 423889/868126304
partner 2d.37 2d 2 5 FC:A 0 ATA 7200 0/0 423889/868126304
partner 2d.33 2d 2 1 FC:A 0 ATA 7200 0/0 423889/868126304
partner 2d.40 2d 2 8 FC:A 0 ATA 7200 0/0 423889/868126304
partner 2d.43 2d 2 11 FC:A 0 ATA 7200 0/0 423889/868126304
partner 2d.39 2d 2 7 FC:A 0 ATA 7200 0/0 423889/868126304
partner 2d.25 2d 1 9 FC:A 0 ATA 7200 0/0 423889/868126304
partner 2d.36 2d 2 4 FC:A 0 ATA 7200 0/0 423889/868126304
partner 2d.28 2d 1 12 FC:A 0 ATA 7200 0/0 423889/868126304
partner 2d.27 2d 1 11 FC:A 0 ATA 7200 0/0 423889/868126304
partner 2d.19 2d 1 3 FC:A 0 ATA 7200 0/0 423889/868126304
partner 2d.35 2d 2 3 FC:A 0 ATA 7200 0/0 423889/868126304
partner 2d.44 2d 2 12 FC:A 0 ATA 7200 0/0 423889/868126304
partner 2d.34 2d 2 2 FC:A 0 ATA 7200 0/0 423889/868126304
partner 2d.29 2d 1 13 FC:A 0 ATA 7200 0/0 423889/868126304
partner 2d.24 2d 1 8 FC:A 0 ATA 7200 0/0 423889/868126304
partner 2d.41 2d 2 9 FC:A 0 ATA 7200 0/0 423889/868126304
partner 1c.23 1c 1 7 FC:B 1 FCAL 10000 0/0 274845/562884296
Pioneer*>

 

Ontapforrum
8,141 Views

When you start disk replace process, do you see anything recorded in the 'messages' log ? Or does it show 'copying' in the aggr status -r output. I have attached one such example. It's weird not to get any output. I will go through the output you have shared later. Please see the attached notepad.

DRhodes
8,128 Views

After looking at the attachment, I went back and tried the command again.  That attachment is the first time I have seen any mention about the acknowledgement prompt or how it is supposed to be answered, and the first time I have seen the expected output listed.  

 

All I had ever gotten before was the acknowledgement request and then a return to the command prompt.  

 

I ran the command again and answered the query with a lowercase y.  This time, it gave the expected output and after some time it now shows the 1c.29 as an aggregate disk and the 1c.27 is now listed as a spare that needs to be zeroed.  

 

Thank you. 

Public