Active IQ Unified Manager Discussions
Active IQ Unified Manager Discussions
Performed following testing for "Diagnosis" option of the Performance advisor and find out that manually failed disk didn't appear on diagnosis.
Manually failed the disk on the filer to degrade the performance of filer. Started coping data on the volume for which the disk was part. Latency increased considerably. Then performed PA diagnosis on the filer for the time during which disk was failed and data was copied. Diagnosis didn't show failed disk or reconstructing disk??
stx601na08> disk fail 0b.29
*** You are about to prefail the following file system disk, ***
*** which will eventually result in it being failed ***
Disk /test_aggr1/plex0/rg0/0b.29
RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)
--------- ------ ------------- ---- ---- ---- ----- -------------- --------------
data 0b.29 0b 1 13 FC:B - ATA 7200 423111/866531584 423889/868126304
***
Really prefail disk 0b.29? y
disk fail: The following disk was prefailed: 0b.29
Disk 0b.29 has been prefailed. Its contents will be copied to a
replacement disk, and the prefailed disk will be failed out.
stx601na08> sysconfig -r
Aggregate test_aggr1 (online, raid4) (block checksums)
Plex /test_aggr1/plex0 (online, normal, active)
RAID group /test_aggr1/plex0/rg0 (normal)
RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)
--------- ------ ------------- ---- ---- ---- ----- -------------- --------------
parity 0a.29 0a 1 13 FC:A - ATA 7200 423111/866531584 423889/868126304
data 0b.29 0b 1 13 FC:B - ATA 7200 423111/866531584 423889/868126304 (prefail, copy in progress)
-> copy 0c.112 0c 7 0 FC:B - ATA 7200 423111/866531584 635858/1302238304 (copy 0% completed)
Solved! See The Solution
Hi Muhammad,
I suspect that the way you failed the disk does not trigger any events.
With "disk fail <disk>" you do not fail the disk in a way that causes a reconstruct.
As you can see from the command output, the disk is "pre-failed".
filer> disk fail 0a.18
*** You are about to prefail the following file system disk, ***
*** which will eventually result in it being failed ***
Disk /aggr0/plex0/rg0/0a.18
RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)
--------- ------ ------------- ---- ---- ---- ----- -------------- --------------
dparity 0a.18 0a 1 2 FC:A - FCAL 10000 136000/278528000 137422/281442144
***
Really prefail disk 0a.18?
In this sate ONTAP just tries to copy all readable data from this disk to a spare disk.
Also your output from sysconfig -r shows a copy operation, not a reconstruct.
If you really want to force a reconstruct of a raid group, use the command "disk fail -i <disk>".
This will fail the disk immedeately. No data copy is involved. The disk is completely reconstructed
onto a new spare disk.
Compare the following output to the one above carefully.
filer> disk fail -i 0a.18
*** You are about to fail the following file system disk ***
Disk /aggr0/plex0/rg0/0a.18
RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)
--------- ------ ------------- ---- ---- ---- ----- -------------- --------------
dparity 0a.18 0a 1 2 FC:A - FCAL 10000 136000/278528000 137422/281442144
***
Really fail disk 0a.18?
Now PerfAdvisor should show the reconstruct event.
regards, Niels
Please note that 'Performance Diagnosis' is performed/verified against the last available configuration with the DFM.
The rule 'Disk Reconstruction in Progress' depends/uses the configuration of the disks in the raid group.
The raid configuration is updated when the diskmon runs (every 4 hrs by default). Hence you would not have got the appropriate error message as the diagnosis is not performed with the latest configuration.
It is recommended that DFM server be set as the SNMP trap reciever. Please note that storage system generates 'disk:failed' trap which when DFM server recieves shall invoke the diskmon. diskmon updates the (raid) configuration.
The disk monitor running as part of Operations Manager generates the
disk failed event and the disk monitoring interval is 4 hours by default.
Looks like the disk monitor did not run in your case and hence the
Operations Manager did not generate the disk failed event and thats why
PA diagnosis did not show this event.
You can forcefully run the disk monitor (and all other monitors) on your
filer with the following command:
dfm host discover <host-name-or-id>
and then see if the disk failed event gets generated by Operations
Manager and shows up in PA too.
Regards
Harish
Hi Muhammad,
I suspect that the way you failed the disk does not trigger any events.
With "disk fail <disk>" you do not fail the disk in a way that causes a reconstruct.
As you can see from the command output, the disk is "pre-failed".
filer> disk fail 0a.18
*** You are about to prefail the following file system disk, ***
*** which will eventually result in it being failed ***
Disk /aggr0/plex0/rg0/0a.18
RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)
--------- ------ ------------- ---- ---- ---- ----- -------------- --------------
dparity 0a.18 0a 1 2 FC:A - FCAL 10000 136000/278528000 137422/281442144
***
Really prefail disk 0a.18?
In this sate ONTAP just tries to copy all readable data from this disk to a spare disk.
Also your output from sysconfig -r shows a copy operation, not a reconstruct.
If you really want to force a reconstruct of a raid group, use the command "disk fail -i <disk>".
This will fail the disk immedeately. No data copy is involved. The disk is completely reconstructed
onto a new spare disk.
Compare the following output to the one above carefully.
filer> disk fail -i 0a.18
*** You are about to fail the following file system disk ***
Disk /aggr0/plex0/rg0/0a.18
RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)
--------- ------ ------------- ---- ---- ---- ----- -------------- --------------
dparity 0a.18 0a 1 2 FC:A - FCAL 10000 136000/278528000 137422/281442144
***
Really fail disk 0a.18?
Now PerfAdvisor should show the reconstruct event.
regards, Niels