Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Performance Diagnosis in Performance Advisor didn't show failed disk
2010-06-07
07:50 AM
3,676 Views
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Performed following testing for "Diagnosis" option of the Performance advisor and find out that manually failed disk didn't appear on diagnosis.
Manually failed the disk on the filer to degrade the performance of filer. Started coping data on the volume for which the disk was part. Latency increased considerably. Then performed PA diagnosis on the filer for the time during which disk was failed and data was copied. Diagnosis didn't show failed disk or reconstructing disk??
stx601na08> disk fail 0b.29
*** You are about to prefail the following file system disk, ***
*** which will eventually result in it being failed ***
Disk /test_aggr1/plex0/rg0/0b.29
RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)
--------- ------ ------------- ---- ---- ---- ----- -------------- --------------
data 0b.29 0b 1 13 FC:B - ATA 7200 423111/866531584 423889/868126304
***
Really prefail disk 0b.29? y
disk fail: The following disk was prefailed: 0b.29
Disk 0b.29 has been prefailed. Its contents will be copied to a
replacement disk, and the prefailed disk will be failed out.
stx601na08> sysconfig -r
Aggregate test_aggr1 (online, raid4) (block checksums)
Plex /test_aggr1/plex0 (online, normal, active)
RAID group /test_aggr1/plex0/rg0 (normal)
RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)
--------- ------ ------------- ---- ---- ---- ----- -------------- --------------
parity 0a.29 0a 1 13 FC:A - ATA 7200 423111/866531584 423889/868126304
data 0b.29 0b 1 13 FC:B - ATA 7200 423111/866531584 423889/868126304 (prefail, copy in progress)
-> copy 0c.112 0c 7 0 FC:B - ATA 7200 423111/866531584 635858/1302238304 (copy 0% completed)
Solved! See The Solution
1 ACCEPTED SOLUTION
migration has accepted the solution
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Muhammad,
I suspect that the way you failed the disk does not trigger any events.
With "disk fail <disk>" you do not fail the disk in a way that causes a reconstruct.
As you can see from the command output, the disk is "pre-failed".
filer> disk fail 0a.18
*** You are about to prefail the following file system disk, ***
*** which will eventually result in it being failed ***
Disk /aggr0/plex0/rg0/0a.18
RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)
--------- ------ ------------- ---- ---- ---- ----- -------------- --------------
dparity 0a.18 0a 1 2 FC:A - FCAL 10000 136000/278528000 137422/281442144
***
Really prefail disk 0a.18?
In this sate ONTAP just tries to copy all readable data from this disk to a spare disk.
Also your output from sysconfig -r shows a copy operation, not a reconstruct.
If you really want to force a reconstruct of a raid group, use the command "disk fail -i <disk>".
This will fail the disk immedeately. No data copy is involved. The disk is completely reconstructed
onto a new spare disk.
Compare the following output to the one above carefully.
filer> disk fail -i 0a.18
*** You are about to fail the following file system disk ***
Disk /aggr0/plex0/rg0/0a.18
RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)
--------- ------ ------------- ---- ---- ---- ----- -------------- --------------
dparity 0a.18 0a 1 2 FC:A - FCAL 10000 136000/278528000 137422/281442144
***
Really fail disk 0a.18?
Now PerfAdvisor should show the reconstruct event.
regards, Niels
3 REPLIES 3
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Please note that 'Performance Diagnosis' is performed/verified against the last available configuration with the DFM.
The rule 'Disk Reconstruction in Progress' depends/uses the configuration of the disks in the raid group.
The raid configuration is updated when the diskmon runs (every 4 hrs by default). Hence you would not have got the appropriate error message as the diagnosis is not performed with the latest configuration.
It is recommended that DFM server be set as the SNMP trap reciever. Please note that storage system generates 'disk:failed' trap which when DFM server recieves shall invoke the diskmon. diskmon updates the (raid) configuration.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The disk monitor running as part of Operations Manager generates the
disk failed event and the disk monitoring interval is 4 hours by default.
Looks like the disk monitor did not run in your case and hence the
Operations Manager did not generate the disk failed event and thats why
PA diagnosis did not show this event.
You can forcefully run the disk monitor (and all other monitors) on your
filer with the following command:
dfm host discover <host-name-or-id>
and then see if the disk failed event gets generated by Operations
Manager and shows up in PA too.
Regards
Harish
migration has accepted the solution
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Muhammad,
I suspect that the way you failed the disk does not trigger any events.
With "disk fail <disk>" you do not fail the disk in a way that causes a reconstruct.
As you can see from the command output, the disk is "pre-failed".
filer> disk fail 0a.18
*** You are about to prefail the following file system disk, ***
*** which will eventually result in it being failed ***
Disk /aggr0/plex0/rg0/0a.18
RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)
--------- ------ ------------- ---- ---- ---- ----- -------------- --------------
dparity 0a.18 0a 1 2 FC:A - FCAL 10000 136000/278528000 137422/281442144
***
Really prefail disk 0a.18?
In this sate ONTAP just tries to copy all readable data from this disk to a spare disk.
Also your output from sysconfig -r shows a copy operation, not a reconstruct.
If you really want to force a reconstruct of a raid group, use the command "disk fail -i <disk>".
This will fail the disk immedeately. No data copy is involved. The disk is completely reconstructed
onto a new spare disk.
Compare the following output to the one above carefully.
filer> disk fail -i 0a.18
*** You are about to fail the following file system disk ***
Disk /aggr0/plex0/rg0/0a.18
RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)
--------- ------ ------------- ---- ---- ---- ----- -------------- --------------
dparity 0a.18 0a 1 2 FC:A - FCAL 10000 136000/278528000 137422/281442144
***
Really fail disk 0a.18?
Now PerfAdvisor should show the reconstruct event.
regards, Niels
