ONTAP Hardware

3 FC disks failed at the same time?und

oweinmann
3,501 Views

Hi,  we have FAS3020c and this morning at around quartert to 6 DFM reported that the partner node is dead. I ran into the office and checked the lights on the harddisks on the shelves. No amber lights! So I thought it is not  a harddisk problem. I tried a cf giveback on the takeover node and this reported two disks missing on the partner node. I checked again and all lights were green. I forced a giveback and noticed that the complete aggr0 was broken.

vol status -f showed no broken disks

I turned off the disk shelve and then 3 FC disks were reported as broken. How likely is it that 3 disks fail at the same time? I mean there wasn't even notification from DFM that one of the disks had failed.

Could this be a firmware bug?

Any Ideas how to recover some of the data of the broken aggregate.

Best Regards,

Oliver

5 REPLIES 5

oweinmann
3,501 Views

Ok, I rebootet the partner node, reconnected all shelve cables and did a cf giveback -f.

I unfailed all 3 failed disks and brought the aggregate online again.

So far it is running and I can migrate all VMs to a new datastore on a different storage. I have no clue what happened on the FAS but I will move all data off it and not use it anymore.

oweinmann
3,501 Views

In the messages log these two errors are displayed before the filer switched to it's parter node and declared itself as dead:

Wed Feb  6 05:43:46 CET [DS-SAN-01: fci.adapter.error:warning]: Fibre Channel adapter driver encountered error "Bad response entry" on adapter 0a.

Wed Feb  6 05:43:46 CET [DS-SAN-01: fci.adapter.error:warning]: Fibre Channel adapter driver encountered error "Revisited response entry" on adapter 0a.

I can't find any good information on these two error. Any clues?

bsti
3,501 Views

I've not run into that particular issue.  Sounds like potential bad hardware.  Have you opened a case with support on this?

oweinmann
3,501 Views

Hi,

the system is out of warranty, so I don't think support will give me any hints on this issue.

bsti
3,501 Views

Gah.. Sorry I don't have a better answer for you.  The only two things I can think of are:

1)  Bad hardware

2)  Driver or software bug

It really sounds like #1 to me.  You may be able to verify by booting to the diags on the controller and funning a full diag against the FC adapters.  Below is the guide for the diag tools:

https://library.netapp.com/ecm/ecm_get_file/ECMP1112531

Hope that helps.

Public