ONTAP Hardware

Flash Pool failed! NetApp FAS2240 - 7-mode

r0b0tman
10,100 Views

Hi NetApp Community,

 

I have a NetApp FAS2240 running Data Ontap 8.2 in 7-mode. We seem to have hit the following bug where the SSD disks have been up for 70,000 hours / 8 years. The SSD disks which were configured as the FlashPool have failed as the firmware was not updated.

https://mysupport.netapp.com/site/bugs-online/product/ONTAP/BURT/1335350

 

The failed SSD's have been replaced with reconditioned SSD disks from a 3rd party supplier, but we are getting plex errors.

Does anyone know how to resolve this issue ?

 

r0b0tman_1-1662117994703.png

 

r0b0tman_0-1662117974875.png

 

1 ACCEPTED SOLUTION

SpindleNinja
10,041 Views

Flashpool doesn't work that way - The data lives on the Flashpool RG. 

 

"Data inserted into the cache by using the write caching policy exists only in cache; there is no copy in HDDs. Flash Pool cache is RAID protected." 

 

via - https://docs.netapp.com/us-en/ontap/pdfs/sidebar/Flash_Pool_caching_policies_and_SSD_partitioning.pdf 

 

Also to note: you can't remove the flashcache RG from the aggr without destroying the aggr. 

View solution in original post

13 REPLIES 13

Ontapforrum
10,042 Views

Have you tried logging a case ? You may get a limited support considering the ontap version on your end of life filer. I don't know the answer for this issue but just a thought - if this is a flashpool, can you not destroy it and rebuild, considering it's a cache to the data-aggregate ?

SpindleNinja
10,042 Views

Flashpool doesn't work that way - The data lives on the Flashpool RG. 

 

"Data inserted into the cache by using the write caching policy exists only in cache; there is no copy in HDDs. Flash Pool cache is RAID protected." 

 

via - https://docs.netapp.com/us-en/ontap/pdfs/sidebar/Flash_Pool_caching_policies_and_SSD_partitioning.pdf 

 

Also to note: you can't remove the flashcache RG from the aggr without destroying the aggr. 

r0b0tman
9,955 Views

Thanks for the reply. I was thinking of this as an option myself, but cannot find the commands on how to do this.

SpindleNinja
10,025 Views

This issue/KB/BURT came up the other day with a fellow A-Teamer and I went looking through the BURT.  Unfortunately there didn't look like there was a way to reset/revive  the drives in the dead raid group.    

 

There is most likely data on the SSDs that you replaced, if you do plan to send it out for recovery, they will most likely want those too.  

r0b0tman
9,955 Views

Hi SpindleNinja,

 

Thanks for the replies. Most of the documentation about FlashPool states that it is not possible to disable the FlashPool without destroying the aggregate, however the KB article below mentions that it is possible to disable FlashPool, but it seems this would need to be done by NetApp Support.

We may need to log a one of support call with NetApp to resolve.

 

https://kb.netapp.com/?title=Advice_and_Troubleshooting%2FData_Storage_Software%2FONTAP_OS%2FCan_Flash_Pool_be_disabled_or_removed_from_an_aggregate%2...

 

The only other thing that I can think of would be to reinitialise the FlashPool, but I have seen no information on how to do this from a maintenance mode boot.

Ontapforrum
9,946 Views

Yes, log a case with NetApp. Do let us know what they suggest for your case ?

SpindleNinja
9,926 Views

Yup per the KB, support l2 required for the procedure.   

 

Re the reinitialize comment, Are you trying to just get an aggr online to use at this point, or still trying to recover it?     The "disabling" won't help with the recovery part unfortunately. 

If you just want a new aggr to work off of, you can try to delete the offline/dead one and start over after zeroing the drives. 

r0b0tman
9,925 Views

Hi SpindleNinja,

 

We are trying to bring the aggregate back online as we need to recover data from it.

Is there anyway to initialize the FlashPool RAID group in maintenance mode ? 

SpindleNinja
9,922 Views

Not that I'm aware of.

 

Though, typically, when you do any Initialize it zero's out drives, which in order to attempt any recovery you'll need to keep the original drives intact as they are.    

What's the aggrs status? I'm assuming it's showing offline though.

 

Though,  if it's showing as online, restricted or degraded you can try wafliron - (this is typically done under the recommendation of support, I think this system and ONTAP ver are currently out of support in general. 

 

https://kb.netapp.com/Advice_and_Troubleshooting/Data_Storage_Software/ONTAP_OS/Overview_of_wafliron. note - I've not ever heard it being used in the context of fixing or recovering a hybrid aggr though. 

 

There might be some companies out there that specialize in data recovery of this sorts. Let me see if I can get a name. 

SpindleNinja
9,904 Views

Give these folks a look - https://www.ontrack.com/en-gb

r0b0tman
9,872 Views

Thanks for the link for recovery.

The aggregate is currently offline as one of the plex's has an error.

DingLongFei
6,149 Views

Hi, do you have resolved this problem.

Thanks a lot.

AlexDawson
6,052 Views

The only option is to engage a third party data recovery company such as Kroll OnTrack. If the flashpool SSDs fail, the whole aggregate fails.

Public