AFF
AFF
If a heavy load with an access size of 4KB (Queue=128/Random=100%/Write=100%) is applied, 2 drives in RAID 6 will fail.
This will definitely happen and failed drive cannot be reused.
Firmware:08.62.00.02.001
*8.53.x does not have this failure.
who knows?
Solved! See The Solution
Hello Soichi,
Were you able to open a support case for this?
We will need to look at a full support bundle and trace buffers collection after the failure to understand why the drives are failing during your IO test.
can you clarify your question? or are you stating a bug that needs to be looked at?
It sounds like you should open a technical case with NetApp so that it can be investigated.
Thanx!! yes.
yes, i opend support request now.
I checked to see if anyone knew about this obstacle.
Thanxx!!
Hello Soichi,
Were you able to open a support case for this?
We will need to look at a full support bundle and trace buffers collection after the failure to understand why the drives are failing during your IO test.
Hi NetApp_RZ
Thanxx for your reply.
I opened support case via IBM Support (IBM MCC) now.
This product is sold by Lenovo and must go through IBM Support.
I will add it within the range that can be disclosed at a later date.
Thanxx!
Soichi,
Thanks for the update.
I was looking in our system for a possible case surrounding the issue as I am curious too about what the drives are doing to hit fail criteria during that IO test.
Could be timeouts or aborts but could be other things too.
Either way, IBM/Lenovo does have escalation paths into NetApp should the need arise to get a deeper look at the issue so will keep an eye on this thread.
Thanks 🙂
I got an answer from support.
The cause of this failure seems to be the firmware of the SSD.
After updating the SSD firmware to LE03, this failure disappeared.
The firmware version of the failed drive's firmware version is LE01.
and details on fix lists are generally not disclosed by drive vendors.
If you are using at least PX05SVB080,PX05SVB160,i recommend updating the firmware right now.
thanxx!!
Thank you so much for the update Soichi,
Very glad to hear the issue is resolved now.
Now that I know what drive model it was I looked to see if NetApp also deploys the same drive and we do.
For both those drives we too have also switched from MS01 to MS03 and the public info we have for MS03 also states slow performance and drive fail issues.
My understanding is that Lenovo's firmware versions are the same as ours with just the first two letters changed from either MS or NE to LE.
https://mysupport.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=1249633