ONTAP Discussions

Fault LED Disk Shelf DS4243 / vpd 16 and vpd 17 mismatch

UTRAUTMANN
4,060 Views

Hello,

 

on my 7-mode lab system i got this message: BGNA108061:monitor.shelf.warning:warning]: Fault reported on disk storage shelf attached to channel 0a. Check fans, power supplies, disks, and temperature sensors

 

And also the fault LED is on at shelf ID 75

 

In reference to this kb article (https://kb.netapp.com/Advice_and_Troubleshooting%2FData_Storage_Systems%2FFAS_Systems%2FDisk_shelf_reports_the_critical_warning__%22non-critical_statu...)

 

my results are:

 

BGNA108061*> sasadmin expander_cli 0a.75 "vpd midplane_compare"
Mismatch detected
VPD 16 - 0180: 01 01 02 01 01 01 01 85 02 01 01 01 01 01 02 01
VPD 17 - 0180: 01 01 02 01 01 01 01 05 02 01 01 01 01 01 02 01

midplane_compare test completed. There were mismatches

 

At the kb-article there is a reference to netapp support but the system is a lab-system and out of support. 

 

Can anyone advise me how to fix vpd mismatch?

 

Thanks in advance for your help

bye Udo

 

 

 

 

1 ACCEPTED SOLUTION

AlexDawson
3,957 Views

Hi there!

 

Unfortunately we are unable to assist with this, as the system is out of support, and the support process includes obtaining logs, running the case number with the logs through an internal tool, or manually consulting with engineering, and if the system is not under support, those are not options.

 

Turning the system and shelves off, entirely, leaving it for 5 minutes and turning it all on again may clear the error. Otherwise, turning it off, moving the disks and IOMs across to another shelf would be an option, as the error is on the backplane, not the IOMs or drives. 

 

As @GidonMarcus says, the data can be copied from a "good" shelf, but understanding what "good" is and what is a viable shelf to copy it from is not a clear question, even for us. I've done some crazy things, but I wouldn't be doing this without support guidance. I cannot stress this enough to anyone else finding this post, but contact support for assistance, do not try to fix it with these instructions with the view to contacting support if it doesn't work.

View solution in original post

3 REPLIES 3

maffo
3,973 Views

Hi Udo,

VPD mismatch is an annoyance but the shelf will keep working, the solution would be to replace the chassis otherwise.

GidonMarcus
3,968 Views

Hi,

 

THIS SHOULD NOT BE APPLIED OR TESTED IN PROD SYSTEMS - CONTACT NETAPP SUPPORT SO THEY CAN GIVE YOU ACCURATE AND SAFE GUIDELINES. THIS CAN TECHNICALLY BRICK YOUR HARDWARE - SO YOU BETTER JUST LEAVE THE WARNING ALONE (the warning usually doesn't cause actual harm).

 

Now that the disclaimer is done

 

If you have another few working shelves from the same model and firmware versions, you can  see in the other ones which one of the VPD lines is correct, using  sasadmin expander_cli <HBA port.shelf ID> 'vpd 16' (and 17). copy it to notepad or use grep if running from linux to look for line 0180 (as your mismatch shows). 

if you see o all of them showing the same value, it's should be OK to copy the correct VPD on your shelf  to the corrupted one on the same shelf.

I don't have the expander_cli commands in hand to copy, but i'd assume you can type help, ? or leave blank and it will list the commands....

 

Again, NetApp support could provide better advice and not do it in this way, they have a dedicated description for each line in the VPD, the expected values and guidelines what can be copied (most of the corrupted lines) and what not  (lines contain serial numbers, incorrect shelf IDs etc).

 

Good luck.

Gidi Marcus (Linkedin) - Storage and Microsoft technologies consultant - Hydro IT LTD - UK

AlexDawson
3,958 Views

Hi there!

 

Unfortunately we are unable to assist with this, as the system is out of support, and the support process includes obtaining logs, running the case number with the logs through an internal tool, or manually consulting with engineering, and if the system is not under support, those are not options.

 

Turning the system and shelves off, entirely, leaving it for 5 minutes and turning it all on again may clear the error. Otherwise, turning it off, moving the disks and IOMs across to another shelf would be an option, as the error is on the backplane, not the IOMs or drives. 

 

As @GidonMarcus says, the data can be copied from a "good" shelf, but understanding what "good" is and what is a viable shelf to copy it from is not a clear question, even for us. I've done some crazy things, but I wouldn't be doing this without support guidance. I cannot stress this enough to anyone else finding this post, but contact support for assistance, do not try to fix it with these instructions with the view to contacting support if it doesn't work.

Public