FAS and V-Series Storage Systems Discussions

FAS2050 storage controller not communicating

Hello All,

We're currently experiencing issues with our FAS2050 storage controller. I could not find any documentation on the FAS2050's assuming it's because they're already End-of-Life. Our FAS2050 supports our Vmware VDI infrastructure (Horizon 7/vCenter 6.5) by storing the profiles for our VDI customers. It's running two storage controllers where one contains the location for user profiles and the other controller oversees the storage of other miscellaneous files.

Over the weekend, we had to perform a power cycle of our server room and in preperation, was in the process of shutting down our VDI environment when suddenly, all power to our servers as well as the FAS2050 was shut off causing a hard shut down.

Afer rebooting, our VMware/VDI environment came back up without too many issues but one of the two storage controllers didn't come up as expected which was the controller holding our VDI profiles. There is an amber light on the exclamation point LED in the front of the chassis as well as green and amber lights on the NICs in the back (on the bottom module, although the top only has green NIC lights).

We utilize NetApp OnCommand System Manager to view the storage locations for our profiles and now we can't discover or add that location back in. 

Been scouring the interwebs for any documentation in regards to the FAS2050 that's not about upgrading it to newer versions.

17 REPLIES 17

Re: FAS2050 storage controller not communicating

What do you see on console of the controller in question? What “cf status” on good controller says?

 

The only hardware specific documentation is related to parts replacement; everything else is the same so just use Data ONTAP manuals for your version.

Re: FAS2050 storage controller not communicating

Hi there! Yes, you're right that this system is now end of support from us, sorry. 

 

Can you connect via RLM or serial cable (RJ-45 9600 8N1)? what does the controller say? With any luck, it will just have "Waiting for giveback" scrolling on the console, and you can just type "cf giveback" into the other controller.

 

This would suggest that when one node came up succesfully, it took over for the other one, but didn't have failover setup properly. This recent thread goes through the process of validating HA for a system of that era - https://community.netapp.com/t5/FAS-and-V-Series-Storage-Systems-Discussions/Restart-the-Controllers-7-mode-HA/td-p/149941

 

Please share output of serial console and we'll see what to do next

Re: FAS2050 storage controller not communicating

Utilizing DATA ONTAP 7.3.1.1, FilerView, the status page shows: "This node has taken over. /vol/vol_mb1_db1 is full (using or reserving 100% of space and 0% of inodes, using 100% of reserve). I'm very much a beginner in the storage world so I'm not sure if the volume being full would have any impact on the takeover process.

Re: FAS2050 storage controller not communicating

Edit to the recent post above:

The status was taken from the operational controller. 

 

Unfortunately, I don't have a working serial cable so I'm going to try to putty into it. I'll post any results from my findings.

 

Thank you guys for your time and assistance with this.

 

-Nick

Re: FAS2050 storage controller not communicating

Unfortunately, we were not able to console into the controller that is down since we don't have a working serial cable to physically console into the system, we had to resort to Putty (which wouldn't work due to network issues). 

 

On the operational controller, after logging in, the prompt displayed:

"controllerhostname"(takeover)>

 

We ran a cf status and got the following result:

"partner has been taken over by <controllerhostname>"

 

The resources, in this case, that is owned by the downed controller is our volume containing our VDI profiles. VDI users still are not able to connect to their profiles which leads us to believe that the takeover process didn't complete as intended even though the status might prove otherwise. 

 

Would a restart of the FAS chassis be recommended or would we need to manually run the takeover commands instead? Thank you again for your help.

 

-Nick

Re: FAS2050 storage controller not communicating

Hi there,

 

The serial cables are the same pinout as the light blue Cisco serial cables, if that helps.

 

If you don't have any other option, just run "cf giveback" on the surviving node and hope for the best. It does a number of checks before giving it back entirely, and it sounds like you're having an outage right now, so it won't get any worse.

 

The volume being full is bad, but I believe it wouldn't cause this behavior.

 

Hope this helps!

Re: FAS2050 storage controller not communicating

Alex,

 

I ran the cf giveback command and got the following output:

 

survivingnode (takeover)> cf giveback
survivingnode (takeover)> Wed Aug 28 13:03:50 EDT [survivingnode(takeover): cf.misc.operatorGiveback:info]: Cluster monitor: giveback initiated by operator
Wed Aug 28 13:03:50 EDT [survivingnode(takeover): disk.failed.abortGiveback:warning] Failed disk 0a.77 should be removed before the giveback command is invoked.
Wed Aug 28 13:03:50 EDT [survivingnode(takeover): cf.rsrc.givebackVeto:error] Cluster monitor: disk check : giveback cancelled due to active state
Wed Aug 28 13:04:00 EDT [survivingnode(takeover): cifs.server.infoMsg:info] CIFS: Warning for server \\DC-001: Connection terminated.
Wed Aug 28 13:04:00 EDT [survivingnode(takeover): cifs.server.errorMsg:error]: CIFS: Error for server \\DC-001: Error while negotiating protocol with server STATUS_IO_TIMEOUT.
Wed Aug 28 13:04:00 EDT [survivingnode(takeover): cifs.server.infoMsg:info] CIFS: Warning for server \\DC-002: Connection terminated.
Wed Aug 28 13:04:00 EDT [survivingnode(takeover): cifs.server.errorMsg:error]: CIFS: Error for server \\DC-002: Error while negotiating protocol with server STATUS_IO_TIMEOUT.
Wed Aug 28 13:04:00 EDT [survivingnode(takeover): cifs.server.infoMsg:info] CIFS: Warning for server \\DC-003: Connection terminated.
Wed Aug 28 13:04:00 EDT [survivingnode(takeover): cifs.server.errorMsg:error]: CIFS: Error for server \\DC-003: Error while negotiating protocol with server STATUS_IO_TIMEOUT.
Wed Aug 28 13:08:56 EDT [survivingnode(takeover): asup.smtp.host:info]: Autosupport cannot connect to host xxx.xxx.xxx.xxx (Network comm problem) for message: DISK FAILED
Wed Aug 28 13:08:56 EDT [survivingnode(takeover): asup.smtp.unreach:error]: Autosupport mail was not sent because the system cannot reach any of the mail hosts frmo the auto.support.mailhost option. (DISK FAILED)

 

One thing to note, prior to running the cf giveback command above, we just did a power cycle of the entire chassis to see if everything would come up okay but no luck. 

Re: FAS2050 storage controller not communicating

I would get access to the second controller at this point.     Either via ssh into the SP/BMC or console cable to see what it's displaying.   

 

Are their any amber lights on any of the disks or any ports on the rear showing offline.   

Re: FAS2050 storage controller not communicating


@Nicolas_ja wrote:


Wed Aug 28 13:03:50 EDT [survivingnode(takeover): disk.failed.abortGiveback:warning] Failed disk 0a.77 should be removed before the giveback command is invoked.


Drive 77 should be the left-most drive of the 4th shelf along - https://library.netapp.com/ecm/ecm_download_file/ECMP1112854 It should have an orange LED on it.

 

Follow advice, try "cf giveback" again.

Forums