FAS and V-Series Storage Systems Discussions

Re: FAS2050 storage controller not communicating

One should never perform giveback without having console access to controller which was taken over - especially in the case. If “cf status” does not indicate partner is “ready for giveback”, it means either partner did not boot or there is some communication issue. Blindly performing giveback in this state can easily result in outage and data loss.

 

Console connection (either directly or via RLM/SP/BMC) is really a must when doing any maintenance in NetApp.

 

Re: FAS2050 storage controller not communicating

A fair comment there, if at all possible, check the partner console - but you also shouldn’t run production workloads on unsupported hardware or operate them without out of band access and regular takeover/giveback tests done during ONTAP upgrades, but here we are.

Re: FAS2050 storage controller not communicating

Unfortunately, we do not have any replacement disks laying around to replace the failed disk (0a.77). I was hoping to run the following command to force a giveback but the command line did not recognize it;

 

storage failover giveback -ofnode <nodename>

 

Is it possible to just remove the disk and leave the bay empty?

 

 

Sincerely,

Nick

Re: FAS2050 storage controller not communicating

Hi there,

 

That command is for modern ONTAP and won't work in your system.

 

Yes, you can remove that drive and leave the slot unused (but for airflow management, just leave the drive in the slot unplugged)

Re: FAS2050 storage controller not communicating

Hello Alex,

 

We removed the failed hard drive but after attempting the giveback command, it showed yet another failed drive that is stopping the process from continuing. Since we do have multiple drives in the chassis with amber lights, we expected a waterfall of failed drive errors. We made the decision to not keep pulling failed drives out of the chassis and left them in there, ultimately going forward with the cf giveback -f command which completed successfully.

 

Now, after entering "cf status", the output displayed is "cluster enabled, partner is up". Even though this is the case, we still cannot ping or add the controller in Netapp OnCommand System Manager via it's IP or hostname.

 

I feel like it's something small that we're just missing in order to get it communicating properly. 

 

-Nick

Highlighted

Re: FAS2050 storage controller not communicating

Do you have a good backup of all the data that's on this 2050?    Multiple drive failures is not good. 

 

As far as whats wrong with the other node,  what happens if you console in to it?    Does anything display?   are their any lights on the rear?   There is a ! on each controller are either of those lit up? 

Re: FAS2050 storage controller not communicating

We do not have a backup. 

 

At the start of this thread, we couldn't physically console into the down storage controller (fas01) or PuTTy into it. After recently acquiring a laptop, we're now able to console in and login to fas01. There was an amber light (back of chassis) on the "!" indicator for the down storage controller (fas01) but after performing a reboot (as mentioned below), the amber light went away.

 

Once consoled in, we're greeted with the login prompt  as normal. Ran a cf status and got a return of "cluster enabled, fas02 is up".  What's interesting here is that after running the same cf status command on fas02 (the working storage controller), we get "cluster enabled, partner is up". So fas02 is not able to see the hostname of fas01 but fas01 can see fas02's hostname.

 

We rebooted fas01 and after consoling back into it, received several logs, shortened versions are below:

 

  • Interconnect link is UP
  • Connection for cfo_rv failed
  • broken disks errors (as expected)
    • Broken Disk 0b.80 Shelf ? Bay ? detected prior to assimilation. It should be removed
    • Broken Disk 0a.87 Shelf ? Bay ? detected prior to assimilation. It should be removed

After running sysconfig and ifconfig on fas01, we found that it lost two IP's that were attached to the e0a and e0b interfaces. We ran the same commands on fas02 and noticed the two IPs that were lost via the ifconfig -a command. For example:

(on fas02) ifconfig -a

e0a: ...partner inet 1.2.3.4 (redacted actual IP) (not in use)

e0b: ...partner inet 1.2.3.5(redacted actual IP)(not in use)

 

I take it that the IP's mentioned above were the IP's originally attached to those interfaces before fas01 took the initial power hit (which caused this whole issue). We used the ifconfig commands to assign those IPs back to the e0a and e0b interfaces. 

 

After attaching the IP's to the interfaces, we can ping our Domain Controller as well as several DNS servers but can't ping individual PC's. We then used the route add command to add the default gateway for good measure (assuming that IP/route was also lost). 

Re: FAS2050 storage controller not communicating

Sounds like the giveback wasn't fully finished?   

 

do the etc/rc and hosts files look good/match? 

 

How are your aggrs looking?  (aggr status)  

 

 

Forums