Solved: Re: FAS 2240 reset w/o saving data & general Netapp terminology - Page 2

Martins_Rubenis · ‎2016-08-26

Hello.

This thread should be easy for people, who have been working with netapp for some time. For me, however, this stuff is very new and frustrating.
Ok, here is the deal.

We got a Netapp FAS 2240-2 system. The guys, who worked with this device previously, just removed it from their environment, without any resetting or anything. Now the hardware is mounted into our rack and I`m trying to get the software to do the right thing, but got lost in terminology, Netapp documentation, cables and CLI commands..

So Can anyone please help me regarding this?

Can you please point out, where I could find explanation about terminology Netapp uses? There are nodes, storage-processors, controller-modules, wrench ports, clusters, ONTAP , 7-MODE, etc, etc. I have worked with EMC previously and it seems that they are naming stuff a little bit differently.

Now, moving to a little bit more technical stuff. How do I need to install the wiring on this beast? I have connected the power, SAS connections between nodes (hope I`m using the right name here), ACP cables between nodes. Then on one node I have connected management port to the management switch, data ports to data switch. Do I need to do this with the other node aswell? In the setup guide, there are management and data connections to the switch only from one node.. This got me a bit confused.

Now, after the cabling is done, I connected to the device through serial port, got into bootmentu, resetted both nodes and did initial config on both nodes through terminal. After this I ran system setup from my win laptop. It discovered both nodes and showed them in the list , but setup gave me an error, which stated "too few cluster ports configured" on both nodes. What to do with this error?maybe my cabling is wrong? Or have I made mistakes when resetting or initial configing?

Please help me with these questions, as the HW has been lying in idle mode for too long.. And huge thanks in advance for the answers. I thinked about posting all these three questions seperately, but then again, they are from the same kind and case..

aborzenkov · ‎2016-08-30

It is technically possible to use 1GbE ports for cluster interconnect, but this is not supported configuration and you should understand that it limits total throughout that can be achieved. As this is not supported, it is not offered by setup tool also. For testing purposes you can build such cluster using CLI.

View solution in original post

Martins_Rubenis · ‎2016-08-30

Hi, the results are not that great. At first I got further down the configurator, but then got stuck at the same "too few cluster nodes" error.
You know, I almost got happy.

Anyway someone earlier stated, that those 10g mezzazine cards are mantadory for cluster setup. We have 8g FC cards there and they do not show in the network port show output.

I have to ask. Do I really need those cards in order to run this system? Why do I even have to have external connection between nodes, if they are lying in the same chassis? Dont they have internal connections? I dont even need this freaking cluster, just failover would be fine. Actually, if the HW would have come with only one node, I would have been done days ago. Now, im wiping disks each time I have to try something new, reinitialising the cluster setup, because there is no way to delete it, as far as I have searched. And in the end, even after wiping and reinitializing, after final reboot it gets some kind of old configuration for SPs (from where??) and i dont know how to fix that. BTW, here is a part of an output after "system reset" on that node.

Aug 30 14:22:26 [localhost:cf.nm.nicReset:warning]: HA interconnect: Initiating soft reset on card 0 due to rendezvous reset.
Aug 30 14:22:26 [localhost:cf.rv.notConnected:error]: HA interconnect: Connection for 'cfo_rv' failed.
add host 127.0.10.1: gateway 127.0.20.1
Aug 30 14:22:28 [localhost:cf.fm.notkoverClusterDisable:warning]: Failover monitor: takeover disabled (restart)
Aug 30 14:22:28 [localhost:kern.syslog.msg:notice]: The system was down for 136 seconds
Aug 30 14:22:28 [localhost:cf.fsm.takeoverOfPartnerDisabled:error]: Failover monitor: takeover of partner disabled (Controller Failover takeover disabled).

Aug 30 14:22:29 [localhost:snmp.agent.msg.access.denied:warning]: Permission denied for SNMPv3 requests from root. Reason: Password is too short (SNMPv3 requires at least 8 characters).
Aug 30 14:22:29 [localhost:cf.nm.nicTransitionDown:warning]: HA interconnect: Link down on NIC 0.
Aug 30 14:22:29 [localhost:clam.invalid.config:warning]: Local node (name=unknown, id=0) is in an invalid configuration for providing CLAM functionality. CLAM cannot determine the identity of the HA partner.
Ipspace "acp-ipspace" created
Aug 30 14:23:00 [localhost:monitor.globalStatus.critical:CRITICAL]: Controller failover partner unknown. Controller failover not possible.

Mabe experts can make some sense out of this....

SeanHatfield · ‎2016-08-30

Lots of good questions.

The FCP adapters don't show in the network port show output. Try "fcp adapter show" instead.

There are internal connections between the nodes in that chassis, but they are used for the HA Interconnect, not the cluster network. Using redundant external 10gb ports for the cluster network is consistent accross all of the platforms. It enables a cluster to scale nondisruptively by adding additional HA pairs.

To be in a supported config, you need the mezz cards. Whoever had it previously was running cluster mode, so they probably ignored the supported topologies and used two 1gbe ports for the cluster network at the cli.

There is more to wipeing the nodes than runninig option 4. As you've noticed some of the config is preserved elsewhere. You also need to run a wipeconfig. See this KB:

https://kb.netapp.com/support/index?page=content&id=1014631&actp=search&viewlocale=en_US&searchid=1472570906636

The HA errors you are seeing are probably transient during boot. Once both nodes are joined to the cluster you should be able to enable HA, or troubleshoot the interconnect.

You said earlier you have the wrench ports cross connected. There are two types:

The "locked wrench port": connects internally to the e0P port (Private network), also called the "ACP" port. It is used as an "Alternate Control Path" when external disk shelves are connected. If you look closely, it has a padlock in the middle of the icon by the port. When there are no external shelves those are cross connected between the nodes to close that loop.

The "wrench port" is a shared management port used by the onboard e0M interface (Managment network) and the internal Service Processor (SP). This port should connected to either your management network or your data network if you don't used a seperate management network. On the 2240, its a 10/100 port.

Note that when you see SP in a NetApp context, it is referring to the out-of-band Service Processor on the node. Another vendor uses that acronym to refer to the Storage Processor, which we call a Node. Different vocabulary, overloaded acronyms.

Bye the way, which version of ONTAP is it running? It should post the version early in the boot process or you can run 'version' at the cluster shell command line.

If this post resolved your issue, help others by selecting ACCEPT AS SOLUTION or adding a KUDO.

aborzenkov · ‎2016-08-30

It is technically possible to use 1GbE ports for cluster interconnect, but this is not supported configuration and you should understand that it limits total throughout that can be achieved. As this is not supported, it is not offered by setup tool also. For testing purposes you can build such cluster using CLI.

Martins_Rubenis · ‎2016-08-30

Hi, thanks for the input!

So please correct me if I am wrong. The only supported option is to have a cluster (mentioned earlier, that 7-mode will be EOL soon). To have a legit cluster one has to have a mezzazine 10gbe NIC (exact one). If you have FC suck in there, you can go and throw them out, because they will not do. So what is the point for not adding them by default there (10g mezz cards), if there is no other option but tu use them? Because for now it seems that the setup we have is useless and we cannot do anything to fit in supported boundaries, right?

SeanHatfield · ‎2016-08-31

When that platform was shipping it could be ordered with empty expansion slots, with FC cards, or with 10gb cards depending on the use case. The FC card variant would have been ordered for a 7mode FC SAN use case. If you want to put it back into production you could either order 10GB cards or convert it back to 7mode.

If this post resolved your issue, help others by selecting ACCEPT AS SOLUTION or adding a KUDO.

Martins_Rubenis · ‎2016-09-01

Hello,

Ended up using those data ports for cluster communication. It is a shame, that only two are left for actual data transmission. I created NFS share for my ESXi hosts and now it can use only one port for that share. What a shame.

Naveenpusuluru · ‎2016-09-02

You can create vlans on the top of that interface and you can use the same port to connect to another network.

Martins_Rubenis · ‎2016-09-05

Hi, my storage network and everything else is divided physically. In the storage I have only storage and ESXi hosts, so using a vlan is not an option there.
For now, I do not need anything else than NFS share. As far as I have read, then NFS have better performance over IP network. And It would be nice, if I could use all the four free ports on both nodes to give access to this one nfs share.. For now, I cannot aggregate those two ports on both nodes, says that there are no available ports for aggregation..