ONTAP Discussions
ONTAP Discussions
We recently had an issue where our 'hw_assist' IPs were on a network that experienced some downtime, and (possibly) as a result caused our filers to panic and then reboot.
We're still investigating the coredump, but in the meantime we want to connect our filers to directly each other (filerA and filerB) on an unused onboard port (e0a, since e0M isn't in use) and used that for the 'cf.hw_assist.cf.hw_assist.partner.address' IP.
I've already configured e0a on filerA as: 172.16.3.111/24 and e0a on filerB as: 172.16.3.113/24. Here is how cf.hw_assist is configured on both systems:
filerA> ifconfig e0a
e0a: flags=0x6f48867<UP,BROADCAST,RUNNING,MULTICAST,TCPCKSUM,NOWINS> mtu 1500
inet 172.16.3.111 netmask 0xffffff00 broadcast 172.16.3.255
ether 00:a0:98:0d:eb:30 (auto-1000t-fd-up) flowcontrol full
filerA> ping 172.16.3.111
172.16.3.111 is alive
filerA> options cf.hw_assist
cf.hw_assist.enable on
cf.hw_assist.partner.address 172.16.3.113
cf.hw_assist.partner.port 4444
filerA> cf hw_assist status
Local Node(filerA) Status:
Active: filerA monitoring alerts from partner(filerB)
port 4444 IP address 172.16.3.111
Missed keep alive alert from partner(filerB).
Last keep alive alert received on
Tue Oct 4 16:57:20 PDT 2011
Partner Node(filerB) Status:
Active: filerB monitoring alerts from partner(filerA)
port 4444 IP address 172.16.3.113
filerA> cf hw_assist test
cf hw_assist Error: No response from partner(filerB), timed out.
filerA> rlm status
Remote LAN Module Status: Online
Part Number: 110-00057
Revision: F0
Serial Number: 48XXXX
Firmware Version: 3.0
Mgmt MAC Address: 00:A0:98:10:0C:2B
Ethernet Link: up
Using DHCP: no
IPv4 configuration:
IP Address: 10.100.1.111
Netmask: 255.255.255.0
Gateway: 10.100.1.2
filerB> ifconfig e0a
e0a: flags=0x6f48867<UP,BROADCAST,RUNNING,MULTICAST,TCPCKSUM,NOWINS> mtu 1500
inet 172.16.3.113 netmask 0xffffff00 broadcast 172.16.3.255
ether 00:a0:98:10:2d:d0 (auto-1000t-fd-up) flowcontrol full
filerB> ping 172.16.3.113
172.16.3.113 is alive
filerB> options cf.hw_assist
cf.hw_assist.enable on
cf.hw_assist.partner.address 172.16.3.111
cf.hw_assist.partner.port 4444
filerB> cf hw_assist status
Local Node(filerB) Status:
Active: filerB monitoring alerts from partner(filerA)
port 4444 IP address 172.16.3.113
Missed keep alive alert from partner(filerA).
Last keep alive alert received on
Tue Oct 4 18:09:02 PDT 2011
Partner Node(filerA) Status:
Active: filerA monitoring alerts from partner(filerB)
port 4444 IP address 172.16.3.111
filerB> cf hw_assist test
cf hw_assist Error: No response from partner(filerA), timed out.
filerB> rlm status
Remote LAN Module Status: Online
Part Number: 110-00057
Revision: F0
Serial Number: 48XXXX
Firmware Version: 3.0
Mgmt MAC Address: 00:A0:98:0F:8C:15
Ethernet Link: up
Using DHCP: no
IPv4 configuration:
IP Address: 10.100.1.113
Netmask: 255.255.255.0
Gateway: 10.100.1.2
Cluster is currently enabled and up and RLM is configured. Any ideas as to why the 'cf hw_assist test' fails? I've set the e0a interface to be trusted. We're running DOT 8.0.1 7-mode.
Solved! See The Solution
Hw_assist requires connectivity between filer head on one side and partner RLM on another side. So direct connection between two onboard ports is not going to work for obvious reasons. You would need to use small switch to connect two RLM and two dedicated ports together.
Hw_assist requires connectivity between filer head on one side and partner RLM on another side. So direct connection between two onboard ports is not going to work for obvious reasons. You would need to use small switch to connect two RLM and two dedicated ports together.
Thanks, sounds like the proper way to move forward is:
- rewire / reconfigure e0a (on both filers) to unique 10.100.1.0/24 address and VLAN1 (same as RLMs)
- update cf.hw_assist.partner.address
- run 'cf.hw_assist test' again
I'll give this a shot and report back.
I just thought I would add to this post to save people some time resolving hw_assist timeout issues with a Service Processor (SP):
Firstly check SP speed / duplex, type 'sp status' and check SP has negotiated 100Mb / Full, if not reconfigure SP network switch ports to auto / auto i.e. speed / duplex.
Once this has been completed type 'sp status' to confirm 100Mb / full duplex, if the output still shows 100Mb / half duplex, type sp reboot and use sp status to confirm reboot has completed and speed / duplex is set correctly.
Another reason for getting time out messages is if the SP has not been configured properly. This may be observed by a SP prompt without hostname i.e. 'SP>'. The SP prompt should be 'SP hostname>'
To fix this issue use the following commands:
sp status
options sp.setup off
sp setup (using info from sp status)
cf hw_assist test
cf hw_assist status
I hope this helps.