2013-08-30 04:05 PM
I've created a few cases with NetApp and worked with our networking department several times throughout the year. We still can not get two out of eight controllers working with DFM trying to do NDMP on port 10000.
I have had DFM set up with my eight controllers for quite some time now, mainly to collect stats. This has worked fine, though I seem to have some SNMP discrepancies (some controllers SNMP works with the main 20gb connection and not the e0M connection, others are just the opposite and only work on e0M, but this is besides the point I think).
Now I want to use SnapVault via data protection in NMC. I went through the NDMP password setup on all eight hosts. Two of them fail when DFM tries to do an NDMP ping on port 10000 while the other six work fine. These two controllers are an HA pair (FAS3270 model). Yes, NDMP is enabled, as is SnapVault. No there are no firewall issues. I can see all firewall rejections using Splunk and there are none. This was also verified by our networking dept, many times.
The command DFM HOST DIAG (either one of the two IPs for each controller) reports this error:
NDMP Ping Failed (Is NDMP (SnapVault) enabled?)
NetApp tech supports final test was doing an NDMP copy between the two controllers, which works. That verifies that NDMP is working on both controllers.
I really need to use snapvault to back up about 20TB of vital data that is currently not being backed up (I can't snapmirror it I am snapmirroring it but to the same aggr that the snapvault copies will be on).
I have tried setting various NDMPD options, like setting the preferred interface to the 20gb connection that it should be traversing, then also tried the e0M interface.
I have tried using a port other than 10000, but after about 6 weeks of NetApp tech support looking into it, they found that port 10000 is hard-coded in DFM. So that whole attempt was a waste.
I am really desperate for a solution. Any ideas AT ALL are greatly appreciated. Thanks.
2013-09-18 01:42 AM
Do you already have a solution for this behaviour?
I have the same issue.
2013-09-19 08:35 AM
Hi Bart. Nope, nothing new to report on this. I have realized that I don't really have to get these 2 controllers working on port 10000 because I'm going to snapmirror the data to a controller that is working on port 10000 and snapvault the replicated data. I just need to update from DOT 18.104.22.168 to 8.1 on an HA pair because it's failing with a 32-bit to 64-bit volume error message on the snapmirror process.
Let me know if you figure anything out.
UPDATE: I do still want to get this working because I believe I am wasting space by putting the snapmirror and snapvault copies of the data onto the same aggr. It is still transferring the initial snapmirror data, so I don't know how much actual space will be used once it is deduped and everything, but it is saying my 41TB aggr has 52TB of data committed to it, while the source data only occupies 21TB. I have never used snapvault before, so I have plenty to learn. I sure would like to be able to snapvault directly from source to destination, though, and no one has been able to figure it out still.
2014-02-13 03:01 PM
Where you able to resolve this NDMP issue, please let me know. I have same issue. Thanks!
2015-03-26 10:01 AM
Sorry, didn't see your reply over a year ago, but no I never really solved the NDMP issue because it was out of my control and had something to do with our Juniper core switches.
I made a new VM on a different subnet and port 10000 is working from it, so I just now (like an hour ago) got my very first SnapVault job to succeed after 3 years of on-and-off attempts to get it working. I should have made a new VM a long time ago, or just re-IP my existing one. Everything I could control was configured correctly, and our virtual firewall was not the problem. It's something with the core switches and the people here that manage those are apparently not that good at it (for example, they updated the core switches and managed to kill every connection on my filers all at the same time. We had NFS traffic down for hours so our VM's on NFS datastores were freaking out). They also couldn't get SNMP traffic to work from my management server to the storage devices. Luckily, the VLAN my new VM is on is configured correctly on our switches, so everything is working (except SNMP is 90% broken again but I believe it's our virtual firewall, no switches, and it was working a few days ago).
Anyway, I'd say if you're having issues getting NDMP to connect, turn on options ndmpd.connectlog.enabled so you can watch NDMP traffic, then try a dfm host diag command. If you see nothing logged when it tries to connect, you know it's got to be a network issue. Then, to test port 10000 easily from any Windows boxes, you can do telnet [your storage system IP here] 10000 and it will attempt to connect on port 10000. Of course you'll need to enable telnet on your SS so telnet will connect and you will know port 10000 works.