ONTAP Hardware

Cannot copy very specific files to NFS mount/share

AleksandarPeev
12,016 Views

Hi everyone,

I have the strangest behaviour on a V-series appliance.

 

First some introduction to my infrastructure setup is in order.

I have an old SUSE 9 server which is a client for NFS shares on a FAS2020 appliance. This appliance is being retired and the shares are moved to a new V-series Netapp appliance. The majority of the content of the shares I have copied via SMB and a third party server onto the new appliance.

 

I have then mounted the new NFS shares onto the SUSE server and reading and writing to the mounts is working with no issues ... except when I want to copy certain XML files that have been generated by our ERP system.

Strange thing is most of the XML files I can copy with no issue, only copying certain XML files hangs. What is worse the whole mount then hangs too i.e. the whole ERP/finance system hangs. The server itself remains fully manageable but disk IO operations hang. I can recover the system by forcibly unmounting the NFS share but the copy process does not finish/ finishes with errors. 

 

I have blamed this behaviour on a firewall but I was proven an idiot by the network guys as the issue persists even when the server and the appliance are directly connected (no firewall inbetween).

 

I even did a network sniff on the issue (see attached picture) and it shows that the NETAPP is refusing to service the write request, which just ends in retransmissions for eternity... thus a hang.

 

The strange thing is ... the whole process works and has worked for a decade now on the old FAS2020 appliance.

 

Both appliances are being accessed via NFS v3... so no difference there.

 

I would be grateful for any help at this point!

 

Oh and I have forgotten to mention this experiment that I performed.

I have tried zipping the XML files and then copying them onto the NFS share ... works with no problem.

Unzipping the files straight to the new destination ... hangs!

 

I guess the V-series appliance just hates the content of my XML files.

 

 

 

1 ACCEPTED SOLUTION

paul_stejskal
11,903 Views

Nevermind. In the traces you are using UDP. We have a couple specific bugs https://mysupport.netapp.com/site/bugs-online/product/ONTAP/BURT/1196031 and https://mysupport.netapp.com/site/bugs-online/product/ONTAP/BURT/1384969

 

We'd recommend switching to TCP for NFS traffic as UDP NFS has multiple issues and won't likely be fixed until ONTAP 9.9+ (9.9.1 likely).

 

You can upgrade to a release between the two bugs but that's a narrow window. I would suggest disabling the cluster firewall for now:

::> system services firewall show
::> system services firewall modify -node *  -enabled false

View solution in original post

15 REPLIES 15

AlexDawson
11,982 Views

I guess the V-series appliance just hates the content of my XML files.

 

Yep,  I guess it does.. there shouldn't be anything that causes that behaviour, I assume you don't have a fpolicy virus scanner in there either.

 

Doesn't contain the EICAR string by any chance?

AleksandarPeev
11,980 Views

No not really, it is a bog standard XML file containing a financial transaction.

I have visually inspected the "good" XML files and the "bad" XML files ... nothing sticks out.

 

Naturally I cannot have them uploaded to an online scanner for virus detection, sensitive data and all, but my local Sophos AV does not find anything.

 

By the way how do I check if I actually have a fpolicy virus scanner? And even if I did have one, would it prevent the write on the fly?

 

Thanks for replying!

AlexDawson
11,961 Views

Ok, I'd try a binary search next - split the file in half with a text editor, and then try putting each half on there.. if one half fails, split that in half, and try again.. repeat until you find out if there is a string that can magically make ONTAP refuse to write a file, and respond back here

paul_stejskal
11,958 Views

Where was the packet trace taken? Try to get one from ONTAP and see if it is even getting the call to begin with.

AleksandarPeev
11,950 Views

The packet trace was taken as a tcpdump from the source server.

 

Is there a way to take a packet trace on a V-series appliance?
If not I would have to run a network tap and sniff. Which is doable but I will have to engage my network guys ... after blaming their devices for interefering .. and proving myself wrong.

 

Any ideas how to perform this?

AleksandarPeev
11,919 Views

Hi Paul,

 

The Netapp is indeed receiving the network packets and the tcpdump on the side of the filer reflects the same traffic.

 

I have posted the screenshot of the filtered conversation between the filer and the server.

paul_stejskal
11,916 Views

So, there are two options I can see. One is this is a new bug we haven't found, so you'll need to open a case to get it fixed. The other is to upgrade and see if the issue is resolved. I'd open a case either way so we can identify the specific bug or open a new one. They may ask for debug sktrace logs. Something is broken in ONTAP here.

 

Please reply with the case number and I can follow up internally once opened. Also please provide both packet traces.

AleksandarPeev
11,914 Views

Hi Paul,

 

I have opened a support case for the issue. The case number is: Case # 2008742244

It has been opened for almost 8 hours now but the status is Unassigned so far.

 

I can see that it has been passed around by the support people, but I have thus far no feedback from them.

 

Thanks for looking into this!

 
 

paul_stejskal
11,908 Views

Acknowledged. I have followed up and added this thread to the case notes internally.

 

Please go ahead and upload your traces (assuming captured at same time, if not if not too much trouble to recapture please do). https://upload.netapp.com/sg and put in your case #.

paul_stejskal
11,904 Views

Nevermind. In the traces you are using UDP. We have a couple specific bugs https://mysupport.netapp.com/site/bugs-online/product/ONTAP/BURT/1196031 and https://mysupport.netapp.com/site/bugs-online/product/ONTAP/BURT/1384969

 

We'd recommend switching to TCP for NFS traffic as UDP NFS has multiple issues and won't likely be fixed until ONTAP 9.9+ (9.9.1 likely).

 

You can upgrade to a release between the two bugs but that's a narrow window. I would suggest disabling the cluster firewall for now:

::> system services firewall show
::> system services firewall modify -node *  -enabled false

AleksandarPeev
11,944 Views

Hi again,

 

That is a great idea and I just tried it. I regret to say results are inconclusive.

I splt the "bad" file into 3 files copied them over .. no issues, tried copying the offending file as a whole ... hangs.

 

here is the shell ouput for posterity:

 

db03:/home/zope/transfer/split # split -b 500 split.xml
db03:/home/zope/transfer/split # ls
. .. split.xml xaa xab xac
db03:/home/zope/transfer/split # cp xaa /mnt/nas04/sharepoint-ns04
db03:/home/zope/transfer/split # cp xab /mnt/nas04/sharepoint-ns04
db03:/home/zope/transfer/split # cp xac /mnt/nas04/sharepoint-ns04
db03:/home/zope/transfer/split # cp split.xml /mnt/nas04/sharepoint-ns04
cp: closing `/mnt/nas04/sharepoint-ns04/split.xml': Input/output error (the IO error is an effect of me forcefully dismounting the NFS share, otherwise it just hangs)

 

I don't know what to make of the results of this test.

 

paul_stejskal
11,929 Views

You didn't say if it was ONTAP 9.2+ or 9.1 or older or 7-mode. Search the KB site for "pktt" and "tcpdump" and you'll see the appropriate articles (tcpdump if ONTAP 9.2+).

 

AleksandarPeev
11,925 Views

It is a 9.3 ONTAP.

 

Thanks for the information!

 

paul_stejskal
11,921 Views

You're welcome. If it is making it to ONTAP, please take a look because we may have to enable some debugging. That seems odd. I suspect the packet is never making it to ONTAP.

AleksandarPeev
11,978 Views

I checked ... I have no fpolicies defined.

Public