I have the strangest behaviour on a V-series appliance.
First some introduction to my infrastructure setup is in order.
I have an old SUSE 9 server which is a client for NFS shares on a FAS2020 appliance. This appliance is being retired and the shares are moved to a new V-series Netapp appliance. The majority of the content of the shares I have copied via SMB and a third party server onto the new appliance.
I have then mounted the new NFS shares onto the SUSE server and reading and writing to the mounts is working with no issues ... except when I want to copy certain XML files that have been generated by our ERP system.
Strange thing is most of the XML files I can copy with no issue, only copying certain XML files hangs. What is worse the whole mount then hangs too i.e. the whole ERP/finance system hangs. The server itself remains fully manageable but disk IO operations hang. I can recover the system by forcibly unmounting the NFS share but the copy process does not finish/ finishes with errors.
I have blamed this behaviour on a firewall but I was proven an idiot by the network guys as the issue persists even when the server and the appliance are directly connected (no firewall inbetween).
I even did a network sniff on the issue (see attached picture) and it shows that the NETAPP is refusing to service the write request, which just ends in retransmissions for eternity... thus a hang.
The strange thing is ... the whole process works and has worked for a decade now on the old FAS2020 appliance.
Both appliances are being accessed via NFS v3... so no difference there.
I would be grateful for any help at this point!
Oh and I have forgotten to mention this experiment that I performed.
I have tried zipping the XML files and then copying them onto the NFS share ... works with no problem.
Unzipping the files straight to the new destination ... hangs!
I guess the V-series appliance just hates the content of my XML files.
Ok, I'd try a binary search next - split the file in half with a text editor, and then try putting each half on there.. if one half fails, split that in half, and try again.. repeat until you find out if there is a string that can magically make ONTAP refuse to write a file, and respond back here
The packet trace was taken as a tcpdump from the source server.
Is there a way to take a packet trace on a V-series appliance? If not I would have to run a network tap and sniff. Which is doable but I will have to engage my network guys ... after blaming their devices for interefering .. and proving myself wrong.
So, there are two options I can see. One is this is a new bug we haven't found, so you'll need to open a case to get it fixed. The other is to upgrade and see if the issue is resolved. I'd open a case either way so we can identify the specific bug or open a new one. They may ask for debug sktrace logs. Something is broken in ONTAP here.
Please reply with the case number and I can follow up internally once opened. Also please provide both packet traces.