Hi experts, As per https://docs.netapp.com/us-en/ontap-sanhost/nvme_rhel_95.html, ONTAP supports NVMe multipath with round-robin policy. Is it safe to be used? Will data corruption happen when the host detects one path error and resubmits IOs on another path? Here is my finding when reading the Linux code. When the NVMe host driver detects a command timeout (either admin or I/O command), it triggers error recovery and attempts to resubmit the I/Os on another path. Taking the nvme_tcp driver as an example, the nvme_tcp_timeout function is called when any command times out: nvme_tcp_timeout -> nvme_tcp_error_recovery -> nvme_tcp_error_recovery_work -> nvme_tcp_teardown_io_queues -> nvme_cancel_tagset nvme_cancel_tagset completes the inflight requests on the failed path and then calls nvme_failover_req to resubmit them on a different path. There is no wait time before the I/O is resubmitted. This means that the controller on the old path may not have fully cleaned up the pending requests, potentially leading to data corruption on the NVMe namespace. For example, consider the following scenario: 1. The host sends IO1 to path1, but then encounters a timeout for either IO1 or a previous I/O request (e.g., keep-alive or I/O timeout). This triggers error recovery, and IO1 is retried on path2, which succeeds. 2. After that, the host sends IO2 with the same LBA to path2, which also succeeds. 3. Meanwhile, IO1 on path1 has not been aborted and continues to execute. Ultimately, IO2 gets overwritten by the residual IO1, leading to potential data corruption. I noticed that the NVMe Base Specification 2.1, section "9.6 Communication Loss Handling," provides a good description of this scenario. It introduces the concept of Command Quiesce Time (CQT), which allows for a cleanup period for outstanding commands on the controller. Implementing CQT could potentially resolve this issue.
... View more
I have bucket named xyz, inside which there are following objects a/b/obj1.doc a/b/obj2.doc a/b/c/obj3.doc a/b/c/obj4.doc On deleting all the above 4 objects, folders a, b and c are not deleted and they are empty Why are the folders not deleted when there are no objects?
... View more
We have 4 NetApp's, all version 9.16. These are government systems, so we have no way to upload logs or configs, we'll have to do this the hard way. 3 of the 4 simply will not send syslogs to the internal syslog server. As far as we can tell, the 1 NetApp that is working, is configured exactly the same way as the 3 that are not working. We have been through several guides and posts on the Google machine and have come up empty on everything tried. - We have a working filter, tested with the 'event filter test' command - We have valid syslog destinations in IP address format (though, just to be 100% sure, we also have DNS configured and working) - The syslog destinations have the correct filters applied - We can generate a test event and see it in the ONTAP event log (we've been using monitor.volume.nearfull) - Using the event history show -destination syslog_1 (syslog_1 being our defined dest) we see absolutely nothing - This is confirmed with a tcpdump command on the syslog server itself seeing no packets - It's as if the syslog service never gets notified that it needs to send a syslog - We can ping and traceroute the syslog IP address (and even the DNS name) from the ONTAP CLI At this point, we're down to a suggestion to login to the systemshell and reset notifiyd. We are, however, pretty nervous about doing so, and since 3 of our 4 devices don't work, it seems like this is not the right thing to be mucking with. Does anyone have anything on this topic. syslog configurations are pretty darn simple, usually, and ONTAP9 doesn't really seem to be any different. Is there some obscure option, somewhere, that needs to be enabled or something?
... View more
Hi in order to facilitate a migration on a big number of client i would like to know if it s possible to mount a volume in different location at the same time. the volume is mounted through the old path in 100+ client and it s not really easy to synchronize the change at the same time on all clients.
... View more
Qns as above.... I'm facing issues joining my SVM to Windows 2003 AD for CIFS Share.... Read in another thread that it's possible to do so if my volume is non-flexgroup. https://community.netapp.com/t5/ONTAP-Discussions/Windows-2003-access-CIFS-share-folder/m-p/161382 Would appreciate any help, thanks. @parisi Are you able to advice on how you created a non-flexgroup volume on Netapp 9.12?
... View more