ONTAP Discussions
ONTAP Discussions
I attempted a file snap restore this morning to restore a virtual server.
After running the command I went to view the folder contents using the Vsphere GUI. This displayed searching datastore ..... and never returned a display.
I logged in to a Vsphere host through putty and was able to see the file there. I assumed the restore was still in progress. Eventually other volumes Vsphere and Oracle started to become inaccessible on the filer, some but not all displayed inactive in Vcenter. A co worker who is the principal administrator failed the filer over and rebooted it
This fixed the problem until they failed it back. Apparently the restore started back up at this point. I deleted the file in the putty session and the symptoms went away.
The volume that file resided does have a snap mirror relationship and a scheduled replication was attempted during the restore. The transfer failed which I guess is normal and replication completed after I deleted the file.
Should peforming a file snap restore cause this? I'm not clear from reading the Data Protection documentation which I've pasted below.
Prerequisites for using SnapRestore
You must meet certain prerequisites before using SnapRestore.
• SnapRestore must be licensed on your storage system.
• There must be at least one Snapshot copy on the system that you can select to revert.
• The volume to be reverted must be online.
• The volume to be reverted is not being used for data replication.
General cautions for using SnapRestore
Before using SnapRestore, ensure that you understand the following facts.
• SnapRestore overwrites all data in the file or volume. After you use SnapRestore to revert to a
selected Snapshot copy, you cannot undo the reversion.
• If you revert to a Snapshot copy created before a SnapMirror Snapshot copy, Data ONTAP can no
longer perform an incremental update of the data using the snapmirror update command.
However, if there is any common Snapshot copy (SnapMirror Snapshot copy or other Snapshot
copy) between the SnapMirror source and SnapMirror destination, then you should use the
snapmirror resync command to resynchronize the SnapMirror relationship.
If there is no common Snapshot copy between the SnapMirror source and SnapMirror destination,
the you should reinitialize the SnapMirror relationship.
• Between the time you enter the snap restore command and the time when reversion is completed,
Data ONTAP stops deleting and creating Snapshot copies.
I was hoping someone would have replied to your posting. I am experiencing the same issue. I am on Data Ontap 7.3.2 and VSC 2.1.1 to do my backups. I am currently only testing this as it has not worked properly yet for me. Basically I have 3 NFS volumes, one for config files and two others for vmdk's. I am able to do a backup perfectly fine but then when I try to restore that snapshot it will saturate the whole interface that connects to the san and cause the volumes to become disconnected into vsphere. This was for a 25 GB vmdk and it wasn't even finished after 45 minutes. I had to reboot my SAN (FAS2020) in order to recover the volumes and get the restore to stop.
I had to reboot because I couldn't find an actual way to cancel the restore. Does anyone know how to do that? Also why would this crash my whole connection? There are no requirements indicating it needs its own NIC to do the transfers.
The restore shouldn't have any impact on an interface as a snapshot restore is all done internally. Basically the inode pointers are changed to point to the old file blocks at the time of the snapshot, rather than the current file blocks.
Any update on this strange behaviour ? I had the same issue with ONTAP 8.1.4P1 (7-mode) on a FAS6240 (not I little one as you see).
It started after a SMO restore operation on a cloned volume for which single file restore was needed/selected. The database has a lot of data files (a very big DB) ! People started complaining, especially about the cifs access. When I looked later on the stats there were latency issues for CIFS and iSCSI (FCP and NFS didn't suffered from it). Strange, because the Oracle environment runs over NFS. Killing the SMO process and later halting the host didn't solve anything. Only after offlining the volume, the controller behaved normal again. Unfortunately when I online it again the single-file snaprestore starts again :(.
Why is there such an impact ? How can I stop this, without destroying my volume ?
SMO Log:
...
--[ INFO] SMO-07200: Beginning restore of database "NGDB"
...
-[ INFO] SD-00010: Beginning single file restore of file(s) [/ngdbhome/ngdb/DATA/CTX/Ctx07.dbf, /ngdbhome/ngdb/DATA/CciLob/22/CciLobData22.dbf5, /ngdbhome/ngdb/DATA/CciLob/22/CciLobData22.dbf6, /ngdbhome/ngdb/DATA/CciLob/22/CciLobData22.dbf3,
...
Messages log:
...
Thu Dec 4 16:38:07 CET [NETAPPXX:wafl.sfsr.done:notice]: Single-file snaprestore of inode 26253 (snapid 19, volume Test_CCIv36_NGDB_Data_clone) to inode 9681 has completed.
Thu Dec 4 16:38:07 CET [NETAPPXX:wafl.scan.start:info]: Starting redirect on volume Test_CCIv36_NGDB_Data_clone.
Thu Dec 4 16:42:05 CET [NETAPPXX:cifs.oplock.break.timeout:warning]: CIFS: An oplock break request to station 10.230.128.31() for filer NETAPPXX, share rmgmailarchive01indexes$, file \Indexes02\166741376BDAE5A4F9BFB6D82329C4E7B_5316\live\log.sqlt has timed out.
Thu Dec 4 16:53:08 CET [NETAPPXX:wafl.sfsr.done:notice]: Single-file snaprestore of inode 13360 (snapid 19, volume Test_CCIv36_NGDB_Data_clone) to inode 2037 has completed.
Thu Dec 4 16:53:16 CET [NETAPPXX:wafl.sfsr.done:notice]: Single-file snaprestore of inode 23958 (snapid 19, volume Test_CCIv36_NGDB_Data_clone) to inode 24675 has completed.
...
There was not much disk activity but the filer was doing lots of WAFL_Ex(Kahu).
CPU NFS CIFS HTTP Total Net kB/s Disk kB/s Tape kB/s Cache Cache CP CP Disk OTHER FCP iSCSI FCP kB/s iSCSI kB/s
in out read write read write age hit time ty util in out in out
99% 722 5593 0 7793 6263 14818 175412 243976 0 0 0s 99% 68% : 32% 0 1438 40 3194 113031 299 133
99% 1102 5684 0 7760 8756 24136 16296 47 0 0 0s 99% 0% - 14% 0 650 324 3549 350 1936 101
99% 912 5241 0 6804 9035 13149 9500 0 0 0 0s 100% 0% - 9% 0 645 6 4084 237 33 0
99% 1030 3820 0 5412 6953 8243 6785 0 0 0 0s 100% 0% - 9% 0 536 26 5784 500 89 65
99% 3901 4294 0 8571 192951 12448 19609 63 0 0 0s 99% 0% - 19% 5 298 73 2183 181 1681 0
99% 732 4715 0 6133 5367 25579 17283 0 0 0 4 100% 0% - 10% 357 296 33 3735 200 87 178
99% 1184 5176 0 7527 6355 26430 86744 0 0 0 4 100% 0% - 18% 166 950 51 1825 62865 194 65
99% 1169 5427 0 7852 8028 19993 89951 47 0 0 4 100% 0% - 16% 1 1245 10 2806 78600 37 0
99% 2263 5952 0 9059 62261 19515 84106 0 0 0 4 100% 0% - 15% 0 832 12 1521 74958 39 0
98% 5364 4812 0 10627 205358 18857 41395 16 0 0 4 99% 0% - 17% 6 395 50 1609 20012 1524 0
99% 2919 5799 0 9701 17548 26488 192051 269 0 0 4 99% 44% Tn 18% 0 877 106 2021 60060 10643 0
ANY1+ ANY2+ ANY3+ ANY4+ ANY5+ ANY6+ ANY7+ ANY8+ AVG CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 Network Protocol Cluster Storage Raid Target Kahuna WAFL_Ex(Kahu) WAFL_XClean SM_Exempt Cifs Exempt Intr Host Ops/s CP
100% 47% 24% 12% 7% 4% 3% 2% 26% 23% 25% 26% 24% 26% 31% 26% 29% 36% 0% 0% 9% 7% 4% 44% 79%( 55%) 0% 0% 9% 8% 14% 1% 7893 0%
100% 54% 34% 23% 16% 11% 7% 5% 32% 31% 30% 36% 34% 35% 33% 31% 28% 30% 0% 0% 12% 20% 6% 42% 100%( 58%) 7% 0% 12% 15% 14% 1% 7709 21%
100% 95% 78% 59% 42% 30% 20% 14% 56% 51% 50% 56% 61% 49% 53% 51% 78% 34% 0% 0% 22% 85% 6% 25% 161%( 74%) 25% 0% 14% 53% 23% 1% 10204 100%
100% 54% 30% 18% 11% 6% 4% 3% 29% 32% 26% 31% 27% 28% 30% 30% 30% 26% 0% 0% 9% 22% 5% 34% 91%( 65%) 4% 0% 11% 19% 12% 1% 8147 100%
Is your datastore accesed via iSCSI, FCP or NFS?
You say you attempted a snap restore, is the snap you're restoring from a standard scheduled volume snapshot?
What command did you use to restore?
Should anyone ever stumble upon this article in a frantic panic trying to get their filer to respond to nfs/cifs/iscsi requests with a running single file snap-restore process in progress (this, by the way, is what VSC uses for VM restores by default on NFS - blows my mind) - you can cancel the single file snap-restore process by deleting the destination file/directory you are restoring to - if you are doing an inline restore, delete the folder/file you are trying to restore and manually copy out that deleted file/folder from your ".snapshots" folder
We had this same issue. Kicked off a VM restore using VSC, and the filer ground to a halt, vms datastores went off line, as did iscsi etc and as such most major applications offline.
Unfortunately we didn't find this whilst it was happening to know to delete the destination folder and whilst faffing around trying to find a way to recover essentially ended up waiting it out for 4.5 hours outage.
It turns out this KB covers this behaviour: https://kb.netapp.com/support/index?page=content&id=2023372&locale=us
The upshot is, NetApp provide this full vm recovery option in their tool but in short, don't use it. Do it the long way and mount a snapshot and drag the vm out manually. There's no 'fix' aside from going to cluster mode or use a better backup and recovery product.