We have installed a new Windows Server 2012 Hyper-V cluster with 4 nodes on an UCS system with NetApp storage. All nodes are booting from iSCSI and there are 2 LUNs connected with iSCSI as a data store (the place for the VHDs). The OnTap version is 8.1.1P1. The connection is done by the integrated Microsoft iSCSI Initiator - without Snapdrive.
If we are copying large files from across LUNs or if we cancel such a copy process, we have issues sometimes, that the cluster nodes are loosing their connectivity to the CSV. Because of that many to all VMs are crashing on the cluster. As long as we are not copying big files on the LUNs, we don't have this problems. Further we experienced file locks, after we cancelled a copy process.
The boot LUNs for the cluster nodes are 40 GB MBR. These are configured with a LUN type of "Windows". The two data store LUNs (5 TB SAS and 5 TB SATA) are configured with a LUN type of "Windows_GPT".
We are not storage specialists - we are focusing on Windows. Further we did not make this configuration. But we think, that a wrong LUN type could cause the problems we have. In our opinion, the LUN type needs to be set to "Windows_2008" in our case (because it's same as for Windows Server 2012).
Can somebody please confirm, that this would be the reason for the troubles we are facing?
What else could cause this timeout-errors in the connection? The Quorum and the hosts themselves don't have this problem, whenever they are booting from iSCSI as well. The only difference is the LUN type between this two LUNs and the data store LUNs.
What's the technical difference between LUN type "Windows_GPT" and "Windows_2008". I heard, that "Hyper_V" is same as "Windows_2008". In the Whitepapers it's mentioned, that I should use "Windows_2008" instead of "Hyper_V", because we don't use SnapDrive.
Did you ever get a resolution to this issue. We are experiencing the exact same issue with our environment. We use HP Servers with local disk for OS, HP switch and Windows 2012 Hyper-V with Snapdrive. But the exact error of the CSV Luns dropping, file locks etc when performing large copies. Ontap 8.1 is also in use.
We are also experiencing the same problem. When doing large copies or deleting a large VHDX the CSV hangs and becomes unresponsive. We have 8 Nodes in our cluster running off of HP Blades using Flex-10 virtual connects, running MPIO and 10Gbps iSCSI adapters in the controller as well with the latest OnTap.
Using Hyper-V LUN type as directed by NetApp Engineer.
I think you could be having a number of any issues:
1) Check your CSV redirected IO bandwidth - are you having problems when going into redirected IO mode? Even 2012 still does this occasionally (although not on backup).
2) We had our LUNS go off line due to deduplication - we have some old 2040's that could not handle the change rate of dedupe so we had to turn it off. We would know it was going to take the LUNs offline because DFM would alert us to high CPU utilization.
3) I would make sure you have all of the latest hotfixes for Hyper-V...check our this link:
4) Turn off ODX...that could be causing your issues.
5) Verify your iSCSI is configured properly - there is too much to talk about here - but things like making sure the iSCSI adapters are disabled for cluster comms in cluster manager, making sure MPIO is properly setup, ensuring that you have the OnTAP 8.1.3 upgrade deployed for 2012 (to allow SCSI UNMAP), etc.
I have been spending months designing and building a new 2012 cluster (on Fiber Channel/FCoE 10Gb CNA because iSCSI is horrid) and there are a LOT more complexities compared to 2008 and the new factor is real. There are many little "gotcha" bugs out there...keep an eye on aidenfinn's site too.
So you want to select the correct LUN type to solve the Alignment problem. A Windows type LUN is aligned for Windows 2003, and Windows_GPT was to support GPT volumes that are aligned slightly differently. Since then, Microsoft has updated the OS (2008 and newer) to respect alignment and fixed both the OS as well as the GPT type Volumes. So in the future ALWAYs use Windows_2008 as the LUN type to stay aligned. Now, when Hyper-V started increasing in popularity, NetApp decided to come out with a new LUN type to support Hyper-V which happens to match the Windows_2008 LUN type. The difference between the Windows_2008 and HyperV LUN type is ONLY that our "SnapManager for Hyper-V" code can quickly identify Hyper-V LUNs, and enabled VSS backups.
As for bandwidth, once your LUN types are correct, your performance should see a significant boost, as well as your DeDuplication rate. Also commands that use Clones will also see significant speed increases (which is why turning off ODX is seen as a recommendation above)
So again. If you are using Windows 2008 or newer (even if they are GPT) select the "Windows_2008" LUN Type. If the LUN is going to be deployed as a Hyper-V VM, then select "Hyper-V" only because it will allow you later to deploy SnapManager for Hyper-V, but otherwise it could use "Windows_2008" without alignment problems.