We recently switched from iscsi to NPIV on our virtual (VMWare) Exchange 2003 Server. Everything works perfect so far, except that the RDM's created by Snapdrive are physical. This is a problem since it's not possible to vmotion a vm with physical RDM's. We are running 2 ESX Vsphere servers in a cluster. One is hosting the Exchange 2003 VM. I need to reboot this ESX, but I can't because it can't vmotion the Exchange vm because of the physical RDM.
Ok, I checked and one of the ESX hosts can't see the 2 luns that are connected to the Exchange vm. I wonder how I can get this working, because looking at filerview, I can see that snapdrive has automatically created an initiator group for the Exchange VM:
Ironically, I find it frustrating that this standard isn't used on all platforms. We are in the process of converting our SQL cluster into using one igroup per node. Aparently, the Microsoft team didn't check with the VMware team on making best practices consistant. Grrr...
I checked the doc on page 109 and yes it makes sense to make an igroup for each ESX cluster.
We have an igroup for each ESX server, this is not really a problem as long as we assign the sam LUN ID to the luns. The thing I don't understand is how
can the ESX server that is currently hosting the VM see the luns while the other ESX can't?
I checked zoning and there is nothing that would keep the ESX from seeing it. Ok, so I checked the lun mappings in filer view and it is only mapped to the WWN generated by ESX for the VM. There is no initiator group added to this LUN for neither ESX host. I guess by adding both ESX servers this breaks NPIV?
Ok, the problem is now solved. To me it seems that SnapDrive can't really do NPIV without manually creating the correct igroups before. When I installed SnapDrive it only created 2 igroups with the ESX WWPN on the filer. I would have expected it to create the igroups with the WWPN of the V-PORT that was generated when activating NPIV for the VM. This was not the case. Since the LUN's where now mapped to only the WWPNs of one of the ESX hosts in the cluster, it could not be taken over by the other ESX. I have now manually created the igroup with all the WWPN's of the ESX and the generated V-PORT WWPNs of the VM. I mapped the igroup to the luns and powered on my virtual machine again.
From the filer console I can now see that the V-PORT initiators are logged in:
GEDACV2_NPIV (FCP): OS Type: vmware Member: 21:01:00:1b:32:ba:90:29 (logged in on: vtic, 0d) Member: 21:00:00:1b:32:9a:90:29 (logged in on: vtic, 0c) Member: 21:01:00:1b:32:ba:42:de (logged in on: vtic, 0d) Member: 21:00:00:1b:32:9a:42:de (logged in on: 0c, vtic) Member: 28:37:00:0c:29:00:00:19 (not logged in) Member: 28:37:00:0c:29:00:00:18 (not logged in) Member: 28:37:00:0c:29:00:00:17 (logged in on: vtic, 0c) Member: 28:37:00:0c:29:00:00:16 (logged in on: vtic, 0d) Member: 28:37:00:0c:29:00:00:15 (not logged in) ALUA: Yes
I'm not sure if the OS type is correct but SnapDrive used the same type when automatically creating the initiators.
yes it's really really hard to find any good documentation on how to setup NPIV. The best documents I found were from brocade or Qlogic. There is nothing in the SnapDrive documentation regarding this. Even in the TR-3740 (Netapp VMWare Best Practices) there is nothing mentioned about NPIV. I'm really disappointed about this.
I'm still not a 100% sure if it is really using NPIV now. Because in SnapDrive it still only shows the WWPN's of the ESX HBA and not the virtual N-PORT WWPN's of the VM.