ONTAP Discussions

ONTAP Select : Cannot create

llyrice
16,154 Views

 

Hello,

 

With ONTAP Select Deploy 2.3 and 2.4, I got an error when creating a Cluster. The node cannot be created with the following error.

I tried with Web GUI and CLI.

 

How to solve this issue ?

 

 

ONTAP Select deploy Event :

 

ClusterNodeCreateFailed: One of the node create operations failed during cluster "CLU_TEST" create.

NodeCreateFailed: Node "CLU_TEST-node1" create failed: Cannot create VM 'CLU_TEST-node1' (errType=InvalidRequest).

 

 

sdotadmin_server.log :

 

2017-06-09 09:16:40,046|DEBUG response body: {"code":56,"details":"Cannot create VM 'CLU_TEST-node1' (errType=InvalidRequest)","type":"VmCreateOvfErr"}
response body: {"code":56,"details":"Cannot create VM 'CLU_TEST-node1' (errType=InvalidRequest)","type":"VmCreateOvfErr"}
2017-06-09 09:16:40,060|ERROR |client_api_helper.py|183:vm_create| Error: NodeCreateFailed: Node "CLU_TEST-node1" create failed: Cannot create VM 'CLU_TEST-node1' (errType=InvalidRequest).
|client_api_helper.py|183:vm_create| Error: NodeCreateFailed: Node "CLU_TEST-node1" create failed: Cannot create VM 'CLU_TEST-node1' (errType=InvalidRequest).
2017-06-09 09:16:46,431|DEBUG response body: {"code":13,"details":"<vmname> invalid - no vm named 'CLU_TEST-node1' on host 'nte-esx01.mqt02.mqt'","type":"InvalidArg"}
response body: {"code":13,"details":"<vmname> invalid - no vm named 'CLU_TEST-node1' on host 'nte-esx01.mqt02.mqt'","type":"InvalidArg"}
2017-06-09 09:16:46,844|ERROR |cluster_tasks.py|519:_create_nodes| Cluster [CLU_TEST]: one or more of node create tasks failed, initiating rollback
|cluster_tasks.py|519:_create_nodes| Cluster [CLU_TEST]: one or more of node create tasks failed, initiating rollback
2017-06-09 09:16:46,845|ERROR |cluster_tasks.py|520:_create_nodes| Error: ClusterNodeCreateFailed: One of the node create operations failed during cluster "CLU_TEST" create.
|cluster_tasks.py|520:_create_nodes| Error: ClusterNodeCreateFailed: One of the node create operations failed during cluster "CLU_TEST" create.
2017-06-09 09:16:46,845|ERROR |cluster_tasks.py|277:create_cluster| Cluster [CLU_TEST]: cluster create failed
|cluster_tasks.py|277:create_cluster| Cluster [CLU_TEST]: cluster create failed
2017-06-09 09:16:46,849| INFO |cluster_tasks.py|292:create_cluster| initiating rollback of failed cluster (CLU_TEST)
|cluster_tasks.py|292:create_cluster| initiating rollback of failed cluster (CLU_TEST)
2017-06-09 09:16:46,856|DEBUG |cluster_tasks.py|331:delete_cluster| Delete cluster "CLU_TEST"
|cluster_tasks.py|331:delete_cluster| Delete cluster "CLU_TEST"
2017-06-09 09:16:46,885| INFO |cluster_tasks.py|421:delete_cluster| Cluster [CLU_TEST]: all nodes in cluster are deleted, deleting cluster db
|cluster_tasks.py|421:delete_cluster| Cluster [CLU_TEST]: all nodes in cluster are deleted, deleting cluster db
2017-06-09 09:16:46,886| INFO |cluster_tasks.py|427:delete_cluster| cluster (CLU_TEST) deleted

 

 

 

Regards,

 

Cyrille

1 ACCEPTED SOLUTION

Reg
15,971 Views

As I understand from NetApp, this bug will be fixed in later releases. In the meantime, as I mentioned you can use the workaround to have the cluster created with a distributed vswitch.

 

Create a dummy test portgroup in the standard vswitch; that should be the only portgroup for the standard vswitch; nothing else. Then you can as usual create your distributed vswitch with nics and port groups and create the cluster. It should work. Post the cluster is created, you can delete the dummy test portgroup from the standard vswitch.

View solution in original post

18 REPLIES 18

Reg
15,851 Views

Is it a standard or distributed vSwitch configured at ESXi level? If distributed, then there is a bug which prevents the deployment and fails at this same level. The workaround is to create a dummy VLAN port group in the standard vSwitch without any nics or any other port groups, and then try the deployment.

 

If standard then perhaps there is some issue which needs to be reviewed.

llyrice
15,808 Views

Hello,

 

It was a distributed vSwitch. I change my environment to have supported hardware server. On the new ESX server with standard vSwitch I can create the cluster and deploy the node.

 

Thank you for yout answer and let me know about this bug I will probably have in my customer environment.

 

Cyrille

Reg
15,972 Views

As I understand from NetApp, this bug will be fixed in later releases. In the meantime, as I mentioned you can use the workaround to have the cluster created with a distributed vswitch.

 

Create a dummy test portgroup in the standard vswitch; that should be the only portgroup for the standard vswitch; nothing else. Then you can as usual create your distributed vswitch with nics and port groups and create the cluster. It should work. Post the cluster is created, you can delete the dummy test portgroup from the standard vswitch.

dangreau
14,941 Views

Hi @Reg . 

I think I encounter the bug with the Distributed vswitch incompatibility (Ontap Select node deployment fails whatever ontap cluster size/deployment model I choose) . The problem is that all my Vmware cluster are only made of Distributed vswitch so deploying a real standard vswitch instead, with real nics and real port group is not an option for us.

I've read  you're saying that using a "dummy/fake"  standard vswitch (without any physical nics) could be a workaround. But I don't understand exactly what I need to do for that ? Could you please explain a little more the workaround ? 

 

Some precision:

1. my Ontap Deploy VM is hosted on a Vsphere 6.0 cluster, with only distributed vswitch (2 on each cluster)

2. Ontap Select VM are planned to be hosted on 2 different Vsphere 6.0 cluster, with only distributed vswitch (2 on each cluster again, with 2*40Gb nic on each dvs in a LAG LACP Upllink) 

 

Thanks a lot for your help. 

Regards

Reg
14,905 Views

Hi Pierre,

The issue is resolved in deploy version 2.6. Regarding the workaround that I mentioned, all you have to do is just create a standard vswitch, using the create option, and follow the default options it gives you, without having to add any portgroups, vmkernel or any nics, etc.

Also, before you create the cluster, are you able to do the networking test "network connectivity-test" from the CLI. This option I believe is there in the GUI too with deploy version 2.6

 

regards,

Reg

dangreau
14,873 Views

Hi @Reg

 

I've tried with Ontap Select/Deploy 2.6 without any more luck or better result. The cluster creation fails at the same moment with same message : "... create failed. Reason: Error during configuration of vnics:  InvalidRequestFault."

 

I've checked log files:

1. sdotadmin-server.log does not give me any more information about the cause

 

2. esxadmin.log gives me this:

 

[INFO] [Thu Nov 23 05:43:31 2017] [8815] Configuring Network ...
[INFO] [Thu Nov 23 05:43:31 2017] [8815] Error encountered while creating VM
[INFO] [Thu Nov 23 05:43:31 2017] [8815] [13] InvalidArg: <scsiC:N> invalid - Invalid/missing disk 'scsi0:0'
[ERROR] [Thu Nov 23 05:43:31 2017] [8815] [51] Err: Error during configuration of vnics: InvalidRequestFault

 

 

I've tested with external storage (SAN) disks and with local DAS disks with the same failure. I've also tested each run (external vs local DAS) with 2 node HA cluster and 1 node ciuster (both in evaluation mode) with the same failure. 

I've tested on 2 different cluster, but both configured with 1 Distributed vswith and 2 distributed port group (one for external and one for internal) , as well as with the same 2*40Gb LAG LACP uplink) 

And I've also ran a "Ontap network connectivity check" (in quick mode) which succeeded. 

 

 

All these failure may be related to my Vmware infrastructure but as I'm time constrained, I don't have much time to troubleshoot with Netapp . Also, I do not want to make any modification to my Vmware cluster as I think that it's up to Ontap Select Appliance to adapt, in a transparent way, to the underlying Vmware hosting infrastructure. Not the opposite.

 

So i'm will give up with Ontap Select and use a more robust and mature software solution (like a much more classical linux box with smb + nfs + drbd + pacemaker). I will loose some interested features (especially snapmirror) but i will save much time (as well as money) and could also bypass wanted feature with technical alternatives. 

 

Thanks anyway for taking time to help me. 

 

Regards, Pierre.

Reg
14,857 Views

Hi Pierre,

Perhaps I could logon myself and assist you. Is the sdotadmin-server.log reporting the same errors as previous when you initially raised the case? Also, are you able to provide the commands that you have executed for local DAS and for external disk array? And screenshots of the local DAS datastores and external array datastores.

 

I guess the problem is with it not being able to recognise the disk, which could be an issue with the configuration or commands you run.

 

ONTAP Select is good, depending on what you want to achieve and what feature sets you want. If it is normal file sharing capabilities, snapshots and replications there are other SDS offering from different vendors too, with their own pros and cons.

 

regards,

Reg

dangreau
14,847 Views

@Reg

"Is the sdotadmin-server.log reporting the same errors as previous when you initially raised the case?"

==> Yes

 

 

"Also, are you able to provide the commands that you have executed for local DAS and for external disk array? And screenshots of the local DAS datastores and external array datastores."

==> Sorry but don't have any screenshots. And no command either as I used OntapDeploy GUI to create cluster. 

But I can send you auto support if you want to investigate a little bit more. 

 

 

I'm sure ontap select is good and works as expected in some shops. The problem is that it seems incompatible in my technical environment. We own many Netapp filer for a long long times and have a strong and historical relationship with Netapp and I know that it will probably consume much more time to make Ontap Select work in our technical environment  than  use other solution.

It is a very pragmatic approach justified by time constraint I have on the project I'm working on. 

 

Regards. Pierre. 

 

Reg
14,780 Views

OK understand. Send me the autosupport and screenshot of the datastore or disk that you are pointing to scsi0:0 to be used as the storage pool. Within vSphere client--> configuration-->storage->datastore

 

Show me the output from the deploy command line

host show-all

license show-all

cluster show-all

 

Assuming you are setting up a multi-node cluster (2 node cluster), then setup the cluster using command line. Would you option to use local within the ESXi? If so, that will be easy, as external storage is only supported for single-node.

 

These is an example of commands. Follow the exact syntax. Note the double \ in the username. I've put FQDN DNS names. You will have to create DNS entries for the below

OTSnode1.com.org to OTSnode2.com.org are your ESXi physical hosts

OTScluster01 --> OTS cluster name

otsclusn01 --> OTS VM residing on ESXi node1

otsclusn02 --> OTS VM residing on ESXi node2

 

host add --host-id OTSnode1.com.org --username <domainname\\domainusername> --password Pass@#123 --vcenter vCenter.com.org

host add --host-id OTSnode2.com.org --username <domainname\\domainusername> --password Pass@#123 --vcenter vCenter.com.org

 

 

Run the host show-all command after each above command to verify the entry is created. The status should show as authenticated.

 

If you are using eval licenses, and want to conifgure a standard configuration of OTS that uses 4vCPU and 16 GB RAM, then the small instance type is selected below, else for premium which is v8CPU and 64GB perhaps requires license and instance type will be medium

Replace Location1 with the site under which the ESXi hosts are registered under in vCenter. Storage pool name is the name of the datastore as you see in vCenter. Replace the internal-network and externale network names with the portgroups you have created in the distributed vSwitch. OTS Management network is usually the external network, unless you have created a seperate one for it.

 

host configure --host-id OTSnode1.com.org --location Location1 --storage-pool DSotsnode01 --eval --internal-network ONTAP-Internal --management-network ONTAP-External --data-network ONTAP-External --instance-type small

host configure --host-id OTSnode2.com.org --location Location1 --storage-pool DSotsnode02 --eval --internal-network ONTAP-Internal --management-network ONTAP-External --data-network ONTAP-External --instance-type small

 

 

Run the host show-all command after each above command to verify the entry. The status should change from authenticated to configured.

You can run the license show-all command too to verify output

 

Run the network connectivity check prior to running the cluster create

 

Finally the cluster create command as below which will install ontap 9.2

 

cluster create --name OTScluster01 --admin-password netapp123 --cluster-mgmt-ip 10.118.111.103 --netmask 255.255.255.224 --gateway 10.118.111.97 --ontap-image-version 9.2 --node-names otsclusn01 otsclusn02 --node-mgmt-ips 10.118.111.101 10.118.111.102 --node-hosts OTSnode1.com.org OTSnode2.com.org --node-mirrors otsclusn02 otsclusn01

 

run the cluster show-all command to verify status.. The same status can be checked through the GUI.

 

 

 

dangreau
14,036 Views

Thanks a lot @Reg I will try this and send you all the information. 

 

Before I start the test, I notice some  difference :

  • First one is that I've only tested Ontap Select deployment with Ontap Deploy GUI. I've never testes with CLI. 
  • Second one is that on your provided CLI commands, I don't see disk space allocated for Ontap Select nodes. When I tested in GUI, I've always provided a "storage pool capacity" limit on the "add host" screen (the screen where I select host and instance type) 
  • Last one is that last time I tested GUI deployment, I used image version 9.3 P1. 

 

Last point, I will only try asked test with a single-node Cluster with external storage as I've used my local DAS datastore for other purpose (my opensource NFS HA box) 

 

Regards. 

 

dangreau
14,027 Views

I've tried to create a single node eval cluster with CLI but result is exactly the same  as with GUI : the cluster creation fails.  😞

 

You will find in your inbox all the screenshot and asked informations. 

 

Regards. 

 

Pierre. 

dangreau
13,984 Views

Well, here is a short status update for my problem and after investigations with @Reg (thanks a lot to him)

 

Ffor the moment it seems just impossible to make Ontap Select work in my environment 😞 Each time the result is the same: node deployment fails with the same message "Node create failed. Reason: Error during configuration of vnics: InvalidRequestFault"

 

each time, "esxadmin.log" show following message :

[INFO] [Tue Nov 21 08:53:32 2017] [8377] Error encountered while creating VM
[INFO] [Tue Nov 21 08:53:32 2017] [8377] [13] InvalidArg: <scsiC:N> invalid - Invalid/missing disk 'scsi0:0'
[ERROR] [Tue Nov 21 08:53:32 2017] [8377] [51] Err: Error during configuration of vnics: InvalidRequestFault
[INFO] [Tue Nov 21 08:54:16 2017] [8391] Marked system disks independent-persistent
[INFO] [Tue Nov 21 08:54:16 2017] [8391] Configuring Network ...
[INFO] [Tue Nov 21 08:54:16 2017] [8391] Error encountered while creating VM
[INFO] [Tue Nov 21 08:54:17 2017] [8391] [13] InvalidArg: <scsiC:N> invalid - Invalid/missing disk 'scsi0:0'

 

And each time, the VM (Ontap Select node) seems deployed in ESX (at least, it exists) : 

DeployedVM.PNG

 

 

What we have tested so far:

  • We have tested Ontap Deploy 2.5 and 2.6 without any improvement 
  • we have tested 2 node HA and single node creation without any improvement 
  • wa have tested with Local Datastore and SAN Datastore without any improvement 
  • we have tested from GUI and from CLI without any improvement 

 

Our environment is :

  • vsphere version = 6.0.0 build 4510822
  • Each tested host has 2 Distributed vswitch. One with 2*40G uplinks, and one with 2*10G uplinks. Both uplinks are configured in LACP mode
  • Ontap Select networks are using 2 distributed port group: one for external/management network, and one for internal (only used for HA pair) . Both distributed PG comes from same distributed vswitch (the one using 2*40G LAG)
  • local datastore tested comes from 2 local disks, in a RAID 1 LV managed by a Smart Array P840ar.
  • SAN datastore tested comes from external LUN accessed by Fiber channel

 

Has anyone got a new idea I could try before I definitely give up with Ontap Select ?

 

Thanks.

 

Pierre.

 

Reg
13,968 Views

Hi Pierre,

Probably, you have checked the compatibility matrix of the hardware you have. I don't think it should be an issue

 

Just to eliminate any problems...

 

a) Have you done any tests with the storage pool being on local datastores (rather than the Pure FC storage) for single and multi-node? External datastore is only supported for single-node.

b) Are you able to test with a standard vSwitch?

c) If there are other VM's on the same ESXi server, are you able to test for a multi-node with any VMs on it; both of them with local datastores for the storage pool?

 

Before you attempt any.. if you see an undeleted cluster, forcefully delete the cluster using the cli -force option.

 

regards,

Reg

Reg
13,968 Views

Also, deregister any host entries post force cluster delete...

dangreau
13,953 Views

Reg, 

 

you have checked the compatibility matrix of the hardware you have.

==> Our hardware is totally supported by Vmware , HPE (we're using DL360G9 with fully supported SPP), PureStorage (for SAN datastore) and Cisco (we're using MDS 9148S SAN switchs)

I also precise that our Vmware infrastructure is a production one, hosting production VM (as well as some test ones), so I can not change anything on it. (and I would not want to change anything as I consider it is up to appliance to adapt to our infrastructure, and not the opposite) 

 


Have you done any tests with the storage pool being on local datastores

==> Yes, as i've indicated before, I've tested Ontap Select single node eval cluster deployment (with ontap deploy 2.5 and 2.6) with a local datastore (provided by a Smart Array P840ar Raid 1 logical volume) . With the same exact failure (and error message). 

 

 

Are you able to test with a standard vSwitch?

==> no, this is impossible as our Vmware design is only based on DVS and DPG. We have no usable standard switch. 

 

 

Also, deregister any host entries post force cluster delete...

 ==> as you can see it on screenshots I gave you by email, I always start cluster creation "from scratch" (without any cluster or host configured). 

 

 

Regards. 

 

Pierre. 

Reg
11,339 Views

Hi Pierre,

Sorry, I can't think of anything else. Not sure if eval license has something do with it.

 

Your errors says that it can't find the disk to create the VMs, which is the storage pool name you specified in the host configure command.

Instead of the Pure storage, storage pool which you mentioned in the host configure, do  you have command output and logs for the storage pool with the local ESXi disk group pool? Does it too fail with the same logs of not able to find the disk? Sorry to re-iterate, but just wanted to double check...

 

regards,

Reg

dangreau
11,306 Views

Hi Reg,

 

I've done a new test : Ontap Select single node eval cluster created on a a local DAS Datastore (2 local SAS Drive in a Raid 1 Logical volume managed by a smart array P840ar) and the result is exactly the same as with a SAN datastore (same error message) 

 

extract of esxadmin.log

[INFO] [Mon Nov 27 09:02:17 2017] [8833] Marked system disks independent-persistent
[INFO] [Mon Nov 27 09:02:17 2017] [8833] Configuring Network ...
[INFO] [Mon Nov 27 09:02:17 2017] [8833] Error encountered while creating VM
[INFO] [Mon Nov 27 09:02:18 2017] [8833] [13] InvalidArg: <scsiC:N> invalid - Invalid/missing disk 'scsi0:0'
[ERROR] [Mon Nov 27 09:02:18 2017] [8833] [51] Err: Error during configuration of vnics: InvalidRequestFault
[ERROR] [Mon Nov 27 09:02:39 2017] [8825] [13] InvalidArg: <scsiC:N> invalid - Invalid/missing disk 'scsi0:0'

 

 

You will find in your inbox all the cli output and logs. 

 

Regards

 

Pierre.

dangreau
11,278 Views

As this subject is marked "solved", and as my problem is not, I have created a new subject here : https://community.netapp.com/t5/Data-ONTAP-Discussions/ONTAP-Select-2-6-cluster-create-failed-Invalid-missing-disk-scsi0-0/m-p/136295#M30012

 

 

==> So I won't post anymore on this subject.  (except to indicate solution, if one is found !) 

 

 

Public