Subscribe

ONTAP Simulator Fixes - MEGA Thread

Hi All,

After running the simulator for many years now I have experienced a wide range of scenarios that have required intervention. Many of the scenarios in this mega thread are based upon a mix of my own experiences, NetApp PS engineers knowledge and other community postings... oh and of course the instruction manuals! The goal of this thread is to bring it all together and show you the best practices which will in many cases prevent frustration further down the line. I hope you find this useful - I've tried to credit the sources as best I can remember.

I would recommend that anyone using the simulator follow sections 1 and 2 as a best practice when using the simulator.

A few details about my setup:

  1. VMware ESXi 5.5
  2. ONTAP Simulator cDOT 8.2.1 (2 nodes)

Assumptions:

I assume that you have already uploaded the relevant images to your datastores by extracting the .tar file from the support.netapp.com link and using the VMware datastore browser - yes its like watching paint dry and the "upload minutes left" may as well be a random number generator.

Mega Thread Contents - ensure you do not boot your images until 1.3 - correctly configuring the second node.

  1. Installation
    1. Getting the ESX 5.5 environment to see the VMDK disks (Multi-extent module)
    2. Add the missing serial ports to node1 and node2
    3. Correctly configuring the second node / Fixing an incorrect serial number
  2. Prevent the root volume and aggregate from filling up (mroot)
  3. Fixing a full root volume or aggregate
    1. Unlocking the diag user
  4. Adding and Removing additional disks to the simulator
  5. Miscellaneous Errors
    1. Unable to find a slot for PCI bridge #0. Remove devices occupying the primary PCI bus from the virtual machine

1.1  Getting the ESX 5.5 environment to see the VMDK disks (Multi-extent module)

You may notice that when you try to boot a freshly downloaded simulator VM in ESX 5.1 or above you receive an error message similar to this:

VMware ESX cannot find the virtual disk '/vmfs/volumes/DataONTAP-sim.vmdk' or simliar. Dont worry! This isn't a problem with the image, ESX doesnt load the multiextent modules required by default - it used to pre ESX 5.1 - grab yourself a copy or your favourite SSH tool and run the below:

To load the vmkernel multiextent module:

          1. Open a console to the ESXi host. For more information, see Using Tech Support Mode in ESXi 4.1 and ESXi 5.x (http://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&docType=kc&docTypeID=DT_KB_1_1&externalId=1017910)

          2. Run this command to load the multiextent module:

                      # vmkload_mod multiextent

     3. On the command line of the ESX host browse to the folder containing your NetApp simulator vmdk disks. (example: cd vmfs/volumes/<datastorename>/vsim_esx-01/)

     4. Run one of the following against BOTH DataONTAP-sim.vmdk and DataONTAP.vmdk (I opted for thin disks in my case)

      For a thick vmdk disk:

          # vmkfstools -i VM-name.vmdk <VM-name-new-disk>.vmdk -d zeroedthick

     For a thin vmdkdisk:

          # vmkfstools -i VM-name.vmdk <VM-name-new-disk>.vmdk -d thin

     Example: # vmkfstools -i DataONTAP-sim.vmdk DataONTAP-sim-fixed.vmdk -d thin

          # vmkfstools -i DataONTAP.vmdk DataONTAP-fixed.vmdk -d thin

     b. Delete the hosted disk after successful cloning using the command - (We wont be needing the original vmdk disks now)

                    # vmkfstools -U DataONTAP-sim.vmdk

                    # vmkfstools -U DataONTAP.vmdk

     c. Rename the cloned VMFS type new disk to the original disk name using the command: (The simulator will expect the original filenames)

                    # vmkfstools -E DataONTAP-sim-fixed.vmdk DataONTAP-sim.vmdk

                    # vmkfstools -E DataONTAP-fixed.vmdk DataONTAP.vmdk

     5. Run this command to unload the multiextent module:

                    # vmkload_mod -u multiextent

That's it! At this point you will have the disks correctly formatted for ESX 5.1 and above. DON'T START YOUR SIMULATORS YET!!! Go to step 1.2.

1.2 - Add the missing serial ports to node1 and node2

I noticed that the installation manuals mention configuring the serial ports on the VM's. I looked once, then twice... It appears that they are not actually in the VM configuration on the ESX image (right click the VM and click edit settings). Again an easy fix:

There are a total of two serial ports to be added to each simulator:

Adding the first console serial port: (console)

  1. On the hardware tab click 'Add...'
  2. Select 'Serial Port'
  3. Select 'Use Named Pipe' and specify the following: \\.\pipe\vsim-cm-cons  (replace vsim-cm with the name of the simulator folder for your environment.
  4. Select 'Server' for Near End setting
  5. Select 'A Process' for Far End setting (I believe this is application in earlier versions of ESX)
  6. Click finish


Adding the second console serial port: (gdb)

  1. On the hardware tab click 'Add...'
  2. Select 'Serial Port'
  3. Select 'Use Named Pipe' and specify the following: \\.\pipe\vsim-cm-gdb  (replace vsim-cm with the name of the simulator folder for your environment.
  4. Select 'Server' for Near End setting
  5. Select 'A Process' for Far End setting (I believe this is application in earlier versions of ESX)
  6. Click finish

1.3 Correctly configuring the second node / Fixing an incorrect serial number


This is an important step if you are going to use a second node. If you are going to use a single node then you can skip this section. I have also used this to fix the serial number on nodes that don't seem to match the licence files provided on the support.netapp.com site.

  1. Open a console window to your second node and power it on. Press the space bar when the Hit [Enter] to boot immediately, or any other key for the command prompt. Booting in 10 seconds...
  2. You should see the VLOADER prompt. Type the following to change the System ID and Serial number for the node (i have used the default second node serial number - check your downloaded licences.txt file:
    1. setenv SYS_SERIAL_NUM 4034389-06-2
    2. setenv bootarg.nvram.sysid 4034389062
    3. boot

2. Prevent the root volume and aggregate from filling up (mroot)

Sooner or later in your simulators lifetime I have encountered the mroot or root volume filling up - symptoms of which can range from warning messages like the one below:

The root volume (/mroot) is dangerously low on space (<10MB). To make space available, delete old Snapshot copies, delete unneeded files, and/or expand the root volume's capacity.  After enough space is made available, reboot this controller.  If needed, contact support personnel for assistance.

Simple best practice here can prevent this scenario you will need to run this on each node of your simulator:

Firstly we need to enter the node level command prompt, you can do this by typing node run local

  1. Disable root aggregate snapshots and delete any existing snapshots
    1. snap sched -A aggr0 0 0 0
    2. Now check for any existing snapshots on the root aggr with snap list -A aggr0 command
    3. If any snapshots i.e. hourly.0 exist - delete them with this command: snap delete -A aggr0 snapname
  2. Disable root volume snapshots and delete any existing snapshots
    1. snap sched vol0 0 0 0
    2. Now check for any existing snapshots on the root vol with snap list vol0 command
    3. If any snapshots i.e. hourly.0 exist - delete them with this command: snap delete vol0 snapname
  3. Next we will enable snapshot autodelete for both the root aggr and root volume just in case a manual snapshot is generated by accident:
    1. snap autodelete -A aggr0 on
    2. snap autodelete vol0 on

At this point you have now prevented snapshots from taking your root volume offline - this was the most common scenario I have encountered when using the simulator and planning upfront will prevent this from happening to you.

3. Fixing a full root volume or aggregate

If you read the above section and thought "That would have been nice to know before my root aggr/vol filled", then fear not, here is how you can fix it - i recommend that if once fixed ensure you run through the steps in section 2 to prevent this scenario happening again.

On the affected node(s) run the following commands:


1. Grow the root aggregate by adding 1 disk (node run local aggr add aggr0 1)

2. Grow the root volume to 1g (node run local vol size vol0 1g)

3. Navigate to systemshell for the node (systemshell -node - cluster-name-01)

4. Login as Diag user (Read Section 3.1)

5. Navigate to the following directories and clear all the log files : /mroot/etc/log and /mroot/etc/log/mlog and run the following command rm *.*

You have now successfully emptied all the log files from the root volume and added more capacity to the volume. Reboot your simulators nodes to ensure all services now boot correctly.

3.1 Unlocking the diag user

Perform the following steps to unlock and login as the diag user:

1. Unlock the "diag" user and assign it a password:

       > security login unlock -username diag

       > security login password -username diag

       Please enter a new password: <password>

       Please enter it again: <password>

2. Log in to the system shell using the diag user account:

       > set -privilege advanced

       *> systemshell local

       login: diag

       password: <password>


4 - Adding and Removing additional disks to the simulator

Many people don't know that they can add more disks to the simulator: Here are the instructions taken from the simulate ONTAP guide:

Limitations:

You can have a maximum of four simulated disk shelves with 14 disk drives per shelf, for a total of 56 drives per simulator.

Each simulated drive is limited to 9 GB. Note: The simulator image comes pre-configured with 28 1 GB disks; 14 each on simulated disk shelves 0 and 1. Simulated disk shelves 2 and 3 are not populated.

You can configure up to a maximum of 220 GB total space for each Simulate ONTAP node.

You can create 64-bit aggregates, but they are limited to a maximum of 9 GB per simulated disk drive.

Adding Disks

1. Unlock the "diag" user and assign it a password:

       > security login unlock -username diag

       > security login password -username diag

       Please enter a new password: <password>

       Please enter it again: <password>

2. Log in to the system shell using the diag user account:

       > set -privilege advanced

       *> systemshell local

       login: diag

       password: <password>

3. Add the directory with the simulator disk tools to the path:

       % setenv PATH "${PATH}:/usr/sbin"

       % echo $PATH

4. Go to the simulated devices directory:

       % cd /sim/dev

       % ls ,disks/

At this point you will see a number of files which represent the simulated disks.  Notice that these files start with "v0." and "v1.". That means the disk are attached to adapters 0 and 1, and if you count the disk files you'll see that there are 14 of them on each adapter. This is similar to the DS14 shelf topology with each shelf attached to its own adapter.

5. Add two more sets of 14 disks to the currently unused adapters 2 and 3:

       % vsim_makedisks -h

       % sudo vsim_makedisks -n 14 -t 23 -a 2

       % sudo vsim_makedisks -n 14 -t 23 -a 3

       % ls ,disks/

The first invocation of the command prints usage information. The remaining two commands tell the simulated disk creation tool to create 14 additional disk ("-n 14") of type 23 ("-t 23") on adapters 2 and 3 (e.g., "-a 2"). As you can see from the output of vsim_makedisks -h, type 23 disks are 1GB disks. You can add a different size and type of disk using the number that corresponds to the disk type.  Note that Data ONTAP 8.1.1 supports simulated disks up to 9GB (type 36 and 37), but make sure you have the space to add such large disks. (I used type 36)

Word of warning - don NOT try to fill the VMDK - there are some overheads involved so try keep some breathing space (credit to  https://communities.netapp.com/docs/DOC-17354)


6. Now we're done with the system shell. We need to reverse some of  the earlier steps and reboot the simulator so that it sees the new disks:

       % exit

       *> security login lock -username diag

       *> system node reboot local

       Warning: Are you sure you want to reboot the node? {y|n}: y

7. After the reboot completes, log back in and take ownership of all the disks.The example below is for a brand new system where all but disks in the root aggregate are currently unowned.

Substitute the name of the node for <nodename> in the commands below:

       > storage disk show

       > storage disk modify -disk <nodename>:v4.* -owner <nodename>

       14 entries were modified.

       > storage disk modify -disk <nodename>:v5.* -owner <nodename>

       14 entries were modified.

       > storage disk modify -disk <nodename>:v6.* -owner <nodename>

       14 entries were modified.

       > storage disk modify -disk <nodename>:v7.* -owner <nodename>

       14 entries were modified.

       > storage disk show

You should now see 56 disks of 1GB each listed in the simulator. The disks should be listed as already zeroed and ready to use inside an aggregate.

Removing Disks (Credit to: https://communities.netapp.com/docs/DOC-17354)

Remove the simulated disk from Data ONTAP by entering the command "disk simpull <disk_name>".  For example:

        user-vsim1> disk simpull v5.32

2. Delete the disk file from the FreeBSD directory “/sim/dev/,disks/”.

  a. Follow the steps to log into the system shell using the diag user account.

  b. From "/sim/dev" enter "ls ,disks/,pulled/" to view all the disks that have been pulled. For example:

        % ls ,disks/,pulled/

         v1.32:NETAPP__:VD-1000MB-FZ-520:14143313:2104448

  c. To remove the disk file that corresponds to Data ONTAP disk "v5.32", subtract "4" from the first number; so that would be

      "v1.32....".  That is the disk file you need to delete. For example:

         % sudo rm ,disks/,pulled/v1.32:NETAPP__:VD-1000MB-FZ-520:14143313:2104448

3. Follow the steps to return to the Data ONTAP prompt.

5.0 Miscellaneous Errors

This section details any seemingly random errors I have come across when using the simulators.

5.1 Unable to find a slot for PCI bridge #0. Remove devices occupying the primary PCI bus from the virtual machine

I dont know how i encountered this error - but it was a simple on to fix

  1. Open the .VMX file for the VM and remove the line pciBridge0.present = "true"
  2. You should now be able to boot the VM

That's all from me for now - I will continue to add any new scenarios that I encounter - if you got this far thanks for reading, I hope this thread has been useful!


Re: ONTAP Simulator Fixes - MEGA Thread

Great article... was exactly what I needed when I tried to start the cDOT 8.2.1 sim on vSphere 5.5.  However, I couldn't get past the step of deleting the original "DataONTAP-sim.vmdk" file (vmkfstools -U DataONTAP-sim.vmdk).  I kept getting errors that the file was still in use:

/vmfs/volumes/b37cb99b-3bcbb796/vsim_cDOT821_drnode1/vsim_esx-cm # vmkfstools -U DataONTAP-sim.vmdk -v 1

OBJLIB-FILEBE :FileBEUnlink : Failed to unlink the file './DataONTAP-s001.vmdk' : 1048580

DISKLIB-LIB   : Cannot remove extent `./DataONTAP-s001.vmdk': Device or resource busy

DISKLIB-LIB   : Failed to delete disk 'DataONTAP-sim.vmdk' or one of its components: Device or resource busy

Failed to delete virtual disk: Device or resource busy (1048585).

Wondering if you've seen this?

One note: I do copy the .tgz to the datastore (NFS from a NetApp in this case), then I untar it there -- much faster.

(Details: ESXi 5.5.0, 1331820, vsim cm 8.2.1)


Re: ONTAP Simulator Fixes - MEGA Thread

Convert the vmdk before adding the vmx to inventory. 

Re: ONTAP Simulator Fixes - MEGA Thread

Update: I was able to make this work by using the "mv" command (rename) instead of deleting the original DataONTAP-sim.vmdk file.  Found the hint in this posting: https://communities.netapp.com/message/124069#124069   Thanks!

Re: ONTAP Simulator Fixes - MEGA Thread

[ Edited ]

Great information.  There was one other bit needed to make the serial console work.

 

At the VLOADER> prompt on each node, set the following options.

 

set autoboot_delay 20
set comconsole_speed 115200
set console comconsole,vidconsole

 

I found this in the attached recipe that I found long ago. It is for ONTAP 7, but it applies to the CDOT simulator as well.

 

 

Re: ONTAP Simulator Fixes - MEGA Thread

Thanks for this post.

Just wondering, is the creation of the serial ports necessary for the cluster to be working? I can't see their purpose, but I'm assuming that the cluster is only "happening" through the network interfaces, which could be a wrong assumption.

 

Thanks!

Re: ONTAP Simulator Fixes - MEGA Thread

I have an OnTap 8.3 sim running with no serial ports.  I don't know how they'd even be used when the sim is running on an ESX blade farm.

Re: ONTAP Simulator Fixes - MEGA Thread

Interesting, thanks. I wonder now, are your sim instances attached to a virtual switch with no physical devices attached? Because that was the only way for my nodes to join each other.

Re: ONTAP Simulator Fixes - MEGA Thread

Yes.  We run several hundred VMs, and have three virtual switches in the ESX blade farm for the different subnets the VMs connect to.

Re: ONTAP Simulator Fixes - MEGA Thread

In a big cluster vidconsole is probably going to be easier to manage.  If you have a VSPC deployed that would be ideal.  The basic serial over ethernet functionality would require you to know which host the sim is running on.  If you have DVAdmin (from an Edge deployment) that can do it too.