Simulator Discussions

Simulator dies at the weekend

bob_lansley
15,532 Views

Yes I know it sounds melodramatic but it's the only way to describe this.

 

For the past couple of weeks I've been playing around with the ONTAP 9 simulator. I upgraded it to 9.1 using option 1 from the boot menu immediately after deploying the 9.0 OVA file.

 

I am running this on a VMWare ESXi 5.5 host with 1GbE network connectivity.

 

It works fine all week but when I come in on Monday it's died with at least one of the nodes showing [nodename:callhome.root.vol.recovery.reqd}EMERGENCY] Call home for ROOT VOLUME NOT WORKING PROPERLY: RECOVERY REQUIRED

 

At the top of the console is also reference that "the system was down for 346 seconds" suggesting it may have rebooted

 

This happens just after midmight on Sat night / Sun morning.

 

I can't access either host to get to log files (unless you guys know how to)

 

Any thoughts on whether this is an ONTAP issue or some external (VMWare?) triggered event?

 

I took VMWare snapshots on Friday so can recover but it's a PITA to have to do this every Monday!

8 REPLIES 8

NAYABSK
15,504 Views

Hi Bob, 

 

Thank you for wrtitng to NetApp community, This is a known issue with ontap simulator and well not to blame weeknd here. To let you know the root volume created in the SIM is so tiny which is around 900MB in your case it might took a week before it got flooded with logs reaching 95%. May be you need to consider delete any existing snapshots on VOL0 and zero  all snapshot schedules 

 

Login to the console and run 

 

 

node run local

Disable VOL0 snapshot schedule 

 

 

snap sched vol0 0 0 0

Delete all snapshots on vol0

 

snap delete -a vol0

And zero snap reserve 

 

snap reserve vol0 0

Also disable the snapshots at aggregate level 

 

snap sched -A aggr0 0 0 0

Delete any snapshots at aggr level 

 

snap delete -a -A aggr0

Try to grow the vol0 using vol size command or incase if you don't have enough space on the aggregate consider adding any spare disk in to the root aggregate aggr0

 

After this you should not be running out of space on vol0 🙂

 

 

Thanks,

Nayab

 

 

bob_lansley
15,446 Views

Thanks, I'll give that a try!

NAYABSK
15,406 Views

Hi Bob, 

 

 

Did you got time to try clearing up some space on vol0 and booting up the simulator again ?

 

 

Thanks,

Nayab

bob_lansley
15,387 Views

Hi Nayab

 

Yes thanks I cleared up vol0's and modified snap schedules. Looks ok so far, I'll see how things are after the weekend as that was when it tended to fail before.

 

Cheers,

 

Bob

NAYABSK
15,371 Views

Hey Bob, 

 

 

Thanks for the update, Do let us know if any issues further and we will be happy assist you :). Hope you having great time playing around with Ontap Simulator 🙂

 

By the way do you know you can also create SSD's on the simulator ? If not please do follow the link below hope you like experimenting 🙂

 

 

How to create SSD's On NetApp SIM

 

 

Thanks,

Nayab

Attilio
14,781 Views

Hello I've installed sim 9.1 on my vmware 6.0 U2 too and I've found my both nodes after some day with the following error message

 

***********************
**  SYSTEM MESSAGES  **
***********************

The contents of the root volume may have changed and the local management
configuration may be inconsistent and/or the local management databases may be
out of sync with the replicated databases. This node is not fully operational.
Contact support personnel for the root volume recovery procedures. 

 

The space on my vol0 is probably not the problem @ my situation

 

cluster199::> aggr show
                                                                      
Warning: Only local entries can be displayed at this time.


Aggregate     Size Available Used% State   #Vols  Nodes            RAID Status
--------- -------- --------- ----- ------- ------ ---------------- ------------
aggr0      36.74GB   33.56GB    9% online       1 cluster199-01    raid_dp,
                                                                   normal

 

cluster199::aggr show
                                                                      
Warning: Only local entries can be displayed at this time.


Aggregate     Size Available Used% State   #Vols  Nodes            RAID Status
--------- -------- --------- ----- ------- ------ ---------------- ------------
aggr0_cluster199_02_0
           36.74GB   33.56GB    9% online       1 cluster199-02    raid_dp,
                                                                   normal

 

*****************

Licenses are missing too ..

 

cluster199::system license> show

Error: show failed: No Base License installed in the cluster. Install a Base License, then try the operation
       again.

 

 

Many thanks for your help

 

Regards.

 

Attilio

Attilio
14,770 Views

solved with

 

  1. To unset the recovery flag and boot normally, start by bringing the system to a halt.
  2. Bring the node to a halt:
    ::*> halt -node <node name>
    (system node halt)
  3. Unset the recovery flag at the loader prompt and boot the node back up.
    LOADER-B*> unsetenv bootarg.init.boot_recovery
    LOADER-B*> boot_ontap

*************************

 

cluster199::> cluster show
Node                  Health  Eligibility
--------------------- ------- ------------
cluster199-01         true    true
cluster199-02         true    true
2 entries were displayed.

 

cluster199::*> cluster ring show
Node      UnitName Epoch    DB Epoch DB Trnxs Master    Online
--------- -------- -------- -------- -------- --------- ---------
cluster199-01
          mgmt     2        2        113      cluster199-01
                                                        master
cluster199-01
          vldb     2        2        1        cluster199-01
                                                        master
cluster199-01
          vifmgr   2        2        60       cluster199-01
                                                        master
cluster199-01
          bcomd    2        2        7        cluster199-01
                                                        master
cluster199-01
          crs      2        2        1        cluster199-01
                                                        master
cluster199-02
          mgmt     2        2        113      cluster199-01
                                                        secondary
cluster199-02
          vldb     2        2        1        cluster199-01
                                                        secondary
cluster199-02
          vifmgr   2        2        60       cluster199-01
                                                        secondary
cluster199-02
          bcomd    2        2        7        cluster199-01
                                                        secondary
cluster199-02
          crs      2        2        1        cluster199-01
                                                        secondary
10 entries were displayed.

 

Regards.

 

Attilio

kushneo
13,645 Views

I have read in some articles that this is a very common issues for netapp simulator (I’m using netapp dot8.3.2-cm ) since it got a very small root volume . so as soon as the root volume get filled with dump files , logs & snapshots root volume get crashed (I’m not use this is the correct word tho) .so when it happens these are the steps I normally follow to get my simulator up again ,

 

1. Step

 

>system node coredump delete clus-01 *

 

 

2. Step

And then I Follow this link steps http://www.cosonok.com/2014/12/root-volume-not-working-properly.html

(copy & past the link if it's not redirecting)

 

at this point sometimes your sim might start showing the interfaces again if not go to step 3.

 

3. Step

And then in the boot loader 

 

LOADER-B*> unsetenv bootarg.init.boot_recovery
LOADER-B*> boot_ontap

 

4. Step

at the end of above steps the system message stopped appearing to me , but it won't show the interfaces , so to up the interfaces what I did is 

 

>set diag
>net port mod -node <nodename> -port * -up-admin true

 

That’s it, it should work now 🙂

 

** I'm still a newbi to netapp so sometimes my above explanations about this issue might not be accurate 😉  

 

 

Public