Simulator dies at the weekend

bob_lansley · ‎2017-02-13

Yes I know it sounds melodramatic but it's the only way to describe this.

For the past couple of weeks I've been playing around with the ONTAP 9 simulator. I upgraded it to 9.1 using option 1 from the boot menu immediately after deploying the 9.0 OVA file.

I am running this on a VMWare ESXi 5.5 host with 1GbE network connectivity.

It works fine all week but when I come in on Monday it's died with at least one of the nodes showing [nodename:callhome.root.vol.recovery.reqd}EMERGENCY] Call home for ROOT VOLUME NOT WORKING PROPERLY: RECOVERY REQUIRED

At the top of the console is also reference that "the system was down for 346 seconds" suggesting it may have rebooted

This happens just after midmight on Sat night / Sun morning.

I can't access either host to get to log files (unless you guys know how to)

Any thoughts on whether this is an ONTAP issue or some external (VMWare?) triggered event?

I took VMWare snapshots on Friday so can recover but it's a PITA to have to do this every Monday!

NAYABSK · ‎2017-02-13

Hi Bob,

Thank you for wrtitng to NetApp community, This is a known issue with ontap simulator and well not to blame weeknd here. To let you know the root volume created in the SIM is so tiny which is around 900MB in your case it might took a week before it got flooded with logs reaching 95%. May be you need to consider delete any existing snapshots on VOL0 and zero all snapshot schedules

Login to the console and run

node run local

Disable VOL0 snapshot schedule

snap sched vol0 0 0 0

Delete all snapshots on vol0

snap delete -a vol0

And zero snap reserve

snap reserve vol0 0

Also disable the snapshots at aggregate level

snap sched -A aggr0 0 0 0

Delete any snapshots at aggr level

snap delete -a -A aggr0

Try to grow the vol0 using vol size command or incase if you don't have enough space on the aggregate consider adding any spare disk in to the root aggregate aggr0

After this you should not be running out of space on vol0 🙂

Thanks,

Nayab

bob_lansley · ‎2017-02-14

Thanks, I'll give that a try!

NAYABSK · ‎2017-02-15

Hi Bob,

Did you got time to try clearing up some space on vol0 and booting up the simulator again ?

Thanks,

Nayab

bob_lansley · ‎2017-02-16

Hi Nayab

Yes thanks I cleared up vol0's and modified snap schedules. Looks ok so far, I'll see how things are after the weekend as that was when it tended to fail before.

Cheers,

Bob

NAYABSK · ‎2017-02-16

Hey Bob,

Thanks for the update, Do let us know if any issues further and we will be happy assist you :). Hope you having great time playing around with Ontap Simulator 🙂

By the way do you know you can also create SSD's on the simulator ? If not please do follow the link below hope you like experimenting 🙂

How to create SSD's On NetApp SIM

Thanks,

Nayab

Attilio · ‎2017-04-29

Hello I've installed sim 9.1 on my vmware 6.0 U2 too and I've found my both nodes after some day with the following error message

***********************
** SYSTEM MESSAGES **
***********************

The contents of the root volume may have changed and the local management
configuration may be inconsistent and/or the local management databases may be
out of sync with the replicated databases. This node is not fully operational.
Contact support personnel for the root volume recovery procedures.

The space on my vol0 is probably not the problem @ my situation

cluster199::> aggr show

Warning: Only local entries can be displayed at this time.

Aggregate     Size Available Used% State   #Vols Nodes            RAID Status
--------- -------- --------- ----- ------- ------ ---------------- ------------
aggr0      36.74GB   33.56GB    9% online       1 cluster199-01    raid_dp,
                                                                   normal

cluster199::aggr show

Warning: Only local entries can be displayed at this time.

Aggregate     Size Available Used% State   #Vols Nodes            RAID Status
--------- -------- --------- ----- ------- ------ ---------------- ------------
aggr0_cluster199_02_0
           36.74GB   33.56GB    9% online       1 cluster199-02    raid_dp,
                                                                   normal

*****************

Licenses are missing too ..

cluster199::system license> show

Error: show failed: No Base License installed in the cluster. Install a Base License, then try the operation
again.

Many thanks for your help

Regards.

Attilio

Attilio · ‎2017-04-29

solved with

To unset the recovery flag and boot normally, start by bringing the system to a halt.
Bring the node to a halt:
::*> halt -node <node name>
(system node halt)
Unset the recovery flag at the loader prompt and boot the node back up.
LOADER-B*> unsetenv bootarg.init.boot_recovery
LOADER-B*> boot_ontap

*************************

cluster199::> cluster show
Node                  Health Eligibility
--------------------- ------- ------------
cluster199-01         true    true
cluster199-02         true    true
2 entries were displayed.

cluster199::*> cluster ring show
Node      UnitName Epoch    DB Epoch DB Trnxs Master    Online
--------- -------- -------- -------- -------- --------- ---------
cluster199-01
          mgmt     2        2        113      cluster199-01
                                                        master
cluster199-01
          vldb     2        2        1        cluster199-01
                                                        master
cluster199-01
          vifmgr   2        2        60       cluster199-01
                                                        master
cluster199-01
          bcomd    2        2        7        cluster199-01
                                                        master
cluster199-01
          crs      2        2        1        cluster199-01
                                                        master
cluster199-02
          mgmt     2        2        113      cluster199-01
                                                        secondary
cluster199-02
          vldb     2        2        1        cluster199-01
                                                        secondary
cluster199-02
          vifmgr   2        2        60       cluster199-01
                                                        secondary
cluster199-02
          bcomd    2        2        7        cluster199-01
                                                        secondary
cluster199-02
          crs      2        2        1        cluster199-01
                                                        secondary
10 entries were displayed.

Regards.

Attilio

kushneo · ‎2017-12-09

I have read in some articles that this is a very common issues for netapp simulator (I’m using netapp dot8.3.2-cm ) since it got a very small root volume . so as soon as the root volume get filled with dump files , logs & snapshots root volume get crashed (I’m not use this is the correct word tho) .so when it happens these are the steps I normally follow to get my simulator up again ,

1. Step

>system node coredump delete clus-01 *

2. Step

And then I Follow this link steps http://www.cosonok.com/2014/12/root-volume-not-working-properly.html

(copy & past the link if it's not redirecting)

at this point sometimes your sim might start showing the interfaces again if not go to step 3.

3. Step

And then in the boot loader

LOADER-B*> unsetenv bootarg.init.boot_recovery
LOADER-B*> boot_ontap

4. Step

at the end of above steps the system message stopped appearing to me , but it won't show the interfaces , so to up the interfaces what I did is

>set diag
>net port mod -node <nodename> -port * -up-admin true

That’s it, it should work now 🙂

** I'm still a newbi to netapp so sometimes my above explanations about this issue might not be accurate 😉

Simulator dies at the weekend

New video on NetApp KB TV