Why doesn't ONTAP 8.1 simulator recover from crashes properly?

reide · ‎2011-12-19

I've noticed a disappointing feature of the ONTAP 8.1X45 7-mode simulator. It doesn't handle crashes well (or at all for that matter). If the 7-mode vsim is powered-off abruptly for whatever reason (power outage, VM powered-off, etc...) it cannot recover. When it reboots it goes into endless panic dumps (see screenshot). I lose my vsim configuration along with all my volumes, SV relationships, demos, etc... This is sad because the ONTAP 7.3 simulator handled this situation beautifully. I would run it in a Linux O/S with a journaled file system and both Linux and the 7.3 simulator would recover nicely.

Can anyone explain why it doesn't recovery successfully from crashes? Is there anything I can do to ensure it DOES recover from crashes as they're pretty common when running vsims on a laptop.

reide · ‎2011-12-19

Here is another user who appears to be experiening the exact same issue. He has even tried to bring the aggregate on-line after the crash, but it reports that WAFL is inconsistent and needs to be checked.

https://communities.netapp.com/message/68433#68433

kborovik · ‎2011-12-21

Reid,

I do not have the answer for you in regards to SIM8.1 crashes, but I can attest that SIM801 recovers gracefully. This is one of the reasons I run SIM801 and did not switch to SIM8.1.

reide · ‎2011-12-21

Interesting. I quickly re-installed the 8.0.1 7-mode sim and manually powered it off. Did it twice in fact. It does recover successfully!

I'll have to reach-out to the ONTAP simulator development team and ask what is up with the 8.1 7-mode simulator and how they can fix it. This issue cannot continue.

kborovik · ‎2012-01-09

Dear DataONTAP SIM Santa,

Please, please, release SIM for ESXi with SCSI controller and easy System ID change procedure. And if you through a bit of performance gain to us, simple mortals, we promise to sacrifice an EMC array every quarter in your name!

Thank you Dear DataONTAP SIM Santa.

julianwood · ‎2012-01-10

Hopefully we don't have to wait for SIM Santa's next run which is far to long to wait until next Christmas!

buckmaster60 · ‎2012-03-22

Any word from the NetApp dev team on fixing the kernel crashes with v8.1? Also see https://communities.netapp.com/thread/19658 It would be nice to have a stable playground for SRM testing and the new vCenter v4 plugin.

markdevries · ‎2012-04-08

Damn. Suffered a power-outage last night, whole neighborhood went black, and my 8.1 sim is ruined as well;

Bad Volinfo/Fsinfo magic for aggregate 'aggr0'

Neither volinfo block of aggregate aggr0 is valid OR the fsinfo block is i ...

PANIC: root aggregate or volume was taken offline in SK process rc on release 8.1RC3X18 ...

Dumping to core disk.

Could someone from NetApp please let us know if they consider/acknowledge this as a problem that needs fixing?

I don't think it reasonable to expect everyone to run the simulator on equipment that has production quality redundant no-break power feeds, etc.

And as such I don't think it's unreasonable to expect the sim to be a little bit more resilient to accidents like this.

scottgelb · ‎2012-04-08

I create a VMware SnapShot then restore when that happens... not an ideal workaround but works well and lets met have a baseline to setup my VSIMs the way I want and revert back after giving a demo or test.

JWHITE_COMPUNET · ‎2012-07-19

I have run into the same thing with the 8.1.1 simulator so the issue hasn't been resolved yet. It's not small amount of work that goes into configuring a multi-node cluster, especially if you expand disk capacity, etc.

I would like to hear if there is an advanced scanning feature we can use to correct this, wafl iron, etc.

onavatte · ‎2012-08-27

Hello,

Actually, I have a similar issue... but on a physical filer (not a simulator) !!

The symptoms are exactely the ones described here: https://communities.netapp.com/message/68433#68433

The system is a V3140 on ONTAP 8.1 with back-end disks on an IBM DS5300.

Some of our filers have their aggr0 restricted. The wafliron hidden option on the special boot menu seems to do nothing and at reboot, the message "root aggregate or volume was taken offline in SK process rc on release" appears.

For the non-root restricted aggregates, the "aggr wafliron start <aggr>" fails with the message:

toaster*> aggr wafliron start a_toto_00

aggr wafliron start a_toto_00: Neither fsinfo block of aggregate 'a_toto_00' is valid.

toaster*> Mon Aug 27 14:18:04 EDT [toaster:wafl.volinfo.fsinfo.error:ALERT]: Bad Volinfo/Fsinfo magic for aggregate 'a_toto_00'

Mon Aug 27 14:18:04 EDT [toto:wafl.iron.mount.inconsistent.fail:info]: Wafliron could not mark volume a_toto_00 inconsistent as this operation doesn't apply to aggregates/traditional volumes.

A case is open with N... IBM actually but it should be at the NetApp engineers level now.

Could it be a 8.1 bug ?