Simulator Discussions

8.3 Simulator core dumps, then Cluster LIF is lost...

oroemisc

I needed to setup the 8.3 simulator from scratch today because my last weeks setup suddenly went into a core dump when entering licenses. After reboot the Cluster Management LIF was gone. I decided to re-setup the whole thing and after one our of uptime, licenses were already entered, nothing else setup so far it core-dumped again. And again Cluster-LIF was gone. This makes it unusable so far.

 

I use VMWare Workstation 8.0.1 on Win7, cDOT 8.2 and 7dot Simulators are running fine for a long time.

 

Anyone with similar behaviour?

 

Regards from NetApp Hamburg

Oli

13 REPLIES 13

shatfield

is vol0 full?

oroemisc

Hi,

 

vol0 was at 95%, I added some disks, now it's at 35%, LIG is there. But it still coredumpd after just some basic commands. Very unstable, in fact the most unstable simulator I had so far.

 

2015-01-07 cdot sim dump.png

shatfield

Did you expand the volume as well, or just add them to the aggr?  The panic is in VHA_DISK_MAIN.  VHA is Virtual Host Adapter, which is the data disk model used in the standard vsims. so its related to the FOD simdisk code, hence still thinking its a vol full or related problem with the simdisk subsystem.

 

I'd rebuild with 4gb sim disks.  Seems stable enough at that size.

 

 

 

oroemisc

I expanded the Volume vol0 as well now - it had 280 MB left, now it hat 1.3 GB left. Then I tried to form a SSD Pool (added SSDs before) - then it coredumped again. Maybe this is not working?

shatfield

Creating SSD pools doesn't appear to work, and attempting it does seem to intermittently crash the sim.

 

I get a debug message in the log and a timeout failure on the job.

The debug event reads:

raid.debug: raid_lm_disk_partition_create: v2.16 state: RAID_LM_FSM_NEEDS_CNTR_ASSIM

 

Maybe its expecting the partner to be present. 

oroemisc

OK, thanks. So I stop working on that Smiley Frustrated

papadopoulosa

 

 

Same here.  After trying to add an SSD pool on both nodes of my vDOT cluster, node 01 keeps failing and rebooting.  However, when it comes up, the cluster LIF is inaccessible.

I have to login and restart node 02 and the cluster LIF comes back up.

 

I also do not see any pools having been created.

 

I just tried adding two more vdisks to aggr0, I'll report back if it makes any difference.

 

cap1.png

 

Adding discs made no difference at all.  Any clues?

shatfield

I tried it in 8.3RC2, and it still doesn't work, but it doesn't crash the sim either.  

 

Now it errors out sharing the first disk in the list, and leaves a phantom record in the storage pool list.  

 

Better than a panic.

 

shatfield

Nevermind.  Still crashed, just took a while.

 

Recovered by deleting the affected simdisk and the ,reservations file.

 

Working theory is VHA diskmodel simdisks don't survive being partitioned.  

papadopoulosa

Not sure if it is relevant, but I went back to a snapshot before adding the SSD pools, and everything is fine. (Except I couldn't restore with SMVI, because the vSIM discs are IDE's). 

 

So it appears that as long as you don't try to use FlashPools, it may be okay.  FlashPools won't buy anything on the simulator, but it would have been nice to test it. 

 

BTW, I found one more issue with the vSIM cluster.  If you lose node 01, node 02 will not take over the cluster management port, it appears it develops in to a split brain.  Same thing happens if you boot both nodes simultaneously.  My solution is to bring up node 01 first, then node 02.

shatfield

Who's got epsilon? 

 

If you halt the one with epsilon, or neither have epsilon, thats the behaviour I would expect.

 

Back on the ADP front, I tried partitioning a simdisk by hand.  Partitioning succeeds but then it gets marked as unowned, and ownership assignment fails.  Which leads back to thinking the vha simulated disk code doesn't know how to cope with a partitioned disk file.   

 

papadopoulosa

So if I understand correctly, on the two node vSIM cluster, if 01 has the epsilon, and 01 crashes, node 02 does not take over the epsilon?

 

I noticed that there is a new vSIM download;  the previous was marked 8.3 RC1, this one is marked as 8.3.  Do you think any of the SSD issues have been addressed?

shatfield

Right, which is why in a 2 node cluster you set cluster ha -configured true.  That disables the epsilon mechanism and uses good old HA. 

 

A closer functional model would be a 4 node vsim cluster.  There you can lose any 1 node, even the one with epsilon, and maintain quorum.  The vserver root vols would also need to be on surviving nodes, because we don't have SFO either in this type of simulation.

 

The downloads are still RC1.  I've tried it on RC2 and ssd pools still didn't work.  I would be really surprised if the vha code gets any attention before GA. 

Announcements
NetApp on Discord Image

We're on Discord, are you?

Live Chat, Watch Parties, and More!

Explore Banner

Meet Explore, NetApp’s digital sales platform

Engage digitally throughout the sales process, from product discovery to configuration, and handle all your post-purchase needs.

NetApp Insights to Action
I2A Banner
Public