Simulator Discussions
Simulator Discussions
Installed 2 node cluster in simulator 9.8.
Systems went down NOT gracefully and now after startup both system report message:
node-01:
nac101::> node run local
Type 'exit' or 'Ctrl-D' to return to the CLI
nac101-01> vol status vol0
Volume State Status Options
vol0 online raid_dp, flex root, nvfail=on, space_slo=none
64-bit
Volume UUID: b33f166a-7c4a-4fbe-9d97-bb7e60809652
Containing aggregate: 'aggr0_nac101'
nac101-01> df -g vol0
Filesystem total used avail capacity Mounted on
/vol/vol0/ 0GB 0GB 0GB 82% /vol/vol0/
/vol/vol0/.snapshot 0GB 0GB 0GB 263% /vol/vol0/.snapshot
node-02:
***********************
** SYSTEM MESSAGES **
***********************
CRITICAL. This node is not healthy because the root volume is low on space
(<10MB). The node can still serve data, but it cannot participate in cluster
operations until this situation is rectified. Free space using the nodeshell or
contact technical support for assistance.
Internal error. Cannot open corrupt replicated database. Automatic recovery
attempt has failed or is disabled. Check the event logs for details. This node
is not fully operational. Contact support personnel for the root volume recovery
procedures.
nac101::> node run local
Type 'exit' or 'Ctrl-D' to return to the CLI
nac101-02> df -g vol0
Filesystem total used avail capacity Mounted on
/vol/vol0/ 0GB 0GB 0GB 100% /vol/vol0/
/vol/vol0/.snapshot 0GB 0GB 0GB 541% /vol/vol0/.snapshot
nac101-02> vol status vol0
Volume State Status Options
vol0 online raid_dp, flex root, nvfail=on, space_slo=none
64-bit
Volume UUID: 0c714e69-f7c9-4590-847c-a5f6a4b27677
Containing aggregate: 'aggr0_nac102'
Please advice how to recover and how to gracefull shutdown simulator nide 1 and node 2.
Solved! See The Solution
Did all adviced, simulators fell in stalling state over and over and space and recover database events.
Installed simulators 9.8 on VMWorkstation and ESX1 7.0, both same issues.
After installing simulators 9.7 all OK ....
Hello,
Most likely something is filling up the root vol so you will either have to clear those or grow the volume. You can review the kb below to help you in checking what is filling the root vol.
As for doing a gracefull shutdown, you can perform a takeover and giveback one at a time.
Thanks
Thank you for your response
I can't access the KB article, opening the link ends in: You do not have permission to view this page.
Peter
Platform Engineer
The shutdown situation ....
I've setup simulator cluster in my personal lab, which I want to shutdown EOB.
Shutting down the simulator nodes/cluster damages the cluster database and systems disks, so anytime I want to work with the simulator I have to rebuild the nodes and cluster from scratch, taking a lot of time.
Is there another way?
Hello,
Sorry missed that you working with a simulator.
For the shutdown, you will have to shutdown the simulator properly to avoid data loss since the Non-volatible RAM is simulated and is not persistent. To do so you can do either:
1. You can issue a shutdown guest from VMware.
2. You can issue the halt command from the cli, wait till it is complete then turn it off manually.
As for the space issue, you will need to set more space for the root vol or check whats eating space. Per example check if there is any coredump saved for that node and removed them or delete any snapshot for the root vol:
cluster::> system node coredump show
cluster::> system node coredump delete-all
To check for snapshot and delete them on vol0
cluster::>system node run -node <node name> -command "snap list vol0"
cluster::>system node run -node <node name> -command "snap delete -a vol0"
Thanks
Thank you, this helped to get the node available for login again, the cluster however still does not respond.
No coredumps, but snapshots where there and all deleted.
After reboot I can login to node managent IP, but what is holding cluster managent service is:
***********************
** SYSTEM MESSAGES **
***********************
Internal error. Cannot open corrupt replicated database. Automatic recovery
attempt has failed or is disabled. Check the event logs for details. This node
is not fully operational. Contact support personnel for the root volume recovery
procedures.
The recovery was succesfull.
When log into node:
***********************
** SYSTEM MESSAGES **
***********************
CRITICAL. This node is not healthy because the root volume is low on space
(<10MB). The node can still serve data, but it cannot participate in cluster
operations until this situation is rectified. Free space using the nodeshell or
contact technical support for assistance.
It looks like I'm closing in to the final healthy setup of the cluster.
Hello,
Seems that something is eating space on the root vol or your root vol is too small.
Common practice is to add another idisk to the aggr and grow the root volume, and turn off snapshots.
cluster> node run local
node> snap delete -a vol0
node> vol options vol0 nosnap on
node> ctrl+D
cluster> reboot
reboot the simulator and that should do the trick.
Thanks
Take a look at this link:
Did all adviced, simulators fell in stalling state over and over and space and recover database events.
Installed simulators 9.8 on VMWorkstation and ESX1 7.0, both same issues.
After installing simulators 9.7 all OK ....