Simulator Discussions

Root volume is damaged




I have two node cluster (vsim 8.3) running on VMware Fusion. Doing some exercise I changed the MTU on my cluster ports using network port broadcas.... after few minutes one of the nodes went down and when it came back again, the following message appeared.



The contents of the root volume may have changed and the local management
configuration may be inconsistent and/or the local management databases may be
out of sync with the replicated databases. This node is not fully operational.
Contact support personnel for the root volume recovery procedures


After a while the other node also showed same message.


I can not see my root volume in clustershell and I cannot use any of the subcommands of vol/volume....


Anyone who had a similar problem with VSIM 8.3? 

How can I recover my root volumes or repair it in clustershell?


If I enter a nodeshell, I have no option creating additional volumes using the vol command...


I'm stuck..


Thank you



Re: Root volume is damaged


you can connect to the node management ip and run 

::> node run local


This will drop you down the node shell and you should be able to manager the volumes 7-mode style.


cmemile01::*> node run local
Type 'exit' or 'Ctrl-D' to return to the CLI
cmemile01-01> vol status
Volume State Status Options
vol0_n1 online raid4, flex root, nosnap=on, nvfail=on 64-bit


What do you get running "::> cluster show" from the cluster shell?

If you get RPC errors and the cluster can not communicate over the interconnect the RDB is out of sync, you try and undo the changes that you have made on the MTU.


::> net port modify -node cmemile01-01 -port e0a -mtu 1500


Another option is to redeploy the sims and start fresh.



Re: Root volume is damaged



I can access the nodeshell  and the output looks ok


n2> vol status vol0
Volume State Status Options
vol0 online raid_dp, flex root, nvfail=on
Volume UUID: b450b0c7-fe1f-4d49-adf5-160776524770
Containing aggregate: 'aggr0_n2'


The vol0 has no space issues

n2> df -g vol0
Filesystem total used avail capacity Mounted on
/vol/vol0/ 7GB 1GB 5GB 19% /vol/vol0/


n2> vol
The following commands are available; for more information
type "vol help <command>"
autosize lang options size
container offline restrict status
destroy online


I cannot create new volumes inside nodeshell


clu1::> aggr show -fields has-mroot
aggregate has-mroot
--------- ---------

Warning: Only local entries can be displayed at this time.
aggr0_n2 true
n2aggr false
2 entries were displayed.


The Volume subcommands are not available

clu1::*> volume ?

Error: "" is not a recognized command

clu1::*> volume


Only the node lif is available

clu1::*> net int show
(network interface show)
Logical Status Network Current Current Is
Vserver Interface Admin/Oper Address/Mask Node Port Home
----------- ---------- ---------- ------------------ ------------- ------- ----
c1n2mgmt up/up n2 e0c true


cluster show is not available


clu1::*> cluster setup

Error: command failed: Exiting the cluster setup wizard. The root volume is damaged. Contact support personnel for the
root volume recovery procedures. Run the "cluster setup" command after the recovery procedures are complete.


I changed the MTU on both nodes in the cluster but something caused cluster configuration lost, I cannot figure it out!


clu1::*> net port show
(network port show)
Speed (Mbps)
Node Port IPspace Broadcast Domain Link MTU Admin/Oper
------ --------- ------------ ---------------- ----- ------- ------------
e0a Cluster - up 1500 auto/1000
e0b Cluster - up 1500 auto/1000
e0c Default - up 1500 auto/1000
e0d - - up 1500 auto/1000
e0e - - up 9000 auto/1000
5 entries were displayed.


In advanced or Diagnostic level, I cannot run cluster ring show. Its like cluster services are not available!.

Never seen this before.



Re: Root volume is damaged


I think you've hit on an interesting scenario.  Jumbo frames don't work on VMware Fusion, and by attempting to use jumbo frames for the cluster network something has gone wrong with the RDBs.

Did they panic and reboot?

Do you know which one had epsilon?  


If this post resolved your issue, help others by selecting ACCEPT AS SOLUTION or adding a KUDO.

Re: Root volume is damaged


That's what I have noticed. All 5 RDBs are gone but somehow the cluster shell remains!!!

No panic, but the node2 (the one with the epsilon role) rebooted.





Re: Root volume is damaged


I tried to break one on purpose last night, but its still up this morning.  Which version are you using specifically?  8.3GA, RC1?  My test box was 8.3.1.

I modified the MTU with:

network port broadcast-domain modify -broadcast-domain Cluster -ipspace Cluster -mtu 9000

Cluster ping identifies the MTU problem but operationally its all still working

cluster1::*> cluster ping -node tst831-02
Host is tst831-02
Getting addresses from network interface table...
Cluster tst831-02_clus1 tst831-02 e0a 
Cluster tst831-02_clus2 tst831-02 e0b  
Cluster tst831_clus1    tst831-01 e0a 
Cluster tst831_clus2    tst831-01 e0b 
Local =
Remote =
Cluster Vserver Id = 4294967293
Ping status:
Basic connectivity succeeds on 4 path(s)
Basic connectivity fails on 0 path(s)
Detected 1500 byte MTU on 4 path(s):
    Local to Remote
    Local to Remote
    Local to Remote
    Local to Remote
Larger than PMTU communication fails on 4 path(s):
    Local to Remote
    Local to Remote
    Local to Remote
    Local to Remote
RPC status:
2 paths up, 0 paths down (tcp check)
2 paths up, 0 paths down (udp check)
If this post resolved your issue, help others by selecting ACCEPT AS SOLUTION or adding a KUDO.

Re: Root volume is damaged


Only node management lifs and a clustershell (via direct ssh to node) are abailable.

There are no cluster lifs available. I get a clustershell when I directly logon to node using ssh.

cluster show or cluster ring show are not available.



The nodeshell does not give me a possibility to create new volumes. I could reinstall a new two node cluster but would like to repair vol0 if there is a way.


Thank you

Re: Root volume is damaged


OK.  Need to try to revive the RDBs.  Shut down the non-epsilon node, halt the epsilon node and boot to the loader.  unset the boot_recovery bootarg and see if it will come back up.


unsetenv bootarg.init.boot_recovery

If the epsilon node comes back up, make sure the cluster ports are mtu1500, then try to bring up the other node.



If this post resolved your issue, help others by selecting ACCEPT AS SOLUTION or adding a KUDO.

View solution in original post

Re: Root volume is damaged




Removing the recovery flag "unsetenv bootarg.init.boot_recovery" did the trick.


I did first on the node with the epsilon role. When it came back online, RDBs started one after one starting with mgmt, vifmg, bcomd.... and so on... of course after view RPC connect errors... The vldb took a while to start but could fetch all info of my volumes again 🙂


Then I ran on the other node..... Now my node management and cluster shell as well as OnCommand System Manager are fully functional again...


What I want to know if you can assist me


1)  What did the unsetenv bootarg.init.boot_recovery to the boot process and logs in order to bypass a set of configurations and try to launch the RDBs again?

2) What trigged the continuation of  "my cluster inconsistence" after I switched the MTU settings right after the node with the epsilon rebooted?


Thank you so much for your prof. help


// Bash


Re: Root volume is damaged


That flag gets set when a condition is detected that casts doubt on the state of the root volume.  In the simulator this usually happens when the root volume gets full, but I've also seen it after some kinds of panics.  Once set, it stays set until you unset it.  This makes sure that someone diagnoses the condition that lead to it getting set in the first place before the node is placed back into service.


This particular message seems to indicate the RDB and local management DBs didn't match at boot time, possibly from a pooly timed panic.  The simulated nvram has its limits.


If you encounter this in real life you should contact support.


If this post resolved your issue, help others by selecting ACCEPT AS SOLUTION or adding a KUDO.

Re: Root volume is damaged


I thought i'd be lucky like you when i read the last remedy with the 'unsetenv' commad. but, my vSIMs still coming back with the same errors - Recovery Required! 

I wonder if there's additional flag(s) that can fix the corrupted vol0? 

I had those sims and windows DC install on vmware workstation and apparently microsoft decided to apply some Win patches and rebooted the machine. this caused the sims to boot up in unstable state as you described. prior to that,  I already took care of both Vol0 sizes (53% avail) and this with the 'unsetenv' command did not help getting my CDot8.3.2 Sims back on full function as i had them running yesterday.  


Re: Root volume is damaged


In your case the file system is actually damaged.


Recovering from that is more involved.  If any one node has a good root vol you may be able to reseed the failed nodes, but if they are all damaged you may need to restore the the last good system configuration backup.


This can happen during a dirty shutdown because by default the simulated nvram operates in a non-persistent mode.  If you have them running on SSD you could change that, but if they are on HDD the performance impact could be significant.




If this post resolved your issue, help others by selecting ACCEPT AS SOLUTION or adding a KUDO.

Re: Root volume is damaged


yep, no backup, no snapshots taken. I already created a new pair and I make sure to suspend them next time I walk away...

thanks for confirming the cause.

Re: Root volume is damaged


I also have to use this occasionally (after lab power failures). In my case (single node cluster vsims) it seems to take 10 to 20 minutes for the full recovery to take place after the system boots. I'm wondering if there is anything one can look at (a log file, a show command, etc) that gives a clue as to RDB rebuild progress. I just do 'network interface show' commands until I see all the LIFs show up again).

Re: Root volume is damaged


SOLVED!!! Had Single node cluster (vsim 9.0) running on VMware Worstation 11.


interupted normal boot into Vloader and ran the cmd "unsetenv bootarg.init.boot_recovery


Rebooted and problem solved. Cluster now healthy. Cheers.



Re: Root volume is damaged


hi this is jai, my root volume is damaged,i cant able to recovery it from the error,i used = "unsetenv bootarg.init.boot_recovery" ,bt this command not working , am getting the same problem again and again.



Thanks in advance.

2021 NetApp Partner Experience Survey
PES Banner
All Community Forums