Solved: Re: Simulator ONTAP 8.3 - Root Volume Space Problem

rbrinson · ‎2015-09-16

I have installed two instances of the ESX version of the 8.3 Simulator and within a few days I run into space problems on the root volume. See the messages below from the console. I have also runto into this same problem with instances installed in ESX in a Partner's lab. It appears that there are logs or some other file(s) that are continually being created and filling up the root volume. The Partner dug a little deeper and reports that it looks like the Simulator is buitl on a Linux image and that it is the Linux image that is actually having a space problem.

CONSOLE OUTPUT:

login as: admin
Using keyboard-interactive authentication.
Password:
***********************
** SYSTEM MESSAGES **
***********************

CRITICAL: This node is not healthy because the root volume is low on space
(<10MB). The node can still serve data, but it cannot participate in cluster
operations until this situation is rectified. Free space using the nodeshell or
contact technical support for assistance.

cdot83::>

VM Characteristics: See Attachment.

shatfield · ‎2015-09-16

And here's the quick&dirty node shell variant:

From the problem node's console:

login to the cluster shell
run local
disk assign all
aggr status
The root aggr will have "root" in the option list. Typically its aggr0
aggr add aggregate_name 3@1g
Assuming the default 1gb disks were used. Adjust as necessary.
vol status
The root vol will have "root" in the options list. typically its vol0
vol size root_volume +2290m
The size increase availble may vary depending on the type of disks used. 2560m or 2290m are most common. Try 2560 first, if that fails fall back to 2290, if that fails the error will give the max size in kb
exit
reboot

You may or may not need a second reboot to remove the recovery flag in the loader. If required it will tell you when you log in from the node shell.

After a clean reboot, go back and disable aggr snaps and vol snaps on the root, delete any existing snaps, and clean out old logs and asup files in the mroot.

View solution in original post

ekashpureff · ‎2015-09-16

Rbrinson -

Yes the sims run out of space given the default root volume size.

Common practice is to add another idisk to the aggr and grow the root volume, and turn off snapshots.

I hope this response has been helpful to you.

At your service,

Eugene E. Kashpureff, Sr.
Independent NetApp Consultant http://www.linkedin.com/in/eugenekashpureff
Senior NetApp Instructor, IT Learning Solutions http://sg.itls.asia/netapp
(P.S. I appreciate 'kudos' on any helpful posts.)

rbrinson · ‎2015-09-16

At this point, I cannot issue any meaningful commands to make this happen. Every command I issue seems to get thwarted due to databases being offline, like the VLDB. Can you give me some additional guidance on what commands to issue to accomplish your sugestion?

ekashpureff · ‎2015-09-16

Rbrinson -

Ouch ! It may be easiest to reinstall the sim.

Can you delete any snapshots on the root vol ?

Will it let you unlock the diag user and drop down in the system shell ?

You could then try deleting log files ... ?

I hope this response has been helpful to you.

At your service,

Eugene E. Kashpureff, Sr.
Independent NetApp Consultant http://www.linkedin.com/in/eugenekashpureff
Senior NetApp Instructor, IT Learning Solutions http://sg.itls.asia/netapp
(P.S. I appreciate 'kudos' on any helpful posts.)

aborzenkov · ‎2015-09-16

You could try deleting some unneeded files (like logs) from system shell.

rbrinson · ‎2015-09-16

Thanks - we removed a bunch of log files, which resolved the issue temporarily. However, the same problem came back within a couple of days.

shatfield · ‎2015-09-16

The fundamental problem is the root aggregate and root volume are too small.

If you can some get some files cleaned off and get back into the cluster shell you can increase its size with the following procedure.

I wrote this from the cluster shell perspective as part of a larger document, but if you can't get back into the cluster shell the node shell equivilents should work just as well.

Increasing the size of the simulator root volume

Steps:

1. log in to the cluster shell
2. Assign any unowned disks by entering the following command:
  
  run * disk assign all
3. Identify the root aggregate by entering the following command:
  
  storage aggregate show -node node_name -root true
  
  Example:
  demo1::> storage aggregate show -node demo1-01 -root true
  Aggregate     Size Available Used% State   #Vols Nodes            RAID Status
  --------- -------- --------- ----- ------- ------ ---------------- ------------
  aggr0        900MB   42.72MB   95% online       1 demo1-01         raid_dp,
                                                                     normal
4. Add 3 disks to the root aggregate by entering the following command:
  
  storage aggregate add-disks -aggregate root_aggregate -diskcount 3
5. Use the root aggregate name to identify the root volume by entering the following command:
  
  volume show -node node_name -aggregate root_aggregate
  
  Example:
  demo1::> volume show -node demo1-01 -aggregate aggr0
  Vserver Volume Aggregate State Type Size Available Used%
  --------- ------------ ------------ ---------- ---- ---------- ---------- -----
  demo1-01 vol0 aggr0 online RW 851.5MB 529.8MB 37%
6. Increase the size of the root volume by 2.5GB by entering the following command:
  
  run -node node_name vol size root_volume +2560m

shatfield · ‎2015-09-16

And here's the quick&dirty node shell variant:

From the problem node's console:

login to the cluster shell
run local
disk assign all
aggr status
The root aggr will have "root" in the option list. Typically its aggr0
aggr add aggregate_name 3@1g
Assuming the default 1gb disks were used. Adjust as necessary.
vol status
The root vol will have "root" in the options list. typically its vol0
vol size root_volume +2290m
The size increase availble may vary depending on the type of disks used. 2560m or 2290m are most common. Try 2560 first, if that fails fall back to 2290, if that fails the error will give the max size in kb
exit
reboot

You may or may not need a second reboot to remove the recovery flag in the loader. If required it will tell you when you log in from the node shell.

After a clean reboot, go back and disable aggr snaps and vol snaps on the root, delete any existing snaps, and clean out old logs and asup files in the mroot.

jigsaw19 · ‎2015-11-04

Good day,

I am having same issue, but when trying to add disk on my root aggregate it is having ERROR: data base is not open.

can someone please help me with this.

Thank you.

SeanHatfield · ‎2015-11-05

Are you using the nodeshell commands (run local, etc) from the post above?

If this post resolved your issue, help others by selecting ACCEPT AS SOLUTION or adding a KUDO.

zavetskyb · ‎2015-11-17

I think in that case, you will need to get into the systemshell as diag user and then go to /mroot/etc and remove the log directory recursively (rm -rf /mroot/etc/log). Once this is done do a df -h . on the /mroot directory and note the decreasing useage. Once it drops below 100%, then exit the shell and reboot. it should come back up and then add disk to the aggr and space to the volume vol0 as previously mentioned.

RPHELANIN · ‎2015-11-18

you need to delete the snapshots for toor volume

cluster> node run local

node> snap delete -a vol0

node> vol options vol0 nosnap on

node> ctrl+D

cluster> reboot

reboot the simulator and that should do the trick.

hansene · ‎2016-01-22

thank you, the only easy and concise solution listed here!

Singhz · ‎2016-01-26

Hi

I have a similar issue with a strange twist. I also get the low space problem, however when I query the root aggregate size, it gives me 3.38GB available out of a total 4.17GB. Only 19% used, so strange why it's complaining about space when there seems to be loads free. I cleared all my snapshots and changed the snap sched vol0 0 0 0 etc a while ago when i added extra disks and expanded the space on aggr0 and i'm still gettig the same problem. All aggregates on my second node are showinga status 'unknown'.

Help!!!

SeanHatfield · ‎2016-01-26

When you added disks to aggr0, did you also expand the size of vol0?

What does df say?

df -h /vol/vol0

If this post resolved your issue, help others by selecting ACCEPT AS SOLUTION or adding a KUDO.

Singhz · ‎2016-01-26

Hi

Thanks for the quick reply. I added two disk but only expanded by 1g at the time as i thought it would be plenty considering that i disabled snapshot sched etc. Interestingly, just ran the command you suggested and get a response of 'Error: show failed: Database is not open'

SeanHatfield · ‎2016-01-26

Database not open means you are at the cluster shell, but the databases are offline because the root vol is full. Try it from the node shell on the node with the error condition:

run local
df -h vol0

If this post resolved your issue, help others by selecting ACCEPT AS SOLUTION or adding a KUDO.

Singhz · ‎2016-01-26

ok..that makes sense. Running local gives /vol/vol0 total 807mb, used 797mb, free 10mb, capacity 99%.

How do i go about extending the vol out from the root aggregate?

SeanHatfield · ‎2016-01-26

Sounds like you've got plenty of free space in the aggr, so try:

vol size vol0 +2g

Again, thats at the node shell. Then reboot. It'll still give you an error. Then reboot again, but stop at the loader and clear the root recovery flag:

VLOADER> unsetenv bootarg.init.boot_recovery
VLOADER> boot

This time it should come up clean.

If this post resolved your issue, help others by selecting ACCEPT AS SOLUTION or adding a KUDO.

Singhz · ‎2016-01-26

Ok...that fixed it. Node 2 root vol is back online.

Thanks for your help, much appreciated.

Cheers

Ibrahim · ‎2018-02-01

Having the same issue and can't add disks. Get this errror message Command failed: database is not open