Active IQ Unified Manager Discussions

Need some help!!

ARMYofONE

I am not a Storage Administrator I work Red Hat servers and systems. However, since I am "IT" it is now in my lane to work the issues. I was experiencing issues with the snapshots building up and being a 6-800% on all my /vol ,snapshots. I have since set the ones to autodelete on that were set to off. I deleted ALL snapshots.

 

on my SC1 my aggr0 is at 95% and aggr1 100%

 

and on my SC2 aggr0 is at 95%, aggr1 98% and aggr2 97%.

 

All the aggregates /.snapshot is 0%.

 

My version is NetApp release 8.2.2 7-Mode NetApp DS 4243 DS 4246. All command line since I cannot seem to get the GUI to work via Firefox.

 

Any help is greatly appreciated.

 

I do show SIS errors where SIS is requesting 4kb of space and only 1kb available. Like I said I am new to NetApp. And the Case number for this is 2007726187, but since I am DOD it is assigned to a secured team for support. But I do not think I need onsite engineer support from NetApp. Never needed it for HP, Windows, VMware, Red Hat, Pure Storage etc.

 

Thanks

37 REPLIES 37

JamesIlderton

For the GUI issue, do you have OnCommand System Manager installed on your admin station?  That version of ONTAP did not have an embedded GUI, but if you install it locally you can add the filers and manage them via your browser.

 

Along the same lines as the other replies, please look at the output of these commands:

'snap sched -A' (this will show the scheduled snapshots for all aggregates, along with the retention)

'snap reserve -A' (this will show any space reservation for snapshots at the aggregate level)

'snap list -A' (this will show any existing snapshots at the aggregate level)

'snap sched -V' (this will show the scheduled snapshots for all volumes, along with the retention)

'snap reserve -V' (this will show any space reservation for snapshots at the volume level)

snap list -V (this will show any existing snapshots at the volume level)

 

What you should have for maximum available space would be 0% snap reserve across the board, no snapshots scheduled/existing on the aggregates and hopefully minimal snapshots on the volumes.  Also, when you list the volume snapshots youshould be able to identify snapshots created by the schedules by their names (hourly, daily and weekly with an ordinal number after to indicate the generation).  Additional snapshots you see may be manually created or created by other tools (such as SnapManager).

 

Also, if you have block LUNs on these systems, they may also be thick provisioned.  You can run 'lun show -v' to list all of your LUNs and look for the attribute Space Reservation and make sure they show Disabled so they are thin.

 

If everything is already configured properly and you did remove all the excess snapshots (of all of them period), then you may be in a situation where you have to migrate some data off quickly.  You can likely expand your aggregates with some additional disk shelves, but that's not typically a quick decision unless you happen to have some laying around unused.

 

Once you get past this intial emergency space issue, do make sure you setup some monitoring and alerting to ensure you don;t get in this situation again.  NetApp has some good tools, and you should be able to use OnCommand Core (previously named DFM) to monitor these 7-mode systems.

 

Hopefully these help, and good luck!

ARMYofONE

@JamesIlderton wrote:

For the GUI issue, do you have OnCommand System Manager installed on your admin station?  That version of ONTAP did not have an embedded GUI, but if you install it locally you can add the filers and manage them via your browser.

 

I have tried and failed. There are dependancies to install that must be on our RHEL6 system that we currently do not have. I get servlet errors when I try to go to my NetApp IP address


 

JamesIlderton

This is the link to OnCommand System Manager for Linux:

https://mysupport.netapp.com/NOW/download/software/systemmgr_lin/3.1.3/

 

Or, if you have a Windows system (or VM) with network access to the NetApp, use the Windows version:

https://mysupport.netapp.com/NOW/download/software/systemmgr_win/3.1.3/

 

I use the Windows version and know I have to ensure I have both the 32 and 64-bit versions of Java installed, but otherwise it's pretty easy and lightweight.  But there's really nothing in the GUI that you can't do in the CLI (at least in your older version of ONTAP, the newest builds have some pretty cool workflows in the embedded System Manager).

ARMYofONE

For the GUI issue, do you have OnCommand System Manager installed on your admin station?  That version of ONTAP did not have an embedded GUI, but if you install it locally you can add the filers and manage them via your browser.

 

Along the same lines as the other replies, please look at the output of these commands:

'snap sched -A' (this will show the scheduled snapshots for all aggregates, along with the retention)

set to 0

'snap reserve -A' (this will show any space reservation for snapshots at the aggregate level)

all set to 0

'snap list -A' (this will show any existing snapshots at the aggregate level)

All are deleted

 

'snap sched -V' (this will show the scheduled snapshots for all volumes, along with the retention)

snap sched set to 0 2 6@8, 12, 16, 20

 

'snap reserve -V' (this will show any space reservation for snapshots at the volume level)

snap reserve set to 5% on all Volumes

 

snap list -V (this will show any existing snapshots at the volume level)

 shows all from the schedule above for each volume

 

What you should have for maximum available space would be 0% snap reserve across the board, no snapshots scheduled/existing on the aggregates and hopefully minimal snapshots on the volumes.  Also, when you list the volume snapshots youshould be able to identify snapshots created by the schedules by their names (hourly, daily and weekly with an ordinal number after to indicate the generation).  Additional snapshots you see may be manually created or created by other tools (such as SnapManager).

So the volumes should not be set to 5% snap reserve?

 

 

 

Also, if you have block LUNs on these systems, they may also be thick provisioned.  You can run 'lun show -v' to list all of your LUNs and look for the attribute Space Reservation and make sure they show Disabled so they are thin.

lun show -v returns a blank command line

 

If everything is already configured properly and you did remove all the excess snapshots (of all of them period), then you may be in a situation where you have to migrate some data off quickly.  You can likely expand your aggregates with some additional disk shelves, but that's not typically a quick decision unless you happen to have some laying around unused.

Well since my cpu's are spiking high when i run a sysstat -M, I do think that adding an additional shelf may crash this large NetApp. We have moved over all mission systems and data to a VERY old NetApp shelf that is much smaller and things are running excellent there not issues and no aggregates filling up.

 

The data currently on there is nothing near what the capacity is. We also Zip and Compress data monthly off to another NetApp disk shelf.

 

Once you get past this intial emergency space issue, do make sure you setup some monitoring and alerting to ensure you don;t get in this situation again.  NetApp has some good tools, and you should be able to use OnCommand Core (previously named DFM) to monitor these 7-mode systems.

Does this install on a Red Hat system?

JamesIlderton

So your aggregate snapshot settings look good, and your snap schedule for volumes seem to be the default.  As for the snap reserve on volumes, it's a default that everyone has different opinions on but in your current situation I'd recommend setting them all to 0.  This will release any free space not used by snapshots and will allow ONTAP to delete snapshots to free space before taking your volumes offsite due to getting full.

 

The blank response for LUN show tells us you only have NAS volumes, no block LUNs have been provisioned (this would be either Fiber Channel or iSCSI-presented disks to hosts).  No need to worry about that then.

 

So you have already moved (not just copied) mission critical data to another NetApp system?  Keep in mind that each shelf is not a NetApp filer/controller, and most systems can support many shelves; it's not the capacity usage that ties up the CPU (well, except during large volume deduplication runs) but instead the IOPS and throughput requests from hosts.

 

And yes, OnCommand Core can be installed on a Linux system (https://mysupport.netapp.com/NOW/download/software/occore_lin/5.2.4/) and you then connect to it via a browser.

 

It sounds like you may need either a partner or NetApp SE to help you review your system further to see what needs to be done to handle the capacity of data you have on there.  One thing I do know is that ONTAP 8.2.2 will hit the maximum 5 years of support in September 2019, so you'll have to upgrade at least the software by then to continue to get official support from NetApp.

JGPSHNTAP

aggr status -s (show spares)

 

disk show -n (show unowned)

 

df -Ag shows aggr in gb

 

 

ARMYofONE

@JGPSHNTAP wrote:

aggr status -s (show spares)

 sc1 has a FSAS and a SAS spare (I justrecently replaced those two drive because they were bad)

sc2 has no spares

disk show -n (show unowned)

 no unassigned discs

df -Ag shows aggr in gb

sc1

Aggregate        GB total              used             avail        capacity

aggr0                      3346GB                   3190GB              155GB           95%

aggr0/.snapshot            0                          0                       0                    0%

 

aggr1                       60240GB0              60240GB              0                  100%

aggr1/.snapshot            0                          0                       0                    0%

  

sc2

Aggregate       GB total                used             avail          capacity

aggr0                     3346GB                    3190GB             155GB            95%

aggr0/.snapshot            0                          0                       0                    0%

 

aggr1                     63587GB                62028GB            1559GB          98%

aggr1/.snapshot            0                          0                       0                    0%

 

aggr2                    53547GB                 51915GB              1631GB         97%

aggr2/.snapshot            0                          0                       0                    0%

 

 


 

JGPSHNTAP

Ok bad news.. Since you don't have spares.. and your aggrs are 100%, you are up **** creek.  You have no outs here, epsecially if all your vols are already thin provisioned.

 

If your VOLS aren't thin provisioned you might be able to squeeze enough time to get out of this mess, other than that, that system will be full in no time and you will be out of luck... Sorry

bsnyder27

If your VOLS aren't thin provisioned you might be able to squeeze enough time to get out of this mess, other than that, that system will be full in no time and you will be out of luck... Sorry


Definitely something to check on again, I agree. I don't see any mention of checking the space guarantee option on the volumes in the data aggrs. This could potentially give you a lot of aggr space back if your volumes are not thin provisioned.

 

As JGPSHNTAP noted early on...you'll want to run SSH commands to obtain this information. Probably something with 'vol status' and grep for the word 'guarantee'. If you see a lot of guarantee=volume then you have the ability to thin provision them.

 

And to explain the option if you're unfamiliar - Volume guarantee grants all of the vol size space to the volume up front. So a 1TB volume will use 1TB from the aggr it's contained in. Guarantee None only takes the used space of the volume from the aggr so you may only have 50GB of the same 1TB volume being used and that is all that it will take from the aggr.

 

Just be sure to keep a close eye on your aggregates when doing this for obvious reasons.

ARMYofONE

Trying to figure out the commands to check that. I also have sis on the volumes and aggrs, but NetApp support said the running inline compression and deduplication on HDD's is not good.

 

tried to grep but no joy

 

bsnyder27
just a simple 'vol status' from cli on each node should dump out status for all volumes and you can then look line by line to see what guarantee is set to for each volume.

JGPSHNTAP

Do me a favor do this

 

df -Vg

 

and then pick one volume and do vol options (volumename) and print output here

ARMYofONE

Vol Name               total               used           avail           capacity          mounted on

/vol/events1       58368GB        46480GB    11887GB           80%            /vol/events1

 

The other vol0 is  at 0%

and events2 is at 88%

 

temp_storage is at 195. All mounted to /vol and the .snapshots are all at 0%. This is my Storage Controller 2

JGPSHNTAP

ok, somewhat of a good sign here

 

Seems you are think

 

Run this command for me and paste outout

 

aggr show_space -g

 

 

ARMYofONE

 Got it to a attached PDF for you. Hopefully it attaches!!

ARMYofONE


I did label them Storage Controller 1 and Storage Controller 2 on the paper

JGPSHNTAP

Ok,  i confirmed you are 100% thick. 

 

You can thin provision the volumes in order to keep the cluster up and come up with a plan to migrate or get more space, but I would talk to mgmt first

 

the command to thin them is

 

vol options volname guarantee none

 

The danger in doing thin is if you grow more than you have you will take the system down for sure

 

Good Luck

ARMYofONE

Ok, I will check and see what I can do. Thanks. I just thought that deleting the large snapshots from the volumes did not clear up on the aggregates since both volumes had subs that were not set to auto delete. So the .snapshots on the volume side was in the 6-800% for each /vol/whatever/.snapshots/ But evenm after I deleted the snapshots it seems the aggregates did not religuish any of the space after deleting the vol .snapshots

JGPSHNTAP

That doesn't matter if you are thick provisioned and 100% thick, then you deleting snaps are voluming only

 

So like I said you can either thin the vols or reduce the size of the vols.

 

ARMYofONE

Thanks. Currently having the developers delete lots of old files and other things.

Announcements
NetApp on Discord Image

We're on Discord, are you?

Live Chat, Watch Parties, and More!

Explore Banner

Meet Explore, NetApp’s digital sales platform

Engage digitally throughout the sales process, from product discovery to configuration, and handle all your post-purchase needs.

NetApp Insights to Action
I2A Banner
Public