Data Infrastructure Management Software Discussions

Highlighted

Need some help!!

I am not a Storage Administrator I work Red Hat servers and systems. However, since I am "IT" it is now in my lane to work the issues. I was experiencing issues with the snapshots building up and being a 6-800% on all my /vol ,snapshots. I have since set the ones to autodelete on that were set to off. I deleted ALL snapshots.

 

on my SC1 my aggr0 is at 95% and aggr1 100%

 

and on my SC2 aggr0 is at 95%, aggr1 98% and aggr2 97%.

 

All the aggregates /.snapshot is 0%.

 

My version is NetApp release 8.2.2 7-Mode NetApp DS 4243 DS 4246. All command line since I cannot seem to get the GUI to work via Firefox.

 

Any help is greatly appreciated.

 

I do show SIS errors where SIS is requesting 4kb of space and only 1kb available. Like I said I am new to NetApp. And the Case number for this is 2007726187, but since I am DOD it is assigned to a secured team for support. But I do not think I need onsite engineer support from NetApp. Never needed it for HP, Windows, VMware, Red Hat, Pure Storage etc.

 

Thanks

37 REPLIES 37
Highlighted

Re: Need some help!!

Hi,

Your aggregates are full / almost full and this is never good on a NetApp system.

Aggregate level snapshots are at 0%, but it doesn't mean your volume level snapshots are not using any space. You can look at df output to see space utilisation broken down per each volume and its snapshot (you can post it here if it isn't very long).

SIS is a deduplication process, which most likely can't complete due to lack of space within the aggregate.

Highlighted

Re: Need some help!!

Ok, let me see if I can try to assist you.

 

100% aggr is very bad..

 

Are all your vols snapshot reserve to to zero or do you have a reserve.  Are all your volumes thin provisioned?

Why do you have so many snaps? 

 

Sis is your dedup, not snaps. 

 

snap reserve

snap sched

vol status (look for guarantee = none)

 

You're going to need to SSH into it to see some data

Highlighted

Re: Need some help!!

df shows

 

all volume .snapshots are at 0%. Theses use to be over 4,5 and 600 percent. But I did delete them all. could they still be stored on the aggregates?

 

sc1 /vol/vol0/ 0% capacity

sc1 /vol/vol0/.snapshot is 0%

sc1 /vol/data/ 73%

sc1 /vol/data/.snapshot 0%

 

sc2 /vol/vol0 0%

sc2 /vol/vol0/.snapshot 0%

sc2 /vol/events1/ 80%

sc2 /vol/events1/.snapshot 0%

sc2 /vol/events2/ 65%

sc2 /vol/events2/.snapshot 1%

sc2 /vol/tmp_storage/ 19%

sc2 /vol/temp_strorage/.snapshot 0%

 

Hope this helps

 

Highlighted

Re: Need some help!!

Are all your vols snapshot reserve to to zero or do you have a reserve.  Are all your volumes thin provisioned? Not sure on that at all.

 

Why do you have so many snaps? Unknown. I could not tell you at all

 

Sis is your dedup, not snaps. 

 

snap reserve set to 5% on all

snap sched set to 0 2 6@8, 12, 16, 20

vol status (look for guarantee = none) shows

sc1 vol0 raid_dp, flex, 64 bit           root, create_ucode=on

sc1 data raid_dp, flex, sis, 64 bit           create_ucode=on, convert_ucode=on

 

sc2 vol0 raid_dp, flex, 64 bit     root, create_ucode=on

sc2 events1 raid_dp, flex, degraded, sis 64 bit     create_ucode=on, convert_ucode=on

sc2 events2 raid_dp, flex, sis 64 bit     create_ucode=on, convert_ucode=on

sc2 temp_storage raid_dp, flex, sis 64 bit     create_ucode=on, convert_ucode=on

 

All of themn say online

Highlighted

Re: Need some help!!

Deleting all your snaps is not a good idea, especially if you don't know exactly what you are doing on Netapp.

 

I would only delete the snapshots to save production, which sounds like you might have done inadvertly.

 

After deleting your snaps are your aggrs still 100%.

 

Also do

priv set adv

sis stat

 

 

Highlighted

Re: Need some help!!

Yeah I was told to run snap delete -a I think and delete all snapshots.

 

Aggregates are still 95-100% I beleive the aggregates were not that high prior to deleting volume snapshots.

 

priv set adv did not work but priv set diag did

 

sis stat shows the following

sc1 /vol/data    40TB allocated     Saving 143GB   %Saved is 0%

 

sc2 /vol/events1    45TB allocated     Saving 361GB   %Saved is 0%

sc2 /vol/events2    27TB allocated     Saving 2848GB   %Saved is 9%

sc2 /vol/temp_storage     911GB allocated     Saving 1566GB   %Saved is 063%

Highlighted

Re: Need some help!!

Make sure your snapshots are off at the aggr level

 

snap sched -A

 

 

Highlighted

Re: Need some help!!

They are currently on. But snap list -A shows 2% and 3% max. Is it possible when I deleted the volume snapshots that were over 500, 600 and 700% that they are somehow still being stored at the aggregate level?

 

Highlighted

Re: Need some help!!

I am unsure how to turn off the snapshots on the Aggregates.

 

Also, SIS keeps getting and error but the NetApp tech support said I should never use disk compression or inline deduplication since I have HDD spin disks and not SSD's.

 

I do believe inline compression is on though, not sure about inline deduplication.

Highlighted

Re: Need some help!!

For the GUI issue, do you have OnCommand System Manager installed on your admin station?  That version of ONTAP did not have an embedded GUI, but if you install it locally you can add the filers and manage them via your browser.

 

Along the same lines as the other replies, please look at the output of these commands:

'snap sched -A' (this will show the scheduled snapshots for all aggregates, along with the retention)

'snap reserve -A' (this will show any space reservation for snapshots at the aggregate level)

'snap list -A' (this will show any existing snapshots at the aggregate level)

'snap sched -V' (this will show the scheduled snapshots for all volumes, along with the retention)

'snap reserve -V' (this will show any space reservation for snapshots at the volume level)

snap list -V (this will show any existing snapshots at the volume level)

 

What you should have for maximum available space would be 0% snap reserve across the board, no snapshots scheduled/existing on the aggregates and hopefully minimal snapshots on the volumes.  Also, when you list the volume snapshots youshould be able to identify snapshots created by the schedules by their names (hourly, daily and weekly with an ordinal number after to indicate the generation).  Additional snapshots you see may be manually created or created by other tools (such as SnapManager).

 

Also, if you have block LUNs on these systems, they may also be thick provisioned.  You can run 'lun show -v' to list all of your LUNs and look for the attribute Space Reservation and make sure they show Disabled so they are thin.

 

If everything is already configured properly and you did remove all the excess snapshots (of all of them period), then you may be in a situation where you have to migrate some data off quickly.  You can likely expand your aggregates with some additional disk shelves, but that's not typically a quick decision unless you happen to have some laying around unused.

 

Once you get past this intial emergency space issue, do make sure you setup some monitoring and alerting to ensure you don;t get in this situation again.  NetApp has some good tools, and you should be able to use OnCommand Core (previously named DFM) to monitor these 7-mode systems.

 

Hopefully these help, and good luck!

Highlighted

Re: Need some help!!

BTW, ONTAP 8.2.2 didn't support any inline efficiencies, so you likely only have post-process deduplication (SIS).

Highlighted

Re: Need some help!!

Forget about SIS.  You don't have enough headroom in aggr or vol for metadata.  That's not your issue.

 

So what is your current status?  How much room is in your aggr?

Turn off off snaps below

snap sched -A 0

 

They are usually useless.

 

But what is your current space status, that's more important.

 

Also, is your system SATA/SAS or flash, b/c inline dedupe and inline compression aren't really good on the first two.

 

Also, you are on 7-mode so things are different anyways and your SIS commands show you aren't saving much anyways.

 

Are these LUNS or Files?

 

Highlighted

Re: Need some help!!


So what is your current status?  How much room is in your aggr?

SC1 my aggr0 is at 95% and aggr1 100% 

and on my SC2 aggr0 is at 95%, aggr1 98% and aggr2 97%.

 

All the aggregates /.snapshot is 0%.

 

Turn off off snaps below

snap sched -A 0

Done

 

They are usually useless.

 

But what is your current space status, that's more important.

 

df -A shows the following:

sc1

Aggregate         kbytes               used             avail        capacity

aggr0                    3509265040         3345887780    163377260           95%

aggr0/.snapshot            0                          0                       0                    0%

 

aggr1                   63166770432         63166770432          0                 100%

aggr1/.snapshot            0                          0                       0                    0%

  

sc2

Aggregate          kbytes               used            avail          capacity

aggr0                    3509265040         33458240384    163366964           95%

aggr0/.snapshot            0                          0                       0                    0%

 

aggr1                   66676035480         65041293188          1634742292        98%

aggr1/.snapshot            0                          0                       0                    0%

 

aggr2                   56148240384         54437240556          1710999828        97%

aggr2/.snapshot            0                          0                       0                    0%

 

Also, is your system SATA/SAS or flash, b/c inline dedupe and inline compression aren't really good on the first two.

Definately not flash.

 

 

Are these LUNS or Files?

Unsure how to check for this. I am all Red Hat systems. I want to say Files but do not know how to check for the answer.

 


 

Highlighted

Re: Need some help!!

For the GUI issue, do you have OnCommand System Manager installed on your admin station?  That version of ONTAP did not have an embedded GUI, but if you install it locally you can add the filers and manage them via your browser.

 

Along the same lines as the other replies, please look at the output of these commands:

'snap sched -A' (this will show the scheduled snapshots for all aggregates, along with the retention)

set to 0

'snap reserve -A' (this will show any space reservation for snapshots at the aggregate level)

all set to 0

'snap list -A' (this will show any existing snapshots at the aggregate level)

All are deleted

 

'snap sched -V' (this will show the scheduled snapshots for all volumes, along with the retention)

snap sched set to 0 2 6@8, 12, 16, 20

 

'snap reserve -V' (this will show any space reservation for snapshots at the volume level)

snap reserve set to 5% on all Volumes

 

snap list -V (this will show any existing snapshots at the volume level)

 shows all from the schedule above for each volume

 

What you should have for maximum available space would be 0% snap reserve across the board, no snapshots scheduled/existing on the aggregates and hopefully minimal snapshots on the volumes.  Also, when you list the volume snapshots youshould be able to identify snapshots created by the schedules by their names (hourly, daily and weekly with an ordinal number after to indicate the generation).  Additional snapshots you see may be manually created or created by other tools (such as SnapManager).

So the volumes should not be set to 5% snap reserve?

 

 

 

Also, if you have block LUNs on these systems, they may also be thick provisioned.  You can run 'lun show -v' to list all of your LUNs and look for the attribute Space Reservation and make sure they show Disabled so they are thin.

lun show -v returns a blank command line

 

If everything is already configured properly and you did remove all the excess snapshots (of all of them period), then you may be in a situation where you have to migrate some data off quickly.  You can likely expand your aggregates with some additional disk shelves, but that's not typically a quick decision unless you happen to have some laying around unused.

Well since my cpu's are spiking high when i run a sysstat -M, I do think that adding an additional shelf may crash this large NetApp. We have moved over all mission systems and data to a VERY old NetApp shelf that is much smaller and things are running excellent there not issues and no aggregates filling up.

 

The data currently on there is nothing near what the capacity is. We also Zip and Compress data monthly off to another NetApp disk shelf.

 

Once you get past this intial emergency space issue, do make sure you setup some monitoring and alerting to ensure you don;t get in this situation again.  NetApp has some good tools, and you should be able to use OnCommand Core (previously named DFM) to monitor these 7-mode systems.

Does this install on a Red Hat system?

Highlighted

Re: Need some help!!


@JamesIlderton wrote:

For the GUI issue, do you have OnCommand System Manager installed on your admin station?  That version of ONTAP did not have an embedded GUI, but if you install it locally you can add the filers and manage them via your browser.

 

I have tried and failed. There are dependancies to install that must be on our RHEL6 system that we currently do not have. I get servlet errors when I try to go to my NetApp IP address


 

Check out the KB!
Knowledge Base
All Community Forums