Data Backup and Recovery
Data Backup and Recovery
Initially I got errors (0xc00408d3, 0xc00402c2, and 0x00402206) during a daily snapshot on SQL database through SnapManger for SQL. I called NetApp for support many times but not much help. It's been 3 weeks from the initial call. I really need help to get this issue solved. I deleted all backups through SnapManager for SQL. I don't see any changes in volume available size. I also deleted some snapshots through filerview. I still didn't see any changes in volume available size.
What have I not done right?
I have had this problem before and had to grow the volume in order for it to sort it's self out. Once it is running again to can shrink it back. {Flexvol assumed} The problem is with how LUNs use reserve. The snaps go into the volumes data space 1st then fill up the reserve space.
If you connect to the consol you can use the command
df -V -g
To view all the volumes on the filer and how the space {in Gb} is used. Use the command
df -A -g
To see how much space you have in the aggregates. Once aggregates are above 80% full you are heading for pain but thats another story...
vol size {volname} +5g
Will grow the volume by 5 Gb
vol size -5g
Will shrink it back by 5 Gb
Good luck
Are you saying that I have to increase the Volume for it to clean up (shrink) itself back? I'm new to NetApp SAN. I just increased the volume by 10 GB. How long will it take to shrink itself back? How do I know that increasing by 10 GB is enough or not?
Thanks,
I increased the volume by 10GB. Now I can make a snapshot without any error. I deleted about 2/3 of all snapshots. It's been over 5 hours, but the volume consumpsion has not reduced.
This can happen in two scenarios that I have seen thus far:
The latter seems to be the most probable. To confirm this theory, try these two commands:
In the "aggr" command, the output is self explanatory. In the latter, if you see ANY volume that has guarantee set to "none", in essence, that is a volume that has the ability to be sized above and beyond the physical confines of the aggregate.
Please confirm.
All volume status show "guarantee"
volume=guarantee.
What version of ONTAP? Do the numbers in "aggr show_space" look as expected?
I think the version is 7.2.4L1. I'm expecting the log (the trouble volume) consuming less space since I deleted about 2/3 of snapshots within the volume. It's been over 5 hours, but the volume consumpsion has not reduced. I also increased the volume by 10GB. Now I can make a snapshot without any error. I don't see any changes in the volume consumpsion.
Check space reservations on your luns. If you have space reservations enabled, the luns will require twice their size in the volume for lun snapshots. There is a whole section on space reservations in the ONTAP documentation which will do you more justice than I can do in this little box 😉
Here's a link with respect to your ONTAP release:
http://now.netapp.com/NOW/knowledge/docs/ontap/rel724L1/html/ontap/bsag/4cr-f3.htm
The command "lun show -v"will tell you if space reservation is enabled on your luns.
Let me know if this seems like the issue; I'm now suspecting it is.
It's typically the best practice to enable space reservations on luns for the reasons cited in the documentation link I referenced. There is also a concept called thin-provisioning which among other things, disables space reservations. Each have their advantages and draw-backs which I recommend you become intimately familiar with before changing. Here is a link on Thin Provisioning:
http://www.netapp.com/us/library/technical-reports/tr-3483.html
Just for further clarification. This is about volumes with luns, right? And the problem is that you are deleting data inside your luns (from the host) and some lun snapshots and not getting space back on the storage system?
Please clarify.
1. I used FilerView. Then go to Volumes -> Snapshots -> Manage -> select the snapshots (two third) within the trouble volume and delete them.
2. I also used SnapManager for SQL server -> Action -> Delete Backup... -> select the LUN within the trouble volume and select delete oldest backup in excess of "1". Then delete. (I'm not really sure if this method delete data inside the LUN or not. A NetApp support told me that this one deleted the snapshots which It didn't make sense to me.)
Data inside the lun can only be managed from the host that is using the lun.
But I think the remaining disconnect is in what you are expecting to see and not seeing it. Perhaps there is no space to be recovered at all. You can take 200 snapshots of your lun this very minute and take up hardly any space at all beyond what you are taking up. Then delete them once more and not see any gains.
Do understand that with luns, that first snapshot will take up exactly the space of the lun. So a 300GB lun with snapshots will require a +600GB volume with space reservations. In having a 300GB lun in a 550GB volume, you WILL NOT be able to take a snapshot and you WILL get a space error.
But do the math and look at the volume, lun and snapshots carefully and consider what I just said. Is there really space to recover?
See if these links help:
http://now.netapp.com/NOW/knowledge/docs/ontap/rel724L1/html/ontap/onlinebk/2snap8.htm
https://now.netapp.com/Knowledgebase/solutionarea.asp?id=kb39654
http://now.netapp.com/NOW/knowledge/docs/ontap/rel724L1/html/ontap/bsag/4cr-f5.htm
I think I'm getting close to the solution. Thank for your advise and info. But I still can't determine which snapshot comsumes the most space on the volume. When I deleted the snapshot with 28% used 15% total, it transfers the 28% usage and 15% total to the previous snapshot. Do I have to delete ALL snapshots in order to free up the space. The following are the results of df and snap list commands.
bossana>df
/vol/mexico_log2/ 73400320 71608900 1791420 98% /vol/mexico_log2/
/vol/mexico_log2/.snapshot 0 12355224 0 ---% /vol/mexico_log2/.snapshot
-----------------------------------------------------------------------------------------------------------
BOSSANA> snap list mexico_log2
Volume mexico_log2
working...
%/used %/total date name
---------- ---------- ------------ --------
0% ( 0%) 0% ( 0%) Sep 08 12:57 sqlsnap__mexico_09-08-2008_12.56.24
28% (28%) 15% (15%) Aug 19 01:15 sqlsnap__mexico_08-19-2008_01.15.00__daily
29% ( 2%) 16% ( 1%) Aug 18 01:15 sqlsnap__mexico_08-18-2008_01.15.00__daily
29% ( 0%) 16% ( 0%) Aug 17 12:30 sqlsnap__mexico_08-17-2008_12.30.00__weekly
29% ( 0%) 16% ( 0%) Aug 17 01:15 sqlsnap__mexico_08-17-2008_01.15.00__daily
29% ( 0%) 16% ( 0%) Aug 16 01:15 sqlsnap__mexico_08-16-2008_01.15.00__daily
30% ( 0%) 16% ( 0%) Aug 15 01:15 sqlsnap__mexico_08-15-2008_01.15.00__daily
30% ( 1%) 17% ( 0%) Aug 14 01:15 sqlsnap__mexico_08-14-2008_01.15.00__daily
There are two ONTAP commands that allow you to see that:
snap delta
snap reclaimable
Play with those and see how you can determine what you are looking for.
Lemme know .....
Thanks for all your help. Those commands and article links you gave me were very useful. They led me to my solution. And I'd like to emphasize that this is only for my solution. It may not suit for the other. I simply turned on SNAP AUTODELETE for the trouble volume and it started deleting snapshots until it reached 20% free space (by default). Since this is a SQL server for researching not the production, I can take more risks. I need to learn more about managing spaces on NetApp SAN for the future. BTW I setup a backup job for SQL server for all databases nightly. Even I lost this server I still have backup of all databases.
Awesome. You are quite welcome. Glad to hear this is resolved for you.