ONTAP Discussions

please explain to me the snap list on the source filer in snapmirror

netappmagic
8,366 Views

The vol1 is 100% full again. My question is not about sovling volume full issue. but, more about the snapmirror in a more granular way. Please see the following outputs on the source filer "filer2". It seems to me that the size of the snap caused by the snapmirror is 1911GB, the rest of the space is taken by the volume (FS) itself. Could anybody please explain to me in detail about the output of "snap list vol1"?

-  what does exactly the snap include, complete copy of volume1, and plus all snapshots since the first full copy? Why do I have to continue to leave the full copy on the source filer, after it has already copied over to DR site?

- has this listed snap alredy been copied to drfiler1? or just the full sets of snapshots?

- is there any way to list the data in detail as to what is the full copy of volume and what are those snapshots, when were those snapshots taken individually?

thanks for your help!

filer2> df -rg vol1
Filesystem               total       used      avail   reserved  Mounted on
/vol/vol1/       8193GB     8148GB        0GB        0GB  /vol/vol1/
/vol/avol1/.snapshot        0GB     1911GB        0GB        0GB  /vol/vol1/.snapshot
filer2> snap list vol1
Volume vol1
working...

  %/used       %/total  date          name
----------  ----------  ------------  --------
23% (23%)   23% (23%)  Oct 09 06:00  drfiler(0151735037)_vol1.806 (snapmirror)

28 REPLIES 28

aborzenkov
7,349 Views

Snapshot unique data contains blocks that were deleted or overwritten since snapshot had been taken. I do not think you can see whether snapshot was fully transferred in snap list output - you need to check snapmirror status. Although indirectly when SnapMirror transfer is in progress, you see two snapshots on source - so we may assume transfer finished (but successfully or not we do not know).

netappmagic
7,349 Views

Okay. Where could you see "two snapshots" on source? I  only see one.

The snapmirror was established a month ago. I can olny see one line as the  result of  "snap list".  So, does this snap with 1911GB include all changes/owverwritten since first initialization?

aborzenkov
7,349 Views

Snapshot by definition cannot include any change "since". It's content is frozen at the moment snapshot is created.

netappmagic
7,350 Views

Okay. Understood. Thanks for clearifications.  I am ursorry, but still could not fully understand my questions in mind. This snapmirror has been scheduled once every half hour on the destiantion, and it seems working by using"snapmirror status".

So, the following line by "snap list" contains every single snapshot (understood it is frozen image copy) since first initialization? and that's why it is so big, with the size of 1911GB? if all snapshots have already transferred to the destination ( I guess this is how the destination keeps the original copy and all changes being made on the source), why do we need to keep all these snapshots on the source?

Thank for your patience.

filer2> snap list vol1

Volume vol1

working... 

  %/used       %/total  date          name
----------  ----------  ------------  --------
23% (23%)   23% (23%)  Oct 09 06:00  drfiler(0151735037)_vol1.806 (snapmirror)

aborzenkov
7,173 Views

SnapMirror does not keep all snapshots. It needs only one - latest - snapshot as baseline. During next update it creates new snapshot and transfers difference between baseline and current snapshot. After that old baseline is removed and last transferred snapshot becomes new baseline.

billshaffer
7,350 Views

You are incorrect assuming the snapshot shown in "snap list" contains every single snapshot since initialization.  On initialization, a "base" snapshot will be taken at the source, and that point-in-time data is copied over to the destination volume.  This initialization data includes all existing snapshots, including the base that was just taken.  At this point you've got a "Snapmirrored" relationship.  Remember that, at the time of creation, a snapshot takes no real space, since it just contains pointers back to the original data.  Only as that original data changes are the snapshot blocks used.

When you update the existing relationship, a "differential" snap is taken at the source.  These two source snaps plus the destination snaps (copied over during the last transfer) are used to compute the data that has changed since the last transfer.  These differences are copied over to the destination volume - including the snap just taken - and then the base snapshot at the source is removed, since the newer snapshot now has the point-in-time reference image, and it not becomes the base for the next update.  This is what aborzenkov is referring to when he says that you see two source snapshots when a transfer is in progress.

So your 1911G snapshot essentially contains the _changes_ to the volume between when the snapshot was taken and now -_not_ the changes since initialization.

You say that the snapmirror has been scheduled for every 30 minutes - but this snapshot is from Oct 9.  The date of the snapshots will update with a successful transfer (and the size will also go down), so I think something is wrong with the schedule.  What does "snapmirror status" show as the lag time?  Will you post the "snap list" output of the destination volume too?

Bill

netappmagic
7,350 Views

Hi Bill,

Thank you so much for such detailed explanations which cleared out quite confusions in my mind.

As you indicated, there should be something wrong with the snapmirror, and now I feel the size of the snapshot should not be so big (1911GB). The total of the volume size is about 8TB.

I am sorry, but I have not accurately stated my situation:

-     a)                 the snapmirror for this volume is scheduled as following, not every half hour as I said earlier:
netapp2:vol1 drfiler1:vol1 - 0-59/59 * * *

How to explain this schedule? Does the update start every hour based on this schedule? Maybe this schedule caused the problem of that every month or so the volume gets full?

-     b)    The output of snap list you see here  was up to 10/09 when the volume got full, and  I therefore broke  the snapmirror off on that day.

The following are outputs you ask for, and again it was up to 10/09:

drfiler1> snap list vol1

Volume vol1

  1. working...

  %/used       %/total  date          name

----------  ----------  ------------  --------

  0% ( 0%)    0% ( 0%)  Oct 09 06:00  drfiler1(0151735037)_vol1.806

  0% ( 0%)    0% ( 0%)  Oct 09 05:59  drfiler1(0151735037)_vol1.805

drfiler1> df -rg vol1

Filesystem               total       used      avail   reserved  Mounted on

/vol/vol1/       8193GB     8135GB       57GB        0GB  /vol/vol1/

/vol/vol1/.snapshot        0GB        0GB        0GB        0GB  /vol/vol1/.snapshot

drfiler1> snapmirror status vol1

Snapmirror is on.

Source                 Destination              State          Lag        Status

netapp2:vol1  drfiler1: vol1  Broken-off     126:33:39  Idle

Thanks again for your patience.

billshaffer
7,350 Views

0-59/59 would, to me, indicate that it's going to update the 59th minute of every hour, though it's kind of a convoluted way to get there.  But the fact that your destination snapshots are a minute apart seems to say it's going every minute, which is a bit extreme.  You should be able to see in /etc/messages and/or /etc/log/snapmirror what it's trying to do and what it's saying.

Try changing your schedule to 59 * * * (which will run at 1 minute before each hour), resync the relationship, and see what happens.  You will need to grow the source so it can write a new snapshot, but after the sync it should remove that 1911G one.

Bill

netappmagic
7,350 Views

Got your point.

I could not grow the source, since the aggr where the  volume is located is completely full, I could not grow the source.

Could I remove the 1911G first? it seems to me that this snapshot maybe already corrupted. If I could, then do I have to reinitialize or I could resync?

billshaffer
6,425 Views

The snapshot itself can't be corrupted.  It can point to data that is corrupted somehow, but that is not the snapshot's fault.

You can remove the existing snapshot, but then you'll have to reinitialize, which will do a full data transfer.  But if that's your only option, then you're kind of stuck.  Any chance of removing some of the "live" filesystem?  temp files or something?

Bill

netappmagic
6,425 Views

Okay. One thing I am not so sure of. Would it be possible the snapshot takes so much space, 1911GB? How to explain the snapshot is so big? It seems not likely that there are so much data being changed between two snapshots.

billshaffer
6,425 Views

Snapshot space is determined solely by change rate.  2TB change in 8TB in 5 days _seems_ a bit high, but like I said before - barring a bug or something, a snapshot just can't be corrupt - the fact that it is that big pretty much tells you that that much data has changed.  A lot depends on the application.  I've seen databases chew up snapshot space pretty quickly.  If you're doing luns, and do a format on the host, that is all going to show as changed data, too.

And remember - the size represents the data change on the live filesystem between now and when the snapshot was taken - the multiple shapshots are only used to compute what needs to be sent to the destination.

Bill

netappmagic
6,425 Views

Hi Bill,

There is one issue left in my mind.

>So your 1911G snapshot essentially contains the _changes_ to the volume between when the snapshot was taken and now -_not_ the changes since initialization.

We have the schedule of updating the volume in every minute on the destination(I know now it is too extrem),  as I understand, this update would meanwhile also trigger the snapshot. So, this 1911GB was the result of snapshot being taken in a minute! This size of data changes in a minute seems impossible.

>2TB change in 8TB in 5 days _seems_ a bit high

I don't know where you got "5days" from?

At this point, I should tell you what is the volume for.  This volume is  presented to Window server as a share, and is used for Acronis backup. 2 weeks retention, full backup in 2 Sunday, and incrementals in any other days.  Does this tell you something?

Thank you!

aborzenkov
6,425 Views

Your schedule is every hour, not every minute. And even in this case next scheduled SnapMirror transfer is skipped if previous one did not complete.

Backup applications can delete large amount of expired data in short time.

aborzenkov
6,425 Views

In your original mail snapshot is dated Oct 09. Your original mail (post) was made Oct 14. That’s 5 days, not one hour. You need to check whether your snapmirror is running. Billshafer already told you that.

netappmagic
6,103 Views

No, it is not 5 days. I explained that already.

Though I post the message on 10/14, the snapmairror had already broken off by me on 10/9 because the volume was full on10/9. so, all outputs were refrection of situation before 10/9.

netappmagic
6,103 Views

Hi aborzenkov,

>Backup applications can delete large amount of expired data in short time.

Does that mean it could cause such large size of snapshot (1911GB) in an hour due to the deletion of the large amount of expired data? I thought if the data is gone, we don't need to track these data, and then should not have such amount of the snapshot. I am still trying to logically explain why we have such large size of  snapshot in an hour(thanks for pointing this out).

Please let me know. Thank you!

billshaffer
6,103 Views

netappmagic:

You're saying that the output you've posted is from commands run Oct 9th, not yesterday?  If so, I've misunderstood; I thought those were current figures.

The bottom line, though, is that whatever size the snapshot is (or was), is the amount of data that has been overwritten/changed.  Period.  Backup applications are known for high change rates.  If your snapshot is that big, it means yes, your backup application is changing that much data.  As far as tracking the data - once the changes have been replicated, the large snapshots will be deleted.  But that replication has to happen for the snapshots to cycle, so until then the changes are still "tracked" in the snapshot.

Does that make sense?

Bill

netappmagic
6,103 Views

Hi Bill,

Yes, it does make sense. I guess, I would have to accept this size of the snapshot.

I hvae two more quick question:

a) I would have to remove the snapshot, and let the backup going, then reinitialize the snapmirror, since I don't have any space left in aggr. right?

b) If I add FC drives into this SATA aggr, would that be alright, any performance issues?

Thank you!

billshaffer
6,034 Views

Without growing the aggr, yes, you would either have to remove the snapshot and reinitialize, or remove volume data, which should let you resync.  If you've got some backup data that is close to expiry, could you expire that out a bit early?

Having different drive types/sizes in an aggr is generally frowned upon; you can introduce performance issues.  That being said, I've see it done, and adding faster disk to slower disk is probably better than adding slower disk to faster disk....  I would avoid it, if you can, but it is certainly possible.

Bill

Public