ONTAP Discussions

performance issue with snapmirror and snapshots on target aggregate

Stefan-Reitmeier
11,865 Views

Hi all,

 

does anyone of you has experience with snapmirror an larger amount of data. 

At the moment we do a snapmirror of about 100tb data distributed over about 10 volumes to a sata aggreagate on a second filer with 85 4tb sata disks (5x17disk raidgroup). Source is FAS 8040, target is FAS 8020 both with cdot 8.3p1. We already moved all workload from the target aggregate, so it hosts only snapmirror targets.

On the source side we do 1 snapshot per day and keep 14 snapshots. Snapmirror is done once per day. From counting snapshots  I would say daily change rate is 2-2,5tb for all volumes.

 

Snapmirror is working fine and finished in less than 2-3h, but container block reclamation and deswizzling is totally killing the aggregate on the target side. We do see continous load of 30MB read and disk util for all disks except parity disks is 90-100%. 

At first we planned 4h snapshot but that is just not possible. At the moment we disabled deswizzle and get to a point where if we are lucky the target aggregate load drops in the night just before next snapmirror kicks in.

 

We are quite new to Netapp but it sounds ridiciolous, that you need so much io for just a plain replication and some snaps.

Do you have any experience with snapshots and snapmirror using sata disks? I think snapshots and snapmirror on Netapp are very resource demanding. It is true that the creation of snapshots on Netapp is super efficient and instant but as soon as snapshot has to be deleted container block reclamation kicks in and takes large amount of disk resource. Same for snapmirror, it is really cool and stable, but deswizzling for logical to physical block mapping with large data affects snapmirror target performance heavily.

 

 

Best wishes,

 

Stefan

 

 

1 ACCEPTED SOLUTION

Darkstar
11,566 Views

This is not related to anything with reallocation. It's simply the deswizzler which has to run through the full volume and update the PVBN references, which involves a lot of Metadata reads which can result in severe cache thrashing. PAM cards on the SnapMirror destination help A LOT with such a workload.

 

An alternative would be to disable the deswizzler but then access to the secondary data will be potentially slow(er) (because every read of a block has to go through the VVBN->PVBN mapping, which is one metadata file) which isn't a big deal usually, but in a DR or migration scenario, when the destination becomes active at some point, you probably don't want to have that extra level of indirection. You can later re-start the deswizzler manually but then again it will take a loooong time to complete.

View solution in original post

6 REPLIES 6

deepuj
11,607 Views

Hi Stefan,

 

Couple of questons from my side:

 

1)Is the disk utilization high always or only during the snapmirror scedule?
2)Is there any volume reallocation schedules running?
3)Is aggregate free space reallocation "ON" on the DR site aggregate
4)Are you taking any backups from the target site?

 

 

Thanks

If this post resolved your issue, help others by selecting ACCEPT AS SOLUTION or adding a KUDO.

Stefan-Reitmeier
11,599 Views

Hi deepuj,

 

thanks for your answer.

Disk util is high after snapmirror when block reclamation and deswizzle kicks in. Depending on the size of deleted snapshots and snapmirror sometimes it takes until next snapmirror and then we have continous high disk load for multiple days.

Reallocate is not running. We use aggregate reallocate no_redirect on source volumes.

Backups are only taken on source side.

 

 

Best wishes,

 

Stefan

 

 

Darkstar
11,567 Views

This is not related to anything with reallocation. It's simply the deswizzler which has to run through the full volume and update the PVBN references, which involves a lot of Metadata reads which can result in severe cache thrashing. PAM cards on the SnapMirror destination help A LOT with such a workload.

 

An alternative would be to disable the deswizzler but then access to the secondary data will be potentially slow(er) (because every read of a block has to go through the VVBN->PVBN mapping, which is one metadata file) which isn't a big deal usually, but in a DR or migration scenario, when the destination becomes active at some point, you probably don't want to have that extra level of indirection. You can later re-start the deswizzler manually but then again it will take a loooong time to complete.

Stefan-Reitmeier
11,497 Views

 

 

 

 

RPHELANIN
11,441 Views

Whats the replication schedule? Deszwilling may not have completed between mirrors...

Stefan-Reitmeier
11,424 Views

Hi 

 

 

 

 

 

Public