Subscribe

Snapvault Performance

hello,

I have a customer with 15 millions files hosted to a FAS3240. We want to backup theses files to a FAS2040. Do you recommend to split the files in different qtrees and start the backup streams in parallel in order to get better performance ? or do you recommend to let all files in the same qtree ?

If I understand Snapvault well, for each backup we have to parse the file system to find the modified files in order to identify the blocks to send. So the number of files inside the Qtree could be a problem.

Thank you for your help.

Best regards,

nd

Re: Snapvault Performance

Hi,

Instinctively I would rather use multiple qtrees - however I am not sure whether it will help with anything .

During SnapVault update you have to parse through these 15 millions of files regardless whether it is a single, or multiple streams, so it will consume a lot of CPU cycles anyway.

If you can split the data and then stage different qtrees updates - that would definitely help.

Regards,

Radek

Re: Snapvault Performance

Just curious, doesn't Snapvault use volume level (block level) snapshots to perform backups?  Does it even iterate files per se?  I'm not an NFS guy so perhaps that works differently.  I'd like to correct my understanding on this if I'm not correct, thanks!

Re: Snapvault Performance

SnapVault transfer block level changes, indeed - but it first parses the file system to check which files have changed since the last update.

http://www.netapp.com/us/library/technical-reports/tr-3487.html, Chapter 6.10:

For this example, suppose that you have two data sets, both 10GB in size. The first data set, dataset1, has approximately a million small files, and the second data set, dataset2, has five files, all 2GB in size. During the baseline transfer, dataset1 requires more CPU usage on the primary or requires a longer transfer time than dataset2.

Regards,
Radek

Re: Snapvault Performance

Okay, that makes sense, but where this leads me to is what does the ONTAP OS consider a file?  In my environment, we're all FCP, so the only "file" I think ONTAP sees is the LUN file (not the NTFS files within the LUN).  So this really doesn't apply in FCP LUN cases.  Am I correct in your understanding? 

The poster above, I think we can assume, is using NFS, so ONTAP sees each individual file hosted in his QTrees.  So in his case performance on the primary is a concern.  However, he didn't say for certain (he may be FCP). 

I think your initial answer is correct, but I'm taking this opportunity to make sure I have my facts on this correct, so I appreciate your responses. 

Re: Snapvault Performance

Yes, this is correct - LUN is just a single file from ONTAP (& SnapVault) perspective.

Re: Snapvault Performance

For best snapvault performance you should split the data in volumes instead of qtrees. What I have seen in the past is that even for an empty qtree in a volume with a lot of files in other qtrees, the snapvault transfer will take a relative long time and you will see a considerable amount of transferred KB's in the snapvault status -l output.

Re: Snapvault Performance

That's interesting observation - maybe someone from NetApp can give us more insight into why it is happening?

If splitting data into volumes is beneficial, indeed, then this info should be highlighted in SnapVault best practices TR.

Re: Snapvault Performance

Thank you for you answers.

I will try to make some tests.

A friend tells me that on his production platfom, with 8 millions files in one volume hosted to a FAS3140, the snapvault backup update takes 24 hours to complete.

In a block environment , iSCSI or FCP, in general, you have max few hundreds of LUNs per controller. So no problem with Snapvault. The performance challenge with Snapvault is for HFC ( High File Count) environment.

Thanks.

Re: Snapvault Performance

I can confirm that it takes a long time when transfering a lot of small files.

We have a volume which stores more than 40 millions files. Although the files are spread across 50 qtree's, the snapvault-job takes up to 24 hours to complete.

In our environment, where we have a "Primary > Mirror > Backup" relationship, this is a big problem, as the snapmirror jobs have to wait until the backup's finished.

I'm currently planning to use a different method to backup this volume.