ONTAP Discussions

How to see deduplicated files

christoph_2
5,649 Views

Hello,

 

as a part of my "Bachelor-Thesis" (a project of the university) I want to get redundant files. I read from the deduplication-procedure of netapp. In this case I thought this could help me. I did some research but I couldn’t find a list which shows the shared bytes with the related files.

 

Our company uses FAS2552-Server. I got access through various interfaces:

  • Web-application “NetApp OnCommand System Manager 3.1.2”
  • PowerShell Toolkit 4.0
  • Operation-System DataONTAP 8.2.2 7-Mode, via SSH/Putty

I know there should be metadata on each aggregat. But I dont know how to see these metadata. Is it possible to read them? 

 

Everything what I found are percentages of the data-saving. But that is not what I need...

 

 

Is there no possibility to get a list of the shared bytes with the related files?

1 ACCEPTED SOLUTION

GidonMarcus
5,504 Views

Hi

 

To my understanding not without scanning the iNODE tree yourself.   (can maybe reverse engineer and use the existing hash index - but I guess it’s not going to help).

But this is something I don’t see any storage admin will ever need to do. Therefor it’s not exists out-of-the-box.

BTW. I’m not NetApp badged – but I don’t think anyone could help you with this further as the terms of use prohibit allot of the necessary steps for doing something like that https://library.netapp.com/ecm/ecm_get_file/ECMP1512260

 

Gidi

Gidi Marcus (Linkedin) - Storage and Microsoft technologies consultant - Hydro IT LTD - UK

View solution in original post

5 REPLIES 5

shamz
5,632 Views

Hi,

 

I'm not sure what you mean by de-duplicated files.  De-duplication in ONTap doesn't really care about files at all, although there are probably some blocks from some file types that are not a good idea to de-duplicate because of historical problems with these.

 

You need to read the documentation and construct your questions within the framework and terminology of how de-duplication works on NetApps ONTap.

christoph_2
5,534 Views

Hey, thank you for your reply, shamz. I am really sorry for my bad english and the unclear question. I have read you have criticized it before. I have updated the question in hope it makes it more clear now.

GidonMarcus
5,611 Views

NetApp and most other storage Vendor do De-Duplication and compression on a block level. For de-duplication the way inode works help to already have the mechanism to map a file to the same block. (https://en.wikipedia.org/wiki/Inode_pointer_structure )  All the vendor left to do is to find the duplicate blocks.   Working in block level also gives more constant performance and not require the process to touch the actual files (which requires handling locks, be protocol aware, and imposes security risks like http://www.theregister.co.uk/2016/05/12/popular_zip_tool_7zip_pwned_pain_flows_to_top_security_software_tools/ ).

 

Good luck in you work. And feel free to ask more.

 

Gidi

Gidi Marcus (Linkedin) - Storage and Microsoft technologies consultant - Hydro IT LTD - UK

christoph_2
5,529 Views

Thank you for your answer, GidonMarkus!

 

I have added the principles of INode to my work. That is fine, thank you! I have noticed that i did an unclear question, so I have updated the question now.

 

At least I need to know which files shares their bytes. Do you know any possibility of how to get a list of these files? 

GidonMarcus
5,505 Views

Hi

 

To my understanding not without scanning the iNODE tree yourself.   (can maybe reverse engineer and use the existing hash index - but I guess it’s not going to help).

But this is something I don’t see any storage admin will ever need to do. Therefor it’s not exists out-of-the-box.

BTW. I’m not NetApp badged – but I don’t think anyone could help you with this further as the terms of use prohibit allot of the necessary steps for doing something like that https://library.netapp.com/ecm/ecm_get_file/ECMP1512260

 

Gidi

Gidi Marcus (Linkedin) - Storage and Microsoft technologies consultant - Hydro IT LTD - UK
Public