2016-10-23 10:31 PM - edited 2016-11-02 12:16 AM
as a part of my "Bachelor-Thesis" (a project of the university) I want to get redundant files. I read from the deduplication-procedure of netapp. In this case I thought this could help me. I did some research but I couldn’t find a list which shows the shared bytes with the related files.
Our company uses FAS2552-Server. I got access through various interfaces:
I know there should be metadata on each aggregat. But I dont know how to see these metadata. Is it possible to read them?
Everything what I found are percentages of the data-saving. But that is not what I need...
Is there no possibility to get a list of the shared bytes with the related files?
Solved! SEE THE SOLUTION
2016-10-24 12:06 AM - last edited on 2016-10-24 01:08 PM by GaryDM
I'm not sure what you mean by de-duplicated files. De-duplication in ONTap doesn't really care about files at all, although there are probably some blocks from some file types that are not a good idea to de-duplicate because of historical problems with these.
You need to read the documentation and construct your questions within the framework and terminology of how de-duplication works on NetApps ONTap.
2016-10-24 04:23 AM - last edited on 2016-10-24 01:12 PM by GaryDM
NetApp and most other storage Vendor do De-Duplication and compression on a block level. For de-duplication the way inode works help to already have the mechanism to map a file to the same block. (https://en.wikipedia.org/wiki/Inode_pointer_structure ) All the vendor left to do is to find the duplicate blocks. Working in block level also gives more constant performance and not require the process to touch the actual files (which requires handling locks, be protocol aware, and imposes security risks like http://www.theregister.co.uk/2016/05/12/popular_zip_tool_7zip_pwned_pain_flows_to_top_security_software_tools/ ).
Good luck in you work. And feel free to ask more.
2016-11-02 12:20 AM - edited 2016-11-02 12:22 AM
Hey, thank you for your reply, shamz. I am really sorry for my bad english and the unclear question. I have read you have criticized it before. I have updated the question in hope it makes it more clear now.
2016-11-02 12:30 AM
Thank you for your answer, GidonMarkus!
I have added the principles of INode to my work. That is fine, thank you! I have noticed that i did an unclear question, so I have updated the question now.
At least I need to know which files shares their bytes. Do you know any possibility of how to get a list of these files?
2016-11-02 05:31 AM
To my understanding not without scanning the iNODE tree yourself. (can maybe reverse engineer and use the existing hash index - but I guess it’s not going to help).
But this is something I don’t see any storage admin will ever need to do. Therefor it’s not exists out-of-the-box.