About lfreeman

lfreeman · ‎2011-06-06

Typically we've seen dedupe savings around 10% on Exchange 2003 and 2007. Your Exchange environment seem a little unusal though and the savings might well be higher especially if you've stored everything in one volume. You can test your potential savings using a couple different methods: 1) Use the Space Savings Estimation Tool (SSET) - ask your NetApp SE for a copy 2) Make a clone of the Exchange Vol(s) using FlexClone, and dedupe the clone. Measure the savings with df -s and decide if you want to implement. Use an eval license for FlexClone if you don't have a license and just want to try it out. Hope that helps DrDedupe

lfreeman · ‎2011-05-31

If you want to test your Exchange environment for dedupe savings before you install dedupe, you can use the free Space Savings Estimation Tool (SSET) crawler to predict the savings without actually deduping anything. Your NetApp SE (or authorized partner SE) can help install and run the SSET tool for you. Or...if you have a FlexClone license, you can create an Exchange clone and dedupe just the clone volume, measure the savings on the clone with df -s then destroy the clone. Finally, you can just turn on dedupe for the Exchange volume and watch what happens. Worst case is that you don't like the savings and you just turn dedupe off (sis off, sis undo). DrDedupe

lfreeman · ‎2011-05-11

There is at least one internal doc I know of that states this is a supported config. Not sure why this is not covered in our external docs but I will speak to the Marketing folks and try to get this added. Best bet is to open a support call and have someone from NetApp Support email you to confirm this is supported.. Hope that helps- DrDedupe

lfreeman · ‎2010-09-02

Hi Reinoud, good comments. Yes I agree that RFP's can be cumbersome, but as organizations start using them eventually they develop a standard process and things get streamlined. One very important point you bring out is the scoring system. I am not sure the vendors need to know which areas have the most weight, but the requestor should always know how they are going to score before they send out the RFP. I've seen the scoring done a couple different ways - some, like you described, have a weight for each section while others actually score each question (say from 1-5) then they factor in a multiplier for the most important questions - sort of like giving those questions a higher degree of difficulty to use a sports analogy, I wonder what other types of RFP scoring systems people use? Larry

lfreeman · ‎2010-08-27

During my time in the storage industry I've seen four common ways people buy storage: 1) They call their favorite vendor and ask for a quote 2) They call their favorite vendor's competitor and ask for a better price 3) They call 3 vendors (to make the boss happy?) and ask them all to bid the same configuration 4) They do a formal RFP process and open it to all bidders I've seen options 1-3 used much more than #4, I guess because storage buyers believe they know exactly what they want and who they want it from, and the only thing that might change their mind is a rock-bottom price. How do you purchase your storage? Do RFP's make sense? Are they too much work? Too subjective? I'm interested to know if you fall into any of the above categories, or if you buy storage some other way- Thanks, Larry

lfreeman · ‎2010-02-28

Hi Marco, having more than 255 references to a deduped block should not cause the clone process to run as slow as you've stated below. I suspect there is something else going on, I'll check with our developers and let you know if I come up with anything. In the meantime suggest you open a case with NetApp support so they can diagnose too. Regards, Larry

lfreeman · ‎2010-02-16

Hi, on average we see around 30%-35% savings on user files with dedupe. What you are seeing might be as good as it gets, but if you are taking hourly snaps - this might also be the cause of the lower rate. Refer to TR-3505 in the netapp.com library - its our dedupe bible. What the TR says, in a nutshell, is if you take a snap and then dedupe, no duplicate blocks will be freed until that snapshot expires, and if you continue snapping every hour before dedupe can finish - you'll never get out this cycle. So here's something to try. If you can, try to give dedupe an extra hour to complete by skipping one snap each night, say, you 1am snap - i.e. schedule sis start to run after your midnight snap then resume your hourly snaps at 2am. Skipping one hourly snap might be all you need. After 15 days (once all your old snaps roll off) check your dedupe savings again and let us know if they went up. BTW I am pretty sure you can also run sis stat and it will tell you how long it took to run dedupe on the CIFS vol, this could also help identify if the snaps are colliding with the sis operation. Hope that helps, DrDedupe

lfreeman · ‎2009-08-25

In a recent thread, Wayne McCormick asked a question about any potential dedupe savings on Oracle databases. Response from Dean Brock was NetApp Engineering had run tests and observed about 15% savings but didn't think this amount of savings was all that interesting. Any other community folks want to give their opinion on this? How much savings is "good?" What amount of savings would trigger you to want to run dedupe? Thanks, DrDedupe

lfreeman · ‎2009-08-13

Yes you should of course see 65% savings on the new LUN just like you did on the old LUN. Your LUN settings seem correct. Question, did you run the "sis on" command on the volume before you did the migration, or run "sis start -s" on the volume after the migration? You'll have to have done one or the other to get the full dedupe savings. DrDedupe.

lfreeman · ‎2009-06-26

Hi Leif, By default, reallocate runs at the logical level, and any blocks that have been deduped will be skipped. You can also run reallocate with the -p option, which will force reallocate to run at the PBVN physical level. In this case, WAFL will try to put the deduped blocks in optimal order, but this might be difficult to do in heavily deduped volumes. In either case, reallocate does not rehydrate deduped data so space savings is unaffected. We don't have an official policy on running reallocate on sis volumes, but IMO I wouldn't bother - don't think its really going to give you any substantial gain. If anyone has experience with before and after performance running reallocate on deduped volume, please feel free to share you experience. Larry

lfreeman · ‎2009-06-26

AJ, In addition to Rick's comments there is an Oracle 11g best practices guide available here - Oracle advanced compression is also discussed in this doc. Larry

lfreeman · ‎2009-06-24

Hi Radek- Lets break down whats happening during the pre- and post- deduplication stages, this should help explain performance impact. Remember that NetApp deduplication on FAS and V-Series systems involves 2 steps - 1) enable dedupe on a volume (sis on) then at some point 2) dedupe the data in that volume (sis start) When you 'sis on' a volume, the behavior of that volume changes. Every time it notices a block write request coming in, the sis process makes a call to Data ONTAP to get a copy of the fingerprint for that block so that it can store this fingerprint in its catalog file. This request interrupts the write string and results in a 7% performance penalty for all writes into any volume with sis enabled. We know its 7% because we measured it in our labs and lab machines don't lie - however every customer I've spoken to says they can't tell the difference, I guess we humans aren't quite so precise. Now, at some point you'll want to dedupe the volume using the 'sis start' command. As sis goes through the process of comparing fingerprints, validating data, and dedupe'ing blocks that pass the validation phase - in the end all we are really doing is adjusting some inode metadata to say "hey remember that data that used to be here, well its over there now." Nothing about the basic data structure of the WAFL file system has changed, except you are traversing a different path in the file structure to get to your desired data block. Like going the the grocery store, you can take Elm Street or Oak Street and depending on traffic either way might get you there faster. Thats why NetApp dedupe *usually* has no perceivable impact on read performance - all we've done is redirect some block pointers. Accessing your data might go a little faster, a little slower, or more likely not change at all - it all depends on the pattern of the file system data structure and the pattern of requests coming from the application. Larry

lfreeman · ‎2009-06-23

These are all good questions, and topics regularly being discussed inside NetApp these days. Scalability of deduplication and continual performance improvements are of course both key objectives for our development team. While we can't disclose specific roadmap items on an open forum like this community, NDA roadmap presentations are available to NetApp customers and prospects via your NetApp account team. Larry

lfreeman · ‎2009-06-23

FYI here is a paper that goes into more detail about setting up NetApp storage up with Hyper-V http://media.netapp.com/documents/tr-3733.pdf Larry

lfreeman · ‎2009-06-18

Andrew, In general, the answer is yes - a volume that has been deduped should not show any appreciable read performance degradation. Since WAFL is a random-layout filesystem, deduplication merely re-randomizes the data blocks. Also remember that NetApp dedupe does not use containers or lookup tables to rehydrate data, we just redirect the existing block pointer metadata. Having said that - I have seen a few cases where read performance degraded, but this is unusual and not predictable - it all depends on the block layout pattern and the pattern of read requests. And as I mentioned earlier - you can always undo dedupe if you don't like the results. Another point worth mentioning is using dedupe together with the Performance Acceleration Module (PAM.) PAM is dedupe-aware so you can actually improve read performance after dedupe with this combination. We've done some tests and I think published them that show dramatic improvement in VDI "boot storm" response times as a result of dedupe and PAM. What has your experience been? Larry

lfreeman · ‎2009-06-18

I can't give you an exact date or release for full integration of dedupe and snapshots but will say this is a high priority for our development team.

lfreeman · ‎2009-06-18

Andrew, There are 3 key factors that effect the performance impact of dedupe- 1) The NetApp FAS or V-Series model 2) The amount of duplicate data in the volume 3) Other processes the system is servicing during the dedupe process If we look at a typical scenario (impossible, I know, but bear with me) - lets say we have a FAS3070, a 1TB volume with 5% duplicate data, and the system is fairly quiet. This would be a typical setting for running dedupe overnight on a regular basis. I would expect this system to complete dedupe in less than an hour and have no impact on workloads (since there aren't any running). On the other hand, if we have a FAS2050, 90% duplicate data, and the system is running at peak load - the dedupe process will take many hours and you will likely see some performance degradation resulting from dedupe. The problem is that there are too many variables for us to give an exact number. Instead, we recommend two things: 1) If your application or system is extremely performance-sensitive, don't run dedupe 2) If you are concerned that dedupe will create an excessive performance penalty, run a POC first Also, remember that your can easily turn off dedupe, and/or "undo" dedupe if you don't like the results you get. Hope that helps, Larry

lfreeman · ‎2009-06-18

Hi Andrew- Deleting snapshots is not a requirement, but yes it will yield the best space savings. Alternatively, you can keep the old snapshots and just wait for them to roll off at which time your space will be reclaimed. Going forward, best practice is to dedupe first then take your snaps. Depending on the volume's growth rate, might be easier just to run dedupe on weekends when perhaps no snapshots are scheduled. Also, just an FYI - Provisioning Manager 3.8 (just released) automates the process of scheduling dedupe and also shows stats on dedupe space savings. Larry

lfreeman · ‎2009-04-27

Hi Gregory, yeah thats a little confusing, let me explain. The 17TB refers to the maximum "logical" size the 2050 volume can grow to. Simple example - I have a 1TB database that I copy 17 times into a volume, dedupe'ing after each copy is made. Evenutally I will have 17TB of database data squeezed into a 1TB physcal volume - thats as big as "sis" will let you grow the volume for a FAS2050. Make sense? Also, as mentioned in the other replies, in 7.3 the volume values increased, see TR-3505 in the Docs section of this community for more on the new vol sizes.

lfreeman · ‎2009-04-13

Thought you'd never ask. Follow this link and look under "Related Content" to see some examples of organizations that had game-changing experiences as a result of storage efficiency. New case studies are appearing all the time, so check back often. It appears that storage efficiency is really catching on!

lfreeman · ‎2009-04-13

Wanted to let the community know that there is a storage efficiency webcast coming up this Thursday, click here to register. During this webcast, Jason Bane of Va Credit Union will describe how he was able to downsize his storage environment without sacrificing performance, then Joel Reich will explore many of the myths around storage efficiency in SAN environments. Questions will be fielded by a expert panel during this one-hour live event. See you there!

lfreeman · ‎2009-03-23

To me its all about the measurement. To steal a phrase from a pretty good book (Purple Cow) "if you measure it, it will improve." Otherwise, how do you know how efficient something is unless you measure it and compare it to something that is inefficient? The question becomes, how exactly do you measure storage efficiency? To me, its simple. Just compare the storage you bought from your friendly storage vendor (aka Raw Capacity) to the storage you can actually use to run your business (aka Allocated Capacity.) Most people call this utilization, and I've seen vendors fight other whose utilization is higher - hey we are are 70% and you are only 60%! Thinking more about this, I began to wonder if this is an old way of thinking, and a new standard is in order - out with the old and in with the new! Anyway, there are lots of new technologies out there that make data "appear" larger than it actually is. Ones that come to mind are compression, deduplication and thin provisioning, but I'm sure there are others. So why not take all these smart features and use them drive utilization to 100% and beyond? What a concept, buy 10TB and actually use 10TB. So here's the simple formula for storage efficiency I'd recommend: Storage Efficiency = Allocated Capacity / Raw Capacity Allocated Capacity is the capacity apparent to applications and users, while Raw Capacity is the manufacturers' stated capacity including system reserves. How high can you go? 100%, 150%, even 200% storage efficiency? Let the measuring begin!

lfreeman · ‎2009-02-19

Hi Jeff, depends on the version of ONTAP. With 7.3.1 when you run 'sis stop' we will set a fence and when you restart we will start where we left off. For versions before 7.3.1, we will start all the way back at the beginning - but if not much time has elapsed between the 'stop' to the 'start' commands we'll quickly get back to the point we left off because the volume will not have changed much and we will run through the already-processed fingerprints quickly. Hope that helps- DrDedupe

lfreeman · ‎2009-01-19

Hi Will, What we mean by "not being supported" in an active-active configuration is that if a takeover occurs and dedupe is running, dedupe will stop. I don't think this is a big deal, it just means that you need to restart it manually after giveback, or just wait until the next scheduled dedupe process kicks in. Here is how I described it in the Dedupe FAQ: 27. Will deduplication work in a clustered (CFO) environment? Yes, the deduplication operation will suspend during takeover and can be resumed after giveback. During the takeover period, deduplication will continue to create and log digital fingerprints as new writes take place on the volume. Note: deduplication must be licensed on all CFO cluster nodes. Does that help? DrDedupe

lfreeman · ‎2008-12-29

Hi Adrian- Each time you take a volume snapshot, every block in that volume is taken out of the active volume, and cannot be deduplicated (i.e. released to the free pool) until the snapshot expires, even if the block is a duplicate block. There are two ways to get around this: 1) Dedupe first, then take your snapshot. 2) Wait for snapshots to expire, at that time you'll get your space savings back Important to note that you can't "hurt" anything with any combination of dedupe and snapshots, its just a matter of when you see your savings. Thare is much more detail in TR-3505, this doc can be found here in the community or in the library on netapp.com Hope that helps DrDedupe

Dedupe 18tb Exchange 2007, what kind of savings to expect

Exchange 2010, should I turn on dedup?

Re: De-Duplicating a Syncmirror volume

Re: Do you use RFP's when buying storage?

Do you use RFP's when buying storage?

Re: How do I get the information, if a file has already one or more blocks referenced 255x?

Re: Dedupe CIFS and NFS

Is 15% dedupe savings on Oracle DB's good enough?

Re: Dedup only saving 5% on 750gb of windows VM's

Re: Deduplication and reallocation, should I even reallocate?

Re: Oracle ASM on VMware & iSCSI

Re: Deduplication Performance Impact?

Re: Future of A-SIS de-dupe

Re: How can I set up Hyper V to get the best saving with Dedup?

Re: Deduplication Performance Impact?

Re: Snapshots & Dedup

Re: Deduplication Performance Impact?

Re: Snapshots & Dedup

Re: Dedupe limits in FAS2050

Re: What is Storage Efficiency?

Storage Efficiency Webcast this Thursday April 16th

Re: What is Storage Efficiency?

Re: deduplication restart

Re: Deduplication is not supported on an active/active configuration?

Re: A-SIS and backups using snapshotss