Exchange does have single instance storage within a database, however its effectiveness is diminished as the number of databases in
an organization increases. My experience with production data is to expect 3-10% savings with deduplication (with the vast majority hovering at 4-5%), which generally means don't do it.
Everything "depends". What are your users like, how much mail do they send/receive a day (change rate), what are they sending? (large attachments?)
The way ESE (jet) places items into the database and later defragments those pages means that a 1MB attachment in 2 databases may not deduplicate very much at all.
Exchange 2010 is a different beast (no single instance storage and page zeroing on by default). Worst case if you run it daily, you should be able to recover your change rate, and after more definite testing with RTM bits, we may see more. Expect guidance for Exchange 2010 soon.
All the customer data we have seen alignes with what Rob has said. In a 2003/2007 environment we don't see over 10%. With that type of return it really doesn't buy you anything.
As far as Exchange 2010 is concerned, Andrew is correct there is no more SIS, there are also some additional features that may possibly result in better numbers dedupe which may provide a much better return.
Has anyone here tested ASIS behavior on "majority whitespace" edb files?
Our scenario: two years ago we were doing zero spam filtering (well, technically we tag+forwarded all spam, which was only 50% or so effective). Today we are rejecting 88% of all incoming messages. Our 4TB .edb files only produce about 750GB worth of backup data. Our assumption is that the discrepancy is mostly wasted space as customers receive and retain much fewer mail (we remove unchanged objects on a specific policy that varies from group to group, point being is no one can retain their mail in their primary inbox forever).
Could netapp deduplication reduce our storage allocation closer to actual utilization?
This is a challenging question to answer as typically people defragment the Exchange environment (from within Exchange) resulting in the data being spread across the edb file, or so I am told. So it would be hard to judge how much whitespace there would be within the file that could be deduped. As previously said the savings we have see are around the 10-12% mark, but there is about a 5-7% metadata overhead so the net savings could be below 10% (assuming metadata isn't included in the initial percentage). What you could try is getting help from a friendly local SE or Partner and get Flexclone, if you Flexclone the volume and run dedupe on this clone you should see the actual saving you would see on the real volume. Once the test is done you could trash the Flexclone without effecting the primary data. This would of course take resources to run so worth doing during a nice quiet time.
Something else to consider is what would dedupe give you? It may well be the the number of spindles you have are there to deliver the IO requirements of Exchange, so freeing up space couldn't be used by another application, but you could of course use the free space for Snapshots.