You will perhaps get a good deal of differing opinions here, but some fundamental common sense is going to probably get you the farthest in the long run. 'A-SIS' is a data manipulation (filesystem) tool that is primarily concerned with removing (consolidatïng) duplicate blocks from the filesystem. Even if it has been refined overtime to try to reduce fragmentation, the tool main goal is to optimize space savings. How performance is affected is largely going to be the result of how deduplication affects seek times and the number of reads it needs to access the blocks requested. This would seem to be relatively easily deducible from a simple understanding of disk based harddrives. 'sis' is a tool with a specific use in mind. Like most tools, you can try to use it for other things but the results may be suboptimal. (You can use a knife to loosen a screw, but you might ruin the screw or the knife in doing so).
There is one complicating/mediating factor here as well: system memory. The larger the system memory, the more easily the system can cache frequently accessed blocks without having to access the disks. This is also why PAM-II cards can have an amplified advantage on de-duplicated data. The 2050, unfortunately, isn't going to have many advantages here.
We might then, deduce that 'sis' isn't the right tool for filesystems (flexvols) that require optimal access times. We can probably reasonably estimate a number of scenarios where 'sis' would be useful and some where it wouldn't be. The key here, to reiterate, is to segregate the data in a way that will make these decisions more clear-cut.
1) VMWare volumes: datasets that are highly duplicate, relatively static, and require little or only slow access: system "C:" drives, for example. The similarity of the data here should lead one to perhaps have exclusively C drives together (without pagefile data).
2) VMWare volumes: datasets that are moderately duplicate and require moderate access times.
3) CIFS/NFS data that moderately duplicate and require moderate/slow access times. The sizes of the file systems here can result in significant savings in terms of GB for normal user data.
Conversely, there are probably many datasets where 'sis' is going to give minimal savings or sub-optimal performance.
1) Datasets with files that have random and/or unique files. siesmic data, encrypted files, compressed files, swap files, certain application data
2) Datasets that have files with application data which have optimized internal data structures or that require fast access times: databases
3) Datasets that are too small for significant savings.
4) Datasets that are dynamic and require fast access
The common sense comes, then, in using the tool for what it was meant for. Segregate the data into reasonably good sets (flexvols) where 'sis' can be used with success and where it shouldn't be used at all. In the end, the goal for most IT operations isn't to save file system blocks at all costs, i.e. without considering performance.
There are other maintenance routines that can help with access times, like the use of 'reallocate', but one needs to read the docs and use a little common sense here too. Normal fragmentation will affect most filesystems over time, but that isn't a situation that is limited to de-duplicated filesystems.
Not sure where this came from, but you may say that yes, your first dedupe scan will show the most dramatic savings, whilst consecutive scans will simply have less duplicated blocks to reduce (or none, if you don't add any new data in the meantime).
disaster recovery from dedpu envairoment can become tricky to problematic.
It's a bit of an urban myth in my opinion. It probably refers to the fact, that deduped data on the filer will get re-hydrated when backed up to tape. If you rely on NDMP to tape for DR, then you will need more physical capacity on your tapes than on yor filer. Then if you do a restore from tape, it might turn out you won't be able to restore everything in one go, because of the lack of space on the filer - you have to restore in chunks, running dedupe scans in the meantime. I think it is a bit of a trade off though: without dedupe these issues go away, but you need more disks and/or can keep less data.