I undestand that this is an egg & chicken dilema.
There will be some performance impact, but you may try stage de-duping of only certain volumes, then do some alignment, then de-dupe some more volumes, etc.
This is what the TR-3505 says about performance:
Write Performance to a Deduplicated Volume
The impact of deduplication on the write performance of a system is a function of the hardware platform that is being used, as well as the amount of load that is placed on the system.
If the load on a system is low—that is, for systems in which the CPU utilization is around 50% or lower—there is a negligible difference in performance when writing data to a deduplicated volume, and there is no noticeable impact on other applications running on the system. On heavily used systems, however, where the system is nearly saturated with the amount of load on it, the impact on write performance can be expected to be around 15% for most NetApp systems. The performance impact is more noticeable on higher-end systems than on lower-end systems. On the FAS6080 system, this performance impact can be as much as 35%. The higher degradation is usually experienced in association with random writes.
Read Performance from a Deduplicated Volume
When data is read from a deduplication-enabled volume, the impact on the read performance varies depending on the difference between the deduplicated block layout compared to the original block layout. There is minimal impact on random reads.
Because deduplication alters the data layout on the disk, it can affect the performance of sequential read applications such as dump source, qtree SnapMirror or SnapVault source, SnapVault restore, and other sequential read-heavy applications. This impact is more noticeable in Data ONTAP releases earlier than Data ONTAP 7.2.6 and Data ONTAP 7.3.1 with data sets that contain blocks with repeating patterns (such as applications that preinitialize data blocks to a value of zero). Data ONTAP 7.2.6 and Data ONTAP 7.3.1 have specific optimizations, referred to as intelligent cache, that improve the performance of these workloads to be close to the performance of nondeduplicated data sets. This is useful in many scenarios, and especially in virtualized environments. In addition, the Performance Acceleration Modules (PAM and PAM II) are also deduplication aware, and they use intelligent caching.