I know NA DeDup which is taking place on the storage. What about backup software, for instance CommVault Deduplication? Is it taking place on hosts? can it also be running on the storage? What is the preference of using one over the other?
I have no experience and knowledge with CommVault, but in general, deduplication of backup software is processed on backup server hosts.
DataDomain Boost is unique case. DD is deduplication storage appliance, so data is deduced on DD by default. But in case of backup softwares that works in conjunction with DD Boost plugin, de dupe is processed on host side.
The question you pose in the initial post may seem misleading - you seem to be asking to compare the deduplication in a general purpose storage device (NetApp) with the deduplication in a dedicated backup/archive system (Commvault). Unless you are thinking of using the NetApp storage as the target for your backups, the original question doesn't really ask a meaningful comparison - kinda like asking to compare an engine's fuel efficiency features on a lawn mower and a farm tractor. Both could be used to mow lawns, but that isn't what the farm tractor is specifically design for nor tuned for.
So if I may - let's backup a step and make some observations. The original question along with some followup comments are looking to compare deduplication in three separate classes of product, which I will identify as general purpose storage (e.g. NetApp), as a dedicated appliance (e.g. Data Domain), and as a backup/archive system (e.g. CommVault).
From the top - NetApp storage is designed first and foremost as a general purpose storage system. All the features available are specifically designed to that goal first - provide any access to any data using any (well, a lot of) protocol first. Do the read/write thing well. Storage efficiency (dedup, etc.) features are bonuses to increase the value proposition and administrative convenience when integrating NetApp FAS storage into application environments. Dedup is present, and while very nice to have and useful it is focused within the storage unit of allocation - a volume - that makes sense to data with a specific purpose. Hence, if you were to take a volume that is shared and make a second copy of the data onto another volume, NetApp native Dedup would not see the two in the same manner and you would not get any deduplication (of course there are other space efficient ways to create the same effect, but this discussion focuses on Dedup).
Dedicated deduplicating appliances like DataDomain (and IBM Protecttier, among others) are design specifically around data storage efficiency. Whether inline or post process, the appliance ingests data and runs it all against the deduplication enginer at a global level throughout the appliance. Using my example from above - you would certainly get the full effect of deduplication if and when the data is copied to the appliance. Some appliances are limited in that they present as a backup/archive target only, for instance as a virtual tape library, whereas others advertise multiple protocols through which data can be loaded or accessed. In as much as they offer file level access, they could be used in a general purpose file storage mode if one likes. However such devices are not designed as primary storage first - rather they are designed as targets for fixed term storage of data with storage access bolted on as a convenience. But they will get better dedup results due to the global view within the appliance. Typical modern appliances dedup on ingest automatically, and can be limited on how fast they can process incoming data.
Deduplication within backup software environments raises the bar yet again. Typical backup systems (CommVault, Avamar, TSM, NetBackup to name a few) that support deduplication do so on an enterprise scale. That is, data to be deduplicated is hashed and keyed, either on ingest or post process, then all future data written to the system is processed at the client end to see if the newly minted chunk of data even has to be sent. This process leverages latent compute power across the entire backup realm/domain/whatever to process the dedup and doesn't even bother to send new data to the designated storage. Of course in the world where much of the infrastructure is virtualized, this can still produce a big impact on actual compute needs - backup is always a bother that way.
Now - consider a place where everything is integrated, talks to one another such that it can, and shares information when it can. You have some user data on a NetApp Filer volume that has some level of deduplication. You use CommVault or TSM or whatever with global dedup turned on for a large pool of data. However, a client has to analyze the data, and that client isn't running natively on the Filer. So any data to be processed is "undedupped" at the Filer, sent to an agent box, then processed for dedup again either by the backup software or perhaps a backup appliance target setup as the storage behind a backup storage pool. It may be that the ultimate backup setup does a better job deduping, but there are multiple touches where data gets re-hydrated along the way.
This last paragraph leads us back to where we started - what about the deduplication in all these different types of systems do you want to compare?