Ok in our case what I found is that we have a POC virtualized commvault system that came online in the last 3 weeks. The POC system was running on volumes that were being deduplicated. We don't know if it kept generating too much data for the flashcache card to handle or if it threw bad data at the flashcache card. Flashcache caches metadata and random reads. Deduplication uses the metadata information so it's going to use the flashcache religiously. So I was guessing that since we have multiple volumes that were logically virtualized by commvault as one volume, that there were alot of random reads and in combination to those volumes being set to dedup, it must have done something negative to the flashcache card.
We did notice that it was only 1 of the nodes that was being affected. So I was able to rule out the new datawarehouse since it was operating on the 2nd node.
So maybe some of my questions might help you out in finding out what system is causing your issue:
- Are you running everything off of 1 node or 2 nodes?
- Does the wafl error state that it's comming from one node or both nodes?
- Is your backup system on separate disk or inside the Netapp SAN?
- Are you deduplicating every volume? Including SQL data volumes?
- Do you have all your volumes running the deduplication schedule using the default sun-sat@0 schedule?
The things I did to bring down performance usage as much were the following:
- Made sure not to deduplicate SQL databases/logs (not worth the processing compared to the low space savings)
- Moved all non-production dedulicated volumes to run on a different time on the weekend with a qos-policy of background
- changed some of the depuplicated volumes to automatic or disabled deduplication. I became picky about what should be deduplicated.
- moved the POC commvualt test to a separate san so it doesn't touch the flashcache
A drastic thing you can do if you need to buy time is to disable deduplication for now across all the volumes while you track what system could be generating all the data. Hope this helps.