"He was told by a VMWare engineer that if you run Windows defragmenation on the virtual server while also are using de-duplication on the storage, it can cause corruption of data on the virtual server and/or the VMDK"
Can anyone provide some insight to the risks/benefits of OS level defrag vs. reallocation vs. other ?
Can anybody point me to any relevant documentation on this ? Official NetApp position?
If you have virtualized any servers in your shop, are your sysadmins performing OS level defrags on their mapped 'drives' ? Is the practice useful, useless or dangerous?
Grr....I had a super long response worked up and the Communities timed out my session (I'd been active just 5 minutes ago so not sure what happened).
In short, you don't want to do this....has to do with temporal locality....here's an explanation I sent to a customer recently.
To answer your question, I'm not finding any "good canned document" on this unfortunately. I did check with <high-level NetApp engineer> who confirmed that OS-level defrag (i.e. defrag inside Windows of its NTFS volumes) isn't needed. The main reason here is that when Windows writes blocks it assumes that it's writing to a physical disk and having those blocks out of order will cause speed issues. When Windows is writing to a NetApp, WAFL (the NetApp's file system) actually controls where the blocks gets written -- Windows has no idea where the blocks are on the actual physical disks inside the NetApp (since the NetApp controls the physical disks and presents a virtual disk up to Windows, VMware, etc.).
Note: in this regard NetApp is different from other SANs/storage arrays which do actually map LUNs/volumes directly to physical disks (for those there could be a benefit to Diskeeper, etc.).
To put it another way....
Is there any benefit in running a Windows-level defrag tool? No.
Will it absolutely hurt anything? Not really but.....
Will it create larger NetApp snapshots? Absolutely -- as you'll be rewriting many of the blocks inside Windows that wouldn't be touched normally.
Will it create a LOT of disk traffic between the Windows server (or VMware server if a Windows VM) and potentially slow down other people/servers using the NetApp? Yes as well.
The closest thing we could find publicly available (checking with NetApp) is actually this document focused on Exchange but applicable for defrag/snapshot impact....see the "Online defragmentation" section.
Now, there is one command called "reallocate" on the NetApp side which can help somewhat with data layout but it works transparently to any servers accessing the NetApp -- I'll write more on that tomorrow.
Side-note: if you haven't looked at it already, do be sure to install the NetApp Host Utilities for VMware inside each of your ESX hosts.
According to TR-3749 (jan 2010), Netapp and vSphere storage best practices:
“DISK DEFRAGMENTATION UTILITIES
Virtual machines stored on NetApp storage arrays should not use disk defragmentation utilities as
the WAFL file system is designed to optimally place and access data at level below the GOS file
system. Should you be advised by a software vendor to run disk defragmentation utilities inside of
a VM, please contact the NetApp Global Support Center prior to initiating this activity.
I would tend to think this applies to non - VMs on Netapp storage as well.
We have an ongoing debate in our office about if we should do this or not. Seems like anyone you talk to will give you a different answer. I argue NO, you should not. A few of the cons:
Significant growth of snapshots
Depulicated data may have changed from the filers perspecitve when the next dedup job runs, resuling in the need to "re-dedup"
Frequently accessed blocks are stored in cache, defraging accesses many otherwise unaccessed blocks. Now the caching algorithm will be fooled into thinking that a dormant block of data is acutally active,, and store it in cache
increase in disk I/O traffic
WAFL optimises placement of blocks anyway. Obviously there is no single physical disk corresponding to your guest partition.
The question is, is there a measurable overhead within the Windows guest that would be reduced by defrag, and does the benefit outweigh the chaos on the storage side caused by deffraging?
Yes....great data there. I've seen reallocate really help as well (the most dramatic example was a bunch of Groupwise servers on FC disk where reallocate cut latencies more than in half). I'm still wrestling between when it makes sense to use "reallocate" vs. "reallocate -p" (if there are any difference in how long it takes to run, how it impacts speed, exactly how much -p helps with snapshot deltas, etc.).
Re: Defrag or No? Windows guest OS defrag w/in FC LUN
That I can answer reallocate -p has zero impact on volume snapshot size.
I'd add to your list interaction with VSM. Data transferred after reallocate will be defragmented; but after reallocate -p - not (at least if I correctly understand how it works). This may need to be taken in account if destination is often used for tasks like backup verification.
Re: Defrag or No? Windows guest OS defrag w/in FC LUN
I'm not a storage expert. I just maintain a set of virtualized sql servers using Netapp luns (iscsi/snapdrive/snapmanager;...).
I noticed that data updates are written to free blocks, meaning the original block is not updated, but kept, since referenced by snapshots earlier made. So, may I conclude fragmentation is inherent to Netapp? May I conclude windows defrag might cause volumes running out of space? May I conclude that (in case we would have enough free space in the volume) the chance that less physical IO is initiated after defrag is negligible or even that in some cases the number of physical IO's might increase? May I conclude Windows will initiate less IO's since it thinks data is sequentialized, but the consequential number of IO's on Netapp is unpredicatable? May I conclude that the sql command "set statistics io on" does not tell me the truth about the number of physical reads executed on Netapp (or any other disk virtualisation/SAN system), only the number of physical IO windows or SQL thinks that have to be done?
Anyway I wonder what defrag means in RAID setup. Isn't it so that 1 byte might be spread over 8 physical disks ... or that 8 bytes might be spread over 8 physical disk?
When I read this, I start to wonder whether sql server index rebuilds might no longer be best practice, since this will have the same effect on snapshots as windows defrag? May I conclude we benefit HA, DR and fast restore, but that we should review best practices regarding IO optimisation?