So I've been watching our disk space shrink over time, and I ran an IOmeter test on one or our virtual machines whose VMDK resides in a VMFS datastore on a VMware-formatted LUN in a flexvol. The results were abysmal for random operations, far lower than even lesser arrays.
This is on a new Dell Poweredge R610 (dual 6-core 2.8Ghz, 96GB RAM) host with a dual-port QLogic 2462 (running ESXi 4.1 update 1 with ALUA-enabled igroups and MPIO settings configured by the Virtual Storage Console) connected through Cisco MDS9124's - so everything in between the host and the storage is very capable, results like I've seen should not be happening.
I decided to do a wafl scan measure_layout on one of my VMFS datastores to see what was going on, and the ratio was 4.93, expected was 1.32.
Based on that information, I decided to perform a volume reallocation using the reallocate start -f -p /vol/[volname] command.
It completed after a number of hours (the datastore is a terabyte in size) and I reran the wafl scan measure_layout command, and the ratio had dropped to 2.93. Progress!
I then ran a reallocate measure -o /vol/[volname] and it gave me a result of 2, which should be fairly optimized.
All's good at this point, except all of the volumes in the aggregate containing the VMFS datastore are showing "redirect" status, even the aggregate itself is showing "redirect". The VMFS datastore itself is still showing "active_redirect" status.
I ran a wafl scan status command and now it's showing all volumes on the filer in question (even volumes in another aggregate) as performing "active bitmap rearrangement".
The other aggregate and it's volumes do not show "redirect" status.
My questions are:
Will the "redirect" and "active_redirect" statuses go away once the wafl scan is completed, since the system shows no reallocations running?
Can I run more volume reallocations even if the volume (and the containing aggregate) itself is in "redirect" status?
Should I reallocate all of the volumes in this aggregate (since it's at 81% used <gasp>) then perform an aggregate reallocation to ensure volumes are spread evenly across all spindles and free space is available in large, contiguous chunks?
I too would like more infor on the "redirect" status. In my case the reallocate -A scan completed successfully, however, the aggregate and both volumes contained within have an added "redirect" status. Neither, however, have an "active_redirect" status.
Can someone point us in the right direction? Thank you.
I'm not 100% sure but I think it may be due to the usage of the `-p` option as per the man page:
The -p option requests that reallocation of user data take place on the physical blocks in the aggregate, but the logical block locations within a flexible volume are preserved. This option may only be used with flexible volumes, or files/LUNs within flexible volumes.
The physical blocks have been moved but the logical blocks remain the same, I imagine the redirect status exists because a redirection is taking place between the logical block locations and the updated physical block location.
Another snip from the manpage:
Using the -p option may cause a performance degradation reading older snapshots if the volume has significantly changed after reallocation has been performed. Examples of reading a snapshot include reading files in the .snapshot directory, accessing a LUN backed by a snapshot, or reading a qtree snapmirror (QSM) destination. When whole-volume reallocation is performed with the -p option and snapshots exist an extra redirection step is performed to eliminate this degradation.
active_redirect can have a performance impact on the volume. Usually it clears itself up when the redirect scanner runs on it after the reallocation. If it does not, then the command I run to resolve are the following: