Auto-tiering for LUN's? (C-mode)

JACEK_HIRSZTRITT · ‎2013-06-10

Hi everyone,

I'm looking for a workflow that would do a simple auto-tiering between aggregates composed of different kinds of disks for LUN's only.

Let's say we have 3 aggregates with these kinds of disks: SATA 7,2k, SAS 10k, SAS 15k.

I'm looking for a script that would do more or less this:

1) collect LUN's usage statistics during the day

after that, at night:

2) take 20% of the most busy LUN's and 'vol move' them to the SAS 15k aggregate / if not already there (I assume the configuration is 1 lun per volume)

3) take the 40% of the least busy LUN's and 'vol move' them to the SATA 7,2k aggregate / if not already there

4) take the rest and 'vol move' them to the SAS 10k aggregate / if not already there

I'm new to WFA and the first question is - is it possible to make such script? If yes - have any of you come accross something like that (not necessarily the same)?

Regards,
Jack

goodrum · ‎2013-06-11

TLDR: This is possible but requires some additional 'components'.

It is possible to do this and not a bad idea. The first challenge would be to 'identify' the Lun performance criteria and determine the thresholds. You could use the Performance Advisor datasource to gather this information and since WFA can access that system as a Data source. This will give you the data needed to make determinations. Now the next step would be to create appropriate Finders that will determine which Luns fall into the right 'grouping'. These finders will be used in the appropriate workflows to generate a Repeat Row for the moves.

Now, create three individual workflows (technically, this could be one but you might want to 'move' more frequently depending on needs and type). The idea would be to have a repeat row condition to find a 'condition' and list of Luns. Now, determine if the Volume containing the Lun matches the 'group list' of the correct aggregates. This could be based on the idea of a UM Resource Group or by name or by disk type (since the latter is cached this might be the easiest). Use a No-Op Cluster command to determine this value and set it to disable the command if 'not' found. Now, use a second No-Op Cluster command to find a 'suitable' aggregate based on the correct finder.

The next step will be to perform the Volume Move. Setup the command and use the information that has been found so far. The last step is to configure the Advanced tab of the Second No-Op and the Volume Move command to only execute if the Volume is not in the right aggregate (If the first No Op was not found). This step will help prevent errant moves.

Rinse and repeat based on your criteria. You should end up with three workflows based on the disk types. Once this is done, you will want to remotely execute these workflows based on a schedule task. Take a look at the REST Api guide for guidance on execution but in a nutshell, you would create a powershell script that is run on a task. This script would then execute the workflows.

Theoretically, this will work thought I have not built this on my end.

Jeremy Goodrum, NetApp

The Pirate

Twitter: @virtpirate

Blog: www.virtpirate.com

MACIEJ_FEDOROWICZ · ‎2013-06-13

But how can I get performance statistics about LUNs (e.g. total IOPS)

In performance/cm_performance schemes from Performance Advisor there are stats only from objects: aggregate, volume and disk

Aggregate level counters:-

1. aggregate:cp_read_blocks - Number of blocks transferred for consistency-point read operations per second

2. aggregate:cp_reads - Number of read operations to the aggregate per second during consistency-point processing

3. aggregate:total_transfers - Total number of disk operations serviced by the aggregate per second

4. aggregate:user_read_blocks - Number of blocks transferred for user read operations per second

5. aggregate:user_reads - Number of user read operations performed by the aggregate per second

6. aggregate:user_write_blocks - Number of blocks transferred for user write operations per second

7. aggregate:user_writes - Number of user write operations performed by the aggregate per second

Calculated counters at aggregate level

8. aggregate:avg_disk_busy - Average utilization of the disks of the aggregate (in percentage).

9. aggregate:avg_volume_latency - Average response time for any operation on any volume of the aggregate (in micro seconds)

10. aggregate:avg_disk_user_read_latency - Average response time for reading a block of user data from disks of an aggregate (in milli seconds)

11. aggregate:avg_disk_user_write_latency - Average response time to write a block of user data onto disks of an aggregate (in milli seconds)

What about LUN stats? (both 7-mode and CM)

Best regards

merick · ‎2013-06-11

While possible I don't know that this would net you the desired results. Any form of post-process data tiering is moving the data too late. If you are moving data based on old access patterns you are making the assumption that the data access pattern will remain consistent. It probably makes more sense to put your performance sensitive data on your best performing tier, monitor access using OnCommand Unified Manager, and then do 1 time vol moves to place the data where it needs to be if you see a historical trend of low/high performing IO. By leveraging some of our flash technology (FlashCache, FlashPool, FlashAccel), even data on SATA will be accelerated on demand and at the time of access. There isn't a need to mess with complicated data movement policies with NetApp.