Reporting host % utilization in DWH

DSS_JBARB · ‎2013-11-27

Hello,

I'm trying to find a way of showing actual storage utilization at the server level. In the Storage and Storage Pool Capacity data mart, I'm able to report used/unused at the storage level, but I don't see the ability to drop in the host.

I'm in Storage Capacity at the moment. Physical Usable and Provisioned Usable Capacity show the same numbers. Although we have several servers that are nearing capacity limits, I don't believe that the servers are 100% utilized across the board. What I want to be able to report is how much capacity the customer is actually using? For example, are all servers actually using 100GB, or are they only using 10GB of that 100GB, and can we possibly reclaim storage that isn't being actively used?

Help please! I've been looking around at the other data marts, and the possiblity of simply running a report from OCI itself. Any and all suggestions are welcome.

Thank you and Happy Thanksgiving!

Julia

moechnig · ‎2013-11-27

It sounds as if you're looking for the utilization level of SAN host filesystems. Like you'd see in the output of "df" on a Unix machine. If that's the case, you're not going to find that in OCI by default. Since we don't instrument the SAN hosts, we don't collect that data. There is a way to import it from any other database that might provide it, via the FSLU data source. This associates capacity numbers with mount points or drive letters on the server, but does not tie it back to the storage LUN(s) underlying the mount point.

DSS_JBARB · ‎2013-11-27

Thanks, James. That is essentially what I'd like to get at, but on at the server level. I don't need to get down into each file system. It just seems that if I can get used vs. unused at the storage level, why couldn't I get it at the server level? I can also provide volume unused, which I think will get me close (?), but I was hoping for something a bit more clear cut. I'll keep working with this and see what I can come up with.

Thanks again!

ostiguy · ‎2013-11-27

Hey Julia,

There are a couple aspects here:

At the storage level, you may have:

storage pool unused - you have an aggregate, thin provisioning or some other type of pool where there is free space that you can allocate more volumes from.

Volume unused - you may have carved volumes, but they are not mapped nor masked to hosts - therefore, they can be presented to hosts as they are currently unused. Volume unused is becoming less common as the world has moved to thin provisioning, where you tend to carve exactly what you need. Before thin provisioning on platforms like Symmetrix and HDS USP, you tended to have an array that was pre carved, and it was up to you to hand out the slices of Wonder Bread.

On a volume level, OCI knows the capacity of the volume - this will be as seen by the host. OCI for many thin provisioning platforms will also know the "consumed capacity" of the volume - this is how much space towards the volume has been consumed by host writes + overhead.

The tricky part is that consumed capacity may have absolutely nothing to do with what the end user perceives as file system utilization. This is because a lot of file systems were not thin provisioning friendly. An example is NTFS (Windows standard for a decade), but other are similarly afflicted. Here is the issue - NTFS has a bias for writing new writes to new blocks on the file system, on the presumption that writing to dirty blocks (blocks free but have previously been written to) may be fragmented, resulting in poor performance (again, this was also somewhat presuming a world of spinning disk).

Anyhow, imagine this is the application behavior:

100GB volume seen in OCI, 0GB "consumed capacity" in OCI.

Nightly, the host application writes 10GB, parses it looking for a needle in the haystack, and deletes it.

If you are the application owner, at any point in time, you are likely to see either 0GB or 10GB for used capacity on that file system.

However, due to how NTFS tends to behave, the first 10GB of writes caused 10GB to be allocated to the volume, so OCI consumed capacity was 10GB.

Deleting that 10GB will not change that consumed capacity value - historically, hosts had no way to tell a storage array to deallocate those blocks that are no longer in use.

The second night, when 10GB is again written, NTFS has a bias against block reuse, so another 10GB will be allocated from the array.

So, as you can see, thin provisioned volumes have a tendency to become "thick", or fully allocated over time. To counter this behavior, there has been a lot of work in the industry for file systems and storage arrays to be able to communicate and be more thin provisioned friendly. However, where we are is still a world where what the array has had to allocate to the host may have little to no correlation whatsoever with what the end user / application owner sees for file system utilization on their host accessing the volume.

With all that said, OCI 6.4.1 has extended its model of a volume to support written capacity - some thin provisioning arrays seem to know how much data has actually been written to thin volumes.

See my post here on that topic - https://communities.netapp.com/message/115931#115931

DSS_JBARB · ‎2013-12-02

This is helpful! But, let's say the Written Capacity field is blank. What does that mean?

I'm beginning to think that I will not be able to report on the server level, but strictly at the array level. I can report on "Used" and "Unused" capacity for the storage pool, but I can't drop in host information, only storage. Even so, is that an accurate report of what is truly being used? In regards to capacity planning, is there a good way to determine the usage so that we can report whether or not we are able to reclaim unused storage.

Thanks again!

Julia

ostiguy · ‎2013-12-02

If written is blank, you probably are looking at a thick provisioned array, and you are back in the classic difficulty where the array doesn't know what has been written to it. The only way to get this data is from the host, which is painful.

It goes back to the earlier point though - what the host says is used cannot be used to determine how much storage you need to buy. Your hosts' file systems may be thin provisioning unfriendly, and as such, what the host shows for utilization may have low correlation with what is required, or will be required over time on the back end.

If the goal is to drive higher storage utilization, here are some high level thoughts for using OCI to do so:

#1

Note - this presumes that all FC switches are being discovered, to ensure all FC paths are being built.

Look at the unused masked volumes Vulnerability in the OCI client (you can also work with this data in the DWH) - are there large numbers of volumes that are reclaimable? Have the operations team spot check volumes, and plan reclamation.

#2.

Performance driven - do you own the Perform module? Are you collecting volume and or switch performance data? It is possible to use performance data to find idle by performance volumes - if a volume has truly no workload, it may be reclaimable.

#3.

What could I thin provision that isn't thin provisioned today? I would work with my data to find where I am and am not thin provisioning today, and where I could. Do I own any storage platforms where I can convert a volume from thick to thin non-intrusively?

DSS_JBARB · ‎2013-12-03

We have a mix of thin provisioned and non-thin-provisioned.

We may have to look at utilization from the Performance module - is the customer actually accessing and using the server versus is there unused capacity when reclaiming the unused. We're moving to a SaaS provider over the next year, and applications are going to have to start paying for storage. We also want to be able to independently monitor that what the SaaS provider is charging can be verified as accurate (as possible). We're also planning to monitor availability and if the provider is meeting current SLA's.

Thanks for bearing with my questions! This was good. I'm still working my way through this and I sometimes I need to talk out what I see.