Total Used vs. Total Physical Used

TMADOCTHOMAS · ‎2017-09-25

I recently attached a previously used additional SSD shelf to an AFF HA pair. It has different disk sizes than the existing SSD so i created a new stack and aggregate. Everything appears normal. I then moved several volumes to this aggregate.

Here's where things get strange.

At least two of the volumes are using >1TB of space, but the new aggregate shows only 613GB of used space, even after several days.

To pick one as an example, the cithqvmccpp_01p volume shows 1.28TB used in System Manager and 58.21GB of snapshot space used, leaving 938.3 GB available.I also show 745.91 GB of deduplication savings and 5.33 GB of compression savings. The volume is thin provisioned.

The command line shows the following:

cithqnacl01p::> volume show-space -volume cithqvmccpp_01p

Vserver : cithqnaccpp01p
Volume : cithqvmccpp_01p

Feature Used Used%
-------------------------------- ---------- ------
User Data 1.27TB 57%
Filesystem Metadata 683.8MB 0%
Inodes 16KB 0%
Deduplication 1.57GB 0%
Snapshot Spill 58.16GB 3%
Performance Metadata 256.0MB 0%

Total Used 1.33TB 59%

Total Physical Used 269.8GB 12%

Why and how does "total physical used" = a significant amount less than "total used"? If "total physical" is supposed to take into account what is actually used in a thin provisioned volume, it is inaccurate. As already noted, the volume properties shows 1.28TB used, not 269.8GB. I should note there is a LUN in the volume where the data resides, but the LUN itself is not thin provisioned.

Any help in deciphering this would be greatly appreciated!

colsen · ‎2017-09-25

Probably need to look at your volume and aggregate efficiencies to sleuth this out. For example, at an aggregate level:

Cluster1::> aggregate show-space -aggregate ssd_aggr1_na_l3_06
(storage aggregate show-space)

Aggregate Name: ssd_aggr1_na_l3_06
Volume Footprints: 48.39TB
Volume Footprints Percent: 74%
Total Space for Snapshot Copies in Bytes: 0B
Space Reserved for Snapshot Copies: 0%
Aggregate Metadata: 0B
Aggregate Metadata Percent: 0%
Total Used: 42.22TB
Total Used Percent: 64%
Size: 65.62TB
Snapshot Reserve Unusable: -
Snapshot Reserve Unusable Percent: -
Total Physical Used Size: 41.97TB
Physical Used Percentage: 64%

Cluster1::> aggr show -fields data-compaction-space-saved -aggregate ssd_aggr1_na_l3_06
aggregate data-compaction-space-saved
------------------ ---------------------------
ssd_aggr1_na_l3_06 7.08TB

We've got ~48TB of volume data on this aggregate, but compaction is saving us ~7TB so we only punching ~41TB of blocks. Then you need to look at compression and dedupe at the volume level.

NetApp does all sorts of things with efficiency on SSD - check out the following:

https://docs.netapp.com/ontap-9/index.jsp?topic=%2Fcom.netapp.doc.dot-cm-vsmg%2FGUID-9C88C1A6-990A-4826-83F8-0C8EAD6C3613.html

Hope that helps,

Chris

TMADOCTHOMAS · ‎2017-09-25

Thank you Colsen! I should have clarified that we are on 8.3.2P10, not yet 9.x, so compaction doesn't apply to us yet :\.

TMADOCTHOMAS · ‎2017-09-25

Well, this is interesting.

After poking around a bit, I've found something. All of my LUNs have space-reserve set to "enabled", so no thin provisioning of the LUNs even though the volumes are thin provisioned. However, there is another field called "space-reserve-honored" - and it is set to false for every LUN. Needless to say i didn't deliberately set this. It seems to indicate that every LUN IS thin provisioned which would explain my situation. The only question is why? Perhaps this is a default on AFF systems when thin provisioning. Also: if the LUNs are thin provisioned, why isn't this reflected in System Manager volume utilization totals? Any ideas?

sgrant · ‎2017-09-27

So I think you've a couple of queries here:

For the volume show-space output we are seeing different figures, with Total Used at 1.33TB but Total Physical Used at 269.8GB.

The difference can be explained that for a space reserved LUN, the Total Used figure will include the provisioned size of the LUN. However, since the containing volume is thin provisioned, the Total Physical Used will only show the space actually being consumed by the client. These figures suggest that the client filesystem is only about c.20% full.

Second, your query about the space-reserve-honored field. This is not configurable and is only available as a search field for the lun show command. It is to identify those LUNs where the LUN space reservation will (true) or will not (false) be met by the containing volume; in your case the LUN is thick provisioned however the containing volume is thin, therefore the space reservation will not be met by the volume and we have the "false" value.

To confirm your LUNs are thick provisioned by virtue of the space-reserve enabled setting (default setting), however since the containing volume is thin provisioned (also default setting) the LUN space cannot be guaranteed, but only an issue if the underlying aggregate runs our of space.

Hopefully this clarifies the figures for you.

Cheers,
Grant.

TMADOCTHOMAS · ‎2017-09-27

sgrant,

That actually does help explain things. Thank you. That leaves one question: in System Manager, when looking at a thin provisioned volume containing a LUN, the "available space" field implies that the LUN is thick provisioned and that the space is not available. However, as we just discussed in this thread, this is not actually the case.

For example: System Manager shows a 2.25TB volume with 1.46 TB of used space. The volume show-shace command shows 67.5 GB of used space - a huge discrepancy. The System Manager aggregate view calculates available space on the aggregate based on the 67.5 GB used space figure, not the 1.46 TB. To me this is very misleading - there is a lot more free space on the volume than System Manager shows. Any thoughts on this?

sgrant · ‎2017-09-28

Yep, I can see how these figures not matching can cause confusion and means you need to manage capacity at both the volume and aggregate level.

It can be easier to enable a completely thin provisioned environment, both volume and LUN. Thereby all space figures will show only the consumed blocks in the aggregate, i.e. only need to manage capacity at the aggregate level.

Combined this with volume autogrow (and optionally snapshot autodelete) and as long as you take an active role in monitoring free space on the aggregate and have policies in place to address the lack of space once the nearly full threaholds are breached, then you can be confident that volume writes won't fail due to lack of space - copied from ONTAP concepts > Storage efficiency > Thin Provisioning 🙂

If you are enabling thin LUNs there is another setting to be aware of the space-allocation option. This enables SCSI thin provisioning that provides:

Automatic host-side space management, i.e. frees previously deleted blocks in the client filesystem that ONTAP is not aware of. No need to perform the SnapDrive Space Reclaimer anymore.
Notify the host when a LUN runs out of space while keeping the LUN online, this means the LUN will not go offline anymore if the volume fills and will be available to the host as Read Only.

The host OS needs to be able to support this feature and you need to take the LUN offline to make the change...so disruptive.

See Automatic host-side space management with SCSI thinly provisioned LUNs

Hope this helps.

Cheers,

Grant.

TMADOCTHOMAS · ‎2017-09-28

Thanks sgrant. We actually do have thin provisioning enabled across the board since our whole cluster is now on AFF. I didn't realize until this thread that doing this "forces" the LUNs to also be thin provisioned. We also use autogrow across the board. I just hate looking at a volume in System Manager and not seeing accurate figures for free space, but thems are the breaks :). Maybe it is fixed/improved in 9.x?

The automatic host side management looks interesting. I will check into that. I assume it is 9.x only, but we will be upgrading in the next few months so that should not be an issue.

sgrant · ‎2017-09-28

Hi, sorry I think I need to clarify that unless you actually set space reserved to false, the LUN is still thick provisioned. For both FAS and AFF the default setting when you create a LUN is thick provisioned...since write operations to that LUN might fail due to insufficient disk space.

It is just that the volume is thin provisioned and therefore the blocks have not actually been reserved in the aggregate - which is why you have the 2 different space figures. If you set space reserved to false, both the volume and aggrgeate space figures will match(ish).

Thanks,

Grant.

TMADOCTHOMAS · ‎2017-09-28

@sgrant wrote:
If you set space reserved to false, both the volume and aggrgeate space figures will match(ish).

Interesting. So if I'm understanding correctly, it's effectively set to false anyway since we thin provision the volumes. So nothing would actually change if we changed this setting to false, other than correcting the System Manager discrpancy. Does that sound right?

sgrant · ‎2017-09-28

Effectively yes. I beleive it is easier to manage a totally thin provisioned environement - so long as you have the policies in place to deal with a shortage of space situation, since when an aggregate says it's short of space it really is short of space...volume move or new disks.

It will not work for those environment who do not actively manage the storage, or absolutely require guarantees, otherwise thin provisioning and consequently over-provisioning allows maximum use of the resources.

Cheers,

Grant.

TMADOCTHOMAS · ‎2017-09-29

Very helpful information sgrant! Thanks very much!