ONTAP Discussions

DS4243 (2TB SAS x 24)...only yielding 21TB?!?

bobby_gillette
9,638 Views

Created aggregate with default raid group = 16, showing 22 data disks, 4 parity... yet the aggregate is only showing 21TB. I'm aware of WAFL overhead, size averaging, etc all reducing space, but somehow I figured I'd be able to present more than 21TB. One thing I'm curious about... I'm showing 8 spares on a vol status -s...

Any ideas?

1 ACCEPTED SOLUTION

fjohn
9,637 Views

Let's step though it.  One thing not mentioned is Data ONTAP version and if this is a 32bit or 64bit aggregate.

Start with the drive.  2TB is in base 10 used by disk drive suppliers.  The first step is to convert from base10 to base 2.  This is the same across all storage vendors.  2,000,000,000,000 bytes = 1.8.19 TB.  We've lost nearly 10% off the top.

Next comes parity overhead.  Since you chose the default RAID group size of 16, you get 16 drives (the other 8 does not equal a whole RAID group, although you can add drives later to the aggregate to fill a partial).  Given 24 drives, I would have personally gone with a number like 22.  The max RAID group size without overriding it for SAS on RAID DP is 28.  With a RAID group size of 16, you have 2 parity spindles and 14 data spindles.  With 22, you have 2 parity spindles and 20 data spindles.  So, it's either 25.466 TB or 36.38 TB with either 8 or two spares.  In that both of these are above 16TB,  I'll assume large aggregates in ONTAP 8.0 or 8.0.1 7-Mode.

Since these are SAS drives, they are formatted with 520 bytes per sector.  the extra 8 bytes in each sector are used to store checksum data.  If this were SATA, the sector size would be 512 bytes and the checksums would take additional blocks. They're not SATA, they're SAS, so no loss here.

Another thing that happens is that drives are sourced from more than one vendor.  Due to slight differences in the geometry and hence the number of sectors, drive are typically "right sized" so that they are interchangable across the vendors from which they are sourced.  This typically consumes about 2% of the space, and that's across the storage industry.  25.466 becomes ~ 24.95, and 36.38 becomes ~ 35.65.

After that, we reserve 10% of the space for WAFL to do it's thing.  You pay 10% of space to optimize the write performance.  How much?  Check out http://blogs.netapp.com/efficiency/2011/02/flash-cache-doesnt-cache-writes-why.html where I present the results of 100% random write workload tests over time.  That leaves you with 22.45 TB or 32 TB.

Last but not least, from the usable space there is a default 5% aggregate reserve.  If you are not using MetroCluster or Synchronous Snapmirror, then you can remove the reserve to recoup that 5%.  (see the link).    With the aggregate reserve, 22.45TB becomes 21.325 TB, what you obtained, and 32.085 TB becomes 3.05 TB.

In light of this, I'd recommend using a RAID group size of 22, and removing the aggregate reserve (unless you are using Metrocluster or Syncronous SnapMirror).  This would give you 32.085 TB in an aggregate consisting of 22 spindles, and two hot spares for a total of 24 drives.

I hope that helps explain where the space goes.

JohnFul

View solution in original post

11 REPLIES 11

giuliano
9,581 Views

Run the "aggr status -r" command.

Are there any disks in a "pending" state for the aggregate / raid group?

If there are, then they should be in the process of re-zeroing (re-formatting), and will be added to the aggregate once the process completes.

Re-zeroing 2TB disks can take many hours, so you might not see them complete re-zeroing until tomorrow.

They can get this way if the disks were added to an aggregate, then the aggregate was destroyed - before the disks can be re-used, they need to be re-zeroed.

You can manually force a re-zeroing of un-zeroed disks by issuing the "disk zero spares" command, and monitor with "aggr status -r" or "aggr status -s" for the spares.

-Giuliano

fjohn
9,638 Views

Let's step though it.  One thing not mentioned is Data ONTAP version and if this is a 32bit or 64bit aggregate.

Start with the drive.  2TB is in base 10 used by disk drive suppliers.  The first step is to convert from base10 to base 2.  This is the same across all storage vendors.  2,000,000,000,000 bytes = 1.8.19 TB.  We've lost nearly 10% off the top.

Next comes parity overhead.  Since you chose the default RAID group size of 16, you get 16 drives (the other 8 does not equal a whole RAID group, although you can add drives later to the aggregate to fill a partial).  Given 24 drives, I would have personally gone with a number like 22.  The max RAID group size without overriding it for SAS on RAID DP is 28.  With a RAID group size of 16, you have 2 parity spindles and 14 data spindles.  With 22, you have 2 parity spindles and 20 data spindles.  So, it's either 25.466 TB or 36.38 TB with either 8 or two spares.  In that both of these are above 16TB,  I'll assume large aggregates in ONTAP 8.0 or 8.0.1 7-Mode.

Since these are SAS drives, they are formatted with 520 bytes per sector.  the extra 8 bytes in each sector are used to store checksum data.  If this were SATA, the sector size would be 512 bytes and the checksums would take additional blocks. They're not SATA, they're SAS, so no loss here.

Another thing that happens is that drives are sourced from more than one vendor.  Due to slight differences in the geometry and hence the number of sectors, drive are typically "right sized" so that they are interchangable across the vendors from which they are sourced.  This typically consumes about 2% of the space, and that's across the storage industry.  25.466 becomes ~ 24.95, and 36.38 becomes ~ 35.65.

After that, we reserve 10% of the space for WAFL to do it's thing.  You pay 10% of space to optimize the write performance.  How much?  Check out http://blogs.netapp.com/efficiency/2011/02/flash-cache-doesnt-cache-writes-why.html where I present the results of 100% random write workload tests over time.  That leaves you with 22.45 TB or 32 TB.

Last but not least, from the usable space there is a default 5% aggregate reserve.  If you are not using MetroCluster or Synchronous Snapmirror, then you can remove the reserve to recoup that 5%.  (see the link).    With the aggregate reserve, 22.45TB becomes 21.325 TB, what you obtained, and 32.085 TB becomes 3.05 TB.

In light of this, I'd recommend using a RAID group size of 22, and removing the aggregate reserve (unless you are using Metrocluster or Syncronous SnapMirror).  This would give you 32.085 TB in an aggregate consisting of 22 spindles, and two hot spares for a total of 24 drives.

I hope that helps explain where the space goes.

JohnFul

vmsjaak13
9,581 Views

There's no such thing as 2TB SAS. These are SATA drives !!

The maximum raidgroup size for Ontap 8.0.1 for SATA drives is 20 (18D + 2P)

You could go: 1 raidgroup of 20 (18D + 2P) disks + 4 spare disks (con here is raid rebuild time),

or: 2 raidgroups of 11 (9D + 2P) with 2 spares.

In both cases you're left with 18 data disks, which equates to about 24TB of usable space after you've removed the 5% aggregate snapreserve.

This scenario covers just 1 full DS4243 shelf, data ontap 8.0.1 and a 64bit aggregate.

Regards,

Niek

fjohn
9,581 Views

"There's no such thing as 2TB SAS. These are SATA drives !!"

Why you are, absolutely right vmsjaak13.  In getting carried away in the nuances of where the space goes, I overlooked the obvious that was starting me in the face.

The main difference, other than the interface and the performance in terms of IOPS for random IO, between SAS and SATA is that SAS drives are formatted at 520 bytes per sector and SATA drives are formatted at 512 bytes per sector.  For SAS, that checksum information is stored in the extra 8 bytes per sector.  For SATA the checksum information has to be stored somewhere else, which uses blocks.  For the same size spindle, you'll have about 10% less usable space per data spindle.

The max spindles per RAID DP RAID group for SATA is 16 in a 32 bit aggregate, however I believe it goes up to 20 or so in a 64 bit aggregate.  This is clearly a 64 bit aggregate because it is beyound the max size for a 32 bit aggregate.

JohnFul

bobby_gillette
9,581 Views

Just a couple of things... they are indeed SATA drives in a SAS enclosure (no idea why, but that's what they ordered!)

NetApp Release 8.0.1RC2 7-Mode, 64-bit aggregate

aggr status BSAS1 -r
Aggregate BSAS1 (online, raid_dp) (block checksums)
  Plex /BSAS1/plex0 (online, normal, active)
    RAID group /BSAS1/plex0/rg0 (normal)

      RAID Disk Device          HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)
      --------- ------          ------------- ---- ---- ---- ----- --------------    --------------
      dparity   2b.00.0         2b    0   0   SA:B   -  BSAS  7200 1695466/3472315904 1695759/3472914816
      parity    2b.00.1         2b    0   1   SA:B   -  BSAS  7200 1695466/3472315904 1695759/3472914816
      data      2b.00.2         2b    0   2   SA:B   -  BSAS  7200 1695466/3472315904 1695759/3472914816
      data      2b.00.3         2b    0   3   SA:B   -  BSAS  7200 1695466/3472315904 1695759/3472914816
      data      2b.00.4         2b    0   4   SA:B   -  BSAS  7200 1695466/3472315904 1695759/3472914816
      data      2b.00.5         2b    0   5   SA:B   -  BSAS  7200 1695466/3472315904 1695759/3472914816
      data      2b.00.6         2b    0   6   SA:B   -  BSAS  7200 1695466/3472315904 1695759/3472914816
      data      2b.00.7         2b    0   7   SA:B   -  BSAS  7200 1695466/3472315904 1695759/3472914816
      data      2b.00.8         2b    0   8   SA:B   -  BSAS  7200 1695466/3472315904 1695759/3472914816
      data      2b.00.9         2b    0   9   SA:B   -  BSAS  7200 1695466/3472315904 1695759/3472914816
      data      2b.00.10        2b    0   10  SA:B   -  BSAS  7200 1695466/3472315904 1695759/3472914816
      data      2b.00.11        2b    0   11  SA:B   -  BSAS  7200 1695466/3472315904 1695759/3472914816
      data      2b.00.12        2b    0   12  SA:B   -  BSAS  7200 1695466/3472315904 1695759/3472914816
      data      2b.00.13        2b    0   13  SA:B   -  BSAS  7200 1695466/3472315904 1695759/3472914816

    RAID group /BSAS1/plex0/rg1 (normal)

      RAID Disk Device          HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)
      --------- ------          ------------- ---- ---- ---- ----- --------------    --------------
      dparity   2b.00.16        2b    0   16  SA:B   -  BSAS  7200 1695466/3472315904 1695759/3472914816
      parity    2b.00.17        2b    0   17  SA:B   -  BSAS  7200 1695466/3472315904 1695759/3472914816
      data      2b.00.18        2b    0   18  SA:B   -  BSAS  7200 1695466/3472315904 1695759/3472914816
      data      2b.00.19        2b    0   19  SA:B   -  BSAS  7200 1695466/3472315904 1695759/3472914816
      data      2b.00.20        2b    0   20  SA:B   -  BSAS  7200 1695466/3472315904 1695759/3472914816
      data      2b.00.21        2b    0   21  SA:B   -  BSAS  7200 1695466/3472315904 1695759/3472914816
      data      2b.00.22        2b    0   22  SA:B   -  BSAS  7200 1695466/3472315904 1695759/3472914816
      data      2b.00.23        2b    0   23  SA:B   -  BSAS  7200 1695466/3472315904 1695759/3472914816

If I'm not mistaken the largest raidgroup size for BSAS is 20, but if I had gone with that I wouldn't have had enough disk to create my second raidgroup (rg0=20 data + 2 parity, only 2 disks remain). For some reason when I created the aggregate it grabbed some of the ATA FC disks (smaller capacity of course)... so I destroyed the aggregate and re-created it specifying disk number @ size. Got rid of the snap reserve for the aggregate and now I'm up to:

aggr show_space BSAS1 -h
Aggregate 'BSAS1'

Total space    WAFL reserve    Snap reserve    Usable space       BSR NVLOG           A-SIS          Smtape
       29TB          2980GB             0KB            26TB             0KB             0KB             0KB

This aggregate does not contain any volumes

Aggregate                       Allocated            Used           Avail
Total space                           0KB             0KB            26TB
Snap reserve                          0KB           256KB             0KB
WAFL reserve                       2980GB            61MB          2980GB

I'm pretty sure that's as good as it's going to get... now to go tell the customer (who requested the order) that they've asked for 28TB of allocations on a 26TB aggregate

Thanks for all the replies guys! Going to try and assign some points where I can for your responses...

Bobby

bobby_gillette
9,581 Views

Update:

Ended up destroying the aggregate to reduce the number of spares from 8 to 4 (went with max raidgroup size 20 (for BSAS).... it's zeroing disks now, so it'll be a while before I know what I'll end up with, I'm thinking it'll be:

24 disks - 4 spare - 2 parity = 18 data spindles x 1.819 =  32TB before WAFL reservation. Hoping to end up around 30TB usable...

Sound about right?

vmsjaak13
9,581 Views

RAID Disk Device          HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)
--------- ------          ------------- ---- ---- ---- ----- --------------    --------------
dparity   2b.00.16        2b    0   16  SA:B   -  BSAS  7200 1695466/3472315904 1695759/3472914816

1695466MB right sized

So 1.819 should be 1655GB

bobby_gillette
9,581 Views

Right sizing... that's rough seeing almost 400GB per disk being lost, let alone explaining it to the customer. Ah the joys of being a storage admin eh? Thanks for the information guys!

bjornkoopmans
9,581 Views

Hi,

Not wanting to disregard the input of others before me, I would assert a more simple approach to your question of where the storage went:

24 disks - 4 parity - 8 spares = 12 data disks

12 disks x 2 TB = 24 TB

Remove WAFL reserve, rightsizing, overhead, etc. and you'll end up with roughly 21 TB.

So no surprise here. 🙂

Regards, Bjorn

bobby_gillette
7,065 Views

Using the max raidgroup size of 20, I'm still left with 4 spares:

NETAPP> vol status -s

Spare disks

RAID Disk       Device          HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)
---------       ------          ------------- ---- ---- ---- ----- --------------    --------------
Spare disks for block or zoned checksum traditional volumes or aggregates
spare           0a.29           0a    1   13  FC:A   -  ATA   7200 847555/1735794176 847827/1736350304
spare           0a.45           0a    2   13  FC:A   -  ATA   7200 847555/1735794176 847827/1736350304
spare           0a.61           0a    3   13  FC:A   -  ATA   7200 847555/1735794176 847827/1736350304
spare           0a.77           0a    4   13  FC:A   -  ATA   7200 847555/1735794176 847827/1736350304
spare           0a.93           0a    5   13  FC:A   -  ATA   7200 847555/1735794176 847827/1736350304
spare           0a.109          0a    6   13  FC:A   -  ATA   7200 847555/1735794176 847827/1736350304
spare           2b.00.20        2b    0   20  SA:B   -  BSAS  7200 1695466/3472315904 1695759/3472914816
spare           2b.00.21        2b    0   21  SA:B   -  BSAS  7200 1695466/3472315904 1695759/3472914816
spare           2b.00.22        2b    0   22  SA:B   -  BSAS  7200 1695466/3472315904 1695759/3472914816
spare           2b.00.23        2b    0   23  SA:B   -  BSAS  7200 1695466/3472315904 1695759/3472914816

Had I gone with a smaller raidgroup of 16, would I have had less spares and by extension more available storage?

Trying to get my head around this, just struggling a little bit.

vmsjaak13
7,065 Views

Hello Bobby,

Afaik this is what you are working with:

  • 24x 2TB SATA drives
  • Maximum number of SATA disks in a raidgroup: 20

    Obviously you can go with 1 RG in the aggregate. This leaves you 4 spares.
    You can also opt for 2 RG's in the aggregate.
    Best practise says: balance the RG's: i.e. 2 RG's of 11 disks.

    Now your question. RG of 16. This leaves 8 disks, from which you can create a RG of 7. This leaves 1 spare.

    Let's write it out:

    RG=20: 18 data disks, 2 parity disks, 4 spares

    RG=11 + RG=11: 9+9 (18) data disks, 2+2 (4) parity disks, 2 spares

    RG=16 + RG=7: 14+5 (19) data disks, 2+2 (4) parity disks, 1 spare.

    19 data disks ! But the raidgroups are very unbalanced.

    I would not go for the third option.

    But, there's one more option: 2 RG's -> 12+11 disks.

    This is still unbalanced, but only by 1 disk, which isn't considered bad practise.

    Let's write it out:

    RG=12 + RG=11: 10+9 data disks, 2+2 parity disks, 1 spare.

    If your goal is maximizing the amount of usable space, go for 12+11+1.

    Regards,

    Niek

    Public