Solved: Re: Configuring Aggregates and hot spares

Cola123 · ‎2018-02-12

I'm having a bit of trouble wrapping my head around aggregates and hot spares, hopefully someone will be kind an help as i'm having trouble find the info I need in the support doco online.

We have 11 960gb SSD's in our Netapp AFF A200.

When I go to create an aggregate it makes me choose between disks attached to each node (each node having roughly 4.4TB of raw disk available).

Do the nodes in a HA pair always control exactly half the disks? Does that mean you cant then create a single aggregate that takes up the total raw disk space within the chassis?

Anyway, if i choose to create a aggregate on one of these nodes.... it lets me choose to make an aggregate of up to 11 disks. If do this, are there any hot spares? or do i need to choose 10 or 9 disks so as to leave a hot spare or two available for the aggregage?

I can't find amywhere in OnCommand System Manager where you can configure disks as hot spares.

Thanks in advance,

AlexDawson · ‎2018-02-12

Yes, aggr0 is the root aggregate, created from the root partitions on disks. I usually rename them to "N1_aggr0" and "N2_aggr0", but there's no specific need to. I usually name data aggregates "Nx_aggr1ssd3840" or similar - including the home node, numeric identifer of the aggr, drive type and capacity in GB in the name.

And yes, that's how you'd create the two data aggregates, and yes, failover of data aggregates between nodes is automatic (not SP.. we use that term for an OOB management component of each node, but I understand EMC too :). For access via CIFS/NFS, the IP (LIF) moves between nodes, for iSCSI/FC the LIF stays on the node and ALUA multipathing takes care of it from the host side.

Hope this helps! Feel free to click "Kudos" or "Accept as solution" if you wish.

View solution in original post

joele · ‎2018-02-12

I'd be interested to see how the A200 is setup and configured, and whether or not ADP (advanced drive partitioning) was setup. I'm also curious on the 11 drive count - typically the A200 will ship with 12 or 24 drives.

Can you paste the output of a 'sysconfig -r' output here? Feel free to contact me directly if it's any easier.

Cola123 · ‎2018-02-12

Oops, i lied. it is 12 disks, but when i go to create an aggregate it only lets me select 11 disks?

I dont know if ADP is turned on or not.

I SSH'ed into the AFF but can't seem to run sysconfig -r? it's not a recognised command'.

I'm unable to post the config info anyway as its a airgapped system. But if you could tell me what i should be looking for, I can report back.

joele · ‎2018-02-12

OK, so it sounds like System Manager is doing it's job. ONTAP isn't supposed to let you exhaust all of your drives - it should maintain an absolute minimum of 1 spare data drive or partition.

If the cluster is airgapped I'm assuming autosupport is disabled/non-functional as well?

I should have been more clear on the command in ONTAP, old 7mode habits die hard. Try running this instead

system node run -node * -command "sysconfig -r"

I know you won't be able to copy and paste the output, but can you count how many total devices (parity, dparity, and spares included) there are? Also take a look at the device names - do they look like typical drive strings, or do the end in '.P1' or '.P2' by any chance?

AlexDawson · ‎2018-02-12

Hi there,

On an All-SSD or AFF system, the SSDs should be split into three partitions - one root partition (R1) and two data partitions (D1, D2). This can be modified, so if the system is not fresh from the factory, more investigation is required.

The root partitions and their spares are shared between the two controllers, and in AFF configs, normally one controller will take D1 and the other will take D2. One partition is always set aside/hidden from each by the GUI - so that if an SSD fails, it can be rebuilt into the spare. From the command line, this can be overridden. Drives/partitions do not need to be marked as hot spares - in case of failure, ONTAP will use any available partition of appropriate size to rebuild onto

If all D1 and all D2 partitions are configured to one controller (active/passive), a single data aggregate of 5.9TB is created. If they are split, each controller will have one of 2.97TB, which times 2 is about 5.9TB (active/active).

With an active/passive config, less CPU cores are available, but you can run the controller to 90% CPU safely. With active/active, controller CPU should be kept under 50%, to enable failover for software updates etc. If you have a single workload, active/passive is probably better. If you have multiple workloads, active/active may be better.

Hope this helps!

Cola123 · ‎2018-02-12

OK so,...

THere is a aggr0 which seems to exist on each controller - i presume this is the root aggregate that contains system files and other stuff?

So i would then go and create one additional aggregate for data on each of the nodes, and select 11 disks (maximum allowable), because the 12th disk will be the hot spare?

Happy to have an aggregate for each node... I don't currently see a need or advantage to have one aggregate for the entire capacity. As long as if one SP goes down the other service processor can take control of that aggregate. (It's my undersatnding that this is what happens in a clustered HA pair)

THanks!

S.

AlexDawson · ‎2018-02-12

Yes, aggr0 is the root aggregate, created from the root partitions on disks. I usually rename them to "N1_aggr0" and "N2_aggr0", but there's no specific need to. I usually name data aggregates "Nx_aggr1ssd3840" or similar - including the home node, numeric identifer of the aggr, drive type and capacity in GB in the name.

And yes, that's how you'd create the two data aggregates, and yes, failover of data aggregates between nodes is automatic (not SP.. we use that term for an OOB management component of each node, but I understand EMC too :). For access via CIFS/NFS, the IP (LIF) moves between nodes, for iSCSI/FC the LIF stays on the node and ALUA multipathing takes care of it from the host side.

Hope this helps! Feel free to click "Kudos" or "Accept as solution" if you wish.

Cola123 · ‎2018-02-12

Done!

Thanks mate.