Adding shelves / disks to a FAS2050C

strattonfinance · ‎2010-10-14

Hi all,

Currently we have a FAS2050C (running DOT 7.3.4) with 20 x internal 15k RPM 144GB SAS drives (part number X286) and no external shelves. To make best use of our limited spindles, we operate the two controllers as a quasi active / passive pair, with 17 disks assigned to controller 1, and the remaining 3 disks assigned to controller 2. All 17 disks assigned controller #1 are in a single RAID-DP RAID group / aggregate, and support our production workload. The 3 disks assigned controller #2 are in a single RAID-4 RAID group / aggregate, and do nothing except sitting there waiting to take over from controller #1 if required.

We're about to add two DS14MK4 shelves to the filer, both fully populated with 14 x 15k RPM 144GB FC drives (part number X278A-R5), and I'm hoping for some assistance with a few questions:

1) Once the extra shelves are installed we want to assign more disks to the second controller and start using it "properly", rather than just having it available as a "passive" partner. This decision was taken because we run into a CPU bottlekneck far more often than we run into a disk I/O bottlekneck, so we figure that the extra disks will be best used if the load is spread across both controllers.

This leads to my first question: are there any guidelines / best practices about how to best allocate disks to each controller, both in a general sense and also taking account the two (slightly) different disk formats and different disk interfaces? i.e. put all 20 internal disks on one controller and all 28 external disks on the other, or split then exactly down the middle (half of each shelf + half of internal per controller), etc?

2) If the "best practice" is to put all internal disks on one controller and all external disks on the other (which I'm guessing might be the case), this is going to require us to move the root volume for controller #2 onto the new disks. From reading NetApp documentation I believe the process to do this is:

- attach new shelves, assign new disks to controller #2, create new RAID groups / aggregate(s)

- create a new root volume on the new disks

- use ndmpcopy to copy /etc from the old root volume to the new root volume

- set the root volume option on the new root volume

- reboot

Is that all that needs to happen / is that all correct? And once I've done this, I should then be able to destroy the old root volume / underlying aggregate / 3-disk RAID-4 RAID group, and reassign these three disks to controller #1 to add to its RAID group / aggregate?

3) I understand that after adding more disks to an existing aggregate we should manually run a WAFL reallocation against any volumes contained in that aggregate. Given that we use dedupe and extensive snapshots, will this reallocation interfere with existing deduped data and/or snapshots in any way? i.e. will it "un-dedupe" our volumes, or cause snapshot sizes to blow out, as part of the process?

4) The shelves / disks that we are adding are used - any suggestions for methods / tools to run appropriate burn-in testing before we put them into production?

5) Any other gotchas I should be aware of during the process of adding these extra shelves / disks?

Thanks all, appreciate any information / advice.

Cheers,

Matt

aborzenkov · ‎2010-10-14

1. NetApp recommends allocating the whole shelf to a head. That is not absolute set in stone requirement. Mixing FC and SAS in one aggregate is supported. In any case – the first priority is sizing requirement – how many disks you application needs.

2. Yes, the procedure to move root volume is correct I’d use vol copy for whole root volume, but it is my personal preference.

3. To be honest, I never actually understood how reallocation is supposed to interoperate with A-SIS; hopefully someone can chime in here.

4. I am not aware of any tests except built-in diagnostics. But it is offline so you lose redundancy for the whole test duration. Other possibility would be create aggregate, destroy it and start spares zeroing. It is not real burn-in, but at least it does write loads all disks.

strattonfinance · ‎2010-10-14

> 1. NetApp recommends allocating the whole shelf to a head. That is not absolute set in stone requirement. Mixing FC and SAS in one aggregate is supported. In any case – the first priority is sizing requirement – how many disks you application needs.

We're a relatively small business, so no single application has huge requirements for either performance or size. The FAS2050 is providing storage to our VMware farm, which runs about 35 low-to-mid load VMs. This means we can afford to lay out our storage according to general best practice guidelines and then balance performance / size requirements of applications across the two controllers by simply moving VMs between them as needed.

Given the above, if NetApp recommend allocating whole shelves to a head then it sounds like we are best off allocating both new shelves to controller #2, and all internal disks to controller #1.

> 2. Yes, the procedure to move root volume is correct I’d use vol copy for whole root volume, but it is my personal preference.

Thanks for the info - I'll look into vol copy for moving the root volume. Is there any particular reason vol copy is better than ndmpcopy (ndmpcopy is recommended in the NetApp docs).

> 3. To be honest, I never actually understood how reallocation is supposed to interoperate with A-SIS; hopefully someone can chime in here.

I've been researching this further since my OP.

Based on what I've read so far, it sounds like reallocation will completely ignore deduplicated blocks (i.e. will not move them) unless you run reallocate with the "-p" switch. In that case, reallocate will physically redistribute a volume accross all disks in the aggregate, but won't change the logical layout of the volume.

Likewise, I think that reallocate with the "-p" switch has no effect on snapshots, but reallocate without the "-p" switch will cause greatly increased snapshot space usage.

I think what we to do is expand the aggregate, run "reallocate -p" against each volume to spread the data out accross the new disks, and then run "reallocate -A" against the aggregate to optimise free space.

I'm certainly not 100% on all of this though, so if anyone else can clarify that would be much appreciated.

> 4. I am not aware of any tests except built-in diagnostics. But it is offline so you lose redundancy for the whole test duration.

Which diagnostics are you referring to here? Sorry, my NetApp knowledge is a bit patchy in places.

> Other possibility would be create aggregate, destroy it and start spares zeroing. It is not real burn-in, but at least it does write loads all disks.

We'll definitely zero all the new disks before we use them.

I was thinking we could try running a disk burn-in / testing tool on a separate box and pointing it at the new disks over NFS, but I'm not sure if we'll be able to generate sufficient load to do proper testing this way. Hence, was hoping for something built into DOT.

Thanks for the response / info, greatly appreciated.

Cheers,

Matt

aborzenkov · ‎2010-10-14

Is there any particular reason vol copy is better than ndmpcopy

Probably no. Actually, ndmpcopy is definitely more flexible.

Which diagnostics are you referring to here?

http://now.netapp.com/NOW/knowledge/docs/hardware/NetApp/diag/html/index.htm

We'll definitely zero all the new disks before we use them.

New disks are prezeroed so to actually zero them you'll first need to "dirty" them.

strattonfinance · ‎2010-10-15

> Probably no. Actually, ndmpcopy is definitely more flexible.

OK, will stick with ndmpcopy then 🙂

> http://now.netapp.com/NOW/knowledge/docs/hardware/NetApp/diag/html/index.htm

Thanks for the link, I have some reading to do...

> New disks are prezeroed so to actually zero them you'll first need to "dirty" them.

Thanks for the tip. As I mentioned in my OP the shelves / disks are actually used/refurb items, but I would assume that the vendor zeroed them and hence we will need to "re-dirty" them again as you suggest.

Cheers,

Matt

radek_kubka · ‎2010-10-15

OK, will stick with ndmpcopy then :-)

I will add my quick 2 cents as well.

Actually I had situations in the past when ndmpcopy didnt't work as expected & plainly missed some files / directories when moving root vol - in manifested e.g. in a form of FilerView refusing to work.

Interestingly enough volcopy always worked without the hitch & no one was able to give me a credible explanation why.

Regards,
Radek

strattonfinance · ‎2010-10-18

> Actually I had situations in the past when ndmpcopy didnt't work as expected & plainly missed some files / directories when moving root vol - in manifested e.g. in a form of FilerView refusing to work.

> Interestingly enough volcopy always worked without the hitch & no one was able to give me a credible explanation why.

Sounds like more votes in the "vol copy" column than the "ndmpcopy" column, so I might stick with with "vol copy".

Thanks for the info.

Adding shelves / disks to a FAS2050C

Introducing GenAI Search on NSS