WAFL questions - blocks and stripes

LAURENTDONGRADI · ‎2013-02-24

Hello, I am curious about other details around the WAFL than what is in http://www.netapp.com/us/library/white-papers/wp_3002.html.

Here is my first question

For example, I could not find, or could not understand how the 4kb WAFL blocks are written in a raid group made of more than 8 data disks. Let's say a DP rg made of 12 disks.

If all the data disks in an aggregate are used simultaneously when writing in a volume, and the commit from NVRAM sends a bunch of blocks at once, how are they written on the physical disk ?

4kb on each disk ?
a total stripe width of 4k spread among the 10 data disks (in my DP 12 disks rg) ?

Second question

About an aggregate containing 2 rg, one made of 12x1TB sata disks, and the other made of 12x2TB sata disks. With DP, I would have 10x1TB data disks and 12x2TB data disks.

I have this representation of a volume spread over all the data disks in an aggregate ... does this mean such a volume is assigned a capacity on the 2TB disks that is the double of what it receives from the 1TB disks ?

With an analogy to a raid 0 in which the stripe unit size is the same for all the disks, you must use disks of same capacity in raid 0.

Here, with NetApp aggregate, I do have 2 raid groups ... so I'm ok with the stripe unit size in each rg (is it 4k ? is it 512b ?), but how is the additional capacity of the 2 TB disks used ?

Do the 2TB disks use a stripe unit size twice as big as the 1TB disks ?

Let's say this differently : when the flush comes from NVRAM, and the data is written to physical disks :

1 stripe is sent to the rgA (1TB disks)
1 stripe is sent to the rgB ( 2TB disks) ... and the stripe unit size is twice as big as the SU size in rgA

Or

1 stripe is sent to rgA (1TB disks)
2 stripes are sent to rgB (2TB disks)
and the SU size are identical in rgA and rgB

Thanks for the experts to englighten me.

LAURENTDONGRADI · ‎2013-02-26

I found answer to my first question here : http://www.netapp.com/us/system/pdf-reader.aspx?m=tr-3001.pdf&cc=us

But, Figure 3 seems to show a stripe is 4K large .... but the explanation that comes just after says it's the stripe unit ! Or even worse, the size of the block sent to disk :

Figure 3 shows how the storage appliance's RAID4 disk array is divided into stripes, each one of which

contains one 4 KB block on the parity disk along with one 4KB block from each of the data disks. It is

convenient to think of a disk block consisting of a very large number of bits, 32,768 bits. For each bit position

in the block, the bit on the parity disk is determined by the EXCLUSIVE-OR of the corresponding bits on the

data disks.

LAURENTDONGRADI · ‎2013-03-25

Answer came from gurus.

4k is the block size that WAFL uses to write on each disk.

Flush comes when NVRAM is 50% full or every 10s.

Flush comes from RAM and not NVRAM, which is only used for journaling, in case of HA Failover or Metro cluster .

Flushed data and parity and double parity is then sent to the disks in the RG written sequentially, 4k on each disk, one after the other. Never mind the size of disks in distinct RGs.

With 14x1TB +4x2TB, 14 disks in the same RG, this is ugly, as the 2TB disks will act as 1TB, and 4TB of raw capacity is lost.

When there is 5 disks or more of 2TB ... 14x1TB + 5x2TB, in the same RG, WAFL will automatically reorganize the disks : A new RG-DP will be created with the 5x2TB, and data will be transfered and reorganized in the next days/weeks so this is transparent to users and admin. At the end, perf and capacity will be optimized.

saranraj456 · ‎2014-08-23

How the stripe size was calculated ?

let say for example an aggr having multiple RG 's .whether the stripe size will be calculated across the RG'S or is it bound to Single RG?

Saran