ONTAP Discussions

Aggregate creation failed, but data partitions remain

xf5543
1,109 Views

Hi all,

I tried to create a data aggregate on a FAS2720. The cluster is still in factory state (meaning the cluster setup is complete but no data aggregates have been configured yet -- currently only the root aggregates exist).

I tried creating the first data aggregate (with root-data-partitioning) but the job failed:

 

NETAPP1::storage aggregate> create -aggregate aggr1_NETAPP1_01 -diskcount 20 -disktype FSAS -node NETAPP1-01 -raidtype raid_dp -maxraidsize 20

Info: The layout for aggregate "aggr1_NETAPP1_01" on node "NETAPP1-01" would be:

      First Plex

        RAID Group rg0, 20 disks (block checksum, raid_dp)
                                                            Usable Physical
          Position   Disk                      Type           Size     Size
          ---------- ------------------------- ---------- -------- --------
          shared     2.10.7                    FSAS              -        -
          shared     2.10.9                    FSAS              -        -
          shared     2.10.5                    FSAS         3.54TB   3.54TB
          shared     2.10.3                    FSAS         3.54TB   3.54TB
          shared     2.11.1                    FSAS         3.54TB   3.64TB
          shared     2.12.1                    FSAS         3.54TB   3.64TB
          shared     2.13.1                    FSAS         3.54TB   3.64TB
          shared     2.11.3                    FSAS         3.54TB   3.64TB
          shared     2.12.3                    FSAS         3.54TB   3.64TB
          shared     2.13.3                    FSAS         3.54TB   3.64TB
          shared     2.11.5                    FSAS         3.54TB   3.64TB
          shared     2.12.5                    FSAS         3.54TB   3.64TB
          shared     2.13.5                    FSAS         3.54TB   3.64TB
          shared     2.11.7                    FSAS         3.54TB   3.64TB
          shared     2.12.7                    FSAS         3.54TB   3.64TB
          shared     2.13.7                    FSAS         3.54TB   3.64TB
          shared     2.11.9                    FSAS         3.54TB   3.64TB
          shared     2.12.9                    FSAS         3.54TB   3.64TB
          shared     2.13.9                    FSAS         3.54TB   3.64TB
          shared     2.11.11                   FSAS         3.54TB   3.64TB

      Aggregate capacity available for volume use would be 57.35TB.

      The following disks would be partitioned: 2.11.11, 2.13.9, 2.12.9, 2.11.9, 2.13.7, 2.12.7, 2.11.7, 2.13.5, 2.12.5, 2.11.5, 2.13.3, 2.12.3, 2.11.3, 2.13.1, 2.12.1,
      2.11.1.

Do you want to continue? {y|n}: y
[Job 10101] creating aggregate aggr1_NETAPP1_01 ...
Error: command failed: [Job 10101] Job failed: Failed to create aggregate "aggr1_NETAPP1_01" on "NETAPP1-01". Reason: ZSM - failed, status code = 571, extra =
       Timeout: Operation "ksmfRawZapi_iterator::get_imp()" took longer than 110 seconds to complete [from mgwd on node "NETAPP1-01" (VSID: -3) to kernel at
       127.0.0.1], took 110.002s, max 110s [127.0.0.1:951].

 

Searching for the error I only found a KB article without a solution (other than to contact NetApp Support).

Running "storage aggregate show" and "storage aggregate show-status" confirms that the aggregate I tried to create does not exist:

 

NETAPP1::storage aggregate> show


Aggregate     Size Available Used% State   #Vols  Nodes            RAID Status
--------- -------- --------- ----- ------- ------ ---------------- ------------
aggr0_NETAPP1_01 159.9GB 7.75GB 95% online 1 NETAPP1-01  raid_dp,
                                                                   normal
aggr0_NETAPP1_02 159.9GB 7.75GB 95% online 1 NETAPP1-02  raid_dp,
                                                                   normal
2 entries were displayed.

NETAPP1::storage aggregate> show-status

Owner Node: NETAPP1-01
 Aggregate: aggr0_NETAPP1_01 (online, raid_dp) (block checksums)
  Plex: /aggr0_NETAPP1_01/plex0 (online, normal, active, pool0)
   RAID Group /aggr0_NETAPP1_01/plex0/rg0 (normal, block checksums)
                                                              Usable Physical
     Position Disk                        Pool Type     RPM     Size     Size Status
     -------- --------------------------- ---- ----- ------ -------- -------- ----------
     shared   2.10.3                       0   FSAS    7200  93.52GB   3.64TB (normal)
     shared   2.10.5                       0   FSAS    7200  93.52GB   3.64TB (normal)
     shared   2.10.7                       0   FSAS    7200  93.52GB   3.64TB (normal)
     shared   2.10.9                       0   FSAS    7200  93.52GB   3.64TB (normal)

Owner Node: NETAPP1-02
 Aggregate: aggr0_NETAPP1_02 (online, raid_dp) (block checksums)
  Plex: /aggr0_NETAPP1_02/plex0 (online, normal, active, pool0)
   RAID Group /aggr0_NETAPP1_02/plex0/rg0 (normal, block checksums)
                                                              Usable Physical
     Position Disk                        Pool Type     RPM     Size     Size Status
     -------- --------------------------- ---- ----- ------ -------- -------- ----------
     shared   2.10.2                       0   FSAS    7200  93.52GB   3.64TB (normal)
     shared   2.10.4                       0   FSAS    7200  93.52GB   3.64TB (normal)
     shared   2.10.6                       0   FSAS    7200  93.52GB   3.64TB (normal)
     shared   2.10.8                       0   FSAS    7200  93.52GB   3.64TB (normal)
8 entries were displayed.

 

Running "storage aggregate show-spare-disks" confirms the disks are now root-data partitioned (they were spares before trying to create the aggregate):

 

NETAPP1::storage aggregate> show-spare-disks -owner-name NETAPP1-01

Original Owner: NETAPP1-01
 Pool0
  Spare Pool

                                                             Usable Physical
 Disk             Type   Class          RPM Checksum           Size     Size Status
 ---------------- ------ ----------- ------ -------------- -------- -------- --------
 2.12.11          FSAS   capacity      7200 block            3.63TB   3.64TB zeroed
 2.13.11          FSAS   capacity      7200 block            3.63TB   3.64TB zeroed
 2.10.1           SSD    solid-state      - block            3.49TB   3.49TB zeroed
 2.10.11          SSD    solid-state      - block            3.49TB   3.49TB zeroed

Original Owner: NETAPP1-01
 Pool0
  Root-Data Partitioned Spares
                                                              Local    Local
                                                               Data     Root Physical
 Disk             Type   Class          RPM Checksum         Usable   Usable     Size Status
 ---------------- ------ ----------- ------ -------------- -------- -------- -------- --------
 2.11.1           FSAS   capacity      7200 block                0B  93.52GB   3.64TB zeroed
 2.11.3           FSAS   capacity      7200 block                0B  93.52GB   3.64TB zeroed
 2.11.5           FSAS   capacity      7200 block                0B  93.52GB   3.64TB zeroed
 2.11.7           FSAS   capacity      7200 block                0B  93.52GB   3.64TB zeroed
 2.11.9           FSAS   capacity      7200 block                0B  93.52GB   3.64TB zeroed
 2.11.11          FSAS   capacity      7200 block                0B  93.52GB   3.64TB zeroed
 2.12.1           FSAS   capacity      7200 block                0B  93.52GB   3.64TB zeroed
 2.12.3           FSAS   capacity      7200 block                0B  93.52GB   3.64TB zeroed
 2.12.5           FSAS   capacity      7200 block                0B  93.52GB   3.64TB zeroed
 2.12.7           FSAS   capacity      7200 block                0B  93.52GB   3.64TB zeroed
 2.12.9           FSAS   capacity      7200 block                0B  93.52GB   3.64TB zeroed
 2.13.1           FSAS   capacity      7200 block                0B  93.52GB   3.64TB zeroed
 2.13.3           FSAS   capacity      7200 block                0B  93.52GB   3.64TB zeroed
 2.13.5           FSAS   capacity      7200 block                0B  93.52GB   3.64TB zeroed
 2.13.7           FSAS   capacity      7200 block                0B  93.52GB   3.64TB zeroed
 2.13.9           FSAS   capacity      7200 block                0B  93.52GB   3.64TB zeroed
20 entries were displayed.

 

However, when I run "storage disk show", the disks in question show up as shared and being part of the aggregate that does not exist (because its creation failed):

 

NETAPP1::storage aggregate> storage disk show -owner NETAPP1-01
                     Usable           Disk    Container   Container
Disk                   Size Shelf Bay Type    Type        Name      Owner
---------------- ---------- ----- --- ------- ----------- --------- --------

Info: This cluster has partitioned disks. To get a complete list of spare disk capacity use "storage aggregate show-spare-disks".
2.10.1               3.49TB    10   1 SSD     spare       Pool0     NETAPP1-01
2.10.3               3.63TB    10   3 FSAS    shared      aggr0_NETAPP1_01, aggr1_NETAPP1_01 NETAPP1-01
2.10.5               3.63TB    10   5 FSAS    shared      aggr0_NETAPP1_01, aggr1_NETAPP1_01 NETAPP1-01
2.10.7               3.63TB    10   7 FSAS    shared      aggr0_NETAPP1_01, aggr1_NETAPP1_01 NETAPP1-01
2.10.9               3.63TB    10   9 FSAS    shared      aggr0_NETAPP1_01, aggr1_NETAPP1_01 NETAPP1-01
2.10.11              3.49TB    10  11 SSD     spare       Pool0     NETAPP1-01
2.11.1               3.63TB    11   1 FSAS    shared      aggr1_NETAPP1_01 NETAPP1-01
2.11.3               3.63TB    11   3 FSAS    shared      aggr1_NETAPP1_01 NETAPP1-01
2.11.5               3.63TB    11   5 FSAS    shared      aggr1_NETAPP1_01 NETAPP1-01
2.11.7               3.63TB    11   7 FSAS    shared      aggr1_NETAPP1_01 NETAPP1-01
2.11.9               3.63TB    11   9 FSAS    shared      aggr1_NETAPP1_01 NETAPP1-01
2.11.11              3.63TB    11  11 FSAS    shared      aggr1_NETAPP1_01 NETAPP1-01
2.12.1               3.63TB    12   1 FSAS    shared      aggr1_NETAPP1_01 NETAPP1-01
2.12.3               3.63TB    12   3 FSAS    shared      aggr1_NETAPP1_01 NETAPP1-01
2.12.5               3.63TB    12   5 FSAS    shared      aggr1_NETAPP1_01 NETAPP1-01
2.12.7               3.63TB    12   7 FSAS    shared      aggr1_NETAPP1_01 NETAPP1-01
2.12.9               3.63TB    12   9 FSAS    shared      aggr1_NETAPP1_01 NETAPP1-01
2.12.11              3.63TB    12  11 FSAS    spare       Pool0     NETAPP1-01
2.13.1               3.63TB    13   1 FSAS    shared      aggr1_NETAPP1_01 NETAPP1-01
2.13.3               3.63TB    13   3 FSAS    shared      aggr1_NETAPP1_01 NETAPP1-01
2.13.5               3.63TB    13   5 FSAS    shared      aggr1_NETAPP1_01 NETAPP1-01
2.13.7               3.63TB    13   7 FSAS    shared      aggr1_NETAPP1_01 NETAPP1-01
2.13.9               3.63TB    13   9 FSAS    shared      aggr1_NETAPP1_01 NETAPP1-01
2.13.11              3.63TB    13  11 FSAS    spare       Pool0     NETAPP1-01
24 entries were displayed.

 

The result is that I'm unable to retry the aggregate creation:

 

NETAPP1::storage aggregate> create -aggregate aggr1_NETAPP1_01 -diskcount 20 -disktype FSAS -node NETAPP1-01 -raidtype raid_dp -maxraidsize 20

Error: command failed: Aggregate creation would fail for aggregate "aggr1_NETAPP1_01" on node "NETAPP1-01". Reason: 20 disks needed, but not enough matching disks
       are available.

 

Which is only logical.

How do I get rid of the data-partitions to make the disks spare again? Note: I don't want to destroy the root partitions on the 4 disks in shelf #10 as they are part of the root aggregate.

1 ACCEPTED SOLUTION

xf5543
950 Views

A quick update on the issue: With assistance from NetApp Technical Support I was able to resolve this using the following steps:

NETAPP1::> rows 0

NETAPP1::> set diag

Warning: These diagnostic commands are for use by NetApp personnel only.
Do you want to continue? {y|n}: y

NETAPP1::*> debug vreport show
aggregate Differences:

Name             Reason   Attributes
--------         -------  ---------------------------------------------------
aggr1_NETAPP1_01(12345678-abcd-efab-cdef-0123456789ab) Present in WAFL Only
                          Node Name: NETAPP1-01
                          Aggregate UUID: 12345678-abcd-efab-cdef-0123456789ab
                          Aggregate State: online
                          Aggregate Raid Status: raid_dp
                          Aggregate HA Policy: sfo
                          Is Aggregate Root: false
                          Is Composite Aggregate: false


NETAPP1::*> debug vreport fix -aggregate aggr1_NETAPP1_01 -type aggregate -object aggr1_NETAPP1_01(12345678-abcd-efab-cdef-0123456789ab)

NETAPP1::*> debug vreport show
This table is currently empty.

Info: WAFL and VLDB volume/aggregate records are consistent.

After running these commands, the missing aggregate shows up as expected.

View solution in original post

1 REPLY 1

xf5543
951 Views

A quick update on the issue: With assistance from NetApp Technical Support I was able to resolve this using the following steps:

NETAPP1::> rows 0

NETAPP1::> set diag

Warning: These diagnostic commands are for use by NetApp personnel only.
Do you want to continue? {y|n}: y

NETAPP1::*> debug vreport show
aggregate Differences:

Name             Reason   Attributes
--------         -------  ---------------------------------------------------
aggr1_NETAPP1_01(12345678-abcd-efab-cdef-0123456789ab) Present in WAFL Only
                          Node Name: NETAPP1-01
                          Aggregate UUID: 12345678-abcd-efab-cdef-0123456789ab
                          Aggregate State: online
                          Aggregate Raid Status: raid_dp
                          Aggregate HA Policy: sfo
                          Is Aggregate Root: false
                          Is Composite Aggregate: false


NETAPP1::*> debug vreport fix -aggregate aggr1_NETAPP1_01 -type aggregate -object aggr1_NETAPP1_01(12345678-abcd-efab-cdef-0123456789ab)

NETAPP1::*> debug vreport show
This table is currently empty.

Info: WAFL and VLDB volume/aggregate records are consistent.

After running these commands, the missing aggregate shows up as expected.

Public