Subscribe

INFO - Strange behavior in OM with regards to disk count

Greetings all ... I ran into something odd and it seems to have corrected itself.  Below is an email i wrote to my account team yesterday and I will explain what I saw yesterday but today, all the counting is correct.

Operations Manager ( 3.8 ) is reporting inconsistent disk numbers and i think what is happening is that it is partially counting MPHA disks.  In the example below I am showing spare disks but as a cluster wide inspection, OM says one node has 239 disks and the other node 168.  168 is correct because each node has 12 shelves of 1TB disks, show i should always be seeing 168 disks.  239 makes no sense because it is not 168 disks nor is it 336 ( counting each disk twice ).

Of 12 controllers under my watch, two nodes of two seperate clusters are reporting false numbers; one on 7.3.2P4 and one on 8.0 GA.

What is interesting is that when i inventory just the spare disks; if it were to report both disk paths on a MPHA system, i should see 8 disks, not 6.

Has anyone here seen this behavior before?

########################

START OF EMAIL

########################

I am seeing something weird going on; it seems that the disks on SJCDBFILER01 A/B are not reporting correctly.

Now that we have OnTAP 8 running and using 1 TB drives ( and soon 600 GB FCAL ); I have noticed areas in OM reporting in regards to capacities and such that 1 TB drives are missing, even though OM is reporting the correct sizes ( 847827 MB = 1 TB NetApp drive ).  I have also noticed interesting totals on number of disks that show one controller with 239 1 TB drives and its partner with 168.

eBay’s SRM is reporting the same thing and I know that SJCDBFILER01-A does not have 239 disks; 1) no single controller has 239 disks and 2) the partner counts all of its 168.  Need to keep in mind that these are MPHA systems so I wonder if disks are showing up twice, but that is not exactly the case.

I start by looking at the spares disk count and it gets interesting

sjcdbfiler01-B> vol status -s

Spare disks

RAID Disk       Device  HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)

---------       ------  ------------- ---- ---- ---- ----- --------------    --------------

Spare disks for block or zoned checksum traditional volumes or aggregates

spare           0f.41   0f    2   9   FC:A   -  ATA   7200 847555/1735794176 847827/1736350304

spare           0g.43   0g    2   11  FC:A   -  ATA   7200 847555/1735794176 847827/1736350304

spare           0h.49   0h    3   1   FC:A   -  ATA   7200 847555/1735794176 847827/1736350304

spare           6a.19   6a    1   3   FC:B   -  ATA   7200 847555/1735794176 847827/1736350304 (not zeroed)

sjcdbfiler01-A*> vol status -s

Spare disks

RAID Disk       Device  HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)

---------       ------  ------------- ---- ---- ---- ----- --------------    --------------

Spare disks for block or zoned checksum traditional volumes or aggregates

spare           0b.41   0b    2   9   FC:A   -  ATA   7200 847555/1735794176 847827/1736350304

spare           0d.49   0d    3   1   FC:A   -  ATA   7200 847555/1735794176 847827/1736350304

spare           5a.19   5a    1   3   FC:B   -  ATA   7200 847555/1735794176 847827/1736350304

spare           5c.43   5c    2   11  FC:B   -  ATA   7200 847555/1735794176 847827/1736350304

Both controllers report four disks total, which is correct;

[root@sjcfilermon01 ~]# ssh sjcdbfiler01-am storage show disk -p | grep 0b.41

0b.41    A    5b.41      B     2    9

[root@sjcfilermon01 ~]# ssh sjcdbfiler01-am storage show disk -p | grep 0b.49

0b.49    A    5d.49      B     3    1

[root@sjcfilermon01 ~]# ssh sjcdbfiler01-am storage show disk -p | grep 5a.19

5a.19    B    0a.19      A     1    3

[root@sjcfilermon01 ~]# ssh sjcdbfiler01-am storage show disk -p | grep 5c.43

5c.43    B    0c.43      A     2   11

[root@sjcfilermon01 ~]# ssh sjcdbfiler01-bm storage show disk -p | grep 0g.43

0g.43    A    6c.43      B     2   11

[root@sjcfilermon01 ~]# ssh sjcdbfiler01-bm storage show disk -p | grep 0h.49

0h.49    A    6d.49      B     3    1

[root@sjcfilermon01 ~]# ssh sjcdbfiler01-bm storage show disk -p | grep 6a.19

6a.19    B    0e.19      A     1    3

[root@sjcfilermon01 ~]# ssh sjcdbfiler01-bm storage show disk -p | grep 0f.41

0f.41    A    6b.41      B     2    9

[root@sjcfilermon01 ~]#

MPHA is correct; but Operations Manager is different; it shows 6 disks for 01-A; it is displaying spare disks twice; almost…. I am suspecting that sjcdbfiler01-a and phxdbfiler03-a are reporting extra drives in OM.  I am going to report this to the DFM forum and may have to open a case with support.

Operations Manager: - http://sjcfilermon01.sjc.ebay.com:8080/dfm/report/view/disks-spare/72?group=1145

  sjcdbfiler01-A spare disk 0b.41 A90A NETAPP X269_HGEMI01TSSX ATA 2 9   847827

sjcdbfiler01-A spare disk 0d.49 A90A NETAPP X269_HGEMI01TSSX ATA 3 1   847827

sjcdbfiler01-A spare disk 5a.19 A90A NETAPP X269_HGEMI01TSSX ATA 1 3   847827

sjcdbfiler01-A spare disk 5b.41 A90A NETAPP X269_HGEMI01TSSX ATA 2 9   847827

sjcdbfiler01-A spare disk 5c.43 A90A NETAPP X269_HGEMI01TSSX ATA 2 11  847827

sjcdbfiler01-A spare disk 5d.49 A90A NETAPP X269_HGEMI01TSSX ATA 3 1   847827

sjcdbfiler01-B spare disk 0f.41 A90A NETAPP X269_HGEMI01TSSX ATA 2 9   847827

sjcdbfiler01-B spare disk 0g.43 A90A NETAPP X269_HGEMI01TSSX ATA 2 11  847827

sjcdbfiler01-B spare disk 0h.49 NA01 NETAPP X269_SMOOS01TSSX ATA 3 1   847827

sjcdbfiler01-B spare disk 6a.19 A90A NETAPP X269_HGEMI01TSSX ATA 1 3   847827

One last note; we may want to consider an upgrade to OM 4.0.x ( I will look into what “x” is ) to ensure we keep up with OM releases and future OnTAPs.

Re: INFO - Strange behavior in OM with regards to disk count

Have you upgraded to 4.0? If yes, are you still seeing this problem?

-Abhi