INFO - Strange behavior in OM with regards to disk count

emanuel · ‎2010-04-13

Greetings all ... I ran into something odd and it seems to have corrected itself. Below is an email i wrote to my account team yesterday and I will explain what I saw yesterday but today, all the counting is correct.

Operations Manager ( 3.8 ) is reporting inconsistent disk numbers and i think what is happening is that it is partially counting MPHA disks. In the example below I am showing spare disks but as a cluster wide inspection, OM says one node has 239 disks and the other node 168. 168 is correct because each node has 12 shelves of 1TB disks, show i should always be seeing 168 disks. 239 makes no sense because it is not 168 disks nor is it 336 ( counting each disk twice ).

Of 12 controllers under my watch, two nodes of two seperate clusters are reporting false numbers; one on 7.3.2P4 and one on 8.0 GA.

What is interesting is that when i inventory just the spare disks; if it were to report both disk paths on a MPHA system, i should see 8 disks, not 6.

Has anyone here seen this behavior before?

########################

START OF EMAIL

########################

I am seeing something weird going on; it seems that the disks on SJCDBFILER01 A/B are not reporting correctly.

Now that we have OnTAP 8 running and using 1 TB drives ( and soon 600 GB FCAL ); I have noticed areas in OM reporting in regards to capacities and such that 1 TB drives are missing, even though OM is reporting the correct sizes ( 847827 MB = 1 TB NetApp drive ). I have also noticed interesting totals on number of disks that show one controller with 239 1 TB drives and its partner with 168.

eBay’s SRM is reporting the same thing and I know that SJCDBFILER01-A does not have 239 disks; 1) no single controller has 239 disks and 2) the partner counts all of its 168. Need to keep in mind that these are MPHA systems so I wonder if disks are showing up twice, but that is not exactly the case.

I start by looking at the spares disk count and it gets interesting

sjcdbfiler01-B> vol status -s

Spare disks

RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)

--------- ------ ------------- ---- ---- ---- ----- -------------- --------------

Spare disks for block or zoned checksum traditional volumes or aggregates

spare 0f.41 0f 2 9 FC:A - ATA 7200 847555/1735794176 847827/1736350304

spare 0g.43 0g 2 11 FC:A - ATA 7200 847555/1735794176 847827/1736350304

spare 0h.49 0h 3 1 FC:A - ATA 7200 847555/1735794176 847827/1736350304

spare 6a.19 6a 1 3 FC:B - ATA 7200 847555/1735794176 847827/1736350304 (not zeroed)

sjcdbfiler01-A*> vol status -s

Spare disks

RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)

--------- ------ ------------- ---- ---- ---- ----- -------------- --------------

Spare disks for block or zoned checksum traditional volumes or aggregates

spare 0b.41 0b 2 9 FC:A - ATA 7200 847555/1735794176 847827/1736350304

spare 0d.49 0d 3 1 FC:A - ATA 7200 847555/1735794176 847827/1736350304

spare 5a.19 5a 1 3 FC:B - ATA 7200 847555/1735794176 847827/1736350304

spare 5c.43 5c 2 11 FC:B - ATA 7200 847555/1735794176 847827/1736350304

Both controllers report four disks total, which is correct;

[root@sjcfilermon01 ~]# ssh sjcdbfiler01-am storage show disk -p | grep 0b.41

0b.41 A 5b.41 B 2 9

[root@sjcfilermon01 ~]# ssh sjcdbfiler01-am storage show disk -p | grep 0b.49

0b.49 A 5d.49 B 3 1

[root@sjcfilermon01 ~]# ssh sjcdbfiler01-am storage show disk -p | grep 5a.19

5a.19 B 0a.19 A 1 3

[root@sjcfilermon01 ~]# ssh sjcdbfiler01-am storage show disk -p | grep 5c.43

5c.43 B 0c.43 A 2 11

[root@sjcfilermon01 ~]# ssh sjcdbfiler01-bm storage show disk -p | grep 0g.43

0g.43 A 6c.43 B 2 11

[root@sjcfilermon01 ~]# ssh sjcdbfiler01-bm storage show disk -p | grep 0h.49

0h.49 A 6d.49 B 3 1

[root@sjcfilermon01 ~]# ssh sjcdbfiler01-bm storage show disk -p | grep 6a.19

6a.19 B 0e.19 A 1 3

[root@sjcfilermon01 ~]# ssh sjcdbfiler01-bm storage show disk -p | grep 0f.41

0f.41 A 6b.41 B 2 9

[root@sjcfilermon01 ~]#

MPHA is correct; but Operations Manager is different; it shows 6 disks for 01-A; it is displaying spare disks twice; almost…. I am suspecting that sjcdbfiler01-a and phxdbfiler03-a are reporting extra drives in OM. I am going to report this to the DFM forum and may have to open a case with support.

Operations Manager: - http://sjcfilermon01.sjc.ebay.com:8080/dfm/report/view/disks-spare/72?group=1145

sjcdbfiler01-A spare disk 0b.41 A90A NETAPP X269_HGEMI01TSSX ATA 2 9 847827

sjcdbfiler01-A spare disk 0d.49 A90A NETAPP X269_HGEMI01TSSX ATA 3 1 847827

sjcdbfiler01-A spare disk 5a.19 A90A NETAPP X269_HGEMI01TSSX ATA 1 3 847827

sjcdbfiler01-A spare disk 5b.41 A90A NETAPP X269_HGEMI01TSSX ATA 2 9 847827

sjcdbfiler01-A spare disk 5c.43 A90A NETAPP X269_HGEMI01TSSX ATA 2 11 847827

sjcdbfiler01-A spare disk 5d.49 A90A NETAPP X269_HGEMI01TSSX ATA 3 1 847827

sjcdbfiler01-B spare disk 0f.41 A90A NETAPP X269_HGEMI01TSSX ATA 2 9 847827

sjcdbfiler01-B spare disk 0g.43 A90A NETAPP X269_HGEMI01TSSX ATA 2 11 847827

sjcdbfiler01-B spare disk 0h.49 NA01 NETAPP X269_SMOOS01TSSX ATA 3 1 847827

sjcdbfiler01-B spare disk 6a.19 A90A NETAPP X269_HGEMI01TSSX ATA 1 3 847827

One last note; we may want to consider an upgrade to OM 4.0.x ( I will look into what “x” is ) to ensure we keep up with OM releases and future OnTAPs.

abhit · ‎2010-05-20

Have you upgraded to 4.0? If yes, are you still seeing this problem?

-Abhi

INFO - Strange behavior in OM with regards to disk count

I2A Registration is Open!