Greetings all ... I ran into something odd and it seems to have corrected itself. Below is an email i wrote to my account team yesterday and I will explain what I saw yesterday but today, all the counting is correct.
Operations Manager ( 3.8 ) is reporting inconsistent disk numbers and i think what is happening is that it is partially counting MPHA disks. In the example below I am showing spare disks but as a cluster wide inspection, OM says one node has 239 disks and the other node 168. 168 is correct because each node has 12 shelves of 1TB disks, show i should always be seeing 168 disks. 239 makes no sense because it is not 168 disks nor is it 336 ( counting each disk twice ).
Of 12 controllers under my watch, two nodes of two seperate clusters are reporting false numbers; one on 7.3.2P4 and one on 8.0 GA.
What is interesting is that when i inventory just the spare disks; if it were to report both disk paths on a MPHA system, i should see 8 disks, not 6.
Has anyone here seen this behavior before?
########################
START OF EMAIL
########################
I am seeing something weird going on; it seems that the disks on SJCDBFILER01 A/B are not reporting correctly.
Now that we have OnTAP 8 running and using 1 TB drives ( and soon 600 GB FCAL ); I have noticed areas in OM reporting in regards to capacities and such that 1 TB drives are missing, even though OM is reporting the correct sizes ( 847827 MB = 1 TB NetApp drive ). I have also noticed interesting totals on number of disks that show one controller with 239 1 TB drives and its partner with 168.
eBay’s SRM is reporting the same thing and I know that SJCDBFILER01-A does not have 239 disks; 1) no single controller has 239 disks and 2) the partner counts all of its 168. Need to keep in mind that these are MPHA systems so I wonder if disks are showing up twice, but that is not exactly the case.
I start by looking at the spares disk count and it gets interesting
sjcdbfiler01-B> vol status -s
Spare disks
RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)
--------- ------ ------------- ---- ---- ---- ----- -------------- --------------
Spare disks for block or zoned checksum traditional volumes or aggregates
spare 0f.41 0f 2 9 FC:A - ATA 7200 847555/1735794176 847827/1736350304
spare 0g.43 0g 2 11 FC:A - ATA 7200 847555/1735794176 847827/1736350304
spare 0h.49 0h 3 1 FC:A - ATA 7200 847555/1735794176 847827/1736350304
spare 6a.19 6a 1 3 FC:B - ATA 7200 847555/1735794176 847827/1736350304 (not zeroed)
sjcdbfiler01-A*> vol status -s
Spare disks
RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)
--------- ------ ------------- ---- ---- ---- ----- -------------- --------------
Spare disks for block or zoned checksum traditional volumes or aggregates
spare 0b.41 0b 2 9 FC:A - ATA 7200 847555/1735794176 847827/1736350304
spare 0d.49 0d 3 1 FC:A - ATA 7200 847555/1735794176 847827/1736350304
spare 5a.19 5a 1 3 FC:B - ATA 7200 847555/1735794176 847827/1736350304
spare 5c.43 5c 2 11 FC:B - ATA 7200 847555/1735794176 847827/1736350304
Both controllers report four disks total, which is correct;
[root@sjcfilermon01 ~]# ssh sjcdbfiler01-am storage show disk -p | grep 0b.41
0b.41 A 5b.41 B 2 9
[root@sjcfilermon01 ~]# ssh sjcdbfiler01-am storage show disk -p | grep 0b.49
0b.49 A 5d.49 B 3 1
[root@sjcfilermon01 ~]# ssh sjcdbfiler01-am storage show disk -p | grep 5a.19
5a.19 B 0a.19 A 1 3
[root@sjcfilermon01 ~]# ssh sjcdbfiler01-am storage show disk -p | grep 5c.43
5c.43 B 0c.43 A 2 11
[root@sjcfilermon01 ~]# ssh sjcdbfiler01-bm storage show disk -p | grep 0g.43
0g.43 A 6c.43 B 2 11
[root@sjcfilermon01 ~]# ssh sjcdbfiler01-bm storage show disk -p | grep 0h.49
0h.49 A 6d.49 B 3 1
[root@sjcfilermon01 ~]# ssh sjcdbfiler01-bm storage show disk -p | grep 6a.19
6a.19 B 0e.19 A 1 3
[root@sjcfilermon01 ~]# ssh sjcdbfiler01-bm storage show disk -p | grep 0f.41
0f.41 A 6b.41 B 2 9
[root@sjcfilermon01 ~]#
MPHA is correct; but Operations Manager is different; it shows 6 disks for 01-A; it is displaying spare disks twice; almost…. I am suspecting that sjcdbfiler01-a and phxdbfiler03-a are reporting extra drives in OM. I am going to report this to the DFM forum and may have to open a case with support.
Operations Manager: - http://sjcfilermon01.sjc.ebay.com:8080/dfm/report/view/disks-spare/72?group=1145
sjcdbfiler01-A spare disk 0b.41 A90A NETAPP X269_HGEMI01TSSX ATA 2 9 847827
sjcdbfiler01-A spare disk 0d.49 A90A NETAPP X269_HGEMI01TSSX ATA 3 1 847827
sjcdbfiler01-A spare disk 5a.19 A90A NETAPP X269_HGEMI01TSSX ATA 1 3 847827
sjcdbfiler01-A spare disk 5b.41 A90A NETAPP X269_HGEMI01TSSX ATA 2 9 847827
sjcdbfiler01-A spare disk 5c.43 A90A NETAPP X269_HGEMI01TSSX ATA 2 11 847827
sjcdbfiler01-A spare disk 5d.49 A90A NETAPP X269_HGEMI01TSSX ATA 3 1 847827
sjcdbfiler01-B spare disk 0f.41 A90A NETAPP X269_HGEMI01TSSX ATA 2 9 847827
sjcdbfiler01-B spare disk 0g.43 A90A NETAPP X269_HGEMI01TSSX ATA 2 11 847827
sjcdbfiler01-B spare disk 0h.49 NA01 NETAPP X269_SMOOS01TSSX ATA 3 1 847827
sjcdbfiler01-B spare disk 6a.19 A90A NETAPP X269_HGEMI01TSSX ATA 1 3 847827
One last note; we may want to consider an upgrade to OM 4.0.x ( I will look into what “x” is ) to ensure we keep up with OM releases and future OnTAPs.