Subscribe

Incorrect number of disks showing on operations manager

Hi all,

I have a weird case on my operations manager. I found that the number of Total disks showing are incorrect. My filer only assigned with 28, but it is showing 50.When I view the report of all disks, it show duplication of the disks. E.g. in shelf 1, bay 2 appear twice. But it is showing a different disk name, 0a.18 and 0d.18.

I attached a screen captured. as shown most of the disks are appear twice here.

I run the "aggr status -r" command on my filter, it show the correct number of disk:

Aggregate aggr0 (online, raid_dp) (block checksums)
  Plex /aggr0/plex0 (online, normal, active, pool0)
    RAID group /aggr0/plex0/rg0 (normal)

      RAID Disk Device  HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)
      --------- ------  ------------- ---- ---- ---- ----- --------------    --------------
      dparity   0d.16   0d    1   0   FC:B   0  ATA   7200 847555/1735794176 847827/1736350304
      parity    0d.17   0d    1   1   FC:B   0  ATA   7200 847555/1735794176 847827/1736350304
      data      0a.18   0a    1   2   FC:A   0  ATA   7200 847555/1735794176 847827/1736350304
      data      0a.19   0a    1   3   FC:A   0  ATA   7200 847555/1735794176 847827/1736350304
      data      0d.20   0d    1   4   FC:B   0  ATA   7200 847555/1735794176 847827/1736350304
      data      0a.21   0a    1   5   FC:A   0  ATA   7200 847555/1735794176 847827/1736350304
      data      0d.22   0d    1   6   FC:B   0  ATA   7200 847555/1735794176 847827/1736350304
      data      0d.32   0d    2   0   FC:B   0  ATA   7200 847555/1735794176 847827/1736350304
      data      0d.33   0d    2   1   FC:B   0  ATA   7200 847555/1735794176 847827/1736350304
      data      0a.34   0a    2   2   FC:A   0  ATA   7200 847555/1735794176 847827/1736350304
      data      0a.35   0a    2   3   FC:A   0  ATA   7200 847555/1735794176 847827/1736350304
      data      0d.36   0d    2   4   FC:B   0  ATA   7200 847555/1735794176 847827/1736350304
      data      0a.37   0a    2   5   FC:A   0  ATA   7200 847555/1735794176 847827/1736350304

Aggregate aggr1 (online, raid_dp) (block checksums)
  Plex /aggr1/plex0 (online, normal, active, pool0)
    RAID group /aggr1/plex0/rg0 (normal)

      RAID Disk Device  HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)
      --------- ------  ------------- ---- ---- ---- ----- --------------    --------------
      dparity   0b.48   0b    3   0   FC:A   0  ATA   7200 847555/1735794176 847827/1736350304
      parity    0b.49   0b    3   1   FC:A   0  ATA   7200 847555/1735794176 847827/1736350304
      data      0c.50   0c    3   2   FC:B   0  ATA   7200 847555/1735794176 847827/1736350304
      data      0c.51   0c    3   3   FC:B   0  ATA   7200 847555/1735794176 847827/1736350304
      data      0b.52   0b    3   4   FC:A   0  ATA   7200 847555/1735794176 847827/1736350304
      data      0b.53   0b    3   5   FC:A   0  ATA   7200 847555/1735794176 847827/1736350304
      data      0c.54   0c    3   6   FC:B   0  ATA   7200 847555/1735794176 847827/1736350304
      data      0b.64   0b    4   0   FC:A   0  ATA   7200 847555/1735794176 847827/1736350304
      data      0c.65   0c    4   1   FC:B   0  ATA   7200 847555/1735794176 847827/1736350304
      data      0c.66   0c    4   2   FC:B   0  ATA   7200 847555/1735794176 847827/1736350304
      data      0b.67   0b    4   3   FC:A   0  ATA   7200 847555/1735794176 847827/1736350304
      data      0c.68   0c    4   4   FC:B   0  ATA   7200 847555/1735794176 847827/1736350304
      data      0c.69   0c    4   5   FC:B   0  ATA   7200 847555/1735794176 847827/1736350304

Anyone have idea on this? Is that problem with the disk configuration or assignment on my filer? I have other filer on all operations manager, but only this filer reporting the wrong number of disks.

Thanks,

Jason

Re: Incorrect number of disks showing on operations manager

Hi Jason,

Which version of Operations Manager are you running and does this one Filer have software-based disk ownership / multipath-HA configured? Are your other Filers configured multipath-HA?

Richard

Re: Incorrect number of disks showing on operations manager

Hi Richard,

Operations Manager running 3.7.1

Should be software-based disk ownership,as I see this message from my filer:

Fri Jan 15 21:38:22 GMT [diskown.isEnabled:info]: software ownership has been enabled for this system

Or is there any command I can check if it is software-based disk ownership or multipath-HA?

Thanks,
Jason

Re: Incorrect number of disks showing on operations manager

You can show disk ownership using:

disk show -v

You can see if you have two paths to each disk using:

storage show disk -p

I'm thinking you might be experiencing this bug:

http://now.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=364066

because it sounds similar to a problem I've seen.

The bug is listed as fixed in 3.7D13 which I know was released after 3.7.1 so the fix may not be included in your current release.

Re: Incorrect number of disks showing on operations manager

The Bugs online also says its fixed in 3.8.1 which is expected in couple of weeks.Also its a GA release.

Regards

adai

Re: Incorrect number of disks showing on operations manager

"disk show -v" return me the correct the disk ownership. And "storage show disk -p" shows that there are both primary and secondary path to all disks.

I will check on the bug and see if upgrade will fix that. Thanks

Re: Incorrect number of disks showing on operations manager

What is your diskmon monitoring interval ?

As per the bugs online workaround,  setting this interval to more than 1 hours will over come this issue.

Can you try setting it to 75minutes or 90 minutes if it was already less than 60 minutes.

Regards

adai

Re: Incorrect number of disks showing on operations manager

That definitely shows that you have Multipath enabled system. Have you noticed that Multipath status changes of your Filer recently like Multipath changed to single path. Check your log for missing disk warning.

There is workaround mentioned in the bug report. Try to follow the same.

Thanks

Daniel

Re: Incorrect number of disks showing on operations manager

Even though you follow the workaround by setting diskmoninterval to 1hour 15mins, there is a possibility that you might see duplicate entries in reports for the minimum of 1 hour window.

There were few internal burts filed on "disks-multipathing" in Operations Manager. All these burt fixes were already gone into DFM 3.8.1 & DFM 4.0.

These DFM versions are having a complete solution for all your disks multipathing problems.

Regards,

Saravanan

Re: Incorrect number of disks showing on operations manager

Hi all,

I just find that my operations manager now returning the correct number of the disks. I don't really did anything purposely to fix it. What I have done is to do a cluster failover, boot the controller to maintenance mode and check all the disk ownership. I don't know if these actions tie to the fix on that.

But anyway, thanks everyone for giving me suggestion.

Thanks,

Jason