Active IQ Unified Manager Discussions

DFM is not picking all the events from Filer

MIP_GMT01
4,208 Views

I have recently noticed that not all the events are picked and logged by DFM from one to more filers. E.g in messages file I can see a volume went offline this morning but it was not picked up by DFM so there was no alert generated. Any idea guys?

Following is from messages file and attached is the screen shot of DFM Events for this filer

Thu Jul 18 08:53:10 BST [FASNOD11PR:wafl.write.fail.spcres:warning]: Write failed to file with space reservations due to lack of disk space in volume dbaclu01pr_uk (guarantee disabled, inode 30996, offset 33481302016, len 65536). 

Thu Jul 18 08:53:10 BST [FASNOD11PR:callhome.tgt.lun.nospc:CRITICAL]: Call home for LUN OUT OF SPACE 

Thu Jul 18 08:53:10 BST [FASNOD11PR:scsitarget.lun.noSpace:error]: LUN '/vol/dbaclu01pr_uk/dbaclu01pr_uk_tlogs/dbaclu01pr_uk_tlogs.lun' has run out of space. 

Thu Jul 18 08:53:10 BST [FASNOD11PR:lun.offline:warning]: LUN /vol/dbaclu01pr_uk/dbaclu01pr_uk_tlogs/dbaclu01pr_uk_tlogs.lun has been taken offline 

Thu Jul 18 08:53:15 BST [FASNOD11PR:wafl.write.fail.spcres:warning]: Write failed to file with space reservations due to lack of disk space in volume dbaclu01pr_uk (guarantee disabled, inode 100, offset 343216128, len 4096). 

Thu Jul 18 08:53:15 BST [FASNOD11PR:scsitarget.lun.noSpace:error]: LUN '/vol/dbaclu01pr_uk/dbaclu01pr_uk_mntpnt/dbaclu01pr_uk_mntpnt.lun' has run out of space. 

Thu Jul 18 08:53:15 BST [FASNOD11PR:lun.offline:warning]: LUN /vol/dbaclu01pr_uk/dbaclu01pr_uk_mntpnt/dbaclu01pr_uk_mntpnt.lun has been taken offline 

Thu Jul 18 08:53:21 BST [FASNOD11PR:wafl.write.fail.spcres:warning]: Write failed to file with space reservations due to lack of disk space in volume dbaclu01pr_uk (guarantee disabled, inode 28602, offset 3222380544, len 4096). 

Thu Jul 18 08:53:21 BST [FASNOD11PR:scsitarget.lun.noSpace:error]: LUN '/vol/dbaclu01pr_uk/dbaclu01pr_uk_user/dbaclu01pr_uk_user.lun' has run out of space. 

Thu Jul 18 08:53:21 BST [FASNOD11PR:lun.offline:warning]: LUN /vol/dbaclu01pr_uk/dbaclu01pr_uk_user/dbaclu01pr_uk_user.lun has been taken offline 

Thu Jul 18 08:55:01 BST [FASNOD11PR:wafl.vol.autoSize.fail:info]: Unable to grow volume 'trpapp01dr_applog' to recover space: Volume cannot be grown beyond maximum growth limit 

Thu Jul 18 08:55:30 BST [FASNOD11PR:wafl.write.fail.spcres:warning]: Write failed to file with space reservations due to lack of disk space in volume dbaclu01pr_uk (guarantee disabled, inode 19236, offset 3156412416, len 31232). 

Thu Jul 18 08:55:30 BST [FASNOD11PR:scsitarget.lun.noSpace:error]: LUN '/vol/dbaclu01pr_uk/dbaclu01pr_uk_system/dbaclu01pr_uk_system.lun' has run out of space. 

Thu Jul 18 08:55:30 BST [FASNOD11PR:lun.offline:warning]: LUN /vol/dbaclu01pr_uk/dbaclu01pr_uk_system/dbaclu01pr_uk_system.lun has been taken offline 

2 REPLIES 2

kryan
4,208 Views

DFM/UM should have detected and logged an event for an offline volume.

I recommend that you open a support case to determine the cause of the monitoring failure.

Troubleshooting this will require the following - it would be good to have it ready when you open the case:

1) "dfm report view events"  CLI output

2) "dfm report view events-history" CLI output

3) DFMDC

4) "dfm host diag CONTROLLER" CLI output to the affected controller

DFM will not log volume autosize events unless you have followed the steps in this KB:

How to configure DFM Alerts for Volume Autosize, Maxdirsize Reach, or Snapshot Autodeletion

Since I notice you are looking at the legacy OM UI I want to point out that all versions of DFM prior to 5.0 will go end of support as of 12/01/2013:

https://support.netapp.com/info/web/ECMP1147223.html

Thanks,

Kevin

arunchak
4,208 Views

HI,

Can you paste the "dfm host diag <filer ip>" output here?

Also can you check the dfmmonitor log for errors "/opt/NTAPdfm/log/dfmmonitor.log".

Thanks,

  Arun

Public