Active IQ Unified Manager Discussions
Active IQ Unified Manager Discussions
Hi
I am experiencing issues with the qtree monitoring as our application teams rely on the alerts being cleared when they free up space in the qtree. I was running DFM 4.0.2 on a physical server with qtreeMonInterval value set to 30 min, but recently upgraded to DFM 5.1 on a VM. I recently logged the issue with NetApp and they suggested settting the values to default, which will not work in our environment. Other suggestions applied were:
dfm option set snapDeltaMonitorEnabled=no
dfm option set is EnableWGChecks=No
The server is performing fine, so I dont think there is any CPU\Memory bottleneck:
Relevant details from DFM diag:
Management Station
Version 5.1.0.15008 (5.1)
Executable Type 64-bit
Serial Number 1-50-000225
Edition Standard edition of DataFabric Manager server
Data ONTAP Operating Mode 7-Mode
Administrator Name
Host Name
Host IP Address
Host Full Name
Node limit 250 (currently managing 17)
Operating System Microsoft Windows Server 2008 R2 Service Pack 1 (Build 7601) x64 based
CPU Count 4
System Memory 16383 MB (load: 22%)
Installation Directory C:/Program Files/NetApp/DataFabric Manager/DFM
20.6 GB free (51.4%)
Perf Data Directory e:\perfdata
Data Export Directory C:/Program Files/NetApp/DataFabric Manager/DFM/dataExport
Database Backup Directory e:/data
Reports Archival Directory e:\reports
Database Directory e:/data
256 GB free (63.9%)
Database Log Directory e:/data
256 GB free (63.9%)
Licensed Features DataFabric Manager server: installed
Object Counts
Object Type Count
Administrator 31
Aggregate 57
App Policy 2
Array LUNs 0
Backup jobs started within last week 0
Clusters 0
Configuration 6
Datasets containing generic app objects 0
Datasets containing generic app objects and with storage service attached 0
Datasets with application policy attached 0
Datasets with storage service attached 0
Disks 2112
DP Managed OSSV Rels 0
DP Managed SnapMirror Rels 0
DP Managed SnapVault Rels 0
DP Policy 37
DP Schedule 19
DP Throttle 1
FCP Target 32
Host 32
Initiator Group 58
Interface 163
Lun Path 295
Mgmt Station 1
Mirror 57
Mount jobs started within last week 0
Network 10
OSSV Directory 30
OSSV Hosts 2
OSSV Rels 0
Primary Storage Systems 2
Prov Policy 16
Qtree 5384
Qtrees in DP Managed QtreeSnapMirror Rel 0
Qtrees in DP Managed SnapVault Rel 0
Qtrees in QtreeSnapMirror Rel 10
Qtrees in SnapVault Rel 0
QuotaUser 21194
report schedule 6
Resource Group 121
Resource Pool 2
Restore jobs started within last week 0
Role 34
schedule 5
Secondary Storage Systems 2
SnapMirror Rels 59
SnapVault Rels 0
Storage Service 16
Unmount jobs started within last week 0
UserQuota 15683
vFilers 13
Virtual Servers 0
Volume 552
Volumes in DP Managed VolumeSnapMirror Rel 0
Volumes in VolumeSnapMirror Rel 104
Zapi Hosts 17
Monitoring Timestamps
Timestamp Name Status Interval Default Last Updated Status Error if older than ...
ccTimestamp Normal 4 hours 4 hours 28 Feb 06:40
cfTimestamp Normal 5 minutes 5 minutes 28 Feb 10:40 Normal 28 Feb 10:35
clusterTimestamp Error Off 15 minutes Not updated ( clusterMonInterval is set to Off )
cpuTimestamp Normal 5 minutes 5 minutes 28 Feb 10:40 Normal 28 Feb 10:35
dfTimestamp Normal 30 minutes 30 minutes 28 Feb 10:39 Normal 28 Feb 10:10
diskTimestamp Normal 4 hours 4 hours 28 Feb 10:40 Normal 28 Feb 06:41
envTimestamp Normal 5 minutes 5 minutes 28 Feb 10:41 Normal 28 Feb 10:36
fsTimestamp Normal 15 minutes 15 minutes 28 Feb 10:40 Normal 28 Feb 10:26
hostPingTimestamp Normal 1 minute 1 minute 28 Feb 10:41 Normal 28 Feb 10:40
ifTimestamp Normal 15 minutes 15 minutes 28 Feb 10:41 Normal 28 Feb 10:26
licenseTimestamp Normal 4 hours 4 hours 28 Feb 10:38 Normal 28 Feb 06:41
lunTimestamp Normal 30 minutes 30 minutes 28 Feb 10:38 Normal 28 Feb 10:11
opsTimestamp Normal 10 minutes 10 minutes 28 Feb 10:41 Normal 28 Feb 10:31
qtreeTimestamp Error 1 hour 8 hours 28 Feb 09:41
rbacTimestamp Normal 1 day 1 day 27 Feb 18:11 Normal 27 Feb 10:41
userQuotaTimestamp Normal 1 day 1 day 28 Feb 10:22 Normal 27 Feb 10:41
sanhostTimestamp Error 6 hours 5 minutes 28 Feb 04:41
snapmirrorTimestamp Error 2 hours 30 minutes 28 Feb 10:33 Normal 28 Feb 08:41
snapshotTimestamp Normal 30 minutes 30 minutes 28 Feb 10:38 Normal 28 Feb 10:11
statusTimestamp Normal 10 minutes 10 minutes 28 Feb 10:41 Normal 28 Feb 10:31
sysInfoTimestamp Normal 1 hour 1 hour 28 Feb 10:38 Normal 28 Feb 09:41
svTimestamp Error 6 hours 30 minutes 28 Feb 08:29 Normal 28 Feb 04:41
svMonTimestamp Normal 8 hours 8 hours 28 Feb 07:14 Normal 28 Feb 02:41
xmlQtreeTimestamp Error 1 hour 8 hours 28 Feb 10:32 Normal 28 Feb 09:41
vFilerTimestamp Normal 1 hour 1 hour 28 Feb 10:32 Normal 28 Feb 09:41
vserverTimestamp Error Off 1 hour Not updated ( vserverMonInterval is set to Off )
Database
monitordb.db 1.71 GB
dbFileVersion 10
ConnCount 46 connections
MaxCacheSize 8388384 KBytes
CurrentCacheSize 1623080 KBytes
PeakCacheSize 1623080 KBytes
PageSize 8192 Bytes
DP Job Information
Job State Count
Jobs Running 0
Jobs Completed Total 1
Jobs Aborted Total 0
Jobs Aborting Total 0
Jobs Completed Today 0
Jobs Aborted Today 0
Jobs Aborting Today 0
Dataset Protection Status
Protection State Count
Protected 0
Unprotected 0
Event Counts
Table Count
Events 212990
Current Events 201401
Abnormal Events 5931
Event Type Counts
Event Type Count
userquota.kbytes 36898
userquota.kbytes.soft.limit 36090
userquota.files.soft.limit 36090
userquota.files 36090
sm.delete 9259
qtree.kbytes 7754
qtree.files 6844
user.email 5998
perf:cifs:cifs_latency 5064
qtree.growthrate 4404
Version 5.1.0.15008 (5.1)
Excerpts from the dfmmonitor.log
Feb 28 08:46:58 [dfmwatchdog: INFO]: [5108:0x500]: dbsrv11 up 1.0 days, mem = 1.13 GB, cpu = 28.9%, db = 1.71 GB, log = 11.6 MB
Feb 28 09:17:31 [dfmwatchdog: INFO]: [5108:0x500]: dbsrv11 up 1.0 days, mem = 1.15 GB, cpu = 18.4%, db = 1.71 GB, log = 13.1 MB
Feb 28 09:35:12 [dfmwatchdog: INFO]: [5108:0x500]: dfmmonitor up 1.0 days, mem = 142 MB, cpu = 27.0%
Feb 28 09:40:08 [dfmwatchdog: INFO]: [5108:0x500]: dfmmonitor up 1.0 days, mem = 93.7 MB, cpu = 27.1%
Feb 28 09:41:16 [dfmwatchdog: INFO]: [5108:0x500]: dfmmonitor up 1.0 days, mem = 104 MB, cpu = 26.9%
Feb 28 09:45:30 [dfmwatchdog: INFO]: [5108:0x500]: dfmmonitor up 1.1 days, mem = 117 MB, cpu = 5.5%
Feb 28 09:47:10 [dfmwatchdog: INFO]: [5108:0x500]: dbsrv11 up 1.1 days, mem = 1.15 GB, cpu = 25.2%, db = 1.71 GB, log = 14.6 MB
Feb 28 10:11:00 [dfmwatchdog: INFO]: [5108:0x500]: dfmmonitor up 1.1 days, mem = 78.9 MB, cpu = 29.5%
Feb 28 10:11:48 [dfmwatchdog: INFO]: [5108:0x500]: dfmmonitor up 1.1 days, mem = 88.8 MB, cpu = 26.7%
Feb 28 10:12:14 [dfmwatchdog: INFO]: [5108:0x500]: dfmmonitor up 1.1 days, mem = 99.7 MB, cpu = 27.0%
Feb 28 10:12:30 [dfmwatchdog: INFO]: [5108:0x500]: dfmmonitor up 1.1 days, mem = 113 MB, cpu = 28.8%
Feb 28 10:17:37 [dfmwatchdog: INFO]: [5108:0x500]: dfmmonitor up 1.1 days, mem = 136 MB, cpu = 12.2%
Feb 28 10:17:53 [dfmwatchdog: INFO]: [5108:0x500]: dbsrv11 up 1.1 days, mem = 1.15 GB, cpu = 42.0%, db = 1.71 GB, log = 16.2 MB
Feb 28 10:18:08 [dfmwatchdog: INFO]: [5108:0x500]: dfmmonitor up 1.1 days, mem = 104 MB, cpu = 17.0%
Feb 28 10:18:45 [dfmwatchdog: INFO]: [5108:0x500]: dfmmonitor up 1.1 days, mem = 116 MB, cpu = 4.5%
Feb 28 10:19:54 [dfmwatchdog: INFO]: [5108:0x500]: dfmmonitor up 1.1 days, mem = 129 MB, cpu = 5.8%
Feb 28 10:31:03 [dfmwatchdog: INFO]: [5108:0x500]: dfmmonitor up 1.1 days, mem = 115 MB, cpu = 26.1%
Feb 28 10:33:47 [dfmwatchdog: INFO]: [5108:0x500]: dfmmonitor up 1.1 days, mem = 134 MB, cpu = 28.9%
Although the status of xmlQtreeTimestamp is showing Normal, the DFM graph is way out:
Hi
Do you Monitor your Appliances with SNMPv1 or SNMPv3?
I recommend to use SNMPv3:
Set SNMPv3 as the preferred SNMP version. As this improves the response times for SNMP communication between DFM and Storage System
----> https://kb.netapp.com/library/CUSTOMER/solutions/1013266_%20OC5_Oct17_2012.pdf
How to Setup:
https://communities.netapp.com/docs/DOC-9314
regards
Thomas
I changed the monitoring to use SNMPv3 yesterday and left it overnight to see if it made a difference but I'm still seeing large lags on the xmlQtreeTimestamp. The weird thing is that the timestamp is not consistent across the controllers and it seems to be in sync with the userQuotaTimestamp
I'm sure something has changed in v5.1 as I don't recall this issue happening in 4.0.2. We are getting apps teams contacting us on a daily basis now to acknowledge a qtree alert as they have performed housekeeping on the share and are still receiving the alert a few hours later.
Any help appreciated
Hi
Just thought I would post an update to this issue. I've rolled back to using DFM 4.0.2 on a VM and have not seen any delays with the qtree monitoring samples, so something must have changed in v5.1. Unless someone identifies the actual issue, I don't think I will be in a hurry to move to v5.1.