Active IQ Unified Manager Discussions
Active IQ Unified Manager Discussions
Howdy-
We're using DFM and Protection Manager to manage about 20 Filers with Snapmirror replication and OSSV. Most of the time the processor is more or less pinned so it's difficult to manage. DFM database backups fail as a matter of course. Reporting is very slow.
The box is installed on a Windows 2k3 ESX VM with 4 vProcs and 4GB or RAM. The management database is about 1.4 GB which seems larger than the typical sizes referenced at the NOW site.
1) Anyone think 1.4 GB is large for the DFM monitor db?
2) Anyone have similar experience with performance?
Thanks,
Scott
Solved! See The Solution
Have you changed any default monitoring interval options ?
Can you paste the copy of dfm diag esp the object counts and monitoring interval parts ?
Regards
adai
Hi Scott,
I think you are hitting the following bug.
http://now.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=301280
Kindly upgrade to 3.7D4 or latter. My suggestion would be to upgrade to 3.8.1 which is the current GA release.
Regards
adai
Thanks for the post. That's great information but unfortunately I double checked my version and it's 3.7.1...
3.7.1.6014 (3.7.1)
Scott
Since the post said 3.7, that was my first take.Since you are on 3.7.1 its not the product bug.
Looks like your VM is not able to handle, as evilensky suggested can you check your esx and the performance of this VM ?
Regards
adai
Message was edited by: Adaikkappan Arumugam changed the title to reflect the correct version(ie 3.7.1)
What does esxtop say about CPU scheduling efficiency? workloads which are poorly threaded, increased vCPU actually increase physical CPU contention creating poor performance for a virtual machine:
Additional instrumentation from outside the VM would probably help. Might be chasing rabbits, but multiple vCPU always raise eyebrows based on past experience.
http://communities.vmware.com/docs/DOC-5240
Message was edited by: evilensky
Thanks for your thoughts, folks. In troubleshooting this we did try dropping the CPU's one by one to see what would happen.
Basically, it got slower.
When I run ESXTop I don't see a %CSTP counter but I do see basically that it's the busiest guest on the ESX host which is a pretty good trick considering the other guests. None of the other boxes are complaining or slow, either.
Is 1.4 GB large for a DFM db? I'm looking for tuning options to see how much of this can be spooled to RAM or if there are other steps that can be taken.
I wonder if there aren't DFM tasks that can be disable or de-prioritized or scheduled.
Have you changed any default monitoring interval options ?
Can you paste the copy of dfm diag esp the object counts and monitoring interval parts ?
Regards
adai
We added a disk and CPU counter and lowered some of the retention schedules. In truth, the performance issue
predates the new counters but I'm open. If we have to get rid of them then so be it.
Details attached - excerpts below.
Thanks again. I really appreciate you (both) digging into this with me.
Scott
Management Station
Version                    3.7.1.6014 (3.7.1) 
                           15.5 GB free (51.7%) 
Licensed Features          Operations Manager: installed
                           Protection Manager: installed
 
Object Counts
Object Type                    Count
Administrator                  6 
Aggregate                      28 
Configuration                  1 
Data Set                       28 
Directory                      89 
Disk                           565 
DP Policy                      38 
DP Schedule                    55 
DP Throttle                    4 
Host                           72 
Initiator Group                58 
Interface                      118 
Lun Path                       325 
Mgmt Station                   1 
Mirror                         92 
Network                        15 
OSSV Directory                 944 
OSSV Hosts                     27 
Primary Storage Systems        3 
Qtree                          154 
report schedule                1 
Resource Group                 39 
Resource Pool                  7 
Role                           27 
schedule                       2 
Secondary Storage Systems      18 
SnapMirror Rels                204 
SnapVault Rels                 89 
Storage Set                    66 
UserQuota                      0 
vFilers                        0 
Volume                         490 
Zapi Hosts                     44
Monitoring Timestamps
Timestamp Name       Interval     Default      Last Updated Error if older than ...
cacheTimestamp       5 minutes   5 minutes      16 Mar 14:05 
ccTimestamp          2 hours     4 hours        16 Mar 12:10 
cfTimestamp          2 minutes   5 minutes    16 Mar 14:10 Normal 16 Mar 14:08 
cpuTimestamp         5 minutes   5 minutes    16 Mar 14:10 Normal 16 Mar 14:05 
dfTimestamp          15 minutes  30 minutes   16 Mar 14:09 Normal 16 Mar 13:55 
diskTimestamp        2 hours     4 hours      16 Mar 14:04 Normal 16 Mar 12:10 
envTimestamp         5 minutes   5 minutes    16 Mar 14:10 Normal 16 Mar 14:05 
fcTimestamp          5 minutes   5 minutes    16 Mar 14:10 Normal 16 Mar 14:05 
fsTimestamp          15 minutes  15 minutes   16 Mar 14:10 Normal 16 Mar 13:55 
hostPingTimestamp    1 minute    1 minute     16 Mar 14:10 Normal 16 Mar 14:09 
ifTimestamp          5 minutes   15 minutes   16 Mar 14:10 Normal 16 Mar 14:05 
licenseTimestamp     4 hours     4 hours      16 Mar 13:41 Normal 16 Mar 10:10 
lunTimestamp         30 minutes  30 minutes   16 Mar 14:10 Normal 16 Mar 13:40 
opsTimestamp         10 minutes  10 minutes   16 Mar 14:10 Normal 16 Mar 14:00 
qtreeTimestamp       8 hours     8 hours        16 Mar 06:10 
rbacTimestamp        1 day       1 day        16 Mar 12:18 Normal 15 Mar 14:10 
userQuotaTimestamp   1 day       1 day        16 Mar 14:07 Normal 15 Mar 14:10 
sanhostTimestamp     5 minutes   5 minutes    16 Mar 14:10 Normal 16 Mar 14:05 
snapmirrorTimestamp  10 minutes  30 minutes   16 Mar 14:10 Normal 16 Mar 14:00 
snapshotTimestamp    30 minutes  30 minutes   16 Mar 13:59 Normal 16 Mar 13:40 
statusTimestamp      5 minutes   10 minutes   16 Mar 14:10 Normal 16 Mar 14:05 
sysInfoTimestamp     15 minutes  1 hour       16 Mar 14:10 Normal 16 Mar 13:55 
svTimestamp          30 minutes  30 minutes   16 Mar 14:10 Normal 16 Mar 13:40 
svMonTimestamp       8 hours     8 hours      16 Mar 07:05 Normal 16 Mar 06:10 
xmlQtreeTimestamp    8 hours     8 hours      16 Mar 14:09 Normal 16 Mar 06:10 
vFilerTimestamp      1 hour      1 hour         16 Mar 13:10
Database
monitordb.db               1.75 GB 
dbFileVersion              9
ConnCount                  33 connections 
MaxCacheSize               392184 KBytes 
CurrentCacheSize           350280 KBytes 
PeakCacheSize              392184 KBytes 
PageSize                   8192 Bytes
Logs
discovery      247 KB 16 Mar 14:04 
DFMMonitor     2.34 MB 16 Mar 14:00 
DFMEvent       1.07 MB 16 Mar 14:03 
DFMServer      1.84 MB 16 Mar 13:56 
DFMScheduler   401 KB 16 Mar 09:00 
DFMWatchDog    300 KB 16 Mar 14:10 
dfm            587 KB 16 Mar 14:10 
sybase         9.23 MB 16 Mar 14:10 
pingmon        264 KB 15 Mar 16:34 
audit          613 KB 16 Mar 14:10
Services
sql        Normal Started 
http       Normal Started 
eventd     Normal Started 
monitor    Normal Started 
scheduler  Normal Started 
server     Normal Started 
watchdog   Normal Started
Time Since Confirmed Alive
Eventd     6 seconds 
Monitor    3 seconds 
Scheduler  15 seconds 
Server     14 seconds 
Watchdog   3 seconds
You are running the following monitors aggressively, than its default values.
Can you bring them to default and see if still CPU utilization is very high?
Go to Web UI Control Center->Options->Monitoring and set the following to blank values and update.
ccTimestamp
I'm gonna say that fixed it. Setting the CC (Conformance Checking) to a bigger number means that the task runs less frequently and doesn't consume the processor as often. While we were at it we set some other tasks to run less frequently. Thanks for the help, I really appreciate it.
Scott
Reply via mail clipped off some part of the post.
Even these monitors are running more frequently than default.
cfTimestamp-----------------------------Cluster Failover Monitoring Interval
dfTimestamp-----------------------------Disk Free Space Monitoring Interval
diskTimestamp---------------------------Disk Monitoring Interval
ifTimestamp------------------------------Interface Monitoring Interval
snapmirrorTimestamp------------------SnapMirror Monitoring Interval
statusTimestamp------------------------Global Status Monitoring Interval
sysInfoTimestamp------------------------System Information Monitoring Interval
Bring them back to default values.
Regards
adai