Active IQ Unified Manager Discussions
Active IQ Unified Manager Discussions
Howdy-
We're using DFM and Protection Manager to manage about 20 Filers with Snapmirror replication and OSSV. Most of the time the processor is more or less pinned so it's difficult to manage. DFM database backups fail as a matter of course. Reporting is very slow.
The box is installed on a Windows 2k3 ESX VM with 4 vProcs and 4GB or RAM. The management database is about 1.4 GB which seems larger than the typical sizes referenced at the NOW site.
1) Anyone think 1.4 GB is large for the DFM monitor db?
2) Anyone have similar experience with performance?
Thanks,
Scott
Solved! See The Solution
Have you changed any default monitoring interval options ?
Can you paste the copy of dfm diag esp the object counts and monitoring interval parts ?
Regards
adai
Hi Scott,
I think you are hitting the following bug.
http://now.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=301280
Kindly upgrade to 3.7D4 or latter. My suggestion would be to upgrade to 3.8.1 which is the current GA release.
Regards
adai
Thanks for the post. That's great information but unfortunately I double checked my version and it's 3.7.1...
3.7.1.6014 (3.7.1)
Scott
Since the post said 3.7, that was my first take.Since you are on 3.7.1 its not the product bug.
Looks like your VM is not able to handle, as evilensky suggested can you check your esx and the performance of this VM ?
Regards
adai
Message was edited by: Adaikkappan Arumugam changed the title to reflect the correct version(ie 3.7.1)
What does esxtop say about CPU scheduling efficiency? workloads which are poorly threaded, increased vCPU actually increase physical CPU contention creating poor performance for a virtual machine:
Additional instrumentation from outside the VM would probably help. Might be chasing rabbits, but multiple vCPU always raise eyebrows based on past experience.
http://communities.vmware.com/docs/DOC-5240
Message was edited by: evilensky
Thanks for your thoughts, folks. In troubleshooting this we did try dropping the CPU's one by one to see what would happen.
Basically, it got slower.
When I run ESXTop I don't see a %CSTP counter but I do see basically that it's the busiest guest on the ESX host which is a pretty good trick considering the other guests. None of the other boxes are complaining or slow, either.
Is 1.4 GB large for a DFM db? I'm looking for tuning options to see how much of this can be spooled to RAM or if there are other steps that can be taken.
I wonder if there aren't DFM tasks that can be disable or de-prioritized or scheduled.
Have you changed any default monitoring interval options ?
Can you paste the copy of dfm diag esp the object counts and monitoring interval parts ?
Regards
adai
We added a disk and CPU counter and lowered some of the retention schedules. In truth, the performance issue
predates the new counters but I'm open. If we have to get rid of them then so be it.
Details attached - excerpts below.
Thanks again. I really appreciate you (both) digging into this with me.
Scott
Management Station
Version 3.7.1.6014 (3.7.1)
15.5 GB free (51.7%)
Licensed Features Operations Manager: installed
Protection Manager: installed
Object Counts
Object Type Count
Administrator 6
Aggregate 28
Configuration 1
Data Set 28
Directory 89
Disk 565
DP Policy 38
DP Schedule 55
DP Throttle 4
Host 72
Initiator Group 58
Interface 118
Lun Path 325
Mgmt Station 1
Mirror 92
Network 15
OSSV Directory 944
OSSV Hosts 27
Primary Storage Systems 3
Qtree 154
report schedule 1
Resource Group 39
Resource Pool 7
Role 27
schedule 2
Secondary Storage Systems 18
SnapMirror Rels 204
SnapVault Rels 89
Storage Set 66
UserQuota 0
vFilers 0
Volume 490
Zapi Hosts 44
Monitoring Timestamps
Timestamp Name Interval Default Last Updated Error if older than ...
cacheTimestamp 5 minutes 5 minutes 16 Mar 14:05
ccTimestamp 2 hours 4 hours 16 Mar 12:10
cfTimestamp 2 minutes 5 minutes 16 Mar 14:10 Normal 16 Mar 14:08
cpuTimestamp 5 minutes 5 minutes 16 Mar 14:10 Normal 16 Mar 14:05
dfTimestamp 15 minutes 30 minutes 16 Mar 14:09 Normal 16 Mar 13:55
diskTimestamp 2 hours 4 hours 16 Mar 14:04 Normal 16 Mar 12:10
envTimestamp 5 minutes 5 minutes 16 Mar 14:10 Normal 16 Mar 14:05
fcTimestamp 5 minutes 5 minutes 16 Mar 14:10 Normal 16 Mar 14:05
fsTimestamp 15 minutes 15 minutes 16 Mar 14:10 Normal 16 Mar 13:55
hostPingTimestamp 1 minute 1 minute 16 Mar 14:10 Normal 16 Mar 14:09
ifTimestamp 5 minutes 15 minutes 16 Mar 14:10 Normal 16 Mar 14:05
licenseTimestamp 4 hours 4 hours 16 Mar 13:41 Normal 16 Mar 10:10
lunTimestamp 30 minutes 30 minutes 16 Mar 14:10 Normal 16 Mar 13:40
opsTimestamp 10 minutes 10 minutes 16 Mar 14:10 Normal 16 Mar 14:00
qtreeTimestamp 8 hours 8 hours 16 Mar 06:10
rbacTimestamp 1 day 1 day 16 Mar 12:18 Normal 15 Mar 14:10
userQuotaTimestamp 1 day 1 day 16 Mar 14:07 Normal 15 Mar 14:10
sanhostTimestamp 5 minutes 5 minutes 16 Mar 14:10 Normal 16 Mar 14:05
snapmirrorTimestamp 10 minutes 30 minutes 16 Mar 14:10 Normal 16 Mar 14:00
snapshotTimestamp 30 minutes 30 minutes 16 Mar 13:59 Normal 16 Mar 13:40
statusTimestamp 5 minutes 10 minutes 16 Mar 14:10 Normal 16 Mar 14:05
sysInfoTimestamp 15 minutes 1 hour 16 Mar 14:10 Normal 16 Mar 13:55
svTimestamp 30 minutes 30 minutes 16 Mar 14:10 Normal 16 Mar 13:40
svMonTimestamp 8 hours 8 hours 16 Mar 07:05 Normal 16 Mar 06:10
xmlQtreeTimestamp 8 hours 8 hours 16 Mar 14:09 Normal 16 Mar 06:10
vFilerTimestamp 1 hour 1 hour 16 Mar 13:10
Database
monitordb.db 1.75 GB
dbFileVersion 9
ConnCount 33 connections
MaxCacheSize 392184 KBytes
CurrentCacheSize 350280 KBytes
PeakCacheSize 392184 KBytes
PageSize 8192 Bytes
Logs
discovery 247 KB 16 Mar 14:04
DFMMonitor 2.34 MB 16 Mar 14:00
DFMEvent 1.07 MB 16 Mar 14:03
DFMServer 1.84 MB 16 Mar 13:56
DFMScheduler 401 KB 16 Mar 09:00
DFMWatchDog 300 KB 16 Mar 14:10
dfm 587 KB 16 Mar 14:10
sybase 9.23 MB 16 Mar 14:10
pingmon 264 KB 15 Mar 16:34
audit 613 KB 16 Mar 14:10
Services
sql Normal Started
http Normal Started
eventd Normal Started
monitor Normal Started
scheduler Normal Started
server Normal Started
watchdog Normal Started
Time Since Confirmed Alive
Eventd 6 seconds
Monitor 3 seconds
Scheduler 15 seconds
Server 14 seconds
Watchdog 3 seconds
You are running the following monitors aggressively, than its default values.
Can you bring them to default and see if still CPU utilization is very high?
Go to Web UI Control Center->Options->Monitoring and set the following to blank values and update.
ccTimestamp
I'm gonna say that fixed it. Setting the CC (Conformance Checking) to a bigger number means that the task runs less frequently and doesn't consume the processor as often. While we were at it we set some other tasks to run less frequently. Thanks for the help, I really appreciate it.
Scott
Reply via mail clipped off some part of the post.
Even these monitors are running more frequently than default.
cfTimestamp-----------------------------Cluster Failover Monitoring Interval
dfTimestamp-----------------------------Disk Free Space Monitoring Interval
diskTimestamp---------------------------Disk Monitoring Interval
ifTimestamp------------------------------Interface Monitoring Interval
snapmirrorTimestamp------------------SnapMirror Monitoring Interval
statusTimestamp------------------------Global Status Monitoring Interval
sysInfoTimestamp------------------------System Information Monitoring Interval
Bring them back to default values.
Regards
adai