Active IQ Unified Manager Discussions
Active IQ Unified Manager Discussions
Hi All,
I am currently setting up Ops Mgr to just monitor our filers (no protection set up) and have successfully got it alerting us for volume full events. I've set up an alert using snapmirror:out of date and have configured the discovered retention policies on our nightly mirrors to 24 hours.
All worked well on the first overnight run, in that it reported a lot of the mirrors as out of date as it was in the process of updating them (This was only to prove the alert. Once all is working OK, I'll exclude the monitoring between midnight and 06:00). I acknowledged the alerts but on the next run I didn't receive any alerts at all concerning the snapmirrors and there are no entries in the event log.
I also tested this out on a couple of hourly mirrors by changing the lag in the retention policy down to, say, 20 mins and it triggered an alert once only.
Any suggestions? Is there anything that I am missing?
As stated before, we are only using Ops Mgr for monitoring and are still relying on snapmirror.conf for scheduling. Obviously, alerts that only trigger once are not much good...
Thanks,
Simon.
Hi Simon,
Once you acknowledge the events they are no more violations(as you acknowledged) and they will not be alerted again until there is a state change.
Also in Operations Manager Alarms can be configured as a Repeated one by specifying "repeat-notify" option.
-KJag
Thanks kjag, thats how we have our volume full alerts set up; to repeat every 15 mins until acknowledged.
The issue here is that we can acknowledge them and then they will trigger again the next time that the volume reaches the threshold but the same process doesn't seem to work for the snapmirror:out of date alert. The only difference between is that we are using vol almost full, which has a severity of warning, as opposed to snapmirror:out of date, which has a severity of error, but this shouldn't matter IMO.
Hi Simon,
As kjag said, did your state change from snapmirror out of date to nearly out of date or date ok ? Did the state of your snapmirror every change ? after you ack ? to any of the below
other than out of date ?
[root@ ~]# dfm eventtype list | grep -i sm.lag
snapmirror:date-ok Normal sm.lag
snapmirror:deleted Information sm.lag
snapmirror:nearly-out-of-date Warning sm.lag
snapmirror:out-of-date Error sm.lag
[root@ ~]#
if not, then until there is a state change a new event and its alert will not be triggered.
Regards
adai
Thanks, thats helped makes things clearer.
I'm not getting any snapmirror:date-ok event appear in the logs once the snapmirror is updated and the lag goes back to normal hance why the alarm is not triggered when/if the snapmirror lags again.
Any ideas?
Simon.
Hi Simon,
can you get the output of following cli for the snapmirror relationship which is lagging ?
dfm report view events
dfm report view events-history
dfm host diag <filer id/ip> for source and destination filer of the snapmirror relationship ?
dfm version to know what version of dfm is running.
Regards
adai
Adai,
Below is the output as requested. I've used just one vol as an example but its the same result on all our Snapmirror relationships:-
dfm report view events 5053
Severity Event ID Event Triggered Ack'ed By Ack'ed Source ID Source
----------- -------- ----------------------- ------------ --------- ------------ --------- --------------------------
Error 18042 SnapMirror: Out of Date 24 Jan 01:56 5053 DRFILER1:/vol_vm_mobapps1d
Information 17493 SnapMirror: Discovered 19 Jan 12:09 5053 DRFILER1:/vol_vm_mobapps1d
dfm report view events-history 5053
Severity Event ID Event Triggered Ack'ed By Ack'ed Deleted By Deleted Source ID Source
----------- -------- ----------------------------- ------------ --------- ------------ ---------- ------------ --------- --------------------------
Error 18042 SnapMirror: Out of Date 24 Jan 01:56 5053 DRFILER1:/vol_vm_mobapps1d
Normal 17494 SnapMirror: Date Ok 19 Jan 12:09 5053 DRFILER1:/vol_vm_mobapps1d
Information 17493 SnapMirror: Discovered 19 Jan 12:09 5053 DRFILER1:/vol_vm_mobapps1d
Normal 16602 Volume Space Reserve OK 19 Jan 12:04 5053 DRFILER1:/vol_vm_mobapps1d
Normal 16601 Volume Next Snapshot Possible 19 Jan 12:04 5053 DRFILER1:/vol_vm_mobapps1d
Normal 16600 Volume First Snapshot OK 19 Jan 12:04 5053 DRFILER1:/vol_vm_mobapps1d
Normal 16599 Inodes Utilization Normal 19 Jan 12:04 5053 DRFILER1:/vol_vm_mobapps1d
Normal 16598 Volume Space Normal 19 Jan 12:04 5053 DRFILER1:/vol_vm_mobapps1d
Normal 15067 Scheduled Snapshots Enabled 19 Jan 12:03 5053 DRFILER1:/vol_vm_mobapps1d
Normal 15066 Volume Online 19 Jan 12:03 5053 DRFILER1:/vol_vm_mobapps1d
You will notice that the only Snapmirror: Date OK event for this vol is from when it was initially discovered.
dfm host diag filer1 - This is the source filer
Network Connectivity
IP Address xxx.xxx.xxx.xxx
Network xxx.xxx.xxx.xxx/16 (last searched 24 Jan 10:51)
DNS Aliases FILER1.simsl.com
DNS Addresses xxx.xxx.xxx.xxx
SNMP Version in Use SNMPv1
SNMPv1 Passed (78 ms)
SNMP Community public
SNMP sysName FILER1.simsl.com
SNMP sysObjectID .1.3.6.1.4.1.789.2.3 (Clustered Filer)
SNMP productId 1573839544
SNMPv3 Failed: No SNMPv3 username specified.
SNMPv3 Auth Protocol
SNMPv3 Privacy Enabled No
SNMPv3 Username
ICMP Echo Passed (0 ms)
HTTP Passed (0 ms)
NDMP Ping Passed (port 10000, 0 ms)
NDMP Connect Passed (1437 ms)
NDMP MD5 Passwd Check Passed
RSH Skipped (rshBinary is empty in global option)
SSH Failed: Login not set for storage system FILER1.simsl.com (3673).
RLM Skipped (hostLogin and hostRLMAddress are empty)
XML Skipped (hostLogin is empty)
Host Details
According to: DataFabric Manager server Host
Host Name FILER1.simsl.com FILER1.simsl.com
System ID 1573839544 1573839544
Model FAS3240 FAS3240
Type Clustered Storage System Clustered Storage System
OS Version 8.0.2 7-Mode 8.0.2 7-Mode
Revisions 350,8.0.1,2.1.1 350,8.0.1,2.1.1
Monitoring Timestamps
Timestamp Name Status Interval Default Last Updated Status Error if older than ...
ccTimestamp Normal 4 hours 4 hours 24 Jan 06:52
cfTimestamp Normal 5 minutes 5 minutes 24 Jan 10:51 Normal 24 Jan 10:47
clusterTimestamp Normal 15 minutes 15 minutes 24 Jan 10:37
cpuTimestamp Normal 5 minutes 5 minutes 24 Jan 10:49 Normal 24 Jan 10:47
dfTimestamp Error 15 minutes 30 minutes 24 Jan 10:45 Normal 24 Jan 10:37
diskTimestamp Normal 4 hours 4 hours 24 Jan 10:50 Normal 24 Jan 06:52
envTimestamp Normal 5 minutes 5 minutes 24 Jan 10:51 Normal 24 Jan 10:47
fsTimestamp Normal 15 minutes 15 minutes 24 Jan 10:44 Normal 24 Jan 10:37
hostPingTimestamp Normal 1 minute 1 minute 24 Jan 10:51 Normal 24 Jan 10:51
ifTimestamp Normal 15 minutes 15 minutes 24 Jan 10:44 Normal 24 Jan 10:37
licenseTimestamp Normal 4 hours 4 hours 24 Jan 10:41 Normal 24 Jan 06:52
lunTimestamp Normal 30 minutes 30 minutes 24 Jan 10:39 Normal 24 Jan 10:22
opsTimestamp Normal 10 minutes 10 minutes 24 Jan 10:50 Normal 24 Jan 10:42
qtreeTimestamp Normal 8 hours 8 hours 24 Jan 10:39 Normal 24 Jan 02:52
rbacTimestamp Normal 1 day 1 day 24 Jan 10:33 Normal 23 Jan 10:52
userQuotaTimestamp Normal 1 day 1 day 23 Jan 10:52
sanhostTimestamp Normal 5 minutes 5 minutes 24 Jan 10:47
snapmirrorTimestamp Error 5 minutes 30 minutes 24 Jan 10:51 Normal 24 Jan 10:47
snapshotTimestamp Normal 30 minutes 30 minutes 24 Jan 10:43 Normal 24 Jan 10:22
statusTimestamp Normal 10 minutes 10 minutes 24 Jan 10:40 Error 24 Jan 10:42
sysInfoTimestamp Normal 1 hour 1 hour 24 Jan 10:06 Normal 24 Jan 09:52
svTimestamp Normal 30 minutes 30 minutes 24 Jan 10:41 Normal 24 Jan 10:22
svMonTimestamp Normal 8 hours 8 hours 24 Jan 02:52
xmlQtreeTimestamp Normal 8 hours 8 hours 24 Jan 02:52
vFilerTimestamp Normal 1 hour 1 hour 24 Jan 10:08 Normal 24 Jan 09:52
vserverTimestamp Normal 1 hour 1 hour 24 Jan 09:52
Performance Advisor Checklist
perfAdvisorEnabled Passed
hostType Passed
hostRevision Passed
hostLogin Failed (hostLogin is empty)
perfAdvisorTransport Passed
dfm host diag filer2 - this is the destination filer
Network Connectivity
IP Address xxx.xxx.xxx.xxx
Network xxx.xxx.xxx.xxx/16 (last searched 24 Jan 10:54)
DNS Aliases FILER2.simsl.com
DNS Addresses xxx.xxx.xxx.xxx
SNMP Version in Use SNMPv1
SNMPv1 Passed (78 ms)
SNMP Community public
SNMP sysName FILER2.simsl.com
SNMP sysObjectID .1.3.6.1.4.1.789.2.3 (Clustered Filer)
SNMP productId 1573766421
SNMPv3 Failed: No SNMPv3 username specified.
SNMPv3 Auth Protocol
SNMPv3 Privacy Enabled No
SNMPv3 Username
ICMP Echo Passed (0 ms)
HTTP Passed (0 ms)
NDMP Ping Passed (port 10000, 0 ms)
NDMP Connect Passed (1437 ms)
NDMP MD5 Passwd Check Passed
RSH Skipped (rshBinary is empty in global option)
SSH Failed: Login not set for storage system FILER2.simsl.com (3675).
RLM Skipped (hostLogin and hostRLMAddress are empty)
XML Skipped (hostLogin is empty)
Host Details
According to: DataFabric Manager server Host
Host Name FILER2.simsl.com FILER2.simsl.com
System ID 1573766421 1573766421
Model FAS3240 FAS3240
Type Clustered Storage System Clustered Storage System
OS Version 8.0.2 7-Mode 8.0.2 7-Mode
Revisions 350,8.0.1,2.1.1 350,8.0.1,2.1.1
Monitoring Timestamps
Timestamp Name Status Interval Default Last Updated Status Error if older than ...
ccTimestamp Normal 4 hours 4 hours 24 Jan 06:54
cfTimestamp Normal 5 minutes 5 minutes 24 Jan 10:53 Normal 24 Jan 10:49
clusterTimestamp Normal 15 minutes 15 minutes 24 Jan 10:39
cpuTimestamp Normal 5 minutes 5 minutes 24 Jan 10:50 Normal 24 Jan 10:49
dfTimestamp Error 15 minutes 30 minutes 24 Jan 10:44 Normal 24 Jan 10:39
diskTimestamp Normal 4 hours 4 hours 24 Jan 09:02 Normal 24 Jan 06:54
envTimestamp Normal 5 minutes 5 minutes 24 Jan 10:50 Normal 24 Jan 10:49
fsTimestamp Normal 15 minutes 15 minutes 24 Jan 10:38 Warning 24 Jan 10:39
hostPingTimestamp Normal 1 minute 1 minute 24 Jan 10:54 Normal 24 Jan 10:53
ifTimestamp Normal 15 minutes 15 minutes 24 Jan 10:44 Normal 24 Jan 10:39
licenseTimestamp Normal 4 hours 4 hours 24 Jan 09:02 Normal 24 Jan 06:54
lunTimestamp Normal 30 minutes 30 minutes 24 Jan 10:47 Normal 24 Jan 10:24
opsTimestamp Normal 10 minutes 10 minutes 24 Jan 10:49 Normal 24 Jan 10:44
qtreeTimestamp Normal 8 hours 8 hours 24 Jan 04:26 Normal 24 Jan 02:54
rbacTimestamp Normal 1 day 1 day 23 Jan 12:08 Normal 23 Jan 10:54
userQuotaTimestamp Normal 1 day 1 day 23 Jan 10:54
sanhostTimestamp Normal 5 minutes 5 minutes 24 Jan 10:49
snapmirrorTimestamp Error 5 minutes 30 minutes 24 Jan 10:50 Normal 24 Jan 10:49
snapshotTimestamp Normal 30 minutes 30 minutes 24 Jan 10:47 Normal 24 Jan 10:24
statusTimestamp Normal 10 minutes 10 minutes 24 Jan 10:47 Normal 24 Jan 10:44
sysInfoTimestamp Normal 1 hour 1 hour 24 Jan 10:07 Normal 24 Jan 09:54
svTimestamp Normal 30 minutes 30 minutes 24 Jan 10:37 Normal 24 Jan 10:24
svMonTimestamp Normal 8 hours 8 hours 24 Jan 02:54
xmlQtreeTimestamp Normal 8 hours 8 hours 24 Jan 02:54
vFilerTimestamp Normal 1 hour 1 hour 24 Jan 10:07 Normal 24 Jan 09:54
vserverTimestamp Normal 1 hour 1 hour 24 Jan 09:54
Performance Advisor Checklist
perfAdvisorEnabled Passed
hostType Passed
hostRevision Passed
hostLogin Failed (hostLogin is empty)
perfAdvisorTransport Passed
dfm version
dfbm.exe 5.0.0.7636 (5.0)
dfdrm.exe 5.0.0.7636 (5.0)
dfpm.exe 5.0.0.7636 (5.0)
dfm.exe 5.0.0.7636 (5.0)
dfmcheck.exe 5.0.0.7636 (5.0)
dfmconfig.exe 5.0.0.7636 (5.0)
dfmconsole.exe 5.0.0.7636 (5.0)
dfmmonitor.exe 5.0.0.7636 (5.0)
dfmperf.exe 5.0.0.7636 (5.0)
dfmscheduler.exe 5.0.0.7636 (5.0)
dfmserver.exe 5.0.0.7636 (5.0)
dfmwatchdog.exe 5.0.0.7636 (5.0)
eventd.exe 5.0.0.7636 (5.0)
grapher.exe 5.0.0.7636 (5.0)
Hope this helps!
Regards,
Simon.
Hi Simon,
There is an out of date event on 24th Jan.
Error | 18042 | SnapMirror: Out of Date | 24 Jan 01:56 | 5053 | DRFILER1:/vol_vm_mobapps1d |
What I also see is that you have changed the default monitoring interval from 30 minutes to 5 minutes for snapmirror monitoring. Pls reset them back to default. snapmirror is a heavy monitor and it does a lot of work.
As I can see in the dfm host diag error status for snapmirror monitoring.
snapmirrorTimestamp Error 5 minutes 30 minutes 24 Jan 10:50 Normal 24 Jan 10:49
Regards
adai
adai,
As you will see from the event history, this is the first occurance of this event. There is no subsequent date OK event.
I've set the snapmirror monitor back to 30 minutes as suggested and dfm diag is now showing as normal. I changed one of the hourly policies down to trigger an event if the lag goes over 30 mins, ran a manual snapmirror update and then let the lag go over 30 mins. No events, either Snapmirror:out of date or date ok where written to the log.
Going back to the original volume, you can see from the below that the volume is showing as out of date in the first screenshot of the Volume Details whereas the second screenshot of the retention policy shows a lag of 13.99 hours which is nowhere near the 1 day threshold.
Thanks,
Simon