Active IQ Unified Manager Discussions
Active IQ Unified Manager Discussions
Hi All,
I am currently setting up Ops Mgr to just monitor our filers (no protection set up) and have successfully got it alerting us for volume full events. I've set up an alert using snapmirror:out of date and have configured the discovered retention policies on our nightly mirrors to 24 hours.
All worked well on the first overnight run, in that it reported a lot of the mirrors as out of date as it was in the process of updating them (This was only to prove the alert. Once all is working OK, I'll exclude the monitoring between midnight and 06:00). I acknowledged the alerts but on the next run I didn't receive any alerts at all concerning the snapmirrors and there are no entries in the event log.
I also tested this out on a couple of hourly mirrors by changing the lag in the retention policy down to, say, 20 mins and it triggered an alert once only.
Any suggestions? Is there anything that I am missing?
As stated before, we are only using Ops Mgr for monitoring and are still relying on snapmirror.conf for scheduling. Obviously, alerts that only trigger once are not much good...
Thanks,
Simon.
Hi Simon,
Once you acknowledge the events they are no more violations(as you acknowledged) and they will not be alerted again until there is a state change.
Also in Operations Manager Alarms can be configured as a Repeated one by specifying "repeat-notify" option.
-KJag
Thanks kjag, thats how we have our volume full alerts set up; to repeat every 15 mins until acknowledged.
The issue here is that we can acknowledge them and then they will trigger again the next time that the volume reaches the threshold but the same process doesn't seem to work for the snapmirror:out of date alert. The only difference between is that we are using vol almost full, which has a severity of warning, as opposed to snapmirror:out of date, which has a severity of error, but this shouldn't matter IMO.
Hi Simon,
As kjag said, did your state change from snapmirror out of date to nearly out of date or date ok ? Did the state of your snapmirror every change ? after you ack ? to any of the below
other than out of date ?
[root@ ~]# dfm eventtype list | grep -i sm.lag
snapmirror:date-ok Normal sm.lag
snapmirror:deleted Information sm.lag
snapmirror:nearly-out-of-date Warning sm.lag
snapmirror:out-of-date Error sm.lag
[root@ ~]#
if not, then until there is a state change a new event and its alert will not be triggered.
Regards
adai
Thanks, thats helped makes things clearer.
I'm not getting any snapmirror:date-ok event appear in the logs once the snapmirror is updated and the lag goes back to normal hance why the alarm is not triggered when/if the snapmirror lags again.
Any ideas?
Simon.
Hi Simon,
can you get the output of following cli for the snapmirror relationship which is lagging ?
dfm report view events
dfm report view events-history
dfm host diag <filer id/ip> for source and destination filer of the snapmirror relationship ?
dfm version to know what version of dfm is running.
Regards
adai
Adai,
Below is the output as requested. I've used just one vol as an example but its the same result on all our Snapmirror relationships:-
dfm report view events 5053
Severity Event ID Event Triggered Ack'ed By Ack'ed Source ID Source
----------- -------- ----------------------- ------------ --------- ------------ --------- --------------------------
Error 18042 SnapMirror: Out of Date 24 Jan 01:56 5053 DRFILER1:/vol_vm_mobapps1d
Information 17493 SnapMirror: Discovered 19 Jan 12:09 5053 DRFILER1:/vol_vm_mobapps1d
dfm report view events-history 5053
Severity Event ID Event Triggered Ack'ed By Ack'ed Deleted By Deleted Source ID Source
----------- -------- ----------------------------- ------------ --------- ------------ ---------- ------------ --------- --------------------------
Error 18042 SnapMirror: Out of Date 24 Jan 01:56 5053 DRFILER1:/vol_vm_mobapps1d
Normal 17494 SnapMirror: Date Ok 19 Jan 12:09 5053 DRFILER1:/vol_vm_mobapps1d
Information 17493 SnapMirror: Discovered 19 Jan 12:09 5053 DRFILER1:/vol_vm_mobapps1d
Normal 16602 Volume Space Reserve OK 19 Jan 12:04 5053 DRFILER1:/vol_vm_mobapps1d
Normal 16601 Volume Next Snapshot Possible 19 Jan 12:04 5053 DRFILER1:/vol_vm_mobapps1d
Normal 16600 Volume First Snapshot OK 19 Jan 12:04 5053 DRFILER1:/vol_vm_mobapps1d
Normal 16599 Inodes Utilization Normal 19 Jan 12:04 5053 DRFILER1:/vol_vm_mobapps1d
Normal 16598 Volume Space Normal 19 Jan 12:04 5053 DRFILER1:/vol_vm_mobapps1d
Normal 15067 Scheduled Snapshots Enabled 19 Jan 12:03 5053 DRFILER1:/vol_vm_mobapps1d
Normal 15066 Volume Online 19 Jan 12:03 5053 DRFILER1:/vol_vm_mobapps1d
You will notice that the only Snapmirror: Date OK event for this vol is from when it was initially discovered.
dfm host diag filer1 - This is the source filer
Network Connectivity
IP Address             xxx.xxx.xxx.xxx
Network                xxx.xxx.xxx.xxx/16 (last searched 24 Jan 10:51)
DNS Aliases            FILER1.simsl.com
DNS Addresses          xxx.xxx.xxx.xxx
SNMP Version in Use    SNMPv1
SNMPv1                 Passed (78 ms)
SNMP Community         public
SNMP sysName           FILER1.simsl.com
SNMP sysObjectID       .1.3.6.1.4.1.789.2.3 (Clustered Filer)
SNMP productId         1573839544
SNMPv3                 Failed: No SNMPv3 username specified.
SNMPv3 Auth Protocol
SNMPv3 Privacy Enabled No
SNMPv3 Username
ICMP Echo              Passed (0 ms)
HTTP                   Passed (0 ms)
NDMP Ping              Passed (port 10000, 0 ms)
NDMP Connect           Passed (1437 ms)
NDMP MD5 Passwd Check  Passed
RSH                    Skipped (rshBinary is empty in global option)
SSH                    Failed: Login not set for storage system FILER1.simsl.com (3673).
RLM                    Skipped (hostLogin and hostRLMAddress are empty)
XML                    Skipped (hostLogin is empty)
Host Details
According to:   DataFabric Manager server       Host
Host Name       FILER1.simsl.com               FILER1.simsl.com
System ID       1573839544                     1573839544
Model           FAS3240                        FAS3240
Type            Clustered Storage System       Clustered Storage System
OS Version      8.0.2 7-Mode                   8.0.2 7-Mode
Revisions       350,8.0.1,2.1.1                350,8.0.1,2.1.1
Monitoring Timestamps
Timestamp Name       Status   Interval     Default      Last Updated     Status   Error if older than ...
ccTimestamp          Normal   4 hours      4 hours                                24 Jan 06:52
cfTimestamp          Normal   5 minutes    5 minutes    24 Jan 10:51     Normal   24 Jan 10:47
clusterTimestamp     Normal   15 minutes   15 minutes                             24 Jan 10:37
cpuTimestamp         Normal   5 minutes    5 minutes    24 Jan 10:49     Normal   24 Jan 10:47
dfTimestamp          Error    15 minutes   30 minutes   24 Jan 10:45     Normal   24 Jan 10:37
diskTimestamp        Normal   4 hours      4 hours      24 Jan 10:50     Normal   24 Jan 06:52
envTimestamp         Normal   5 minutes    5 minutes    24 Jan 10:51     Normal   24 Jan 10:47
fsTimestamp          Normal   15 minutes   15 minutes   24 Jan 10:44     Normal   24 Jan 10:37
hostPingTimestamp    Normal   1 minute     1 minute     24 Jan 10:51     Normal   24 Jan 10:51
ifTimestamp          Normal   15 minutes   15 minutes   24 Jan 10:44     Normal   24 Jan 10:37
licenseTimestamp     Normal   4 hours      4 hours      24 Jan 10:41     Normal   24 Jan 06:52
lunTimestamp         Normal   30 minutes   30 minutes   24 Jan 10:39     Normal   24 Jan 10:22
opsTimestamp         Normal   10 minutes   10 minutes   24 Jan 10:50     Normal   24 Jan 10:42
qtreeTimestamp       Normal   8 hours      8 hours      24 Jan 10:39     Normal   24 Jan 02:52
rbacTimestamp        Normal   1 day        1 day        24 Jan 10:33     Normal   23 Jan 10:52
userQuotaTimestamp   Normal   1 day        1 day                                  23 Jan 10:52
sanhostTimestamp     Normal   5 minutes    5 minutes                              24 Jan 10:47
snapmirrorTimestamp  Error    5 minutes    30 minutes   24 Jan 10:51     Normal   24 Jan 10:47
snapshotTimestamp    Normal   30 minutes   30 minutes   24 Jan 10:43     Normal   24 Jan 10:22
statusTimestamp      Normal   10 minutes   10 minutes   24 Jan 10:40     Error    24 Jan 10:42
sysInfoTimestamp     Normal   1 hour       1 hour       24 Jan 10:06     Normal   24 Jan 09:52
svTimestamp          Normal   30 minutes   30 minutes   24 Jan 10:41     Normal   24 Jan 10:22
svMonTimestamp       Normal   8 hours      8 hours                                24 Jan 02:52
xmlQtreeTimestamp    Normal   8 hours      8 hours                                24 Jan 02:52
vFilerTimestamp      Normal   1 hour       1 hour       24 Jan 10:08     Normal   24 Jan 09:52
vserverTimestamp     Normal   1 hour       1 hour                                 24 Jan 09:52
Performance Advisor Checklist
perfAdvisorEnabled     Passed
hostType               Passed
hostRevision           Passed
hostLogin              Failed (hostLogin is empty)
perfAdvisorTransport   Passed
dfm host diag filer2 - this is the destination filer
Network Connectivity
IP Address             xxx.xxx.xxx.xxx
Network                xxx.xxx.xxx.xxx/16 (last searched 24 Jan 10:54)
DNS Aliases            FILER2.simsl.com
DNS Addresses          xxx.xxx.xxx.xxx
SNMP Version in Use    SNMPv1
SNMPv1                 Passed (78 ms)
SNMP Community         public
SNMP sysName           FILER2.simsl.com
SNMP sysObjectID       .1.3.6.1.4.1.789.2.3 (Clustered Filer)
SNMP productId         1573766421
SNMPv3                 Failed: No SNMPv3 username specified.
SNMPv3 Auth Protocol
SNMPv3 Privacy Enabled No
SNMPv3 Username
ICMP Echo              Passed (0 ms)
HTTP                   Passed (0 ms)
NDMP Ping              Passed (port 10000, 0 ms)
NDMP Connect           Passed (1437 ms)
NDMP MD5 Passwd Check  Passed
RSH                    Skipped (rshBinary is empty in global option)
SSH                    Failed: Login not set for storage system FILER2.simsl.com (3675).
RLM                    Skipped (hostLogin and hostRLMAddress are empty)
XML                    Skipped (hostLogin is empty)
Host Details
According to:   DataFabric Manager server       Host
Host Name       FILER2.simsl.com               FILER2.simsl.com
System ID       1573766421                     1573766421
Model           FAS3240                        FAS3240
Type            Clustered Storage System       Clustered Storage System
OS Version      8.0.2 7-Mode                   8.0.2 7-Mode
Revisions       350,8.0.1,2.1.1                350,8.0.1,2.1.1
Monitoring Timestamps
Timestamp Name       Status   Interval     Default      Last Updated     Status   Error if older than ...
ccTimestamp          Normal   4 hours      4 hours                                24 Jan 06:54
cfTimestamp          Normal   5 minutes    5 minutes    24 Jan 10:53     Normal   24 Jan 10:49
clusterTimestamp     Normal   15 minutes   15 minutes                             24 Jan 10:39
cpuTimestamp         Normal   5 minutes    5 minutes    24 Jan 10:50     Normal   24 Jan 10:49
dfTimestamp          Error    15 minutes   30 minutes   24 Jan 10:44     Normal   24 Jan 10:39
diskTimestamp        Normal   4 hours      4 hours      24 Jan 09:02     Normal   24 Jan 06:54
envTimestamp         Normal   5 minutes    5 minutes    24 Jan 10:50     Normal   24 Jan 10:49
fsTimestamp          Normal   15 minutes   15 minutes   24 Jan 10:38     Warning  24 Jan 10:39
hostPingTimestamp    Normal   1 minute     1 minute     24 Jan 10:54     Normal   24 Jan 10:53
ifTimestamp          Normal   15 minutes   15 minutes   24 Jan 10:44     Normal   24 Jan 10:39
licenseTimestamp     Normal   4 hours      4 hours      24 Jan 09:02     Normal   24 Jan 06:54
lunTimestamp         Normal   30 minutes   30 minutes   24 Jan 10:47     Normal   24 Jan 10:24
opsTimestamp         Normal   10 minutes   10 minutes   24 Jan 10:49     Normal   24 Jan 10:44
qtreeTimestamp       Normal   8 hours      8 hours      24 Jan 04:26     Normal   24 Jan 02:54
rbacTimestamp        Normal   1 day        1 day        23 Jan 12:08     Normal   23 Jan 10:54
userQuotaTimestamp   Normal   1 day        1 day                                  23 Jan 10:54
sanhostTimestamp     Normal   5 minutes    5 minutes                              24 Jan 10:49
snapmirrorTimestamp  Error    5 minutes    30 minutes   24 Jan 10:50     Normal   24 Jan 10:49
snapshotTimestamp    Normal   30 minutes   30 minutes   24 Jan 10:47     Normal   24 Jan 10:24
statusTimestamp      Normal   10 minutes   10 minutes   24 Jan 10:47     Normal   24 Jan 10:44
sysInfoTimestamp     Normal   1 hour       1 hour       24 Jan 10:07     Normal   24 Jan 09:54
svTimestamp          Normal   30 minutes   30 minutes   24 Jan 10:37     Normal   24 Jan 10:24
svMonTimestamp       Normal   8 hours      8 hours                                24 Jan 02:54
xmlQtreeTimestamp    Normal   8 hours      8 hours                                24 Jan 02:54
vFilerTimestamp      Normal   1 hour       1 hour       24 Jan 10:07     Normal   24 Jan 09:54
vserverTimestamp     Normal   1 hour       1 hour                                 24 Jan 09:54
Performance Advisor Checklist
perfAdvisorEnabled     Passed
hostType               Passed
hostRevision           Passed
hostLogin              Failed (hostLogin is empty)
perfAdvisorTransport   Passed
dfm version
dfbm.exe 5.0.0.7636 (5.0)
dfdrm.exe 5.0.0.7636 (5.0)
dfpm.exe 5.0.0.7636 (5.0)
dfm.exe 5.0.0.7636 (5.0)
dfmcheck.exe 5.0.0.7636 (5.0)
dfmconfig.exe 5.0.0.7636 (5.0)
dfmconsole.exe 5.0.0.7636 (5.0)
dfmmonitor.exe 5.0.0.7636 (5.0)
dfmperf.exe 5.0.0.7636 (5.0)
dfmscheduler.exe 5.0.0.7636 (5.0)
dfmserver.exe 5.0.0.7636 (5.0)
dfmwatchdog.exe 5.0.0.7636 (5.0)
eventd.exe 5.0.0.7636 (5.0)
grapher.exe 5.0.0.7636 (5.0)
Hope this helps!
Regards,
Simon.
Hi Simon,
There is an out of date event on 24th Jan.
| Error | 18042 | SnapMirror: Out of Date | 24 Jan 01:56 | 5053 | DRFILER1:/vol_vm_mobapps1d | 
What I also see is that you have changed the default monitoring interval from 30 minutes to 5 minutes for snapmirror monitoring. Pls reset them back to default. snapmirror is a heavy monitor and it does a lot of work.
As I can see in the dfm host diag error status for snapmirror monitoring.
snapmirrorTimestamp Error 5 minutes 30 minutes 24 Jan 10:50 Normal 24 Jan 10:49
Regards
adai
adai,
As you will see from the event history, this is the first occurance of this event. There is no subsequent date OK event.
I've set the snapmirror monitor back to 30 minutes as suggested and dfm diag is now showing as normal. I changed one of the hourly policies down to trigger an event if the lag goes over 30 mins, ran a manual snapmirror update and then let the lag go over 30 mins. No events, either Snapmirror:out of date or date ok where written to the log.
Going back to the original volume, you can see from the below that the volume is showing as out of date in the first screenshot of the Volume Details whereas the second screenshot of the retention policy shows a lag of 13.99 hours which is nowhere near the 1 day threshold.
 
Thanks,
Simon