Active IQ Unified Manager Discussions

DFM / Filer volume name mismatch

glen_eustace
8,569 Views

In recovering from an issue, we ended up renaming a volume on the filer from Log00 to Log0, DFM hasn't detected this change after 2 weeks and still thinks that there is a volume Log00.  How can I force DFM to update the volume name ?  In the past when we have renamed volumes DFM picks them up in about 15 minutes. The wrongly named volume has broken the snapmirrors on this Dataset.

1 ACCEPTED SOLUTION

adaikkap
8,569 Views

Hi Geln,

           This could be a bug even. Renaming from log00 to log0 is not being picked up in DFM. What version of ontap are these systems running ? Can you make the snmp version 3 as preferred version for communication between DFM and filer. Lets see if this solves.

dfm host set  <filerip/id> prefsnmpVersion=3

This is supported only for DOT version 7.3 or later and the password must be atleast 8 characters long.

Also glen, to your question deleting the volume on dfm does not delete it on the filer, unless you do it from the NMC dataset->Provisioning Page.

Regards

adai

View solution in original post

10 REPLIES 10

agireesh
8,513 Views

Please check the "Last Updated" timestamps for "fs" monitor in output of  "dfm host diag <host-name-or-ip-address>" command.

if "Last Updated" field is not updated then you can can run the monitor using "dfm host discover <host-name-or-ip-address>" command to discover the renamed volume

Also, you can  check the "dfmmonitor.log" file to debug the issue.

arunchak
8,513 Views

path for dfmmonitor.log is /opt/NTAPdfm/log/dfmmonitor.log in linux and the equivalent path in windows. Tail the log when you run the discovery to see whether fsmon is updating properly.

-Arun

glen_eustace
8,513 Views

I am seeing huge numbers of snmp timeouts in the log, yet according to the host diag output the monitored state of each filer is good.  We have 14 controllers and only 4 of them are reporting the errors. I can find no configuration differences between those with timeouts and those that are ok.

Dec 15 08:10:29 [DFMMonitor:DEBUG]: [3436:0x184c]: snmpwalk 130.123.96.32 (On table having the attribute dfFileSys ) : timed out after 25 seconds (read 0 rows).

Dec 15 08:10:54 [DFMMonitor:DEBUG]: [3436:0x184c]: snmpwalk 130.123.96.32 (On table having the attribute volName ) : timed out after 25 seconds (read 0 rows).

Dec 15 08:10:58 [DFMMonitor:DEBUG]: [3436:0x2464]: snmpwalk 130.123.96.39 (On table having the attribute dfFileSys ) : timed out after 25 seconds (read 0 rows).

Dec 15 08:11:09 [DFMMonitor:DEBUG]: [3436:0x2e14]: snmpwalk 130.123.96.32 (On table having the attribute aggrName ) : timed out after 25 seconds (read 0 rows).

Dec 15 08:11:19 [DFMMonitor:DEBUG]: [3436:0x184c]: snmpwalk 130.123.96.32 (On table having the attribute volName ) : timed out after 25 seconds (read 0 rows).

Dec 15 08:11:23 [DFMMonitor:DEBUG]: [3436:0x2464]: snmpwalk 130.123.96.39 (On table having the attribute dfFileSys ) : timed out after 25 seconds (read 0 rows).

Dec 15 08:11:34 [DFMMonitor:DEBUG]: [3436:0x2e14]: snmpwalk 130.123.96.32 (On table having the attribute aggrName ) : timed out after 25 seconds (read 0 rows).

Dec 15 08:11:44 [DFMMonitor:DEBUG]: [3436:0x184c]: snmpwalk 130.123.96.32 (On table having the attribute volName ) : timed out after 25 seconds (read 0 rows).

Dec 15 08:11:48 [DFMMonitor:DEBUG]: [3436:0x2464]: snmpwalk 130.123.96.39 (On table having the attribute dfFileSys ) : timed out after 25 seconds (read 0 rows).

Dec 15 08:11:59 [DFMMonitor:DEBUG]: [3436:0x2e14]: snmpwalk 130.123.96.32 (On table having the attribute aggrName ) : timed out after 25 seconds (read 0 rows).

Dec 15 08:12:13 [DFMMonitor:DEBUG]: [3436:0x2464]: snmpwalk 130.123.96.39 (On table having the attribute volName ) : timed out after 25 seconds (read 0 rows).

Dec 15 08:12:38 [DFMMonitor:DEBUG]: [3436:0x2464]: snmpwalk 130.123.96.39 (On table having the attribute volName ) : timed out after 25 seconds (read 0 rows).

Dec 15 08:13:03 [DFMMonitor:DEBUG]: [3436:0x2464]: snmpwalk 130.123.96.39 (On table having the attribute volName ) : timed out after 25 seconds (read 0 rows).

Dec 15 08:14:10 [DFMMonitor:DEBUG]: [3436:0x1c24]: snmpwalk 130.123.96.39 (dfFileSys): timed out after 25 seconds (read 0 entries).

columbus_admin
8,513 Views

Glen,

     If you cannot get it to update, and believe it is DFM, you can do what I have to do with Protection Manager volumes that get moved.  DFM does not use the FSID in combination with the filer sys_id or serial number to create unique records.  So if I migrate a volume from one aggr to another, Protection Manager goes crazy and I end up with volume_1 volume_2, etc.

In these cases I get the DFM object id by running:

  • dfm volume list filer_name:/volume_name.  ID is the first column.
  • From there I shutdown everything but the database service
  • Run dfm volume delete -f ID_number_from_list_output
  • Restart the other DFM services and do a rescan, and all the volumes sort themselves out.

     As you case is different from mine, you may have to search on both volume names...but hopefully this will help

- Scott

glen_eustace
8,513 Views

I have been considering dfm volume delete but wasn't sure whether DFM would attempt to delete the volume on the filer as well.  That would be catastrophic if it did and it succeeded !! This is a volume with the logs for 120+ production SQL databases !!

columbus_admin
8,513 Views

I can understand that!  It only removes the volume from DFM, no connection back to the filer.  I have to do it every time I migrate from one aggr to another now.

- Scott

glen_eustace
8,513 Views

I am still nervous about the delete.  I'll wait and see if the others have any comment about the snmp errors first. In your case the volume has actually moved so an attempt to delete would fail.  In my case, the volume is still in place (with the same ID,I believe) so even though the name is different it is probably the same object and an attempt to delete may actually succeed.

adaikkap
8,570 Views

Hi Geln,

           This could be a bug even. Renaming from log00 to log0 is not being picked up in DFM. What version of ontap are these systems running ? Can you make the snmp version 3 as preferred version for communication between DFM and filer. Lets see if this solves.

dfm host set  <filerip/id> prefsnmpVersion=3

This is supported only for DOT version 7.3 or later and the password must be atleast 8 characters long.

Also glen, to your question deleting the volume on dfm does not delete it on the filer, unless you do it from the NMC dataset->Provisioning Page.

Regards

adai

agireesh
8,513 Views

Hi Geln,

Do you have sufficient space on your server where dfm server is installed. Some time due to lack of space monitor stop working.

Can you check  below field value in output of "dfm version" command. If free space is less than 10% then monitor will not able to collect the data from storage system

Installation Directory       /opt/NTAPdfm
                             26.7 GB free (70.5%)

glen_eustace
6,581 Views

We are running DFM 4.0.1 and the OnTAP version is 8.0.2 (7-Mode)

Enabling SNMPv3 resulted is a failure.  After a little investigation, I discoverd that someone had turned the Windows Firewall back on 😞

Turning it off again gave me success on host diag and the errant volume has now been sync'ed and has the correct name.  I am going to leave snmpv3 on.

Checking the logs reveals things are much happier 🙂 Several other volumes that had been manipulated on the filers have also been 'repaired'.

Thanks heaps for you assistance (as always)

Public