Thank you for the workaround. I'll await the customer's feedback. Can we get public reports for BURTS 732015 and 770193 please? Customer has no visibility into these right now. Thanks, Chris Pozezanac Technical Account Manager - Credit Suisse NetApp
... View more
Hi, Customer is looking to pull information that may not be available in the OCUM database into a 3rd party tool for reporting. Does the OCUM API have the ability to query API's of the controllers it monitors? For example, if I want to pull snapmirror schedules for 50 systems, I don't want my 3rd party tool to have to login to every controller. I'd rather query the OCUM database and have him go get the information for me. Can this be done for OCUM 5.2 and 6.x? Thanks.
... View more
Thanks Shailaja. We would be looking at both 7M and cDOT. Where can i find instructions on how to setup the dictionary entries? What would be the best document to start with?
... View more
Customer is looking to enhance their existing workflows by incorporating performance metrics to help them decide where to place a new workload in their environment. Ideally we would be able to leverage IOPS and latency at the controller level. Can someone point me to any documentation or guides we have that could help the customer determine what counters are available, and whether those can be implemented as trending over a specified period of time?
... View more
Hi Adai, One more question. The customer wants to report on spare disks and "unowned"disks. What field or fields would help us identify spares and unowned disks? Thanks, Chris
... View more
Working with a customer to build some SQL queries for reporting against a DFM 4.0.2 instance. We are doing query joins for disk, plex, raid group, and aggregate to tie disks back to their aggregates, however about 80 of 300 aggregates not reporting anything due to a break in the chain showing NULL values for plex, RG, or aggr. Here's the query: select distinct a.aggrid ,o.objName ,a.aggrstoragesystemid ,d.disktype ,p.plexaggrid ,r.rgplexid ,d.diskrgid from aggregateview a left join objectview o on a.aggrid = o.objid left join plexview p on a.aggrid = p.plexaggrid left join raidgroupview r on p.plexid = r.rgplexid left join diskview d on r.rgid = d.diskrgid Could this be a permissions issue between the DFM host and the host login credentials for each storage system? Any guidance would be welcome. Thanks, Chris
... View more
Hi, Is there a mechanism to report on when the last completed raid scrub was completed on an aggregate? Within ONTAP 7-mode I can run the following: rlawson-vsim2> aggr scrub status -v aggr scrub: status of /aggr0/plex0/rg0 : Scrub is not active. Last full scrub completed: Tue Feb 22 08:43:41 PST 2011 Is there a way to report on this through Ops Mgr 4.x or UM 5.x? Are there any alerting mechanisms that could tell me when a scrub has completed?
... View more
Customer would like to restrict access to users who are not logged into the system. They are concerned about alarm emails sent to large distribution lists and they want to prevent users from being able to make changes on the system when they are not logged in. I would think they could do this by removing privileges to the "Everyone" account, but what would our recomemndation be for admins who want to lock down the system to users who are not logged in? What roles/capabilities would be required to present a screen with no visibility into the Operations Manager GUI for users who are not logged in? They would like to restrict visibility to system names, reports, events, etc...
... View more
This is logged under BURT # 385906. A workaround can be found here: http://support.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=385906 The issue is fixed in the following releases: 4.0.2D6, 4.0.2D9, 5.0D2 Note: 5.0 does not include the fix, so if you are upgrading from 4.0.x, be sure to upgrade to 5.0D2.
... View more
I want to confirm I have the expected behavior of LUN usage from a host perspective in a thin provisioned environment. Assumptions: - LUN is thin provisioned with no fractional reserve, no volume guarantees - No snapshots present My lun used on the storage system will continue to grow until it reaches 100%. The host side will see usage based on filesystem usage (i.e. when a file is deleted space will become available). The storage system lun usage will go down if a space reclamation tool - such as SnapDrive - is enabled, otherwise the lun usage will stay at 100%. Is this correct?
... View more
Hi - Does System Manager 2.0R1 support viewing LUNS in vFiler-owned volumes? If not, are there plans to add this functionality? My customer wants to move to System Manager and have their multistore management in a single interface. This is a big selling point for them because they can already manage their CIFS and NFS objects, but they have to turn to the CLI to manage vFiler LUNs with the existing Filerview interface. Thanks.
... View more
Customer is asking me if we can monitor for free space in a volume via SNMP? They are using a tool called PRTG to monitor their environment.
Do anyone know if ONTAP’s SNMP capability allow them to view free space in a volume?
... View more
How are inode full and almost full thresholds determined in Operations Manager 4.0 events? Is it possible to modify a volume's inode full or almost full threshold? Is there a global option available? I could not find either in my research. Thanks, Chris
... View more
Adai: There are no ndmp sessions running on either the host or the destination system. Pete: I was able to find the process running for job #103885 and kill it. I was able to successfully run on on-demand job to give us a backup. I will submit a BURT and provide you with the database and BURT # offline. Thanks for your help! Chris
... View more
We have a protection manager dataset that has failed for 7 days. Each day the job says it is waiting on a previous job #103885. The protection manager GUI indicates job #103885 completed successfully. I had the option to cancel job #103885. After issuing the cancellation through the GUI, after 2 hours the job still indicates it is processing the abort. Is there a backdoor method to kill the job? I need help understanding what caused this job to hang and how to kill it. I do not want to restart the dfm services. Here is a snapvault status from one of the qtrees in the dataset. The filer sees the relationship as a normal "snapvaulted" state: [root@trulsut0001 ~]# rsh truanap0004 snapvault status -l /vol/dfpm_tr02_vol19_SV_1236712977_2946468784/gpho_1107 Snapvault secondary is ON. Source: 172.21.124.11:/vol/tr02_vol19/gpho.1107 Destination: truanap0004:/vol/dfpm_tr02_vol19_SV_1236712977_2946468784/gpho_1107 Status: Idle Progress: - State: Snapvaulted Lag: 176:10:39 Mirror Timestamp: Wed Jul 14 05:02:33 EDT 2010 Base Snapshot: truanap0004(0118064306)_dfpm_tr02_vol19_SV_1236712977_2946468784-base.0 Current Transfer Type: - Current Transfer Error: - Contents: Replica Last Transfer Type: Update Last Transfer Size: 12 KB Last Transfer Duration: 00:26:45 Last Transfer From: 172.21.124.11:/vol/tr02_vol19/gpho.1107 [pozezac@trulvop0001 ~]$ dfpm job list 103885 Job Id Job State Job Description ------ ------------- ------------------------------------------------------------ 103885 aborting Back up data from node Primary data to node Backup of dataset tr02_vol19 (26919) with daily retention [pozezac@trulvop0001 ~]$ Here are the first and last events from job 103885 that PM claims is still running: Job Id: 103885 Job State: aborting Job Description: Back up data from node Primary data to node Backup of dataset tr02_vol19 (26919) with daily retention Job Type: remote_backup Job Status: success Bytes Transferred: 79045287936 Dataset Name: tr02_vol19 Dataset Id: 26919 Object Name: tr02_vol19 Object Id: 26919 Policy Name: Back up_05:00 Policy Id: 33831 Started Timestamp: 14 Jul 2010 05:00:06 Abort Requested Timestamp: 21 Jul 2010 09:49:36 Completed Timestamp: Submitted By: dfmscheduler Destination Node Id: 2 Destination Node Name: Backup Source Node Id: 1 Source Node Name: Primary data Job progress messages: Event Id: 16482760 Event Status: normal Event Type: job-start Job Id: 103885 Timestamp: 14 Jul 2010 05:00:06 Message: Error Message: Event Id: 16574211 Event Status: warning Event Type: job-progress Job Id: 103885 Timestamp: 21 Jul 2010 09:49:37 Message: Error Message: Received request to abort job. Here are the first and last messages from the next job that was the first failure for this dataset: [pozezac@trulvop0001 ~]$ dfpm job details 103969 Job Id: 103969 Job State: completed Job Description: Create local backup on node 'Primary data' of dataset 'tr02_vol19' (26919) with daily retention Job Type: local_backup Job Status: failure Bytes Transferred: 0 Dataset Name: tr02_vol19 Dataset Id: 26919 Object Name: tr02_vol19 Object Id: 26919 Policy Name: Back up_05:00 Policy Id: 33831 Started Timestamp: 15 Jul 2010 00:01:08 Abort Requested Timestamp: Completed Timestamp: 15 Jul 2010 01:01:09 Submitted By: dfmscheduler Source Node Id: 1 Source Node Name: Primary data Job progress messages: Event Id: 16489784 Event Status: normal Event Type: job-start Job Id: 103969 Timestamp: 15 Jul 2010 00:01:08 Message: Error Message: Event Id: 16489785 Event Status: normal Event Type: job-progress Job Id: 103969 Timestamp: 15 Jul 2010 00:01:08 Message: Waiting for job 103885 to finish Error Message: Event Id: 16490974 Event Status: error Event Type: job-progress Job Id: 103969 Timestamp: 15 Jul 2010 01:01:09 Message: Error Message: tr02_vol19: Timed out while waiting for protection job "Back up data from node Primary data to node Backup of dataset tr02_vol19 (26919) with daily retention" (103885) to finish. Event Id: 16490975 Event Status: error Event Type: job-end Job Id: 103969 Timestamp: 15 Jul 2010 01:01:09 Message: Error Message:
... View more
Pete: In our case, we have existing relaitonships and don't want to go through the hassle of changing the snapmirror schedule for each and every one. I did try to import the relationships into datasets for monitoring only, but I discovered that when I get an alert for a dataset lag it doesn't specify which snapmirror relationship is lagging. This requires quite a bit of investigation if multiple relationships are in a dataset. If the snapmirror lag errors for the datasets specified the offending relationship, managing the relationships at the dataset level would have been an option for us. Thanks, Chris
... View more
Thank you Adai. This is very helpful. I have 76 VSM relationships that were discovered and assigned to 59 replication policies. Does this mean I need to assign a lag warning/error threshold manually for each and every policy? What would be the impact of assigning all of the relationships to a single replication policy?
... View more
I upgraded to Ops Mgr 4.0 and I'm using the "External Relationships" tab to monitor snapmirror relationships as Shiva suggested. I wanted to modify the snapmirror lag error and warning thresholds, but I'm not seeing the options in 'dfm options list'. The online man pages show that the options are: snapmirrorLagErrorThreshold: value (as a length of time) above which the snapmirror lag will generate an error event. snapmirrorLagWarningThreshold:value (as a length of time) above which the snapmirror lag will generate a warning event. However, they do not show up on my 'dfm options list' command: [root@trulvop0001 ~]# dfm option list|grep snap snapmirrorMonInterval 30 minutes snapshotDiscoveryEventsEnabled No snapshotMonInterval 30 minutes snapvaultMonInterval 30 minutes [root@trulvop0001 ~]# dfm option list snapmirrorLagErrorThreshold Error: There is no snapmirrorLagErrorThreshold option. [root@trulvop0001 ~]# dfm option list snapmirrorLagWarningThreshold Error: There is no snapmirrorLagWarningThreshold option. Where can I adjust these thresholds?
... View more
I am in the process of importing ~40 VSM relationships into Protection Manager 3.8.1 for monitoring only. I created an empty dataset and a policy with no schedule. For some reason many of the relationships are not importing and being removed from the "External Relationships" tab. Is the reason for this logged somewhere? I checked the dfpm.log and didn't see anything. I am also seeing the datasets are reporting an protection status of "Baseline Failure: All baseline transfers failed because the dataset in non-conformant". However, the dataset itself is reporting to be conformant and the transfers are working. What causes this error and how can I clear it?
... View more