Active IQ Unified Manager Discussions

Performance Advisor not running scripts setup under Alarms

BARUNAGARWAL
3,538 Views

Hi Gurus,

We have recently started using NetApp Performance Advisor (PA) and have found it very helpful. Digging a bit more in PA revealed the Thresholds and Alarms and the ability to integrate Perfstat with PA.

Ofcourse I went through the post at https://communities.netapp.com/people/fenton/blog/2011/02/22/diy-guide-to-integrating-perfstat-to-performance-advisor which gives the detail on how to integrate. There are supposedly some batch files and a simple perl script to run Perfstat. I am not able to get to those files. So I wrote my own perl script based on the suggestion provided.

My DFM is a linux server and I am accessing PA from the windows based jumphost where we have NetApp Management console installed.

My problem is PA does not seem to run the script. I followed 2 different approach

1) I wrote a simple batch script to call the perl (tried with shell script as well) script. The batch script was placed in the local directory of jumphost. It uses plink to log into the DFM server and run the perl script. The perl script and perfstat binary are placed in the remote DFM server. When I run the batch script manually I can initiate perfstat data collection but the Performance Advisor does not seem to even run the batch script. So the question is, Is PA expecting the script to be in the DFM server to begin with?

2) The second approach was to pass the location of the perl script directly to performance advisor but that also doesn't seem to work.

I have tried with simple scripts which just creates a file and says it has been created by PA at <date/time> to check if PA is accessing the scripts at all. When I test the alarms PA says the script has been started and then closes the pop-up. Is there a log file where PA logs it's activities?

We do have a separate user for Performance Advisor with the following roles, "GlobalAlarm, GlobalDelete, GlobalDFMCore, GlobalPerfManagement, GlobalPerfThreshTemplate, GloablRead, GlobalWrite. All other functionality of PA works just fine.

Any help or push in the right direction would be great.

Thanks

3 REPLIES 3

JGPSHNTAP
3,538 Views

Just a word of caution with kicking off perstat from alarm triggers.  I've done this in the past from windows, but it really depends on what counter you are using to kick off the perfstat.  For example, if you use volume:nfs_write_latency, if you a large performance problem and it causes  write latency across multiple volumes, perfstat will try to kick off X times.   I think you have to have system level counters in order to have this affective.

As for your setup, mine was a little different, all windows, on single host so it was a little easier

BARUNAGARWAL
3,538 Views

Thanks for the heads up. I do understand that the problem associated with multiple perfstat kick off's. I have configured the alarm for only 1 volume for which I check the latency and have set very high threshold. We have been asked to collect perfstat data when the problem occurs but our last 3 attempts of collecting perfstat data have not been able to catch the problem as the issue of high latency is intermittent. The only other option would be to have some kind of alerting and manual kick off of perfstat. But takes away the brilliance of integration and automation.

I will keep digging but I am sure someone out there must have worked on this type of setup.

JGPSHNTAP
3,538 Views

Like i've said, I've done it for volume based, on a windows box but it's more trouble than it's worth.

For example, you can have multi-level alerting setup where at x amount of latency it kicks off perfstat.  Let's say you set your perfstat for 6x5, well that takes a long time, and your volume could again get into the state where it tries to kick off another perfstat.

If you have an operationis staff, I would make sure you setup multi-level alerting and then have the ops staff kick off the perfstat once they get an email for that specific volume.

I can try to dig up my perfstat script, but in my opinion, you are swimming up river, especially with volume.

What are your counters and thresholds?

Public