Data Infrastructure Insights

Cloud Insights - Failure to collect performance data from ONTAP cluster

alexandre_oliveira
2,847 Views

Hello,

We are monitoring 4 different clusters using Cloud Insights. We are no receiving the following error for one of them:

 

perf-object-get-instances(Object : wafl) failed: System busy: 10 requests on table "perf_object_get_instances" have been pending for 2209502 seconds. The last completed call took 0 seconds.

 

I found this kb, but it didn't help: https://kb.netapp.com/Advice_and_Troubleshooting/Cloud_Services/Cloud_Insights/Performance_poll_failure_due_to_internal_error_for_ONTAP_data_collector....

 

I was able to run the "statistics lif show" command without any problems on the cluster.

 

ACQ log only shows that: 

ERROR [DSM-pool-3-thread-3167/NetApp-MC-CJ [storageperformance]] com.onaro.sanscreen.acquisition.framework.datasource.BaseDataSource (DataSourceErrorException.java:194) - NetApp-MC-CJ [Error retrieving data] - Performance sample failed to send any data to server on 279 consecutive updates. ([Device name ]: Performance sample failed to send any data to server on 279 consecutive updates.)

Any ideas about what can cause this problem?

 

Thanks!

5 REPLIES 5

tahmad
2,803 Views

cmd process might be hung, you may have to restart the CMD process. As per your kb, you need to contact NetApp support to proceed with these steps as they require diag level

Miles_Kniep
2,781 Views

Hi @alexandre_oliveira, do you have any other software monitoring ONTAP as well? I've found in a few cases where there are many tools monitoring, ONTAP does not allocate sufficient priority to respond to all  API requests (understandably - ONTAP is prioritizing user data access). It might be worth checking if this improves if you:

  • Disable performance monitoring of ONTAP from other applications
    OR
  • Increase the performance monitoring interval from other applications
    OR
  • If you're on a more recent ONTAP release (9.9 and up), you could instead use the ONTAP Cloud Connection data collector - this means CI doesn't have to query anything against the cluster, and instead ONTAP feeds the data directly into CI as a push mechanism.

Do you see any related errors in the ONTAP EMS or mgwd logs by chance?

alexandre_oliveira
2,779 Views

Hello,
They are still running 9.7 so cloud connection is not an option yet.

They also have AIQ Unified Manager, that is running without any problems.

I opened a suppot case, let see what support says.

Thanks for the feedback!

tahmad
2,710 Views

you mind sharing the case number @alexandre_oliveira 

alexandre_oliveira
2,698 Views

Sure! it is 2009123293 .

Public