Software Development Kit (SDK) and API Discussions

OCUM/AIQ LUN list api call from Netapp Harvest fails often

Grox80
2,097 Views

Hello community,

 

for some time I am trying to solve issue regarding API call for LUN capacity list from Netapp Harvest to Active IQ.

 

Observed behaviour:

At least once a day LUN capacity Harvest call towards AIQ ends with error and it stays broken untill workaround is applied.

Call ends with general 13001 error (confirmed by Netapp engineer).

Zexplorer "lun-iter" calls are ending with 13001 error as well. During this time also "dfm lun list" is not working with "internal error" message. 

If I try to lower "max-records" some iterations will work, until it hits error 13001. If I look for "next-tag" that does not work, everyday it is different "tag". Broken "tag" remains the same untill workarround is applied". 

When I tried to skip broken "tag" usually some other one was not working as well.

Every other api call like "aggr-iter" or "volume-iter" are working all the time. 

 

Only ONTApi calls are affected.

RestAPI and "um lun list" are working all the time.

 

Workaround:

Running AIQ mysql backup followed by reboot:

mysqldump --single-transaction --hex-blob --all-databases | 7za a -si "/data/ocum-backup/backup.bak";rm -f /data/ocum-backup/backup.bak; reboot

 

Environment:

VMware running OVA template. Installed as AIQ 9.6, updated to latest version.

A700 hosted NFS volume as a datastore. Native 40Gb network.

Controllers and ESXi hosts are not overutilized.

AIQ appliance resources were increased to 8vCPU and 24GB of RAM.

Three cluster are monitored by OCUM/AIQ.

Netapp clusters management address, OCUM/AIQ and Harvest/Graphite server are located in the same subnet.

Currently we have about 900 R/W LUNS (single lun per volume, ISCSi mounted) and about the same ammount of snapmirror copies.

 

Checked:

"df -hTP" and "df -iTP" show enough capacity on all partitions.

Netapp Harvest log files - only "internal error" message is occuring by default.

Stopwatch time during non working API call is under 1s.

OCUM/AIQ logs were checked by Netapp engineer - no relevant error was found.

 

Steps done in attempts to solve this issue:

Netapp Harvest - detailed error message saved to default log file.

Netapp Harvest - stopwatch time of the API if error ocurs

Netapp Harvest max records decreased to 100 (default is 1000).

Netapp Harvest timeout for API call increased to 300s (default is 60s).

Testing script created (perl based) to find broken "tags".

Netapp case created to investigate OCUM/AIQ so far without resolution.

Netapp case - disabling performance data collection on AIQ to have more resources on AIQ appliance.

 

Many thanks for ideas and advices.

Regards.

2 REPLIES 2

ttran
1,921 Views

Hi Grox80,

 

Error 13001 definitely is a generic communication issue as you might have suspected, especially when there is an API communication issue between Harvest and SDK. I didn't see which version of Harvest you are running. Some of these communication issues have been resolved in Harvest 1.6.1.

 

Harvest 1.6.1 

 

Regards,

 

Team NetApp

Team NetApp

Grox80
1,915 Views

Hi @ttran,

 

Many thanks for Your response.

 

Currently I am using Harvest 1.6

I would try 1.6.1, unfortunatelly I am unable to find working link for Harvest 1.6.1.

Neither in attached discussion (link expired) nor in toolchest.

 

Regards.

 

 

Public