Software Development Kit (SDK) and API Discussions

Ontap SDK: volume-get-iter ZAPI returns erroneous next-tag



Hi folks, I am one of the developers of NetApp Harvest. We heavily use Zapi to collect performance and capacity counters from ONTAP hosts. Recently we've developed an plugin to collect counters of the volume-get-iter Zapi. In my own testing environment everything seemed fine, but one of our users reported that the plugin keeps hanging in the background.


I did a little bit debugging and played around with the max-records attribute. It turns out that sometimes (and seemingly this depends on the max-records value and the number of volumes in your cluster), Zapi returns a next-tag that is exactly the same as the one we got during the previous batch request, and so the program ends up in an infinite cycle.


Is this a known issue? Has anyone else faced this? It seems like a similar issue was reported previously. I get this issue with an ONTAP 9.6P3 release. ONTAP 9.7 seems to be fine (although this could just be the number of volumes).


For illustration, this is the relevant part of the function that I'm running:


def collect_counters():
    Collect counter data from Ontap host. We request the Zapi object 
    "volume-get-iter" and store only the counters that are in the two global
    lists volume_space_counters and volume_sis_counters (check the top of this

        data:       triple nested dict: svm => volume => counter => value

data = {} # Get Zapi connection to Cluster SDK, zapi = connect_zapi(params) # Construct Zapi request request = SDK.NaElement('volume-get-iter') request.child_add_string('max-records', 150) desired_attributes = SDK.NaElement('desired-attributes') request.child_add(desired_attributes) volume_attributes = SDK.NaElement('volume-attributes') desired_attributes.child_add(volume_attributes) volume_attributes.child_add_string('volume-id-attributes', '') volume_attributes.child_add_string('volume-space-attributes', '') volume_attributes.child_add_string('volume-sis-attributes', '') volume_attributes.child_add_string('volume-snapshot-attributes', '') next_tag = 'initial' api_time = 0 # Continue as long as we get a "next tag" in previous request while (next_tag): if next_tag != 'initial': request.child_add_string('tag', next_tag) start = time.time() # Send request to server try: response = zapi.invoke_elem(request) except Exception as ex: logger.error('[collect_counters] Exception while sending ZAPI ' \ 'request: {}'.format(ex)) sys.exit(1) t = time.time() - start api_time += t # Check the results if response.results_status() != 'passed': logger.error('[collect_counters] ZAPI request failed: {}'.format( response.results_reason() ) ) sys.exit(1) num_records = response.child_get_int('num-records') # No point to continue if no data available if not num_records:'[collect_counters] No counter data, stopping session') sys.exit(0) # This will be None if we got everything already next_tag_tmp = response.child_get_string('next-tag') # DEBUG # Compare the newly received next-tag to the previous one # to see if we keep getting the same tag. tag_compare = '<NONE>' if next_tag_tmp: tag_compare = '<SAME>' if next_tag_tmp == next_tag else '<NEW>' next_tag = next_tag_tmp logger.debug('[collect_counters] Batch API time: {}s. Num records={}. ' \ 'Next tag=[{}]'.format(round(t,2), num_records, tag_compare ) ) # Extract instances try: instances = response.child_get('attributes-list').children_get() except (NameError, AttributeError) as ex: logger.error('[collect_counters] Extracting results failed:' \ ' {}'.format(ex)) sys.exit(1)



When I set max-records to 500, the plugin runs fine, but when I set it to 150, it ends up in an infinite API loop:



$ python extension/
[2020-01-30 17:46:33,487] [INFO] Started extension in foreground mode. Log messages will be forwarded to console
[2020-01-30 17:46:33,487] [DEBUG] Started new session. Will poll host [Cuba] for volume capacity counters
[2020-01-30 17:46:33,717] [DEBUG] [connect_zapi] Created ZAPI with host [Cuba:443], Release=NetApp Release 9.6P3: Sun Sep 22 08:26:36 UTC 2019
[2020-01-30 17:46:35,301] [DEBUG] [collect_counters] Batch API time: 1.58s. Num records=150. Next tag=[<NEW>]
[2020-01-30 17:46:36,749] [DEBUG] [collect_counters] Batch API time: 1.44s. Num records=150. Next tag=[<NEW>]
[2020-01-30 17:46:40,162] [DEBUG] [collect_counters] Batch API time: 3.4s. Num records=150. Next tag=[<SAME>]
[2020-01-30 17:46:41,903] [DEBUG] [collect_counters] Batch API time: 1.73s. Num records=150. Next tag=[<SAME>]
[2020-01-30 17:46:43,368] [DEBUG] [collect_counters] Batch API time: 1.45s. Num records=150. Next tag=[<SAME>]
[2020-01-30 17:46:45,132] [DEBUG] [collect_counters] Batch API time: 1.75s. Num records=150. Next tag=[<SAME>]



Same script tested against ONTAP 9.7 with no issues:

sanjunipero>$ python extension/
[2020-01-30 17:46:27,751] [INFO] Started extension in foreground mode. Log messages will be forwarded to console
[2020-01-30 17:46:27,752] [DEBUG] Started new session. Will poll host [jamaica] for volume capacity counters
[2020-01-30 17:46:27,975] [DEBUG] [connect_zapi] Created ZAPI with host [jamaica:443], Release=NetApp Release 9.7: Thu Jan 09 17:11:21 UTC 2020
[2020-01-30 17:46:28,684] [DEBUG] [collect_counters] Batch API time: 0.71s. Num records=76. Next tag=[<NONE>]
[2020-01-30 17:46:28,691] [DEBUG] [collect_counters] Collected 1292 counters for 76 volumes
[2020-01-30 17:46:28,694] [DEBUG] Ending session. Runtime: 0.94s. API time: 0.71s [75.16%]