Software Development Kit (SDK) and API Discussions

Re: Malformed XML exceptions ( how to handle)

aashray
12,216 Views

and which SDK version are you using ? So that I can test and find a solution using the same SDK.

25 REPLIES 25

explorenetapp
10,872 Views

Hi Aashray,

The SDK version is 4.0.

Prasanna

aashray
10,872 Views

Prasanna, I would recommend you download the SDK available at http://support.netapp.com/NOW/cgi-bin/software and try out the same code. Let me know if that solves the issue.

explorenetapp
10,872 Views

Hi Aashray,

Thanks for the response. I shall try out with the SDK that you have pointed me to and let you know the updates.

Prasanna

aashray
10,872 Views

Could you share with me your complete code for aggr-list-info that doesn't seem to be working.

explorenetapp
10,872 Views

Hi Aashray,

Thanks for the response. I need to check on the possibility of sharing the code. Will get back on this by next week. Also, I have a question w.r.t verifying the execution of the API calls with the newer version that you pointed me to. I understand that the version is SDK 5.0. Can you please let me know if verifying the API call execution with apitest.exe of SDK 5.0 is equivalent to verifying the same with 5.0 version of manageontap.jar in the java code?

Regards,

Prasanna

aashray
10,872 Views

apitest is a command-line utility to test APIs. This utility is suitable for API users who are at the beginner's level.

Verifying with API test should be equivalent to verifying with the manageontap.jar.

Your error could be related to passing incorrect XML parameters, that's why I asked for the code. Also, you could test your APIs on ZEDI that comes as a part of the package you have. It generates complete code which should be correct. The code I gave in my first comment is a complete working one from ZEDI. So you could refer that or share your code with me so that we could solve this.

-Aashray

POOJA_HP_2013
10,872 Views

Hi Aashray,

Thanks for your response.

I am  Prasanna's collegue and would be taking this up further. As Prasanna had mentioned, we might not be able to share the exact piece of code, but we do have the apitest and the Z-Explorer outputs. All the APIs were run using 4.0 and 5.0 SDK versions. Please find the same attached with this post.

apitest_50_Raw Outputs.zip      - contains API outputs, run using SDK 5.0

apitest_Raw_Outputs.zip           - contains API outputs, run using SDK 4.0

ZExplorerOutputs.zip                - contains Z-Explorer outputs

For most of the APIs, the output seems to be truncated. However, for some of them, though the output seems OK, I observed the following line appended at the end:

     "<results reason="debugging bypassed xml parsing" status="failed" errno="13001"/>"

Not sure what this could refer to.

Any help would be appreciated!

Regards,

Pooja

coon
10,872 Views

In the raw output (I opened aggr-space-list-info_raw_50.log) it stops at

allocated>2331906048</volume-al

A few suggestions...

  1. Can you run a packet trace from data ONTAP (pktt start all -i X.X.X.X -d /vol/volume) where X.X.X.X is the IP address you're running apitest from and volume is replaced with an actual volume on the controller that has space to capture a packet trace.Issue the zapi with HTTP (not https as that complicates using a packet trace). Then stop the packet trace (pktt stop all). You can just use one command and one version that returns this error. Then attach here along with the output the packet trace from the controller. I'm interested to see if this is the same data that left the controller in to the network (by gathering a packet trace).
  2. I'd also caution against programmatically using just -info API calls if there are -iter & -next APIs for the same information. As the number of all resource types (volumes, aggregates, shares/exports, LUNs, whatever) on a system grows, I've seen complications arise out of only grabbing a large bucket of output with just -info when -iter and -next would be better. For this type of call (using aggr-space-list-info as an example) you'd do better to use the aggr-list-info to build an array of the aggregates and then call aggr-space-list-info for each aggregate. Please let me know if you see individually called aggregates a way to resolve this error.

POOJA_HP_2013
10,872 Views

Thanks coon for the quick response!

Please find the output of packet trace attached with this post.

I shall check on the usage of the APIs with corresponding -iter and -next APIs available and try to implement it the way you have suggested if not already done so.

Regards,

Pooja

coon
11,509 Views

Did you include the output from the apitest command? I'm just looking for a unique string that will help me chase it down in the packet trace.

coon
11,509 Views

It looks like any replies that hit the MTU size (1514) gets truncated XML. I had a colleague point out that the PSH flag from TCP also tells the recipient to go ahead and process the data, don't wait for more.

I'll have to defer to some of the developers on how to address that or if it's already addressed somewhere.

coon
11,509 Views

I filed all the details under bug 728756. The quick solutions would be jumbo frames (9k MTU) or using segmentation like I mentioned before (meaning build an array of the aggregate and then just query each individual aggregate).

It'll take about a day for this link to work, but you can subscribe to it here: http://support.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=728756

Note: I just put that out there for tracking purposes. One of the other developers might know if we've already addressed this somewhere. 7.3 code isn't exactly new. ONTAP 8.x has been around a bit now.

POOJA_HP_2013
11,509 Views

Thanks Coon for the response.

I cross-checked out code, and we already implement the segmentation approach, as suggested in your previous post. Here is the list of APIs for which we are using -iter and other APIs:

quota-report

qtree-list-

aggr-list-info

perf-object-get-instances

perf-object-instance-list-info

volume-list-info

cifs-share-list

Regarding use of jumbo frames, could you please be little more descriptive. I am not aware of how to implement this in the code, in order to avoid the truncation of the output. Could you please provide some sample code which explains using jumbo frames?

Thanks & Regards,

Pooja

coon
11,509 Views

Jumbo frames would be a network configuration on the NICs of both the storage, the network, and the source issuing the API calls. We're investigating this further, but it seems these problems only happen in the packet trace you sent me when you have a response that is larger than 1 MTU/MSS (network interface PDU size). In your packet trace, Data ONTAP has a 1500 MTU set. When the zapi response is bigger than 1 MTU/MSS, the output is truncated. Jumbo frames (9k MTU) are fairly well known now even if they are technically considered nonstandard MTU. This would increase the amount of data that could be sent back from within Data ONTAP to 9k.

Changing your network communications channel between wherever you are issuing API calls from and to Data ONTAP may not be easy.

The easier suggestion would be to write the code to ask for smaller chunks of data. Build the object list within your code and then just query individual object details instead of asking for all of them at once (like in the aggr-list-space-info query that you provided the output for).

You called aggr-list-space-info with no aggregate value (effectively requesting the entire list). If you issued multiple commands for aggr-list-space-info aggr1 aggr-list-space-info aggr2...etc then the responses I believe could fit within the single MTU and then the problem is worked around without infrastructure changes.

coon
11,509 Views

If I were programming this, I'd probably also write in to my code a routine that simply takes any result that looks like this:

     "<results reason="debugging bypassed xml parsing" status="failed" errno="13001"/>"

and return an error for response too large. This would allow me to catch when that happens and identify alternative ways to gather the same information in smaller chunks.

I'm talking completely outside of the discussion we are having within NetApp about identifying why it is doing that and if/how to address it. This is just discussing working around the issue for now.

POOJA_HP_2013
10,896 Views

Hi Coon,

All the Malformed errors that we have reported, were found by our customers. I was trying to re-produce the Malformed errors with our set-ups, but couldn't. Could you please  let me know if there is any way in which this error can be reproduced? That might help us in debugging the issue better.

"If I were programming this, I'd probably also write in to my code a routine that simply takes any result that looks like this:

     "<results reason="debugging bypassed xml parsing" status="failed" errno="13001"/>""

I observed this in the apitest output for some of the APIs. However, since we have never got malformed xml errors in our set-ups, we are not sure on how this is handled by Ontapi SDK, that we are using in our code.

Here's a code snippet of how we use the APIs to get the output (as NaElement):

NaElement elemIn = new NaElement(api); //where api is volume-list-info, nfs-exportfs-list-rules etc

//elemIn is added with a new child by invoking elemIn.addNewChild(params1_name, params1_value)

NaServer naServer = new NaServer(<address>,1,0);

naServer.invokeElem(elemIn);

I am not sure on how "naServer.invokeElem(elemIn)" would return the output in case it contains "<results reason="debugging bypassed xml parsing" status="failed" errno="13001"/>". If it would be returned as a part of the parsed XML output, we could have a check in our code, as suggested by you.

Regards,

Pooja

POOJA_HP_2013
10,896 Views

Hi Coon,

Any updates on my previous post?

I went through the exceptions again which we have encountered so far in customer's environment, and here is some observation on the same:

  1. Malformed XML error is thrown from NaServer.java class (which is not part of SE code), hence there isn't much from SE side that can be done to handle this if it's due to truncated data. The exception is thrown even before we could get the response back from NaServer:

               "netapp.manage.NaProtocolException: Malformed XML

                         at netapp.manage.NaServer.invokeElem(NaServer.java:644)"

  1. Here are some of the APIs which are throwing exception:

          API: aggr-list-info

     As you had also suggested, we try to get the aggregate names using this API and then iterate through the list using aggr-space-list-info. However, here the exception is thrown while we try to get the aggregate names itself. Any suggestion on how this could be avoided, if it could be?

         API: disk-list-info

     Here, we first try to get the disk drives using "disk-list-info" API. When it fails, we try again using CLI, but it throws exception again.

          API: volume-list-info

     Here, we first try to get all the volumes using "volume-list-info" API, and if it fails try again with "<volume-list-info-iter-next>" but this also throws the Malformed XML error as shown below:

         [2013-03-09 11:43:54 Streamer-17             ] .NetAppNativeMethod(tapp.NetAppPlexProvider) Protocol Exception while processing: <volume-list-info><verbose>true</verbose></volume-list-info>

          [2013-03-09 11:43:54 Streamer-17             ] ntapiDataCollection(tapp.NetAppPlexProvider) NaProtocolException getting plexes for: 0118043593.  Retrying using iterator.

          [2013-03-09 11:44:33 Streamer-17             ] .NetAppNativeMethod(tapp.NetAppPlexProvider) Making ONTAPI call:(10.35.10.36): <volume-list-info-iter-next><maximum>10</maximum><tag>28429880056655277</tag></volume-list-info-iter-next>

          [2013-03-09 11:45:42 Streamer-17             ] .NetAppNativeMethod(tapp.NetAppPlexProvider) Protocol Exception while processing: <volume-list-info-iter-next><maximum>10</maximum><tag>28429880056655277</tag></volume-list-info-iter-next>

          [2013-03-09 11:46:42 Streamer-17             ] .NetAppPlexProvider(tapp.NetAppPlexProvider) Can't enumerate APPIQ_NetAppPlex

netapp.manage.NaProtocolException: Malformed XML

          at netapp.manage.NaServer.invokeElem(NaServer.java:644)

         API: snapshot-list-info

         

coon
10,896 Views

Pooja,

We are discussing this issue still. Can you say if there is any consistency in the Data ONTAP versions that encounter this error? I recall seeing somewhere (perhaps the case notes) that this was primarily a 7.3.x issue?

POOJA_HP_2013
10,896 Views

Customer encountered these issues for manageontap 4.1 jar. However we tried with 5.0 R1 as well, and still got the errors.

Regards,

Pooja

coon
9,988 Views

Apologies, I was asking if there is any commonality to the Data ONTAP version that is receiving these queries.

Public