ONTAP Rest API Discussions

REST API returns inconsistent clone parent info

mbuchanan
2,016 Views

We have a set of training environments that are FlexClones of "gold" master volumes.  We refresh them nightly by destroying the old clones and creating new ones.  The refresh process has three parallel threads of execution, one for each geographical region's training environments.  Occasionally we see the REST API return an empty clone.parent_snapshot property for a clone volume ("VolA") when one thread queries the volumes on the filer while another thread is destroying or creating VolA.

 

Below is an example of this behavior where process 2011662 queries the volumes on the SVM while process 2011635 destroys volilnx002_trnaok_clone_e418_v0:

2022/08/16 23:30:07 waflhook(2011662): Querying LVM on host trnt13
2022/08/16 23:30:10 waflhook(2011635): Removing filer vspsunepictrn01 volume(s) volilnx002_trnaok_clone_e418_v0
2022/08/16 23:30:16 waflhook(2011635): Deleting filer vspsunepictrn01 snapshot e418-v0 on volume(s) volilnx002_trnaok_clone_e415_v1247
2022/08/16 23:30:20 waflhook(2011662): Clone volume with undef parent snapshot

This is the record returned by the API for volilnx002_trnaok_clone_e418_v0:

 

{
  "num_records": 1,
  "records": [
    {
      "uuid": "4f7574a7-1d1e-11ed-9f74-00a098cd8ee3",
      "name": "volilnx002_trnaok_clone_e418_v0",
      "clone": {
        "parent_volume": {
          "uuid": "1b084d37-1cbc-11ed-9f74-00a098cd8ee3",
          "_links": {
            "self": {
              "href": "/api/storage/volumes/1b084d37-1cbc-11ed-9f74-00a098cd8ee3"
            }
          },
          "name": "volilnx002_trnaok_clone_e415_v1247"
        },
        "parent_snapshot": {}
      },
      "_links": {
        "self": {
          "href": "/api/storage/volumes/4f7574a7-1d1e-11ed-9f74-00a098cd8ee3"
        }
      }
    }
  ],
  "_links": {
    "self": {
      "href": "/api/storage/volumes?start.uuid=4e0511fc-1b8a-11ed-9f74-00a098cd8ee3&fields=clone.parent_volume%2Cclone.parent_snapshot&max_records=1"
    },
    "next": {
      "href": "/api/storage/volumes?start.uuid=4f7574a7-1d1e-11ed-9f74-00a098cd8ee3&fields=clone.parent_volume%2Cclone.parent_snapshot&max_records=1"
    }
  }
}

 

Is this the expected behavior from the API?  Should we simply ignore volumes with parent_volume set but parent_snapshot undefined?  Should we avoid querying volumes on an SVM or cluster while another process could be creating or destroying volumes?

4 REPLIES 4

mbuchanan
1,924 Views

Here is an example of the API returning a clone volume with an empty parent snapshot while the clone is being created.  The API returns the partially-formed volume volilnx002_trnstl_clone_e417_v1253 in response to process 200876's query while process 190557 is creating the clone.

 

2022/08/19 12:01:20 waflhook(200876): Querying LVM on host trnt12
2022/08/19 12:01:24 waflhook(190557): Creating filer vspsunepictrn01 volume clone volilnx002_trnstl_clone_e417_v1253 from volume volilnx002_trnstl_clone_e408_v158 snapshot e417-v1253
2022/08/19 12:01:31 waflhook(200876): Clone volume with undef parent snapshot

This is the response returned to process 200876:

{
  "_links": {
    "self": {
      "href": "/api/storage/volumes?start.uuid=8b5fa9b5-899d-483b-a55c-effb1634ffa8&max_records=1&fields=clone.parent_volume%2Cclone.parent_snapshot"
    },
    "next": {
      "href": "/api/storage/volumes?start.uuid=8ebdb3e6-1fe0-11ed-9f74-00a098cd8ee3&max_records=1&fields=clone.parent_volume%2Cclone.parent_snapshot"
    }
  },
  "num_records": 1,
  "records": [
    {
      "uuid": "8ebdb3e6-1fe0-11ed-9f74-00a098cd8ee3",
      "name": "volilnx002_trnstl_clone_e417_v1253",
      "clone": {
        "parent_volume": {
          "name": "volilnx002_trnstl_clone_e408_v158",
          "uuid": "a1e40781-1b29-11ed-9f74-00a098cd8ee3",
          "_links": {
            "self": {
              "href": "/api/storage/volumes/a1e40781-1b29-11ed-9f74-00a098cd8ee3"
            }
          }
        },
        "parent_snapshot": {}
      },
      "_links": {
        "self": {
          "href": "/api/storage/volumes/8ebdb3e6-1fe0-11ed-9f74-00a098cd8ee3"
        }
      }
    }
  ]
}

 

RobertSimac
1,637 Views

Matt not sure if applicable to your problem but I have recently observed some multi-threading related issues, which I tested and attributed to the usage ontap rest python library  "with rest_connection:" python syntax in my code.

 

I have posted about it here: https://community.netapp.com/t5/ONTAP-Rest-API-Discussions/Python-with-context-manager-and-rest-apis-multi-threading/m-p/442026/highlight/true#M468

 

If you are using python -and- 'with' syntax, the problem may be in the threads' overstepping over each other and peeking into the other thread data... and causing random and otherwise unexplainable problems...

 

Note this is observed only for ontap rest python library, and only for 'with restconnection:' syntax. No other languages have similar syntax, afaik, so the problem is most likely limited only to Python...

mbuchanan
1,629 Views

Thanks, Robert.  I am using Perl, with a homegrown NetApp REST client class built on top of the REST::Client module, and what I called "threads" in my original post are actually implemented as processes forked from a common parent but in separate address spaces.  I am not familiar enough with Python's thread implementation to say whether our problems are related.

 

NetApp assigned BURT 1503570 to track my problem.  It may not be public, because I was not able to find it with a search.  I worked around my problem by adding some locking to my NetApp Perl modules.  Specifically, I created a lock for each vserver.  The volume query takes the lock in shared mode, and volume create and destroy operations take that lock in exclusive mode.  That prevents the query from getting partial information on volumes being created or destroyed.  It's not perfect because it protects my code only from itself and not other clients that could be touching the volumes.  My hope is that the fix for the BURT will handle this locking on the API server side so the client doesn't need to consider it.

RobertSimac
1,625 Views

Thanks Matt for detailed description. Based on it, I am rather sure our problems are not related, your code working as separate processes, unable to affect each other (at client side), while my Python code working as python 'threads', sharing the address space and using the problematic 'with' syntax. But, it is always good to double check.. Thanks.

Public