OpenStack Discussions

Copy Offload Utility Image cloning unsuccessful

jonmills
6,244 Views

Good morning, this is my first post to this forum. I'm tasked with building a large OpenStack Queens private cloud at NASA Goddard Space Flight Center. We purchased a FAS 2650 ontap cmode 9.3P4 cluster with about 70TB of usable SSD storage. OpenStack is running well, and I'm just trying to establish tighter integration between Cinder/Glance and the NFS backend, to take advantage of features like Copy Offload. I'm having limited success there, though

Basically, I can get it to work, but only when Glance & Cinder are hosted from the same flexvol. So, it's working right now, but this is not how I wanted to lay out my storage. I wanted Glance & Cinder at least in separate flexvols, and preferably balanced with one on one filer head node and the other on the second filer head node.

Current situation is this:

filer2:/images (contains both Glance, Cinder images -- copy offload works great)

Situations where copy offload fails:

 

1) filer1:/openstack/glance and filer2:/openstack/cinder (where 'openstack' is just a junction point in the export namespace).

 

This fails badly. First off, the driver doesn't seem to understand junction paths, and it ends up thinking that 'openstack' is a flexvol, when in fact it's just a directory (a mountpoint) in the namespace. You get stuff like this:

 

NaApiError: NetApp API failed. Reason - 13040:Volume "openstack" does not exist in Vserver "openstack". Reason: entry doesn't exist.
2018-08-29 09:37:27.965 32295 ERROR cinder.volume.drivers.netapp.dataontap.nfs_cmode Traceback (most recent call last):

But regardless, the driver cannot find the source file (meaning the glance image):

Discover file retries exhausted.
2018-08-29 09:23:29.326 32295 INFO
cinder.volume.drivers.netapp.dataontap.nfs_base
[req-4c82cd34-bc97-4820-be4c-dd134fcc727f
5cc84caf09f814620d952c9aa399c0ecaf8dbfa538c11143a6ed24bd06ece0f3
6e6d8ff081014c679f18ad4b818ffd4c - - -] Image cloning unsuccessful for
image 4b8371ec-e7ef-4081-ba36-629eb5a18957. Message: NetApp API failed.
Reason - 14935:Clone operation failed to start: Source file does not exist..

 

2) filer2:/images/glance and filer2:/images/cinder, where 'images' is a flexvol and 'glance' and 'cinder' are qtrees in that same flexvol. Fails with cinder volume agent saying it can't find the volume, or the volume doesn't exist, because the driver doesn't appear to understand what a qtree is.

 

3) filer1:/glance and filer2:/cinder, where 'glance' and 'cinder' are just flexvols, but hosted on different filer heads of the same ontap cluster. Same problem, driver cannot find source file glance image, like:

 

Discover file retries exhausted.
2018-08-29 09:23:29.326 32295 INFO
cinder.volume.drivers.netapp.dataontap.nfs_base
[req-4c82cd34-bc97-4820-be4c-dd134fcc727f
5cc84caf09f814620d952c9aa399c0ecaf8dbfa538c11143a6ed24bd06ece0f3
6e6d8ff081014c679f18ad4b818ffd4c - - -] Image cloning unsuccessful for
image 4b8371ec-e7ef-4081-ba36-629eb5a18957. Message: NetApp API failed.
Reason - 14935:Clone operation failed to start: Source file does not exist..

 

4) filer2:/glance and filer2:/cinder, where 'glance' and 'cinder' are separate flexvols in the same aggr on the same filer head. This kinda sorta works, but the API still fails, and it falls back to doing what looks like a literal unix 'mv' command. it also takes a long time because there's a timeout waiting on the API failure. It looks like this:

 

cinder.volume.drivers.netapp.dataontap.nfs_base
[req-03f1f945-cd36-417a-a378-4669b40e43c4 5cc84caf09f814620d9
52c9aa399c0ecaf8dbfa538c11143a6ed24bd06ece0f3
6e6d8ff081014c679f18ad4b818ffd4c - - -] Discover file retries exhausted.
2018-08-29 12:42:53.993 23332 WARNING
cinder.volume.drivers.netapp.dataontap.nfs_base
[req-03f1f945-cd36-417a-a378-4669b40e43c4 5cc84caf09f814620d9
52c9aa399c0ecaf8dbfa538c11143a6ed24bd06ece0f3
6e6d8ff081014c679f18ad4b818ffd4c - - -] Exception moving file
/var/lib/cinder/mnt/5905fe460e578283c84
96e21d8b3a6da/25b9beee-3812-4941-946c-bb7f753c85f5. Message - Unexpected
error while running command.
Command: sudo cinder-rootwrap /etc/cinder/rootwrap.conf mv
/var/lib/cinder/mnt/5905fe460e578283c8496e21d8b3a6da/25b9beee-3812-4941-946c-bb7f753c85f5
/var/lib/cinder/mnt/5905fe460e578283c8496e21d8b3a6da/img-cache-58aaa550-ff19-4372-bb93-634af3715dd5
Exit code: 1
Stdout: u''
Stderr: '/bin/mv: failed to access
\xe2\x80\x98/var/lib/cinder/mnt/5905fe460e578283c8496e21d8b3a6da/img-cache-58aaa550-ff19-4372-bb93-634af3715dd5\xe2\x80\x99:
Not a directory\n': ProcessExecutionError: Unexpected error while
running command.
2018-08-29 12:42:53.995 23332 INFO
cinder.volume.drivers.netapp.dataontap.nfs_base
[req-03f1f945-cd36-417a-a378-4669b40e43c4
5cc84caf09f814620d952c9aa399c0ecaf8dbfa538c11143a6ed24bd06ece0f3
6e6d8ff081014c679f18ad4b818ffd4c - - -] Performing post clone for
volume-5fe2827f-e865-49f4-bf64-d32419cfc2fc


In all cases, Glance has been configured to expose multiple location metadata as per this post by D. Cain:
https://community.netapp.com/t5/OpenStack-Discussions/Copy-offload-unsuccessful-Source-host-details-not-found/m-p/108795#M240.

 

And here is my ontap backend config section from cinder.conf:

 

[gpc-xxxxx]
volume_backend_name = gpc-xxxxx
volume_driver = cinder.volume.drivers.netapp.common.NetAppDriver
trace_flags = method,api
netapp_api_trace_pattern = ^(?!(perf)).*$
netapp_storage_family = ontap_cluster
netapp_copyoffload_tool_path = /etc/cinder/na_copyoffload_64
nas_secure_file_operations = True
nas_secure_file_permissions = True
netapp_server_hostname = openstack
netapp_vserver = openstack
expiry_thres_minutes = 1
thres_avl_size_perc_stop = 100
thres_avl_size_perc_start = 100
netapp_server_port = 443
netapp_storage_protocol = nfs
nfs_shares_config = /etc/cinder/nfs_shares
nfs_mount_attempts = 3
nfs_mount_options =
'rw,noatime,vers=4.0,rsize=65536,wsize=65536,hard,proto=tcp,timeo=600,retrans=2,sec=sys,lookupcache=pos'
netapp_login = cinder
netapp_password = xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
netapp_transport_type = https
image_volume_cache_enabled = False


There are no unix permissions issues, no nfs export issues, and no API auth issues. The API and/or copy offload utility just fails to "find" the glance image unless those images are inside the same flexvol as the Cinder volume backend that the driver is pointed at.

 

Any assistance, or tips, would be hugely appreciated!

 

Cheers,

 

Jonathan Mills

2 REPLIES 2

nareshkumarg
5,895 Views

Openstack works well with hostnames there are no issues with it(Tested with master branch).
Openstack is gracefully handling the hostname/IP.

 

Jonmillis I have tested with a master branch with respect to above issues reported by you...
1) filer1:/openstack/glance and filer2:/openstack/cinder (where 'openstack' is just a junction point in the export namespace).
This doesn't work and fails with error : NaApiError: NetApp API failed. Reason - 13040:Volume "openstack" does not exist in Vserver "openstack". Reason: entry doesn't exist.

Test 1 : Works fine with mountpoint in cinder nfs export(lifip:/openstack/cindrel4 , openstack is just a mountpoint and not a flexvol):
mounts
lifip:/openstack/cindrel4 on /opt/stack/data/cinder/mnt/3c2486470393d6f3fc95899c0572cdca type nfs4
lifip:/glance on /opt/stack/data/glance/images type nfs4
lifip:/ on /mnt/rootvolume type nfs4

root@scspa0547852001:~# ls -l /mnt/rootvolume/
total 20
drwxr-xr-x 2 stack stack 4096 Dec 17 03:53 cindrel
drwxr-xr-x 2 stack stack 4096 Dec 17 00:47 cindrel2
drwxr-xr-x 2 stack stack 4096 Dec 17 03:49 glance
drwxrwxrwx 3 stack stack 4096 Dec 17 00:55 openstack --> this is just a directory
drwxr-xr-x 3 nobody 4294967294 4096 Dec 14 05:05 openstackk

Copy offload works fine without issues.. Refer i2v1.log
DEBUG oslo_concurrency.processutils [^[[01;36mNone req-e843f0e1-1259-4d47-b1e9-3322849d3f28 ^[[00;36mdemo None] ^[[01;35mCMD "/etc/cinder/copyoffload/na_copyoffload_64 10.61.65.231 10.61.65.231 /glance/f3887838-fa33-48ae-b34f-c17ece14f855 /openstack/cindrel4/02a50221-a62c-41b5-a308-9e4243a5f0f5" returned: 0 in 2.044s^[[00m ^[[00;33m{{(pid=3218) execute /usr/local/lib/python2.7/dist-packages/oslo_concurrency/processutils.py:409}}^[[00m

Test 2 : Doesn't work with mountpoint in cinder nfs export(lifip:/openstack/cindrel4 , openstack is just a mountpoint and not a flexvol)
as well as mountpoint in glance export path(lifip:/glancedir/glancevol2, glancedir is just a mountpoint and not a flexvol)
mounts
lifip:/openstack/cindrel4 on /opt/stack/data/cinder/mnt/3c2486470393d6f3fc95899c0572cdca type nfs4
lifip:/glancedir/glancevol2 on /opt/stack/data/glance/images type nfs4

root@scspa0547852001:~# ls -ltr /mnt/rootvolume/
total 24
drwxr-xr-x 3 nobody 4294967294 4096 Dec 14 05:05 openstackk
drwxr-xr-x 2 stack stack 4096 Dec 17 00:47 cindrel2
drwxrwxrwx 3 stack stack 4096 Dec 17 00:55 openstack
drwxr-xr-x 2 stack stack 4096 Dec 17 03:53 cindrel
drwxr-xr-x 3 stack stack 4096 Dec 17 04:21 glancedir --> this is just a directory
drwxr-xr-x 2 stack stack 4096 Dec 17 08:02 glance

Copy offload returns error 2 and falls back normal copy.. Refer i5v1.log
Error : DEBUG oslo_concurrency.processutils [^[[01;36mNone req-40c66156-dbe0-4aac-8ecf-eb247eeb810f ^[[00;36mdemo None] ^[[01;35mCMD "/etc/cinder/copyoffload/na_copyoffload_64 10.61 .65.231 10.61.65.231 /glance/17cc39c1-a60f-4d58-b6a1-6d8ab358a5dd /openstack/cindrel4/1f1785c3-f23c-499d-8184-698ca6739c75" returned: 2 in 0.030s^[[00m ^[[00;33m{{(pid=6924) execute /usr/local/lib/python2.7/dist-packages/oslo_concurrency/processutils.py:409}}^[[00m

In both the above tests we don't see the error seen by you "NaApiError: NetApp API failed. Reason - 13040:Volume "openstack" does not exist in Vserver "openstack". Reason: entry doesn't exist.".
The error in test 2 looks to be an error in the na_copyoffload_64 tool and not openstack issue.

 

2) filer2:/images/glance and filer2:/images/cinder,
where 'images' is a flexvol and 'glance' and 'cinder' are qtrees in that same flexvol.
Fails with cinder-volume agent saying it can't find the volume or the volume doesn't exist, because the driver doesn't appear to understand what a qtree is.

Local test yeilded the error below same as you.. Refer i6v1.log
WARNING cinder.volume.drivers.netapp.dataontap.nfs_base [^[[00;36m-] ^[[01;35mException during cache cleaning lifip:/images/cinder. Message - Volume /images/cinder not found .^[[00m: NetAppDriverException: Volume /images/cinder not found.

Cinder is not supported with qtrees.. This is not a defect, this is an unsupported feature request.

 

3) filer1:/glance and filer2:/cinder, where 'glance' and 'cinder' are just flexvols, but hosted on different filer heads of the same ontap cluster.Same problem, driver cannot find source file glance image
I tested this in my local env with following mount configs, cinder and glance vols are mounted using two different lif's.
lifip:/glance on /opt/stack/data/glance/images type nfs4
lifip_2:/cindrel on /opt/stack/data/cinder/mnt/482e2a043ff27f331eee6ee7d86a5ae6 type nfs4

Copy offload works fine.. Refer i7v1.log
DEBUG oslo_concurrency.processutils [^[[01;36mNone req-254581e6-016c-4a5d-8b97-4f3bb1123533 ^[[00;36mdemo None] ^[[01;35mCMD "/etc/cinder/copyoffload/na_copyoffload_64 10.61.65.231 10.61.65.230 /glance/77c78a9a-a1d8-4375-8e6b-d42c2a33465f /cindrel/35f27f25-a2d9-4456-b055-fda0bfc22c33" returned: 0 in 2.072s^[[00m ^[[00;33m{{(pid=18181) execute /usr/local/lib/python2.7/dist-packages/oslo_concurrency/processutils.py:409}}^[[00m

 

4) filer2:/glance and filer2:/cinder, where 'glance' and 'cinder' are separate flexvols in the same aggr on the same filer head. This kinda sorta works, but the API still fails, and it falls back to doing what looks like a literal Unix 'mv' command. it also takes a long time because there's a timeout waiting on the API failure. It looks like this:
No we don't have this issue.. Tests above prove it..

 

Openstack works well with hostnames in the latest version.
it seems like OpenStack has been patched to handle hostname resolution.
The issue was observed in the Queens release and I think it is safe to say that this is not observed in R and S.

jonmills
5,288 Views

There was never a hostname resolution problem.

 

The problem was that the netapp driver hashes the pool_name (which is just the form of the NFS export, like host:/export) to derive a long string to name the cinder directory at the /var/lib/cinder/mnt mountpoint.  And in some parts of the netapp driver, it was hashing the hostname:/export/path and in other parts it was hasing the ipaddress:/export/path and coming up with different hashes and thus different directory names.  The result was a 'does not exist' issue trying to find the path and it not being there because it was looking for the wrong name.  The simple fix for me was to just drop all the hostnames and use the IP address of the filer controller in question when considering the export or pool_name.

Public