Subscribe

SRM SRA 2.1 test-failover-start cloning failure

In the interests of helping others, I thought it worth posting the following.

We have just upgraded from vSphere 4.1 to 5.1 and along with that we have upgraded SRM to 5.1, it came as a surprise that a test would fail when the SRA tried to clone the datastore volumes.  After much investigation the issue was as follows;

1. The actual cloning of the snapshots worked as expected.

2. The SRA then attempted to export the clone with a long list of IPv4 and IPv6 addresses for all of the ESXi hosts.

3. The export would succeed but the entry in /etc/exports was truncated part way through an address, at approx character 4112 (seemed suspiciously like a 4k buffer somewhere :-( )

4. The SRA would read back in the export rules and get an XML parsing error which it would then log as a failure and pass back to SRM which would then report it as a failure to clone the volume.

We could find no easy solution to this issue.  Reducing the number of IP addresses or turning off IPv6 is not an option and have ended up hacking the code in failover.pl.  We did consider filtering out the [fe80::] addresses which possibly should be included but in the end did the following

1. We have broken the loop that iterates over the IP list so that the list of rwhosts and roothosts no longer have the individual IP addresses in them.

2. We have added our ESX NFS network to the ontap_config.txt options read-write-hosts and root-hosts

3. We have broken the check in the checkexports subroutine to return true when matching theses 'extra hosts'.

The exports line now only contains the network rather than individual hosts and the corruption no longer occurs.

This solution is not ideal as I don't like changing vendor supplied code but getting a solution that worked quickly was more important at this stage.

Re: SRM SRA 2.1 test-failover-start cloning failure

Glen,

What version of ONTAP was this?

Re: SRM SRA 2.1 test-failover-start cloning failure

8.1.2P4

Re: SRM SRA 2.1 test-failover-start cloning failure

7-mode or cDOT?

Re: SRM SRA 2.1 test-failover-start cloning failure

Sorry, I should have said in my first reply - 7 mode