SolidFire and HCI

Solidfire Volume replication

CarlosBaptista
2,342 Views

Hi all,

 

I've created 5 volume replications between 2 nodes of an HCI cluster. The original volumes have been around for some time and each have about 10TB of data. How can I monitor the percentage of data that has already been replicated, or even if it has started and has no errors? The only messages available in the web interface are "PausedDisconnected" and "ResumingConnected," which doesn't seem to be replicating.

 

Thanks

Carlos Baptista

1 ACCEPTED SOLUTION

elementx
2,263 Views

Nothing is likely replicated as replication in my estimate can't even start.

It's resuming because it's set up to replicate, but every time it tries it gets disconnected as no connection can be established. The messages could be more verbose, but they are documented (one example below).

The solution is in my earlier comment - make sure the network is configured properly.

You can find more in the TR & KB although my earlier comment contains the key details.

Minor versions of SolidFire should be compatible for replication purposes.

 

1) KB - https://kb.netapp.com/onprem/solidfire/Element_OS/Remote_replication_volume_pairs_in_pausedDisconnected_status

2) TR - https://www.netapp.com/media/10607-tr4741.pdf

View solution in original post

5 REPLIES 5

elementx
2,278 Views

Hi Carlos,

 

> I've created 5 volume replications between 2 nodes of an HCI cluster. 

 

Do you mean between two storage clusters in an HCI cluster? You can't replicate within one and the same NetApp HCI storage cluster; you can snapshot & clone volumes in the same cluster, but for replication - sync or async - you need 2 separate storage clusters.

 

If you have 1 cluster, then that can't be set up. Assuming you have 2 clusters, I think replication cannot start because of MTU or some other problem.

 

Can you try to create 2 junk vols (one 1GiB on one cluster, another 1GiB on another cluster) and see if those empty volumes can be replicated? If you have a network problem than that will likely fail as well.

 

You'd need MVIP to MVIP connectivity (so CL1NODES->CL2MVIP:443, and CL2NODES->CL1MVIP:443) and the same for SVIP (where MTU 9000 are used, unlike MVIP which can be smaller, like the standard MTU 1500).

 

Other than that possibility, are the both clusters the same major version of SolidFire (Element OS)?

CarlosBaptista
2,267 Views

Hi,

Thanks for the reply.

I apologize, but I explained the problem incorrectly.
Yes, there are 2 HCI clusters and the volumes are replicating between the two clusters.
The versions are slightly different, one has 12.3 and the other 12.7.
But is there any way to see if replication is happening, what percentage has already replicated, or the errors that exist?

Carlos

elementx
2,264 Views

Nothing is likely replicated as replication in my estimate can't even start.

It's resuming because it's set up to replicate, but every time it tries it gets disconnected as no connection can be established. The messages could be more verbose, but they are documented (one example below).

The solution is in my earlier comment - make sure the network is configured properly.

You can find more in the TR & KB although my earlier comment contains the key details.

Minor versions of SolidFire should be compatible for replication purposes.

 

1) KB - https://kb.netapp.com/onprem/solidfire/Element_OS/Remote_replication_volume_pairs_in_pausedDisconnected_status

2) TR - https://www.netapp.com/media/10607-tr4741.pdf

elementx
1,723 Views

I'm going to self-accept my answer as solution because the KB and TR explain volume replication workflow, errors and status.

 

As for watching "replication progress", volumeStats have writeBytes which is "The total cumulative bytes written to the volume since the creation of the volume". That ought to be increasing when a volume is being written to as replicationTarget.

https://docs.netapp.com/us-en/element-software/api/reference_element_api_volumestats.html#object-members

 

elementx
1,659 Views

Bonus content.

https://scaleoutsean.github.io/2024/06/14/netapp-solidfire-replication-monitoring.html

 

tldr; for the initial sync, Running Tasks at the destination reports on progress, speed, time left, etc.

Once it's synced, Async Delay is showed at the Source for Async mode.

Sync by definition can't have any delay in replication, but there would be errors if there was.

 

Public