ONTAP Discussions
ONTAP Discussions
Our customer is currently in the last stages of putting their brand new StorageGRID environment for use with FabricPool (source = A400). Our customer is asking a few very tricky questions for which the SG documentation seems to be lacking. All questions related to the resiliency of the environment (A400 + SG) regarding the side effects of possible network failures between the A400 and the SG.
As you might guess, rightfully, our customer is wary about using FabricPool with a StorageGRID instance without a good understanding of how OnTap is managing various conditions involving the network and the load-balancers.
Solved! See The Solution
1. When Object Store is unavailable, data tiering from ONTAP performance tier to the capacity tier will be suspended. Data retrieval from capacity tier will not be possible.
Reasons for Object Store to become unavailable:
When there is no response from the capacity tier for thousand consecutive S3 operations.
When there is no response from the capacity tier for two minutes.
When there is continuous request timeouts (10 seconds) to the capacity tier.
Regarding client Retries,
SMB client Retries are client dependent.
NFS client Retry after 5 seconds. Hang until connectivity is reestablished
SAN client The application might need to be restarted so that the read can be retried
2. Unless the is no risk on filling up the Performance tier, Writes will not be effected.
3. Regarding the Load balancing primary and secondary, I don't have exact timeouts. Below some links can help.
StorageGRID load balancer Third-party and global load balancers
Surprising for me to see a lack of responses to this question. Maybe it was wrongly formulated. I was expecting some input, especially for question #3. I'll be patient on this one. 🙂
1. When Object Store is unavailable, data tiering from ONTAP performance tier to the capacity tier will be suspended. Data retrieval from capacity tier will not be possible.
Reasons for Object Store to become unavailable:
When there is no response from the capacity tier for thousand consecutive S3 operations.
When there is no response from the capacity tier for two minutes.
When there is continuous request timeouts (10 seconds) to the capacity tier.
Regarding client Retries,
SMB client Retries are client dependent.
NFS client Retry after 5 seconds. Hang until connectivity is reestablished
SAN client The application might need to be restarted so that the read can be retried
2. Unless the is no risk on filling up the Performance tier, Writes will not be effected.
3. Regarding the Load balancing primary and secondary, I don't have exact timeouts. Below some links can help.
StorageGRID load balancer Third-party and global load balancers
Now we're talking! Thanks for the heads up! Food for thought, certainly.