This isn't more of a question any more but my experiences with having high metadata consumption on some of the nodes in a multi country stretched NetApp Storagegrid installation using just SG5812 appliances that come with 64GB RAM of which I believe 61GB is actual usable. This means the metadata limit maxs at 3TB per node in total. I am supposed to use only 1.32TB of that capacity according to NetApp recommendations. Now if there is an imbalance in the total number of nodes in these two countries, like if one country has 6 nodes across 3 sites and the other country has 3 nodes per site in a 3 site solution and the grid is stretched, then you are bound to see high metadata usage is bound to grow in the sites with just 3 nodes. There is a good documentation that i found as part of my digging to understand how things work on the storagegrid cassandra level. This is my limitation since I wasn't fully versed in this part. (https://www.netapp.com/video/z0pro187-d8/best-practices-and-advice-for-designing-a-storagegrid-deployment-1351-2/#:~:text=This means that the grid,storage nodes worth of metadata) So I added virtual metadata-only nodes and as part of the grid expansion the metadata got distributed over to the new nodes, but the compaction jobs didn't take care of reducing the unwanted content on the metadata layer on the storagegrid appliance nodes in all the 3 node sites. The compaction jobs kept looping over and over and I triggered some manually on nodes that had none running on them. But none of this helped. So, I kept reading into generic cassandra documentation and also storagegrid advanced training documentation where there was a mention of the "nodetool cleanup" command. In this case, the nodes were unable to cleanup what was supposed to be cleaned up even after rebalance. I chose to go down this path after running the "nodetool status" command and found how there was an extreme imbalance in the Load value on the nodes where clean up was supposed to happen even though the ownership is around 70% odd which it was supposed to be. So, the thought was this is definitely things that needed clean up. So I ran the nodetool cleanup one by one and this brought down the metadata consumption on the existing physical appliances in the 3 node site after addition of metadata. This according to support was not a recommended operation but these appliances seem to be so under powered and CPU busy most of the time and with little RAM not giving the possibility for increasing the metadata capacity. I have noticed that most of the CPU usage is high and it happens due to cassandra read operations and they seem to contribute to some network retransmissions too. Another thing i noticed is the kind of data being stored on these appliances are important. If they are very large small objects that create a large load of metadata that will make you reach this problem earlier than you should. So the solution is to try and keep all of your sites at the same or close by node count so the metadata distribution can happen evenly and catch high metadata consumers and see if they can be reconfigured to a better high write size. I write this message here since i had a lot of trouble navigating the issue and not article was found help me navigate this situation.
... View more
I recently installed and configured 2 storageGRID in 2 distant sites and in the same network. I federated the 2 grid and the connection is working in both ways but when I put an object in the local bucket from the same tenant it does not replicate to the the same bucket name in the remote site. In both sites I have the same error : Cross-grid replication requests are pending because a resource is unavailable. Failed to send cross-grid replication request from source bucket 'pilot' to destination bucket 'pilot' . Error code: DestinationRequestError. Detail: InternalError. Can someone comment in this.
... View more
Hello there, we have production Ontap clusters configured with storageGRID for moving old CIFS data and snapshots. they are filling up with 99% usage now. we have also used some space from StorageGRID for backup software as object storage for long term copies, but lost connections lately believe it may also be related to the usage. wondering if we have to take any actions in this situation such as pasusing new data to storagegrid, free up some space or any others.... please share you experience. thank you,
... View more
Hi, We have an old Netapp FAS2650 running with ONTAP 9.11.1P20 (yes, it is out of support) and a S3 object server running there for testing. It was running fine for couple of weeks until I noticed today that the LIF is not at home node. After reverting the LIF to home node, S3 service stopped working. Nothing is listening on port 443 anymore. I tried to disable/enable object-store-server, stop/start vserver, migrate LIF back to other node, adjust service-policy, all with no luck. S3 object-store-server mycluster::> object-store-server show
(vserver object-store-server show)
Vserver: mycluster-vs1
Object Store Server Name: mycluster-vs1.example.com
Administrative State: up
HTTP Enabled: false
Listener Port For HTTP: 80
HTTPS Enabled: true
Secure Listener Port For HTTPS: 443
Certificate for HTTPS Connections: mycluster-vs1.example.com
Comment: Netapp S3 playground Interfaces and service-policy mycluster::> network interface show -vserver mycluster-vs1
Logical Status Network Current Current Is
Vserver Interface Admin/Oper Address/Mask Node Port Home
----------- ---------- ---------- ------------------ ------------- ------- ----
mycluster-vs1
vs1data1 up/up 10.0.77.88/24 mycluster-01 a0a-123 true
vs1data2 up/up 10.0.88.99/24 mycluster-01 a0a-456 true
2 entries were displayed.
mycluster::> network interface show -vserver mycluster-vs1 -lif * -fields service-policy,services
vserver lif service-policy services
---------- -------- ------------------ -----------------------------------------------------------------------------------------------------------------------------------------------------------------------
mycluster-vs1 vs1data1 default-data-files data-core,data-nfs,data-cifs,data-flexcache,data-fpolicy-client,management-dns-client,management-ad-client,management-ldap-client,management-nis-client,data-dns-server
mycluster-vs1 vs1data2 my-s3-service data-core,data-s3-server
2 entries were displayed. No TCP/s3 service listening on LIF vs1data2 port 443. mycluster::> network connections listening show -vserver mycluster-vs1
Vserver Name Interface Name:Local Port Protocol/Service
---------------- ------------------------------------- -----------------------
Node: mycluster-05
mycluster-vs1 vs1data1:4049 UDP/rquota
mycluster-vs1 vs1data1:2050 TCP/fcache
mycluster-vs1 vs1data1:111 TCP/port-map
mycluster-vs1 vs1data1:111 UDP/port-map
mycluster-vs1 vs1data1:4046 TCP/sm
mycluster-vs1 vs1data1:4046 UDP/sm
mycluster-vs1 vs1data1:4045 TCP/nlm-v4
mycluster-vs1 vs1data1:4045 UDP/nlm-v4
mycluster-vs1 vs1data1:2049 TCP/nfs
mycluster-vs1 vs1data1:2049 UDP/nfs
mycluster-vs1 vs1data1:635 TCP/mount
mycluster-vs1 vs1data1:635 UDP/mount
12 entries were displayed. I presume it's a bug in this old ONTAP code. Any idea how to revive that without destroying the buckets and users?
... View more
I tested using 3 StorageGRID 5612 and 2 SG100. I want to use a UPS to safely shut down StorageGRID from a Linux server in the event of a power outage. (Log in via ssh and run a command) Then, when power is restored, I want to check whether it will start automatically without going to the machine room. To achieve this, I used sshpass to execute a command remotely from the Linux server, and was able to shut it down successfully. Command execution example: ■Stop the StorageGRID (gateway node) service from a Linux machine via ssh connection. sshpass -p bycast ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null admin@192.168.0.88 "echo bycast | sudo -S service servermanager stop“ ■Shut down StorageGRID (gateway node) from a Linux machine via ssh connection. sshpass -p bycast ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null admin@192.168.0.88 "echo bycast | sudo -S shutdown –h now“ I confirmed the shutdown using the above method and removed the power cable. Then I plugged the power cable back in. As for the two SG100s, one started up automatically when I connected the power cable, but the other did not start up. Which is the correct behavior? Also, is there a setting to start it up automatically? (Is there something that can be set from the BIOS or management GUI?) By the way, the storage node (SG5612) started up automatically when the power cable was connected, and it seemed that there was no need to operate the power switch OFF/ON in the machine room. My desired goal is to have both SG100s set up to start up automatically when power is restored. thank you.
... View more