Solved: Cluster upgrade sequence

Novicer · ‎2021-03-31

Hello, I have a netapp metro cluster that looks like the one below. If I were to upgrade the Ontap version from 9.6x to 9.7x but by using the "cluster image update -pause-after all" option.

Will the upgrade pick the nodes in the sequence as per the HA setup? .. example, will it pick netapp01-01 first and then go to netapp01-02 and then go to netapp01-03 and then finally to netapp01-04?

Will this always be the sequence? Would it be possible to control this sequence so we don't unknowingly upgrade node that has workload on it.

If I were to move the workload manually between the nodes once two nodes are upgraded. So the customer can face minimal impact at all. Kindly advice.

netapp01::> storage failover show
Takeover
Node Partner Possible State Description
-------------- -------------- -------- -------------------------------------
netapp01-01 netapp01-02 true Connected to netapp01-02

netapp01-02 netapp01-01 true Connected to netapp01-01

netapp01-03 netapp01-04 true Connected to netapp01-04

netapp01-04 netapp01-03 true Connected to netapp01-03

4 entries were displayed.

maffo · ‎2021-04-01

As I mentioned before the "-pause-after all" will just pause the upgrade at every step, but will not achieve what you want (controlling which node is updated and in which order); to do that, the command has the "-node <node>" option.

I don't know if there is a way to do the upgrade without any sort of connection drop for CIFS clients as takeover/giveback and aggregate relocation would cause client disconnection even if for a brief moment; and I don't think you can complete a volume move either if any client is still connected to the volume share.

View solution in original post

maffo · ‎2021-04-01

"-pause-after all" will just pause through every step of the upgrade process, as mentioned in the manual page:
[-pause-after {none|all}] - Update Pause
Specifies that the update should pause at each predefined pause points (for example, after validation, after download to the boot device, after takeover, and after giveback) during the update.

I think you probably want "cluster image update -node <node>" to force the update on a specific node name, or "-force-rolling true" for a rolling upgrade which will do one HA pair at a time.

Novicer · ‎2021-04-01

Thanks for the response. I am doing cluster upgrade for a metro cluster. The customer cant afford any sort of down time. So was thinking if by enabling the "-pause-after all" option, will I be able to do just two nodes of the 4 node cluster then failback all resources to the upgraded nodes and then continue with the other two nodes of the pair?

Do you think this is do-able? Or what would be the recommended way to upgrade a MetroCluster without any downtime at all. Possibly no connection or session drops for nfs and cifs.

maffo · ‎2021-04-01

As I mentioned before the "-pause-after all" will just pause the upgrade at every step, but will not achieve what you want (controlling which node is updated and in which order); to do that, the command has the "-node <node>" option.

I don't know if there is a way to do the upgrade without any sort of connection drop for CIFS clients as takeover/giveback and aggregate relocation would cause client disconnection even if for a brief moment; and I don't think you can complete a volume move either if any client is still connected to the volume share.

Novicer · ‎2021-04-05

Thanks for the response. I did indeed go through the documentation for the cluster image update command and found the -node option. But what struck me was it specifically said this:

"[-nodes {<nodename>|local}, ...] - NodeSpecifies the nodes that are to be updated. This parameter is not supported for updates of MetroCluster configurations and for two-stage upgrades."

Link to the documentation

Mine is a 4 node metro cluster. This didn't seem to make sense, given metro cluster must intend to have high availability, which is why one would deploy a metro cluster. Appreciate if you could throw some light on this.

maffo · ‎2021-04-05

Apologies but I believe there might be a bit of confusion here.
This output does not show a Metrocluster but rather a 4-node cluster:

netapp01::> storage failover show 
                              Takeover          
Node           Partner        Possible State Description  
-------------- -------------- -------- -------------------------------------
netapp01-01    netapp01-02 true     Connected to netapp01-02
              
netapp01-02   netapp01-01 true     Connected to netapp01-01
              
netapp01-03    netapp01-04 true     Connected to netapp01-04
              
netapp01-04    netapp01-03 true     Connected to netapp01-03

A Metrocluster is 2 separate ONTAP clusters deployed at separate sites and with data/configuration synchronization between them.

In Metroclusters upgrade is done usually via switchover/switchback (one site at a time), more information available on our documentation at https://docs.netapp.com/ontap-9/topic/com.netapp.doc.dot-cm-ug-rdg/GUID-590B351F-BDDF-41B5-B0F1-391BEC7542E1.html

Regardless of the upgrade method used, CIFS/SMB clients will require disconnection unless it's SMB 3.0 clients with CA shares (e.g MS-SQL or Hyper-V): unfortunately this is a limitation of the protocol itself and not of ONTAP or the Metrocluster configuration.