Virtualization Articles and Resources
Virtualization Articles and Resources
Introduction:
This script checks a specified cluster for the items in the "Steps for preparing for a major upgrade" section. The items that are covered are the ones that can be addressed prior to the actual software image update. These are outlined roughly on pages 31-54 in the guide. Based upon the output of the script you can make the necessary adjustments in the cluster to ensure a successful upgrade.
Step 1: Configure PowerShell Environment
Step 2: Download and Prepare to Run Script
Step 3: Run Script (See Details Below)
Resources:
Clustered Data ONTAP 8.3.2 Upgrade outlined in several major sections can be found here.
All content posted on the NetApp Community is publicly searchable and viewable. Participation in the NetApp Community is voluntary.
In accordance with our Code of Conduct and Community Terms of Use, DO NOT post or attach the following:
Continued non-compliance may result in NetApp Community account restrictions or termination.
If you are using Windows 2012 R2 or Windows 8.1 and using the logging functionality, there is a possibility it might not log. Microsoft describes the issue and has a hotfix available here:
I had an incident where the script would look like it's hanging randomly but it would continue to run if I press "enter"
i just wanted share with eveyone.
Mcgue,
I believe Ontap 8.3.1 has a bug fix for the Interclust LIF being in a IFGRP. I am having warnings from your script but when I link to the KB you reference, the Netapp KB now says Fixed in 8.3.1. So is it safe to ignore this warning now?
THanks this Script is Awesome.
smcdonnell,
You are exactly right. That issue is resolved in 8.3.1 and 8.3P2. For the moment I left that in there just in case someone was reviewing either 8.3 or 8.3P1 for an upgrade. I've got a version updated for the changes in 8.3.1 ready to roll, and as soon as it becomes our recommended release (see https://kb.netapp.com/support/index?page=content&id=7010163) then I'll certainly update it here.
Thanks so much for the feedback too!
Makes sense, though I ran the Upgrade ADvisor from our lateset Autosupport and the upgrade options only show 8.3.1 available in the "Upgrade To: " drop down field.. just an FYI
Thanks for the head's up on the Upgrade Advisor. I opened a case with our web site support team to see what happened to the 8.3 options.
Hello,
First off, thank you for this script, it will be very valuable to have this information prior to upgrading our customer!
At the moment, I am looking for another set of eyes to see what I am missing. I have gone through the instructions one at a time through the beginning of step 3. The script will run but the screen only posts red output and the output file has not been generated. Attached is the screenshot from the first atempt. The second attempt was with only the .\83UpgradeCheck.ps1 An attempt was also made to access the 'HELP' get-help .\83UpgradeCheck.ps1 -full , the screenshot for this is also attached below. Any direction is much appreciated!!
-Brandon
Brandon,
Thanks for the screenshot. The issue you are facing is about how the file is saved. Sometimes the browser will replace the escape character with these unknown characters. Try saving the script file different ways from this page. Try a right click save as. Or open it and copy the entire contents into a text editor or ISE and save it. Let me know how that goes.
The script is now loaded, took a few tries...... So now when I run the script it gives me only one error. The NetApp PowerShell toolbox download from the 'toolchest' has 4.0.0 as the only option. I worked with our RSE to get a link to the 3.2.1 version, however it redirects to the 4.0.0 download. Getting closer?
UPDATE**
I was able to clear a path for the script to run. The 'NetAppDocs-Lite' module was also removed from the 'installed programs' page. This didnt change the script issue... yet. Not until searching 'computer' for 'NetAppDocs-Lite' there was found to be a folder in the C:\Windows\System32\WindowsPowerShell\v1.0\Modules\ location. After this folder was deleted, the script began to run as expected. NOTE: there is NO 'NetAppDocs-Lite' entry in C:\Windows\System32\WindowsPowerShell\v1.0\Modules\ ...... seems so obvious now! 😉
Big thanks to mcgue for assisting with this issue!!
-B
Mcgue
Just ran your new updated SCript. and Now I have a new error, see Below:
Now checking subnet 10.100.210.0/24
This subnet is in use by a single port
Port a0a-3010 on node somenode-01 has an MTU value of 9000
***** Error Found *****
Subnet 10.100.210.0/24 contains port a0a-3010 on node somenode-01 has an MTU value of 9000 which has a MTU of 9000 and does not match at least one other port in the subnet
Port a0a-3010 on node somenode-02 has an MTU value of 9000
***** Error Found *****
Subnet 10.100.210.0/24 contains port a0a-3010 on node somenode-02 has an MTU value of 9000 which has a MTU of 9000 and does not match at least one other port in the subnet
I ran a network port show, as per the KB you reference, but all the MTUs are the correct! The Error message even says its both 9000.
Is there something else I can check or is a bug...
thanks again for the amazing Script!!
smdonnell,
I sent you a private message in the forum if you would please check. Will post the fix once we verify too.
Hello,
This script is very helpful. Thanks!
I'm trying to make sense of why the errors are being triggered below. I don't see any of these errors being discussed in the networking consideration article:
https://library.netapp.com/ecmdocs/ECMP12458273/html/GUID-EFDA729F-AA1D-45D9-9CCC-D47E2C40D316.html
***** Error Found *****
SVM1 LIF does not have a home node that matches the first node to be upgraded of NODEA
***** Error Found *****
SVM1 LIF does not have auto-revert enabled
Now checking failover group a0a-01
***** Error Found *****
The failover group must contain exactly 2 nodes and a0a-01 contains NODEA NODEB NODEC NODED
Thanks.
Hi ssikorsk,
I sent you a message. Hope it helps.
Hello all,
Having an issue with the -RunSection & -SkipSection flags for the script.
Using the -RunSection 17 flag, the script runs without any Section output.
Using the -SkipSection 17 flag, the script runs with EVERY section output, including 17.
Anyone run into this too, and have any input on it?
Thanks!!
-Brandon
Thanks for the input Brandon. These sections require a comma separated list of sections you want to run or skip accordingly. The current full list of sections is seen in one of the examples in the help output, I'm also including it here too:
Model
ONTAPVersion
Snapshot
Sysstat
SQLorHyperV1
SQLorHyperV2
Failover
HA
Nodes
RDB
Quorum
Disks
Aggregates
Volumes
Space1
Space2
Network
MTU
SAN
Time
Jobs
UNIX
Netgroup
Guarantee
Firewall
HomePort
Rebalance
FlexCache
LoadSharing
32BitFull
32BitAggr
32BitSnap
SVMRunning
InfiniteVolume
ExternalServer
ExportRule1
ExportRule2
UpgradeCautions
RouteCheck
VMAlignCheck
LUNHungCheck
Perfect Thank you!
-RunSection SAN is what we were looking for.
"Verify that each host is configured with the correct number of direct and indirect paths, and that each host is connected to the correct LIFs"
Can anyone confirm if a cluster administrator is required to run the commands? I'm hoping this can work with a read-only account. I understand PowerShell must be launched with a Windows adminstrator account, but I'm curious if the cluster account can be read-only.
Thanks,
Josh
Hello @jwolf,
It may, theoretically, be possible to not be an administrator, but you would need to check each of the NPTK commands used, cross reference that against the APIs used, then check against the permissions needed for that action.
That being said, there's a multitude of commands (like GetClusterRingResults) which use Invoke-NcSsh to execute commands at the diag level, which implies that an account with significant privileges is being used.
Hope that helps.
Andrew
I am also a little confused about the section
The failover group must contain exactly 2 nodes and MGNT contains
Where does it say that a failover group can only contain 2 nodes. It normally has all the nodes in the cluster
Cheers
Joe
Joe-Pollard,
Thank you for the feedback. The two node requirement for a failover group is covered in section 6a here:
https://library.netapp.com/ecmdocs/ECMP12458273/html/GUID-E5EBF14B-9DEA-4835-B8AB-659BF5E64997.html
The goal is to ensure that external server accessing LIF is fully functional throughout an upgrade regardless of any other issues that might come up.
It is important to know that only one LIF per SVM needs to be configured like this and it is only needed to be configured that way for the actual upgrade. Keeping failover groups optimal on an ongoing effort is covered here: https://library.netapp.com/ecmdocs/ECMP1636021/html/GUID-94D4F8DD-3437-4474-81A6-9DB6FF6A244D.html
Thanks for the fantastic script... If you could include VLAN tagging check on all nodes, we can avoid any outage during the lif migration during the node reboot.
Thanks for the feedback dbalaatt. So there is some checking now for failover group targets for each interface, but I'm not sure that is exactly what you are looking for. Can you give me more details or an example you saw?
Hello,
While running this script on a 4 node cluster I am getting this error message:
***** Error Found *****
The failover group must contain exactly 2 nodes and a0a-212_failover contains avnacluster01-01 avnacluster01-02 avnacluster01-03 avnacluster01-04
Why would this be a requirement for a failover group to have exactly 2 nodes on a 4 node cluster?
Thaks,
Paul.
pogranovich,
This is a requirement only for the upgrade and only for one LIF per SVM. It is outlined in the Upgrade Guide. I'll change the wording on the next release to make it more clear. There is no reason to change any other LIFs to have any different failover group members. See section 6a in the following link for the failover group member requirement for the external server accessing LIF:
https://library.netapp.com/ecmdocs/ECMP12458273/html/GUID-E5EBF14B-9DEA-4835-B8AB-659BF5E64997.html
We check for external servers connectivity but recently I had an issue where SVM mgmt was pinging DNS servers and we went ahead with the upgrade however, there was a firewall issue between SVM mgmt and DC servers which created issues for CIFS authentication. Anyway we can include this in the checks?
PS. Great script and thank you for it!
DA,
I'm not sure I follow exactly what happened post upgrade. Can you send me more details as to what you experienced?
DA I'd like to know what happened as well...mcgue's script is telling us that "Test connections failed to at least one server type..." I'm trying to figure out how to rememdy this...we've got data LIFS configured for the SVM, but no SVM mgmt LIFS...system is a swtichless pair of 8040's running CDOT 8.2.3p2 - Any help is appreciated. (I should add, that in the previous step all external servers respond to ping, so external servers are accessible via the SVM data LIF ). More details below:
asdfsdf,
Check your NIS servers that are defined for that SVM. Likely they are not actually NIS servers but possibly DNS or AD servers inyour environment thus not responding to NIS requests. If you have NIS servers defined and don't actually need NIS services, then you can remove them and re-run the script.
BTW - great username!
Thanks mcgue.
Under which 'section' of the script do these tests run?
I'm trying to find the right one to use the -RunSection flag with, tried a few and no luck yet. I don't like having to sit through the entire output when these are the last tests that are failing.
Please excuse the username, my right hand was probably occupied by a sandwich or a delicous cookie when I setup the account 😉
If you look at the script output in the bottom summary section there will be a line fully populated with the -RunSection sections that had either a warning or error during that script run. Then you can make changes, and re-run it testing only the failed sections. Also, the section you are referring to is ExternalServer.
Hi mcgue, works like a charm now. Initially had an issues with the "This script requires Data ONTAP PowerShell Toolkit 3.2.1 or higher." (I Had all up-to-date componenets installed...). The workaround was to comment out your entire Check toolkit version.
It works great now! For those experiencing this issue, my solution was to comment out lines 554 through to 568.
Cheers.
-Pash.
Glad you were able to get it working with the workaround. You might have an old version of the PowerShell toolkit out there. Look above at some of the comments from brandon_lee for some helpful ideas on cleaning up the previous versions.
Hello,
Thank you for this script..very helpful! I am running this against a 6 node cluster and keep getting this message no matter which Node I chose to upgrade first "clesan01_mgmt LIF does not have a home node that matches the first node to be upgraded of cle-nacl01-06" . The home node this lif resides on is Node 3.
Thanks
Kathy
I get this error on a few of the Vservers we have and I dont know if its something I can correct by manually migrating the lif to the first node to be upgraded or do I have to create another lif on the first node and blow the one below away as its on node 4 ?
netapp001vse01_CIF_DR_mgmt LIF does not have a home node that matches the first node to be upgraded of netapp001-01
@MAXWELL_CROOK The warning is reminding you that when you go to do the upgrade you want to make certain that all the SVMs management LIFs have their home node set to the same node, and that this is the first node you're going to upgrade.
If you to this for each of the SVM mgmt LIFs:
net int modify -vserver xxxx -lif llll -home-node nnnnn -home-port pppp
net int revert -vserver xxxx -lif llll
You'll resolve this particular warning; there is no need to destroy and re-create the lifs.
Note: The Powershell script is doing the checks that are outlined in the OnTap Upgrade and Revert/Downgrade Guide. If you haven't already done so, I strongly recommend reading through that guide prior to doing your upgrade. It has more details on all the checks and the upgrade process itself.
Hi thanks for the feeback, why do you need to move all of the home nodes to node 1 (the first node to upgrade) as when that goes down all the management goes down with it (well fails over to node 2) seems an odd ask. I have gone through the upgrade document and its not clear to me why you'd do this. This and the routing changes for someone thats mainly SAN experienced is quite a lot to get your head around 😞
When you have upgrade to 8.3.2+ CDOT changes how it does its management (AD/NIS/LDAP/DNS/iSNS/etc) network traffic so that instead of being specific to the node, and going out the node management lif, its handled at the SVM level through the management LIF.
You put the management LIFs on the first node to be upgraded (despite the fact that you're going to reboot it right away) so that when it has rebooted and comes up as the only node on the new version, these SVMs all have a valid management LIF that can be used for this network traffic. You'll also note one of the checks has you ensure that for the first node you make sure you have a failover group that is different than normal as well. Thats to ensure that these LIFs move back to that first node, and not elsewhere in the cluster where you're not yet at the new version. This is covered in a bit more detail in the "SVM networking considerations for major upgrades" section of the upgrade doc, along with the rest of the pre-requisites.
You're absolutely right its a lot to take in when planning your first 8.2 to 8.3 upgrade. Keep asking questions though, you're on the right track.
I ran the script against a single-node cluster, but some of the errors make me think the script isn't aware that some checks don't apply. Is this true?
Version 2.3 is forthcoming with these changes:
Can you please update the script to correct the false failover group report?
VIRTUALGEEK2,
Can you try with the 2.4 version updated in August and let me know if you are seeing the same issue?
@mcgue. You noted the v2.3 changes in a earlier comment but i don't see a similiar post for v2.4. What changed in 2.4? Thanks in advance
Good eye wertk! We did some testing with 2.3 and found one of the output lines regarding changing a failover group was worded awkwardly. Figured would update to 2.4 since technically it was a change but it didn't change any of the actual tests. Hopefully this is the last of any updates to this script.
First, let me say AWESOME script! This is very helpful.
Also, I noticed a warning at the end of the script that stated that the RTF and HTML files were limited by PowerShell to 9999 lines. I was able to overcome this limitation by entering these commands prior to running this script.
$pshost = get-host
$pswindow = $pshost.ui.rawui
$newsize = $pswindow.buffersize
$newsize.height = 32766
$newsize.width = 160
$pswindow.buffersize = $newsize
Change the buffer size numbers to suit your environment. It looks like 32766 is the limit for the buffer height on Windows 7.
Hope this helps someone!
Thanks Jefferyh for the tip! Good to know, that's worthwhile to make the change as the HTML and RTF are definitely a lot friendlier to use output.
I had a few people ask if this script would be useful in doing an ANDU from 8.3 to 9.1. There are a few sections that would be helpful. I put a new article together here: http://community.netapp.com/t5/Data-ONTAP-Discussions/Upgrading-Clustered-Data-ONTAP-8-3x-to-ONTAP-9-1-Using-Automated-Nondisruptive/td-p/128391
Hi McGue.
WHAT A FANASTIC SCRIPT!!!
Thanks so much for writing and sharing this with us!
Very very thanksful for it and will use it in an upgrade from 8.2 -> 8.3.2 this week on 4 Clusters.