How to Check Data ONTAP 8.3.2 Upgrade Requirements Using a PowerShell Script

by Frequent Contributor ‎2015-06-03 01:10 PM - edited ‎2016-08-16 06:47 AM

Introduction:

This script checks a specified cluster for the items in the "Steps for preparing for a major upgrade" section. The items that are covered are the ones that can be addressed prior to the actual software image update. These are outlined roughly on pages 31-54 in the guide. Based upon the output of the script you can make the necessary adjustments in the cluster to ensure a successful upgrade.

 

Step 1: Configure PowerShell Environment

  • .NET Framework 4.5 or greater installed which is available here.
  • PowerShell 4.0 or later installed (must be installed after .NET Framework 4.5 or later is installed) which is available here
  • NetApp PowerShell Toolkit 3.2.1 installed which is available for download here

Step 2: Download and Prepare to Run Script

  • Save the attached file with a ".ps1" extension.  Current version 2.4 was uploaded on August 16, 2016.
  • Start Windows PowerShell with the "Run as Administrator" option.
  • Execution policy set appropriately to run scripts within PowerShell (currently this script is unsigned).
  • More help on these requirements can be found here.

Step 3: Run Script (See Details Below)

  • Find more information about the script by issuing the following in PowerShell:
    get-help .\83UpgradeCheck.ps1 -full
  • Syntax:
    .\83UpgradeCheck.ps1 [-HyperV] [-FCP] [-ISCSI] [-NoHyperV] [-NoFCP] [-NoISCSI]
    [[-Cluster] <String>] [[-Username] <String>] [[-Password] <String>] [[-RunSection] <String[]>] [[-SkipSection] <String[]>] [[-Outpu
    tFile] <String>] [-PostCheck] [-ShowVersion] [<CommonParameters>]
  • Here is a brief explanation of the optional parameters, see above get-help output for more detail:
    HyperV: Cluster has Hyper-V over SMB in use
    FCP:  Cluster has FCP in use
    ISCSI:  Cluster has ISCSI in use
    NoHyperV: Cluster does not have Hyper-V over SMB in use
    NoFCP:  Cluster does not have FCP in use
    NoISCSI:  Cluster does not have ISCSI in use
    Cluster: Specify the cluster management LIF used for connection
    Username:  Specify the username to connect to the cluster
    Password:  Password to connect to the cluster (note - this is in clear text)
    RunSection:  One or more specific sections of the script to run
    SkipSection:  One or more specific sections of the script to skip
    OutputFile:  A file name to write the output of the script results
    PostCheck:  To run after 8.3.2 upgrade checks
    ShowVersion:  Show the version of the script
  • The script output is color coded as follows:
    Purple - Section headers, tells you what is being checked
    Red - The check failed
    Green - The check succeeded
    Yellow - User interaction is needed
    Blue - At the completion of the script shows count of errors and warnings

Resources:

  1.  Clustered Data ONTAP 8.3.2 Upgrade outlined in several major sections can be found here.

Comments
Frequent Contributor

If you are using Windows 2012 R2 or Windows 8.1 and using the logging functionality, there is a possibility it might not log.  Microsoft describes the issue and has a hotfix available here:

https://support.microsoft.com/en-us/kb/3014136

Member

I had an incident where the script would look like it's hanging randomly but it would continue to run if I press "enter"

i just wanted share with eveyone.

 Mcgue,

 

I believe  Ontap 8.3.1 has a bug fix for the Interclust LIF being in a IFGRP. I am having warnings from your script but when I link to the KB you reference, the Netapp KB now says Fixed in 8.3.1. So is it safe to ignore this warning now?

 

THanks this Script is Awesome. Smiley Very HappySmiley Very Happy 

Frequent Contributor

smcdonnell,

 

You are exactly right.  That issue is resolved in 8.3.1 and 8.3P2.  For the moment I left that in there just in case someone was reviewing either 8.3 or 8.3P1 for an upgrade.  I've got a version updated for the changes in 8.3.1 ready to roll, and as soon as it becomes our recommended release (see https://kb.netapp.com/support/index?page=content&id=7010163) then I'll certainly update it here.  

 

Thanks so much for the feedback too!

 

Makes sense, though I ran the Upgrade ADvisor from our lateset Autosupport and the upgrade options only show 8.3.1 available in the  "Upgrade To: "  drop down field.. just an FYI Smiley Wink

Frequent Contributor

Thanks for the head's up on the Upgrade Advisor.  I opened a case with our web site support team to see what happened to the 8.3 options.

Member

Hello,

 

First off, thank you for this script, it will be very valuable to have this information prior to upgrading our customer!

 

At the moment, I am looking for another set of eyes to see what I am missing.  I have gone through the instructions one at a time through the beginning of step 3.  The script will run but the screen only posts red output and the output file has not been generated.  Attached is the screenshot from the first atempt.  The second attempt was with only the  .\83UpgradeCheck.ps1     An attempt was also made to access the 'HELP'   get-help .\83UpgradeCheck.ps1 -full  , the screenshot for this is also attached below.  Any direction is much appreciated!!

 

 

-Brandon

upgrade_check_script01.PNG

 

upgrade_check_script04.PNG

Frequent Contributor

Brandon,

 

Thanks for the screenshot.  The issue you are facing is about how the file is saved.  Sometimes the browser will replace the escape character with these unknown characters.  Try saving the script file different ways from this page.  Try a right click save as.  Or open it and copy the entire contents into a text editor or ISE and save it.  Let me know how that goes.

Member

The script is now loaded, took a few tries......  So now when I run the script it gives me only one error.  The NetApp PowerShell toolbox download from the 'toolchest' has 4.0.0 as the only option.  I worked with our RSE to get a link to the 3.2.1 version, however it redirects to the 4.0.0 download.  Getting closer?

 

 

upgrade_check_script05.PNG

Member

UPDATE**

 

I was able to clear a path for the script to run.  The 'NetAppDocs-Lite' module was also removed from the 'installed programs' page.  This didnt change the script issue... yet.  Not until searching 'computer' for 'NetAppDocs-Lite' there was found to be a folder in the C:\Windows\System32\WindowsPowerShell\v1.0\Modules\   location.  After this folder was deleted, the script began to run as expected.   NOTE: there is NO 'NetAppDocs-Lite' entry in C:\Windows\System32\WindowsPowerShell\v1.0\Modules\ ......  seems so obvious now! Smiley Wink

 

Big thanks to mcgue for assisting with this issue!!

 

-B

 

upgrade_check_script06.PNG

Mcgue

 

Just ran your new updated SCript. and Now I have a new error, see Below:

 

Now checking subnet 10.100.210.0/24
This subnet is in use by a single port
Port a0a-3010 on node somenode-01 has an MTU value of 9000
***** Error Found *****
Subnet 10.100.210.0/24 contains port a0a-3010 on node somenode-01 has an MTU value of 9000 which has a MTU of 9000 and does not match at least one other port in the subnet
Port a0a-3010 on node somenode-02 has an MTU value of 9000
***** Error Found *****
Subnet 10.100.210.0/24 contains port a0a-3010 on node somenode-02 has an MTU value of 9000 which has a MTU of 9000 and does not match at least one other port in the subnet

 

I ran a network port show, as per the KB you reference, but all the MTUs are the correct! The Error message even says its both 9000. 

 

Is there something else I can check or is a bug...Man Frustrated

 

thanks again for the amazing Script!!

Frequent Contributor

 smdonnell,

 

I sent you a private message in the forum if you would please check.  Will post the fix once we verify too.

New Contributor

Hello,

 

This script is very helpful.  Thanks!

 

I'm trying to make sense of why the errors are being triggered below.  I don't see any of these errors being discussed in the networking consideration article:

https://library.netapp.com/ecmdocs/ECMP12458273/html/GUID-EFDA729F-AA1D-45D9-9CCC-D47E2C40D316.html

 

 

 

***** Error Found *****                                                         
SVM1 LIF does not have a home node that matches the first node to be upgraded of NODEA
                                                             
***** Error Found *****                                                         
SVM1 LIF does not have auto-revert enabled                              

Now checking failover group a0a-01                                            
***** Error Found *****                                                         
The failover group must contain exactly 2 nodes and a0a-01 contains NODEA NODEB NODEC NODED

 

Thanks.

Member

Hi ssikorsk,

I sent you a message. Hope it helps.

Member

Hello all,

 

Having an issue with the -RunSection  &   -SkipSection flags for the script.

 

Using the -RunSection 17 flag, the script runs without any Section output.

Using the -SkipSection 17 flag, the script runs with EVERY section output, including 17.

 

Anyone run into this too, and have any input on it?

 

Thanks!!

 

-Brandon

 

Frequent Contributor

Thanks for the input Brandon.  These sections require a comma separated list of sections you want to run or skip accordingly.  The current full list of sections is seen in one of the examples in the help output, I'm also including it here too:

 

Model
ONTAPVersion
Snapshot
Sysstat
SQLorHyperV1
SQLorHyperV2
Failover
HA
Nodes
RDB
Quorum
Disks
Aggregates
Volumes
Space1
Space2
Network
MTU
SAN
Time
Jobs
UNIX
Netgroup
Guarantee
Firewall
HomePort
Rebalance
FlexCache
LoadSharing
32BitFull
32BitAggr
32BitSnap
SVMRunning
InfiniteVolume
ExternalServer
ExportRule1
ExportRule2
UpgradeCautions
RouteCheck
VMAlignCheck
LUNHungCheck

Member

Perfect Thank you! 

 

-RunSection SAN   is what we were looking for.

 

"Verify that each host is configured with the correct number of direct and indirect paths, and that each host is connected to the correct LIFs"

Member

Can anyone confirm if a cluster administrator is required to run the commands? I'm hoping this can work with a read-only account. I understand PowerShell must be launched with a Windows adminstrator account, but I'm curious if the cluster account can be read-only.

 

Thanks,

Josh

Frequent Contributor

Does section 6.8 in the below link help you?  

 

http://www.netapp.com/us/media/tr-4475.pdf

 

Extraordinary Contributor

Hello @jwolf,

 

It may, theoretically, be possible to not be an administrator, but you would need to check each of the NPTK commands used, cross reference that against the APIs used, then check against the permissions needed for that action.

 

That being said, there's a multitude of commands (like GetClusterRingResults) which use Invoke-NcSsh to execute commands at the diag level, which implies that an account with significant privileges is being used.

 

Hope that helps.

 

Andrew

Member

Hi @asulliva and @mcgue,

 

Thanks for the info! This is exactly what I needed. This is a fantastic tool.

 

Cheers,

Josh

Member

I am also a little confused about the section

The failover group must contain exactly 2 nodes and MGNT contains


Where does it say that a failover group can only contain 2 nodes. It normally has all the nodes in the cluster

 

Cheers

 

Joe

Frequent Contributor

Joe-Pollard,

 

Thank you for the feedback.  The two node requirement for a failover group is covered in section 6a here:

 

https://library.netapp.com/ecmdocs/ECMP12458273/html/GUID-E5EBF14B-9DEA-4835-B8AB-659BF5E64997.html

 

The goal is to ensure that external server accessing LIF is fully functional throughout an upgrade regardless of any other issues that might come up.

 

It is important to know that only one LIF per SVM needs to be configured like this and it is only needed to be configured that way for the actual upgrade.  Keeping failover groups optimal on an ongoing effort is covered here:  https://library.netapp.com/ecmdocs/ECMP1636021/html/GUID-94D4F8DD-3437-4474-81A6-9DB6FF6A244D.html

Thanks for the fantastic script... If you could include VLAN tagging check  on all nodes, we can avoid any outage during the lif migration during the node reboot.

Frequent Contributor

Thanks for the feedback dbalaatt.  So there is some checking now for failover group targets for each interface, but I'm not sure that is exactly what you are looking for.  Can you give me more details or an example you saw?

Hello,

 

While running this script on a 4 node cluster I am getting this error message:

 

***** Error Found *****
The failover group must contain exactly 2 nodes and a0a-212_failover contains avnacluster01-01 avnacluster01-02 avnacluster01-03 avnacluster01-04

 

Why would this be a requirement for a failover group to have exactly 2 nodes on a 4 node cluster?

 

Thaks,

Paul.

Frequent Contributor

pogranovich,

 

This is a requirement only for the upgrade and only for one LIF per SVM.  It is outlined in the Upgrade Guide.  I'll change the wording on the next release to make it more clear.  There is no reason to change any other LIFs to have any different failover group members.  See section 6a in the following link for the failover group member requirement for the external server accessing LIF:

 

https://library.netapp.com/ecmdocs/ECMP12458273/html/GUID-E5EBF14B-9DEA-4835-B8AB-659BF5E64997.html

New Contributor

 

We check for external servers connectivity but recently I had an issue where SVM mgmt was pinging DNS servers and we went ahead with the upgrade however, there was a firewall issue between SVM mgmt and DC servers which created issues for CIFS authentication. Anyway we can include this in the checks?

 

PS. Great script and thank you for it!

Frequent Contributor

 DA,

 

I'm not sure I follow exactly what happened post upgrade.  Can you send me more details as to what you experienced?

DA I'd like to know what happened as well...mcgue's *awesome* script is telling us  that "Test connections failed to at least one server type..."  I'm trying to figure out how to rememdy this...we've got data LIFS configured for the SVM, but no SVM mgmt LIFS...system is a swtichless pair of 8040's running CDOT 8.2.3p2 - Any help is appreciated.  (I should add, that in the previous step all external servers respond to ping, so external servers are accessible via the SVM data LIF ).  More details below:

 

error

 

 

Frequent Contributor

asdfsdf,

 

Check your NIS servers that are defined for that SVM.  Likely they are not actually NIS servers but possibly DNS or AD servers inyour environment thus not responding to NIS requests.  If you have NIS servers defined and don't actually need NIS services, then you can remove them and re-run the script.

 

BTW - great username!

Thanks mcgue.

 

Under which 'section' of the script do these tests run?

 

I'm trying to find the right one to use the -RunSection flag with, tried a few and no luck yet.  I don't like having to sit through the entire output when these are the last tests that are failing.

 

 

Please excuse the username, my right hand was probably occupied by a sandwich or a delicous cookie when I setup the account Smiley Wink

Frequent Contributor

If you look at the script output in the bottom summary section there will be a line fully populated with the -RunSection sections that had either a warning or error during that script run.  Then you can make changes, and re-run it testing only the failed sections.  Also, the section you are referring to is ExternalServer.

Hi mcgue,  works like a charm now.  Initially had an issues with the "This script requires Data ONTAP PowerShell Toolkit 3.2.1 or higher."  (I Had all up-to-date componenets installed...).  The workaround was to comment out your entire Check toolkit version.

 

It works great now!  For those experiencing this issue, my solution was to comment out lines 554 through to 568.  Use this advice at your own risk.

 

Cheers.

-Pash.

Frequent Contributor

Glad you were able to get it working with the workaround.  You might have an old version of the PowerShell toolkit out there.  Look above at some of the comments from brandon_lee for some helpful ideas on cleaning up the previous versions.

Hello,

Thank you for this script..very helpful!  I am running this against a 6 node cluster and keep getting this message no matter which Node I chose to upgrade first "clesan01_mgmt LIF does not have a home node that matches the first node to be upgraded of cle-nacl01-06" .  The home node this lif resides on is Node 3.

Thanks

Kathy

I get this error on a few of the Vservers we have and I dont know if its something I can correct by manually migrating the lif to the first node to be upgraded or do I have to create another lif on the first node and blow the one below away as its on node 4 ?

 

netapp001vse01_CIF_DR_mgmt LIF does not have a home node that matches the first node to be upgraded of netapp001-01

 

 

New Contributor

@MAXWELL_CROOK  The warning is reminding you that when you go to do the upgrade you want to make certain that all the SVMs management LIFs have their home node set to the same node, and that this is the first node you're going to upgrade.    

 

If you to this for each of the SVM mgmt LIFs: 

     net int modify -vserver xxxx -lif llll -home-node nnnnn -home-port pppp

     net int revert -vserver xxxx -lif llll     

You'll resolve this particular warning; there is no need to destroy and re-create the lifs.

 

Note: The Powershell script is doing the checks that are outlined in the OnTap Upgrade and Revert/Downgrade Guide.   If you haven't already done so, I strongly recommend reading through that guide prior to doing your upgrade.   It has more details on all the checks and the upgrade process itself.

 

 

 

 

Hi thanks for the feeback, why do you need to move all of the home nodes to node 1 (the first node to upgrade) as when that goes down all the management goes down with it (well fails over to node 2) seems an odd ask. I have gone through the upgrade document and its not clear to me why you'd do this. This and the routing changes for someone thats mainly SAN experienced is quite a lot to get your head around Smiley Sad

New Contributor

When you have upgrade to 8.3.2+ CDOT changes how it does its management (AD/NIS/LDAP/DNS/iSNS/etc) network traffic so that instead of being specific to the node, and going out the node management lif, its handled at the SVM level through the management LIF.  

 

You put the management LIFs on the first node to be upgraded (despite the fact that you're going to reboot it right away) so that when it has rebooted and comes up as the only node on the new version, these SVMs all have a valid management LIF that can be used for this network traffic.    You'll also note one of the checks has you ensure that for the first node you make sure you have a failover group that is different than normal as well.  Thats to ensure that these LIFs move back to that first node, and not elsewhere in the cluster where you're not yet at the new version.  This is covered in a bit more detail in the "SVM networking considerations for major upgrades" section of the upgrade doc, along with the rest of the pre-requisites.       

 

You're absolutely right its a lot to take in when planning your first 8.2 to 8.3 upgrade.  Keep asking questions though, you're on the right track.   

 

 

 

 

New Contributor

I ran the script against a single-node cluster, but some of the errors make me think the script isn't aware that some checks don't apply.  Is this true?  

Frequent Contributor

Version 2.3 is forthcoming with these changes:

 

  • Don’t run certain sections if the cluster is already on 8.3. 
  • Single node cluster check changes.
  • Fixed an issue with RTF/HTML outputs with PowerShell 5.
  • Check MTU for unused failover groups.
  • Show a summary of good LUNs in SAN section.
  • Change to network section to better handle failover groups in 8.3+
  • Added sleep to Invoke-NcSSH cmdlet runs to allow for session to complete.
  • Allow only IP based LIFs to be selected for ping tests

Can you please update the script to correct the false failover group report?

Frequent Contributor

VIRTUALGEEK2,

 

Can you try with the 2.4 version updated in August and let me know if you are seeing the same issue?

New Contributor

@mcgue.  You noted the v2.3 changes in a earlier comment but i don't see a similiar post for v2.4.  What changed in 2.4?  Thanks in advance

Frequent Contributor

Good eye wertk!  We did some testing with 2.3 and found one of the output lines regarding changing a failover group was worded awkwardly.  Figured would update to 2.4 since technically it was a change but it didn't change any of the actual tests.  Hopefully this is the last of any updates to this script.

Member

First, let me say AWESOME script!  This is very helpful.

 

Also, I noticed a warning at the end of the script that stated that the RTF and HTML files were limited by PowerShell to 9999 lines.  I was able to overcome this limitation by entering these commands prior to running this script.  

 

$pshost = get-host

$pswindow = $pshost.ui.rawui

$newsize = $pswindow.buffersize

$newsize.height = 32766
$newsize.width = 160

$pswindow.buffersize = $newsize

 

Change the buffer size numbers to suit your environment.  It looks like 32766 is the limit for the buffer height on Windows 7.

 

Hope this helps someone!

 

Frequent Contributor

Thanks Jefferyh for the tip!  Good to know, that's worthwhile to make the change as the HTML and RTF are definitely a lot friendlier to use output.

Frequent Contributor

I had a few people ask if this script would be useful in doing an ANDU from 8.3 to 9.1.  There are a few sections that would be helpful.  I put a new article together here:  http://community.netapp.com/t5/Data-ONTAP-Discussions/Upgrading-Clustered-Data-ONTAP-8-3x-to-ONTAP-9-1-Using-Automated-Nondisruptive/td-p/128391

Hi McGue.

 

WHAT A FANASTIC SCRIPT!!!

 

Thanks so much for writing and sharing this with us!

Very very thanksful for it and will use it in an upgrade from 8.2 -> 8.3.2 this week on 4 Clusters.

 

 

Warning!

This NetApp Community is public and open website that is indexed by search engines such as Google. Participation in the NetApp Community is voluntary. All content posted on the NetApp Community is publicly viewable and available. This includes the rich text editor which is not encrypted for https.

In accordance to our Code of Conduct and Community Terms of Use DO NOT post or attach the following:

  • Software files (compressed or uncompressed)
  • Files that require an End User License Agreement (EULA)
  • Confidential information
  • Personal data you do not want publicly available
  • Another’s personally identifiable information
  • Copyrighted materials without the permission of the copyright owner

Files and content that do not abide by the Community Terms of Use or Code of Conduct will be removed. Continued non-compliance may result in NetApp Community account restrictions or termination.