Solved: Preventative Maintenance best practice on NetApp Storage

Harvey_Mulenga · ‎2020-07-01

What is best way and steps to undertake preventive maintenance on NetApp Storage?

Mjizzini · ‎2020-07-09

Most maintenance should be performed in a non disruptive way.

Even when needed to halt/reboot the node, a takeover of the node in question should be performed so the partner can carry the load until the maintenance is done.

Note that:

CIFS and NFS v4 will terminate all sessions and will have to reconnect to the partner after the lifs be migrated.
The up node will be doing the work on behalf of the partner. The up Node will have no redundancy during the takeover. Therefore, to prevent any performance impact , takeover/giveback is recommended to be done during maintenance time only.

View solution in original post

SpindleNinja · ‎2020-07-01

Monitor Active IQ for bugs/patches/etc.

Install and setup alerting for Unified Manager. Or take it further and deploy Cloud Insights.

Setup customer alerting.

paul_stejskal · ‎2020-07-06

Also, I'd recommend a regular patch cycle. You don't want to have to jump up in ONTAP releases just because you're hitting end of support dates. Usually most customers stick to a major release or two back (currently 9.5/9.6 latest P release). I cannot make a specific recommendation but generally the best releases are here: https://kb.netapp.com/Support_Bulletins/Customer_Bulletins/SU2

Harvey_Mulenga · ‎2020-07-09

Thank you all for your posts. To rephrase my question; What is required to be done in a situation were a client has a regular maintenance with you on their storage? Do you have to regularly shutdown their storage and blow the machines and patch them? Or you just monitor and patch when need arise?

SpindleNinja · ‎2020-07-09

You should never have to shut down storage for a code or firmware updates. (unless a single node cluster) You do need to make sure that NAS lifs failover correctly and that hosts can failover correctly when SAN fails over.

As far as scheduling with the client. It's both. If it's something that's critical and/or they're hitting it, patch it. The recent SP/BMC bug is a good example. Otherwise, keeping them on a 12 or 6 month patch/update cycle is pretty normal. Another example is if they want to try a new feature(s) in a new version of ONTAP.

Mjizzini · ‎2020-07-09

Most maintenance should be performed in a non disruptive way.

Even when needed to halt/reboot the node, a takeover of the node in question should be performed so the partner can carry the load until the maintenance is done.

Note that:

CIFS and NFS v4 will terminate all sessions and will have to reconnect to the partner after the lifs be migrated.
The up node will be doing the work on behalf of the partner. The up Node will have no redundancy during the takeover. Therefore, to prevent any performance impact , takeover/giveback is recommended to be done during maintenance time only.