2009-01-21 12:38 PM
We have two clusters of 3020(c) that were recently upgraded from OnTap 7.2.3 to
OnTap 126.96.36.199. One cluster seems to be happy and we've seen no crashes or
failures on either node. The second cluster has experienced two failures. Each
failure happened after a quota assertion error and the cluster failed over. After a chat
with NetApp tech support, we determined that our issue would be addressed in OnTap
188.8.131.52P2 that should be out next week.
This is all fine, BUT my concern is that we have an underlying problem with the node
that keeps crashing on a random basis. Looking through the messages file on the node
that crashes very frequently it appears that someone tried to install the 7261_q tarball
instead of the 7261_e tarball. This resulted in a download failure which I'd expect, but
since we install with the 'install_netapp' script that unarchives and installs the tarball under
/etc I'm curious if extracting an incorrect version of OnTap -- in spite of being unable to
download the faulty image to the filer -- could have adverse effects on
normal functionality of a filer node?
In other words, is this quota error and panic a red herring where in fact there is corruption
on the node that's crashing on a regular basis? If this is the case, is there any way to work
around this issue short of reinstalling the filer from scratch? Note that in this cluster, filer2
(the healthier of the two nodes) reports correct information for kernel, backup, and diag versions.
filer1 (the unhealthy one) reports only correct information for kernel version, but backup and diag
versions are no longer displayed.
2009-01-21 01:58 PM
I would not expect that to be the problem. Bottom line, the only platform-specific differences in the distrubution should be the code that goes on to the CF card (i.e. the ONTAP kernel). The rest of the tarball is the same. The download command caught the error and, rightfully saved your controller from becoming a doorstop at the next reboot.
The quota issue you describe sounds like a real bug in that release of code, I've heard of quota issues in that particular release, and especially since there is a P release to fix it coming out, I don't think it's a symptom of loading a different distrubution on the controller.
2009-01-21 10:49 PM
Will the "software install" and/or "software update" commands already check if it is the correct version before actually installing the software in the root volume?
2009-01-22 12:44 PM
I don't believe it will stop the loading of the bits onto the root volume, although I have to admit, I've never tried it, but I believe the answer is no.
But I do know the download command whether run manually or via the software command will stop it which is the critical piece. The kernel is the
only real difference between the builds. The rest is mostly text files if you really look at it.