DOT 8.1 stability

jimmy_henry · ‎2012-06-25

Hi

We are deploying DOT 8.1 on a mix of FAS3240, 2240 and 2020 mainly for the snapmirror compability between 32-bit aggr and 64-bit aggr. The systems will be used for CIFS/vFilers and VMware but needs to be very stable as it is critical to serve the users. Initially, we were planning to deploy everything on DOT 8.0 and wait until 8.1 is stable enough before upgrading. However, the new 2240 are all on 8.1 and it may be more of an issue to downgrade to 8.0.

Is it advisable to deploy on 8.1 now for all, or better wait it to be stable enough? Is there any known issues with DOT 8.1 re: CIFS, vFilers and VMware on FCP?

Thanks for your comments

Jimmy

scottgelb · ‎2012-06-25

For the 2240 you can't go below 8.1 and for the 2020 you can't go above 7.3. So you are in a position without a lot of choices except on the 3240, but it likely makes sense to match 8.1 on it to the 2240s (especially if the 3240 is the VSM target you have no other choice) and the 2020 on 7.3 can be a source but not a target.

We have been deploying a lot of 2240s and (knock on wood) 8.1 has been very stable (more so than the 8.0 releases last year in my experience), but it is always good to check the release advisor and bug reports on each release (and release comparison). You mentioned vFilers too... 8.1 adds back data motion for vFilers (in 7.3 and 8.1 but not 8.0) and SMB 2 which may also be of benefit.

pascalduk · ‎2012-06-27

Running ONTAP 8.1 on 3 MetroClusters and a couple of standard HA pairs for almost 2 months now. No issues at all.

jimmy_henry · ‎2012-07-10

Thanks a lot. Have deployed 8.1 and happy with it so far.

radek_kubka · ‎2012-07-10

Hi,

It just popped out in this thread:

https://communities.netapp.com/thread/22676

Some less lucky people hit this bug in 8.1, which apparently is fixed in 8.1.1 only:

http://support.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=510586

Regards,
Radek

jimmy_henry · ‎2012-07-10

Thanks. This seems scary enough to stay on 8.1! We're not comfortable upgrading to 8.1.1 for now (too new). Has anyone hit bug 510586? Is there any specific condition that it happens?

Regards

Jimmy

radek_kubka · ‎2012-07-10

The bug description specifically talks about "fragmented data layout" - so it looks like 8.1 works just fine (looking at the existing, not affected deployments), unless this condition is met.

david_veness · ‎2012-07-21

We experienced bug 510586 in our environment - we deleted an old NFS volume (>12 Months) on one of our 3140's and it sent the WAFL scans into meltdown. It sat the CPU and Disks @ 100% utilisation for nearly 24hrs, causing major outages to nearly 100 VMs. Very nasty indeed. Now on 8.1.1RC1 across our environment and appears to be stable so far.

Thanks
David

jimmy_henry · ‎2012-07-22

It seems quite serious. We have either to go for 8.1 or 8.1.1RC - but the thought of going for an RC version on a critical deployment is quite scary. 8.1P1 is currently available but doesn't fix bug 510586. Does anyone know whether there is an upcoming 8.1P2 which will address it and an estimated availability date?

Alternatively, when is 8.1.1 going GA? Many thanks

jimmy_henry · ‎2012-07-30

Just a quick update: DOT8.1P2 is now released and fixes this bug

mtj_proact · ‎2012-10-02

We hit this bug on several Metro Clusters, and received a speciel D patch 8.0.2P6D12 solving this bug, the problem I have is that it sounds like it will only be fixed in 8.1 releases (8.1P2, 8.1P3, 8.1.1RC1, 8.1.1, 8.1.1P1) and not in the 8.0.X versions like 8.0.3P4 or 8.0.4RC1 and we will not go for the 8.1 versions at the moment. Does anyone no if this bug 510586 will be solved in 8.0.X versions !?

Any comments from NetApp on this is most welcomed.

ANDREW_GALLANT · ‎2012-07-16

I know I am coming in a bit late on this one but about a month ago we upgraded our 3240s to 8.1. It was a total disaster. All 9 of our 3240 towers had issues with back to back copies starting. The towers would stop serving data. NetApp ended up upgrading our 9 towers to 3270. The issue with 8.1 is that the kernel takes a larger footprint leaving less memory for system functions. The buffers fill up and the filer stops serving data until the buffers are flushed to disk, this creates and endless loop because vmware is queuing up requests then when the filer becomes available again its flooded with queued data causing the buffer overrun to start again. PAM cards in 3240s made the issue worse, we had to pull all of our PAM cards out. The issue also happened on our 3170 towers but in that case the towers that had PAM cards in them did not have the buffer over run issue. 8.1P1 is in RC and will be GA in a few weeks. I would wait until 8.1P1 comes out. (it may be P2 where the issue is addressed) As for reverting it was not an option you have to delete all of your snapshots. I reverted a tower that had one volume on it and it still took an hour for the HA pare. In NetApps defence 1. Our towers were really busy to begin with 2. NetApp did everything they could to correct the issue which resulted in all of our 3240 heads being changed out to 3270 heads.

jimmy_henry · ‎2012-07-16

This sounds scary. Do you know if the issue occurs only when upgrading? In our case, it'll be a fresh installation and new data will be snapmirrored to the 3240. Does this help in any way?

We will also be on VMware via FC and CIFS mainly + PAM.

I would prefer to wait for 8.1P1 but not sure if this will be in time as we already have deployed remote sites which will be snapmirrored onto the 3240 on 8.1

Any comments from NetApp on this most welcomed.

ANDREW_GALLANT · ‎2012-07-16

It is a 8.1 issue. Upgrade or new install it will happen. The kernel in 8.1 is bloated so it lowers the capacity of the filer. If you are doing a new install than you will be less likely to hit it because you will see the alarms go off before hand. If you have an existing loaded filer than you won't know if you hit it until after you upgrade. Now that I think about it I think the issue is addressed in P2

jimmy_henry · ‎2012-07-16

Just checked and 8.1P1 is available. Is it recommended to go for 8.1P1 then?

Among the bugs fixed is http://support.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=542889 - is that the one related to the issue you had?

aborzenkov · ‎2012-07-16

I suspect, it was 8.1.1 implied. P-releases do not even have RC at all.

ANDREW_GALLANT · ‎2012-07-16

http://support.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=510586 was my bug. I am sure I hit the other one too

radek_kubka · ‎2012-07-16

Interesting story. Scary, but interesting.

If you hit bug 510586, then you should be looking at 8.1.1 I believe, not 8.1P1.

pascalduk · ‎2012-07-30

That bug is also fixed in 8.1P2, which was released last week.