You need to run mbralign and mbrscan against the *-flat.vmdk, that's why you see this error For anyone that is having issues running mbralign or mbrscan for Windows or Linux, I've created a step by step article on how to do so: http://www.sysadmintutorials.com/tutorials/netapp/netapp-how-to-use-mbralign-to-correct-misalignment/
... View more
Sounds good, with our situation completing scrubs on the aggreage and clearing the RLW_Upgrading process didn't make any difference to the performance. Upgrading also to 8.1.1RC1 did not make any difference. Only after installing the PAM cards we saw some difference. But with the same load, same utilization going from 8.02 to 8.1 to 8.1.1, only saw problems in 8.1 and above.
... View more
Hi Craig, do you plan on upgrading to this release soon ? I would be interested if it resolved the issues ? Also what version of Ontap are you running now ?
... View more
Thanks for the reply, according to Netapp there are specific RAID group sizes you should use depending on the disks that you have in the system. See the image below Another note on adding disks to an aggregate. We have been told to either add the full raid group size or at least half when adding disks. For example, for a Raid Group of 16, it is recommeneded to either add 16 disks or 8 disks at a time, not adding 1 or 2 disks at a time. Also remember you must keep 2 spare disks per disk type.
... View more
Another update, we have now seen the system running with the 256GB flash cache modules for a few days now. The system seems to be responding much faster, though we have been advised that the modules will need to be increased to 512GB. We will also be adding another shelf of 24 x 600GB SAS. Setting a raid group size of 16 (which is recommended for a 3240 with 600GB SAS), creating a new aggregate, migrating data over to this aggregate. Netapp also identified a server that was absolutely hammering the system, we have removed this system to a single Physical Server running on it's own local storage. This has also improved performance. We will then be able to detroy existing aggregates and add these existing 600GB disks into the new aggregate. This will take some time but i'll post updates along the way. If anyone has questions don't hesitate to ask.
... View more
A quick update, we have updated to ontap 8.1.1RC and still face the same issues with performance problems. We have "on loan" some 256GB flash cache cards that we put into the system last night, it's still a bit early to tell but so far today everything seems to be running ok, we haven't tried to run a dedupe yet. Once we do i'll update the post here.
... View more
I'm supprised Data Ontap 8.1GA hasn't been removed from the downloads section with all the problems it's causing everyone. We are upgrading this Sunday night, so I will definately let you know
... View more
Just thought i'd update everyone on this issue. We have been working with Netapp for almost one month now and it's with the highest level engineers. They have basically told us that 8.1GA code has issues that could seriously degrade system performance to where it's almost unusable. There is no workaround for Ontap 8.1 however the fix is to upgrade to 8.1.1RC1. There is a public BURT that has been release which you can read here: http://support.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=510586 We are just waiting on some more test results and we will be upgrading to 8.1.1RC1 soon. I will write back to let everyone know if the performance returns to normal.
... View more
Hi, can you please post the following output: jump into priv set diag and type sysstat -M 1 for about 20 seconds during times of high latency or slowness ? Do you monitor the network interfaces on the Netapp ? Can you let us know what the utilization is please ? Is it set as a vif ?
... View more
The ANY1+ monitor is not your actual cpu utilization. You are better off using sysstat -M 1 to see the utilization on each CPU. ANY1+ means 1 of your cpu's reached a max or is operating at 99%, ANY2+ means 2 or more of your cpu's are operating at x%, and so on. In our case, after the aggregates completed a full scrub, we noticed a drop in CPU utilization. Also in our case, we have turned off all dedupe jobs for the time being. Previously if a dedupe job kicked in and the scrub had not completed, the system was almost unusable. Hope this helps you and others.
... View more
When you type in aggr scrub status <aggr name> -v, if you see the aggr or raid groups as (suspended) and it's already done a few percent, then use the command aggr scrub resume <aggr name> so it resumes from the point it started, it should take less time. I think if you type in aggr scrub start aggrname, it will start the scrub process from the beginning again. Which is not a bad thing, but depending on the size of your aggregates and the utilization of the filer, could take quite a few hours. Good thing is, if you are seeing some impact from the command you can pause it and resume it later.
... View more
If you have one aggregate with 32-bit and another with 64.bit, and your root vol sits on the 32-bit aggr, an option is to use snapmirror with 8.1 to snapmirror the root to the 64-bit aggr, set the new root volume as root, and either reboot or failover and failback the filer to put it in place.
... View more
Hi, actually the scrub does have a relation with RLW. I wouldn't wait for the scrub to finish on it own, it will probably take months, because it only runs in the background when the filer has low utilization. With our one, I found a low utilization period and resumed the scrub manually "aggr scrub resume <aggr name>". Once the scrub completed, the RLW_Upgrading changed to RLW_ON With all these bugs, i'm wondering if ontap 8.1 was released a bit too early ?
... View more
Hi just out of interest if you enter in priv set diag, and type aggr status <aggr number> -v can you see RLW_ON or RLW_Upgrading ? Also if you type in aggr scrub status -v, when was the last time your aggrgates completed a full scrub ?
... View more
Hi Satinder, thanks for your reply. I use VSC to scan the datastores and it tells me the vm's with certain offsets. For example Group 1 - vm's have an offset of 1 and Group 2 - vm's have an offset of 2, and so on. I have this information, but do you have the cli to create, for example, a lun with an offset of 1 ?
... View more
Hi, Creating optimized luns in VSC to quickly fix misaligned vm's is great, however not so great when you are using snapvault and there is no option to place the lun in a qtree. Does anyone know if it's possible to create these optimized luns via the cli, so it will then be possible to place them in a qtree ?
... View more
This is what I got back from Netapp "What it does is add additional data into the parity block that is stored when a write is completed to protect against uncommon disk malfunctions where the system is not able to detect that data wasn’t actually written to disk" But you're right, I can't find any information on rlw in the 8.1 documentation. Luckily Scott pointed out the previous forum post above.
... View more
After resuming the scrub last night on the aggregate, it is now complete and the aggregate does not show rlw_upgrading. It is now shoing rlw_on, which means the upgrade has completed. I will try on one more aggregate on a different filer just to make sure this is the resolution. I'll post back my results.
... View more
I'm going to test this tonight on one 2040 filer that we have. I'm going to resume to aggr scrub and see what the status reports back once it's finished. It's at 94% now. For anyone interested the cli to resume the scrub on an aggregate is aggr scrub resume <aggr_name>, followed by aggr scrub status -v to check the status. Fingers crossed this is it. Will write back once its finished the scrub.
... View more
Hey Scott, thanks for the reply, yeah I could open the forums link and it's exactly the same problem. If I type in the cli vol scrub status -v I can see the following: vol scrub: status of /aggr3/plex0/rg0 : Current scrub is 2% complete (suspended). First full scrub not yet completed. vol scrub: status of /aggr0/plex0/rg0 : Scrub is not active. Last full scrub completed: Sun Jun 17 06:36:30 EST 2012 vol scrub: status of /aggr1/plex0/rg0 : Current scrub is 25% complete (suspended). Last full scrub completed: Sun Jun 10 05:15:30 EST 2012 vol scrub: status of /aggr1/plex0/rg1 : Current scrub is 26% complete (suspended). Last full scrub completed: Sun May 20 06:32:16 EST 2012 vol scrub: status of /aggr2/plex0/rg0 : Current scrub is 21% complete (suspended). Last full scrub completed: Sun May 6 02:22:28 EST 2012 The only aggregate that no longer has the rlw_upgrading status is aggr0. As you can see from the status above is also is not in the middle of any scrub. Could this be the reason why the other aggregates are reporting back rlw_upgrading, because it has not completed a full scrub since the upgrade to data ontap 8.1 ?
... View more
Hi, We recently upgraded ontap from version 8.02 to 8.1. We read through the release notes, upgrade advisor and the upgrade notes, proceeded with the upgrade which was quite smooth BUT.. No where in the release notes or 8.1 documentation does it mention that in the background (after the upgrade) there is a background process that runs that can potentially dramatically degrade performance of your filer. If anyone from Netapp reads this, can you please ask to add this caveat into the release notes and upgrade advisor. Right after upgrading there is a background process that begins which is entitled rlw_upgrading. RLW is short for Raid Protection Against Lost Writes. It is new functionality that is added into Data Ontap 8.1. to see this process you need to be in priv set diag and then aggr status <aggr_name> -v The issue is, while this process is running, and your dedupe jobs kick in, the CPU will sky rocket to 99% and filer latency goes through the roof. The only way to run the filer sufficiently is to either disable all dedupe, or turn all dedupe schedules to manual. The problem is, this background process has been running for the last 3 weeks on one filer, and the last 2 weeks on another filer. I have a case open with Netapp at the moment, but was wondering if anyone else has experience with this, or any recommendations/commands as for us to see how long this process has left to complete because no one seems to know much about this process or function ? Becasuse for the last 2-3 weeks we have not been able to run any deduplication without severly impacting filer latency.
... View more
Hi, I heard on the grapevine that there is a command we can enter into our FAS 3240 that will show the performance improvements with FlexCache before installing the module. Does anyone know what the command is ?
... View more
Hi Scott, just thought i'd check in and see if you are now running OnTap 8.1 and if so, do you know if the command snapvault update exists within it's interactive shell or rsh ?
... View more