Netapp FAS vs EMC VNX

dejanliuit · ‎2011-04-10

Hi.

This year we have to decide if we should keep our IBM N-series 6040 (Netapp 3140) stretch metrocluster, upgrade to Netapp 3200 (or rather the IBM alternative) or move to another manufacturer.

And from what I can see, EMC VNX is one of the most serious alternatives. My boss agrees and so we have aranged meeting with EMC to hear about their storage solution including ATMOS.

So, I would like to hear from you other guys what I should be aware about if we decide to go EMC VNX instead of keeping the Netapp/IBM track.

It could be implementation wise or things like "hidden" costs, ie volume based licensing.

I'm having trouble finding EMC manuals to see what can be done and what can't.

Our CIO has set up one big goal for the future user/filestorage: The storage has to cost at most as much as it would if you go and buy a new Netgear/DLink NAS (with mirrored disk) a year.

This would mean that $/MB for the system has to be as close as possible to this goal. Today the cost is at least tenfold more.

Unless we come close to that, we have a hard time convincing the Professors with their own fundings to store their files in our storage instead of running to nearest HW-store and buy a small NAS (or two) for their research team.

It's called "academic freedom" working at a university...

Initial investment might be a little higher, but the storage volume cost has to be a low as possible.

Today we have basic NFS/CIFS volumes, SATA for file and FC for Vmware/MSSQL/Exchange 2007.

No addon licenses except DFM/MPIO/SnapDrive. Blame the resellers for not being able to convince us why we needed Snap support for MSSQL/Exchange.

We didn't even have Operations manager for more than two years and has yet to implement it as it was recently purchased.

The Tiering on Netapp is a story for itself.

Until a year ago our system was IOPS saturated during daytime on the SATA disks and I had to rechedule backups to less frequent full backups (TSM NDMP backup) to avoid having 100% diskload 24/7.

So the obvious solution would be PAM and statistics show that it (512GB) would catch 50-80% of the reads.

But our FAS is fully configured with FC and cluster interconnect cards so there is no expansion slot left for PAM.

So to install PAM we have to upgrade the filer, with all the costs associated BEFORE getting in the PAM.

So the combination of lack of tiering and huge upgrade steps makes this a very expensive story.

What realy buggs me is that we have a few TB fibrechannel storage available that could be used for tiering.

And a lot of the VM images data would be able to go down to SATA from FC.

EMC does it, HP (3Par) does it, Dell Compellent does it, Hitachi does it ...

But Netapp doesn't implement it. Despite having a excellent WAFL that with a "few" modifications it should be able to implement it even years ago.

Things we require are

* Quotas

* Active Directory security groups support (NTFS security style)

* Automatic failover to remote storage mirror, ie during unscheduled powerfailure (we seem to have at least one a year on average).

Things we are going to require soon due to amount of data

* Remote disaser recovery site, sync or async replication.

Things that would be very usefull

* Multi-domain support (multiple AD/Kerberos domains)

* deduplication

* compression

* tiering (of any kind)

So I've tried to set up a number of good/bad things I know and what I've seen so far.

What I like with Netapp/Ontap

* WALF with its possibilities and being very dynamic

* You can choose security style (UNIX/NTFS/Mixed) which is good as we are a mixed UNIX/Windows site.

Things I dislike with Netapp/Ontap/IBM

* No tiering (read my comment below)

* Large (read: expensive) upgrade steps for ie. memory or CPU upgrade in controllers

* Licenses bound to the controller-size and has essentialy to be repurchased during upgrade (this I'm told by the IBM reseller)

* You can't revert a disk-add operation to a aggregate

* I feel a great dicomfort when switching the cluster as you first shut down the service to TRY to bring it up on the other node, never being sure it will work.

* Crazy pricing policy by IBM (don't ask)

* A strong feeling of being a IBM N-series customer we are essentialy a second rate Netapp user.

Things I like so far with VNX from what I can see

* Does most, if not everything our that FAS does and more.

* Much better Vmware integration, compared to the Netapp Vmware plugin that I tried for a couple times and then droped it.

* FAST Tiering

* Much easier smaller upgrades of CPU/memory with Blades

I have no idea regarding negative sides, but being an EMC customer earlier I know they can be expensive, especially after 3 years.

That might counter our goal of keeping the storage costs down.

I essentialy like Netapp/Ontap/FAS, but there is a lot of things (in my view) talking against it right now with Ontap loosing its technological edge.

Yes, we are listening to EMC/HP/Dell and others to hear what they have to say.

I hope I didn't rant too much.

henrypan1 · ‎2011-04-10

dejan,

Please drop me an email to Henry.pan@ironmountain.com, so I could share my storage selection story with you.

Good luck

Henry

radek_kubka · ‎2011-04-11

Hi Henry,

Any chances you can post some of your key thoughts here as well?

Regards,
Radek

henrypan1 · ‎2011-04-11

Not yet Radek

Thank

Henry

radek_kubka · ‎2011-04-11

Hi,

Interesting stuff - I am really keen to see how other members view this.

Few, quick thoughts from me:

Things I like so far with VNX from what I can see

* Does most, if not everything our that FAS does and more.

Actually it doesn't do everything - e.g. there is nothing even remotely resembling MetroCluster functionality you are utilising at the moment (unless I am missing some new EMC functionality I haven't heard about before)

Regarding EMC VNX 'unification' - have you seen this: http://communities.netapp.com/people/radek.kubka/blog/2011/02/11/objects-in-the-mirror-are-closer-than-they-appear?

* Much better Vmware integration, compared to the Netapp Vmware plugin that I tried for a couple times and then droped it.

Have you seen this in action? Personally I haven't (other than few slides), so I am somewhat skeptical how slick this EMC / VMware integration is. I am using NetApp VSC, and although it isn't perfect, it is very useful in my opinin, most of ther time, it genuinely does what it says on the tin.

* FAST Tiering

Always read the small print . FAST works on 1GB sub-LUN granularity - that's a *lot* of data to mve around! Also it doesn't work on NFS datastores (EMC offers archiving functionality for file shares, which isn't feasible for moving VMDK files around). Also, interestingly enough, EMC introduced so called FAST Cache, which in essence is the same approach as NetApp Flash Cache - it makes me thinking this approach may be actually more feasible that sub-LUN tiering.

* Much easier smaller upgrades of CPU/memory with Blades

I am not familiar with their upgrade process, but again: always read the small print to double-check how convenient this upgrade will be in a real-life scenario! E.g. I know for a fact there is no upgrade path from VNXe to VNX (which not necessarily will be applicable to your case).

Regards,
Radek

dejanliuit · ‎2011-04-11

FAST not being able to work with NFS is news to me. Good to know and I will ask EMC when we meet them tomorrow. FAST Cache sounds interesting.

Regarding the Vmware plugin I have been checking out what Chad Chakac is presenting on his blog, but of course I will demand a real life view of it ASAP.

Also, I will ask it the blade upgrades realy provide scalability. You might be limited to one CIFS/NFS server only being able to use one blade anyway and not scale with the number of blades.

Regarding the Tiering and movement size, I have been looking at Dell Compellent and are about to visit HP to see 3Par in action.

At least Compellent can move smaller blocks, but it is missing the NAS functionality that we require (unless combined with Microsoft Storage Server solution).

As long as Dell isn't presenting ExaNet soon, I'm not sure it would be the way to go for our enduser files. HP seems to have Windows and Samba-based NAS gateways to the 3Par system.

I'm still not impressed with Netapp's total dismissal of tiering without PAM. They (tiering and PAM) could easily live side by side, both adding value to the Netapp system.

At least in my case the PAM-requirement actualy is one of the reason to make us look the other way instead of keeping the current setup and adding life to the system..

radek_kubka · ‎2011-04-11

I'm still not impressed with Netapp's total dismissal of tiering without PAM. They (tiering and PAM) could easily live side by side, both adding value to the Netapp system.

That's actually a very interesting topic in itself. I've mentioned something along these lines few times to different NetApp folks and the answer usually was: "we don't need automated tiering as caching is better". That's arguably not true in every case (e.g. a random write heavy workload), yet EMC following NetApp footsteps with their FAST cache proves the point that the Flex Cache / PAMM concept is doing its job great in many situations.

That being said, from a marketing standpoint having a feature (automated block movement between tiers), even if not using it, is almost always better than not having it!

mooi · ‎2011-04-11

Well Dejan let look at the tiering is like the VMotion in the VMWare it is a nice function to have but if you did not tune it correctly then you will have too many movement of data at the FAST tiering layer that will cause you a serious performance issue. I think you might need to reconsider the need of FAST tiering but the Fast Cache is a better option for long run. Well in OnTap 8 you also have the option of using the data motion which allow you to perform the data movement from FC/SAS with out impacting the production system only drawback it will be a manual job.

The thing that impress me is the unisphere that provide a unified view and also execution makes easy for replication, provisioning, backup and DR. The only thing that worries me is the cost that going to impact me as my data grows. There is no saying is the price of the data protection suit is following the controller price or raw data tiering pricing.

Looking back at the Deduplication it only works for the NAS and not for SAN which is not impressive to me but yet this going to be a argument point cause most of the consolidation will happens on NAS area where most of the files being stored. Well on the Ontap 8 you are be able to turn on the compression as well take note it does have performance impact on the vol/lun that you turn it on.

Correct me if I am wrong I just whack thru the VNX documentation i dont think they are having the Metro Cluster solution. they may have the mirror view or netapp snapview equivalent.

dejanliuit · ‎2011-04-11

I agree that Tiering inappropriatly used can worsen the situation as a lot of data will be doing ping-pong "in transfer" between tiers.

Also, FAST without FASTCache could make you having quite slow system until next rearangement in the tiers when the accessed data is moved up, by then the target data might be "uninteresting" until next week, month etc...

Looking at the FASTCache more closely it seems to be the right combination with tiering on SAS/SATA level.

The granularity FASTCache operates at 64K block level, making it decent sizewize, compared to FAST 1GB granularity.

Now compare that to DataMotion granualrity of whole volume, never ranging less than a few hundred GB and probably getting closer to TB size.

And you don't move ie. the volume with the whole Sharepoint database disk to a slower tier just because 90% is static data.

So the DataMotion is more useful (for me) in situations where I wish to rebalance aggregates or move data from aggregates to replace aging/small disks

What does worry me regarding any kind of tiering is ie. the location of file's metadata.

Much, if not most, of the pressure on my SATA-disks is metadata-access.

I'm a little worried that metadata will "pull along" data filling the cache, depending on the actual location of metadata on ie. VNX.

While Netapp PAM can be configured to cache only metadata, but I haven't seen any configuration to prioritize metadata when replacing the PAM content.

EMC FASTCache "detailed review" paper : http://www.emc.com/collateral/software/white-papers/h8046-clariion-celerra-unified-fast-cache-wp.pdf

Message was edited by: dejan-liuit Added EMC link

vmsjaak13 · ‎2011-04-11

Just my biased 2 cents 🙂

NetApp (FAS32xx) pro's:

* Room for 50% more IO cards compared to FAS3140 / N6040

* 6Gbps SAS with new DS2246 shelf (remember 4x 6Gbps lanes in a SAS cable)

* vfilers for multiple domains

* Free deduplication

* Free compression (inline, as with almost all things inline, it can impact performance)

* Free My Autosupport/upgrade advisor (only available for N Series customers if they go through IBM support)

* Flash cache is dedup aware, works at 4kb block level, set it & forget it.

* Plenty of snapmanagers to pick from.

* IBM always lags behind with new software versions for the N Series: snapmanagers, ontap versions

EMC VNX con's:

* Deduplication at the file level, not block level

* Deduplication only on CIFS/NFS, but not for: VMware/Oracle over NFS.

* No metrocluster functionality

* Different replication techniques for block/file, each with its own limitations !!

* Big performance impact with raid6 over raid5.

* VNX means: multiple operating systems to learn !

* FAST cache can have a big performance impact (moving data around)

* Unisphere is nice, but ask EMC what other software you might need, based on features you want to use.

* Entry level VNX5100 is FC only, and you can't upgrade.

In the end a consumer grade NAS will always have a much lower price per GB.

But I don't think I need to elaborate on why you don't want that in an enterprise environment.

Hope this helps.

Regards,

Niek

radek_kubka · ‎2011-04-11

Hi Niek,

* FAST cache can have a big performance impact (moving data around)

Just to be pedantic - I think this bullet point should say "FAST can have a big performance impact".

There are two different things with very confusing names:

- FAST as automated sub-LUN tiering which moves data around (http://uk.emc.com/about/glossary/fast.htm)

- FAST Cache which in essence mimics NetApp Flash Cache / PAM II, with no data movement, just dynamic caching (http://uk.emc.com/about/glossary/fast-cache.htm)

Regards,

Radek

vmsjaak13 · ‎2011-04-11

Hello Radek,

you're correct !

Thanks for the clarification.

Regards,

Niek

mooi · ‎2011-04-11

Yup Agreed

mooi · ‎2011-04-11

I agree with niek.

Look at the VNX it is no different with celerra box added on functionality of Clariion and Avama. But they did not bled it good enough and too many software you need to learn.

Snap sure for NAS

Snap View for Block

Replicator for iSCSI

where else with netapp just one simple snapshot cover it all. It could be more technology involve when it comes to the replication and also database consistency aware software. You can look it at the video that publish at the youtube they might looks easy but when the actual setup and performance tuning comes in you notice that it is a nightmare.

Netapp technology is easier to setup and also learning cycle is shorter compare to VNX

thomas_glodde · ‎2011-04-11

Hi there,

im regularly involved in pre-sales and 9 out of 10 times the customer choses a netapp solution over an existing emc solution. and that 1 customer who choses the emc solution has to do so because he was forced to do so by his boss. but lets stop the political stuff, go to the facts:

the vnx series is nothing else as a rebranded clarion/celera nas-head combination which emc is selling for ages, its old whine in new bottles. you still have to work on different layers of operating systems if the navisphere gui failes to give you the specific wizard driven task you need.

netapp has transparent cluster failover in a metro cluster environment, vnx doesnt. we have post processed dedup over all primary data as well as optional inline compression over all primary data. we have proper thin provisioning as well as up to 255 snapshots, even integrated in windows previos version client. we dont hassle around with linux or windows ce or whatever, we "talk" cifs native with proper acl integration as well as v1-4 nfs and we even support both, means we map windows to unix users and vice versa.

the thing about qnap/dlink nas stuff, you really dont want to go down that road. we are talking about netapp ENTERPRISE storage for a reason. we talk about 520 fromated hard disks or a 8+1 sektor checksumming, we talk about constant scrubbing and self aware fast raid rebuilds and a 24x7 4h service level, nothing of which those small nas can provide. and have 10 users + working on a 3 disk nas, try to reach 50mb+ file transfers using smb2 and have up to 255 snaps per volume. if they are not aware of these security and reliability features, let them have their nas and one day it will crash on them and all their data is lost.

if you are not happy with ibm, is there any reason why you wouldnt buy directly from netapp and its partners to have a proper FAS3240AE and not an N Series?

Regards

Thomas

dejanliuit · ‎2011-04-12

> netapp has transparent cluster failover in a metro cluster environment.

Yes, but I had a bad experience with failover that didn't work during a power failed. This time due to a mixture of human factor and incomplete signaling on the main distribution powerboard (fixed after the incident).

We did buy metrocluster to handle that kind of situation, just to find out the hard way that when we completly lost power to one of the datacenters, the one thing we missed to check when we bought the equipment, metrocluster didn't kick in!

Instead that half simply stopped working, just to do a failover (!) the moment we got power back. So I had to do another failover to get it back to normal. And I seriously dislike failover procedure as it is today as I can't check anything before Ontap stops the service on the working (redundancy) node.

I had expected that the other node kicked in and took over when we lost the power, now the complete virtual system stopped for 2h. Not a good PR for neither IBM/Netapp or the virtual system.

Later I found out it is per design not to fail over in case a whole datacenter is lost, but you have to do a manual forced cluster failover, including an unconfortable failback afterwards.

VNX not having anything close to metrocluster is good to know. I will ask them how they handle situations like that.

> if you are not happy with ibm, is there any reason why you wouldnt buy directly from netapp and its partners to have a proper FAS3240AE and not an N Series?

Well, we do have a sizable investment in the IBM N-series and while I realy feel like moving to "pure" Netapp would provide us with with a better support/access to code earlier etc... it would mean replacing all the hardware, due to support contract reasons.

I doubt I could convince any boss to do that investment, unless Netapp steps in with a sizable buyback. But it will be on the discussion as IBM is not the list of cleared companies for Storage resales to Swedish goverment (inlcuding universities) since 3 years back (public tender reasons, nothing strange about that, Hitachi is missing too).

thomas_glodde · ‎2011-04-12

> netapp has transparent cluster failover in a metro cluster environment.

there is a transparent failover IF PROPERLY CONFIGURED 😉 we strongly suggest our customers to follow the given best practices and we actualy plan and roll out these practices with them, eg setting proper time outs, install host utils etc.

for your total dr scenario, a netapp MC cannot handle a site disaster if the complete datacenter receives a power outage, you have to do a "cf forcetakeover -d" then, there are a few caveats we lead our customers around, so you just have been unproperly consulted ;-( we have several big strech/fabric metroclusters who takeover/giveback within 10-15 seconds without any system going down.

> if you are not happy with ibm, is there any reason why you wouldnt buy directly from netapp and its partners to have a proper FAS3240AE and not an N Series?

ok, seems like a political/sales issue, you might be able to solve it with your local netapp sales representative or at least i'd stick with ibm before buying an emc machine

good luck mate! 😉

urbanhaas · ‎2011-07-01

EMC doesn't have Metrocluster in the VNX, but offers VPLEX Metro as an equivilent configuration. VNX + VPLEX can be the same cost as a NetApp Metrocluster. With their 5.0 code, they have transparent failover (no similair "cf takeover" command), if you install a witness at a third site running in a VM or standalone server.

It would be good for NetApp to offer similiar witness support to handle the total datacenter failure/split-brain scenerio. I'm currently comparing NetApp MetroCluster and EMC VPLEX Metro in my own blog http://dctools.blogspot.com.

No one ever has it all. EMC's VPLEX will rely on VNX or RecoverPoint to do snapshots. NetApp has snapshots nicely integrated into one package. NetApp doesn't offer redundent nodes at each datacenter, EMC VPLEX does. NetApp MetroCluster will have storage traffic trombone, EMC VPLEX will offer local access at each site.

The point is, no one vendor has everyone. Most are wearing blinders to what other's can do and their own limitations. I would love NetApp to offer sub-lun tiering within an aggregate. I would love NPIV-style virtual target FC ports into a vFiler. I would love NetApp to offer a web-based GUI to all the administrative commands people use (vFiler...).

VNX may have two different OSes for block and NAS, but customers don't usually see it or have to learn it, as Unisphere covers that up.

I am a huge NetApp fan and sell a lot of NetApp boxes. They work well. They offer some of the richest functionality, but the user interface (GUI, web-based) is often lacking in NetApp's best features.

aborzenkov · ‎2011-07-01

NetApp does provide witness support (MetroCluster tiebreaker); in the past it was separate solution (I believe integrated with OM); today it is offered as part of ApplianceWatch PRO. See as example http://communities.netapp.com/servlet/JiveServlet/downloadBody/6314-102-1-9571/Partner%20Academy%20Workshop%20MetroCluster%20June%202010.pptx or http://communities.netapp.com/servlet/JiveServlet/download/49558-22659/ApplianceWatchPROBestPracticesGuide.pdf

Unfortunately it is very poorly documented and marketed; the only available link is NetApp internal, couple of paragraphs in ApplianceWatch PRO documentation and whatever you can find on community or kb sites.

You mention in your blog that NetApp MetroCluster needs 4 FC connections – do you count backend only? Because MC requires 2 ISLs; 4 can be used but is optional.

I wonder how VPLEX implements simultaneous write support on both sites without introducing read latency for local access (due to necessity to verify that data had not been changed remotely).

Mathias_Robichon · ‎2011-11-07

> Yes, but I had a bad experience with failover that didn't work during a power failed.

False. MetroCluster has an inbox solution to failover during power outage. If management cards are proprely configured, the takeover is automatic during MetroCluster rack power failure (please, read documentation).

Also, if network goes down at the same time, an UPS solution (about 800e by MetroCluster head) can resolve the problem (again, read the documentation it's documented).

Finally, if MetroCluster is used with "complicated" inter-links (like DWDM), a third referee could be used (like Tie Breaker) and NetApp provides some solutions like this.

Regards,

Mathias

dejanliuit · ‎2011-11-07

Well, we did have the problem. I had more people commenting it should work, but unfortunatly the setup was initialy done by a Netapp consultant (not partner consultant, but a netapp techie) and this is our production envirovment so I can't touch it very much to resolve the problem.

Anyway, we didn't go for neither upgrade or vnx. Instead we are looking at cloud solutions for filestorage and DAS for our exchange 2010.

Only virtualization will be left when MSSQL 2012 with DAS-support for availability clusters is released.

And then we will have another look at if the N-series is worth the maintenance and expansion-cost.