2010-08-19 12:52 PM
FAS3160 cluster in which we would like to utilize NFS to host several vm sessions.
Of the two filers one has significantly more iops as opposed to the other (Filer 1 in the image)
The one with the lower iops has more Network activity. (Filer 2 in the image)
CPU activity is almost identical between the two, so I haven't included that information.
Of the two filers which would be preferred?
We have available disk that can be used on either filer, along with additional network ports on both filers.
Thanks in advance,
Solved! SEE THE SOLUTION
2010-08-24 01:53 PM
That is pretty normal. The thing about an "op" is that it's not fixed in size. So a tiny cached metadata op = one op as does a large write. Thus, it's not unusual to see high ops with low network throughput and lower ops with higher network throughput. If I'm doing 32K writes, 10 "ops" is 320K on the network (plus some overhead). But if my metadata lookup is 1K, it would take 32 ops to generate the same amount of network throughput. That's over 3X the ops for the same amount on the network throughput.
As to which is better. I'm sure not you're asking the right question. In general, large ops perform better than small ones assuming you are going to disk, but what is better and can you control that given that users tend to do what they will do. Sometimes you can get them to change their applications, but that can be a tough road. The stat I'd rather track is latency. When I think performance, I always start there since that will dictate the user experience more than anything else. And as a storage vendor, I care about the latency of the request coming in and the reply going out. If that is good and the end-user latency is bad, then it's probably something outside of my control and I have to bring in the network or host people to figure out the problem. In my view, in-box latency at or under 10ms is pretty good for most environments. That's not that it won't occasionally go above that, but if it's at that level or below, most users don't complain about performance. This is, of course, a rule of thumb so feel free to establish your own threashold based on your environment but if you don't have one, 10ms is probably a good starting point. So I start with controller latency, then if that's not acceptable, I then look to other components for a bottleneck. That can be cache, disk utilization, CPU, or network, and probably a couple of things I haven't thought of at this moment, but hopefully you get the idea.
2010-08-25 02:01 PM
Thanks for the detailed response.
I probably should of stated the question in a different way. If you had a choice between two heads to host several VM sessions via NFS which one would be preferred. - The one with the higher ops and lower network utilization or the head with the higher network utilization and lower ops?
2010-08-25 02:31 PM
If I have to choose, I typically like better throughput than ops. In general, larger ops perform better than lots of little ones...especially if you are hitting disk rather than cache. But like I said, sometimes you don't get to control that so I tend to rely on latency as my real guide.
2010-08-27 12:07 AM
In many VM environments, you typically need lots of IOPS, because the IO generated by the different guests is mostly random IO with a small block size. You typically also see that the actualy throughput is very low because of this.
So you need to choose the head that can offer the most IOPS possible. If the number of spindles is comparable between the two heads, and one head is already serving a high number of IOPS, it would probably be better to choose the other head (assuming it still has more spare capacity to serve IOPS than the busy head).
If the disk configuration between the heads is different (say FC versus SATA or many more spindles on one head), you need to choose the head that can offer the mosts IOPS (largest number of spindles in the aggregate or fastest disk technology).
2010-08-27 06:27 AM
Thank you for your reply Karl,
This particular situation has been a bit difficult to quantify. Watching the ops and throughput stats for the past couple of days, at least in our setup/configuration I have to agree with Adam.
With regards to NFS and VMware specifically, we are experiencing very low ops and more throughput on a FAS2020 that we use exclusively with NFS for test and dev vm sessions.
Take this with a grain of salt as everyone's environment is different and based on the type of VM sessions that are being hosted I would guess the results could be quite different.
2010-08-27 10:29 AM
That is interesting that you are seeing higher throughput than Ops on your VMs. At any rate, here is a little thing I put together for monitoting NFS performance in VMware environments.