ONTAP Discussions

Highlighted

CIFS performance struggling

We've got an ageing IBM N6250, aka FAS3250 I think, running 7-Mode 8.2.1

 

We want to extract a lot of data from this filer using CIFS - we've got 200T+ in about 44 shares on a two headed controller.

 

Our problem is that the file copies just aren't going fast enough. Individually they work fine, and if I set off a copying job, it works well for a while. But it will then collapse - the transfers simply sit there waiting, being very slow indeed. 15 seconds to shift a 40K file.

 

The copying job is in powershell on a Win2016 VM, doing CopyItem \\filer\share\file destination. There is 10GbE between the filer and the VM.

 

What sort of things can cause that? I realise this is a vague question, so please ask me for further information and I'll try and supply it.

 

cheers,

clive

17 REPLIES 17
Highlighted

Re: CIFS performance struggling

I'd get a storage side packet trace and open in Wireshark and see where delays are happening. Likely it's not storage side if transfers are fast normally. Without having some basic perf info it's hard to say.

 

I'd get stats start cifs, let run for 5 minutes, then stats stop. That will tell you the latency at the controller level. If you have some, check volume level with stats start volume, then stats stop.

Highlighted

Re: CIFS performance struggling

CIFS stats nice and easy :

 

Head 1 :

 

cifs:cifs:instance_name:cifs
cifs:cifs:node_name:
cifs:cifs:node_uuid:
cifs:cifs:cifs_ops:3835/s
cifs:cifs:cifs_latency:1.34ms
cifs:cifs:cifs_read_ops:2095/s
cifs:cifs:cifs_write_ops:1156/s

 

Head 2 :

cifs:cifs:instance_name:cifs
cifs:cifs:node_name:
cifs:cifs:node_uuid:
cifs:cifs:cifs_ops:114/s
cifs:cifs:cifs_latency:9.87ms
cifs:cifs:cifs_read_ops:39/s
cifs:cifs:cifs_write_ops:0/s

 

Head 2 going way slower.

 

Vol stats are, um, a little more verbose. 3.5MB, 4.0MB output on the two heads. read_latency is highest on the CIFS volumes being read (bulk data, on slower disk), but no instantly obvious difference between the two heads.

 

So I think it is somewhere inside the n-series/netapp.

 

cheers,

clive

 

 

Highlighted

Re: CIFS performance struggling

9 ms isn't bad. I'd expect something higher if it's freezing for a few seconds. I think a packet trace is next step to confirm latency on the network.

Highlighted

Re: CIFS performance struggling

Those stats weren't gathered while it was stuck, it's still working at the moment.

 

The freeze when it happened wasn't a few seconds, it was up to a few hours.

 

 

 

 

Highlighted

Re: CIFS performance struggling

It's now slowed down a bit - not frozen, but still not shovelling data that quickly (total 125Mb/s or so)

 

cifs:cifs:instance_name:cifs
cifs:cifs:node_name:
cifs:cifs:node_uuid:
cifs:cifs:cifs_ops:454/s
cifs:cifs:cifs_latency:12.95ms
cifs:cifs:cifs_read_ops:268/s
cifs:cifs:cifs_write_ops:7/s

 

It'll go up again in due course.

 

 

Highlighted

Re: CIFS performance struggling

Alright then those stats will need to be gathered when it is stuck. 5 minutes is long enough to confirm if there is a latency issue.

Highlighted

Re: CIFS performance struggling

A couple of minutes at quite slow :


cifs:cifs:instance_name:cifs
cifs:cifs:node_name:
cifs:cifs:node_uuid:
cifs:cifs:cifs_ops:300/s
cifs:cifs:cifs_latency:16.95ms
cifs:cifs:cifs_read_ops:161/s
cifs:cifs:cifs_write_ops:4/s

 

Compared with the other head, showing 1-2 ms latency, that doesn't seem great.

Highlighted

Re: CIFS performance struggling

It could be outliers. You might try disabling vscan or fpolicy as a test, but without a support case/perfstat I can't really tell more.

Highlighted

Re: CIFS performance struggling

Harvest may be helpful here if you haven't set it up already. https://nabox.org/

Highlighted

Re: CIFS performance struggling

Outliers? Can you tell me more about this? I know what an outlier is in statistics, just wanted to know what you mean with regards to my problem.

 

I'm pretty sure we've got no vscan on there, and I've made sure to exclude it from the servers I'm working with. (the data I'm copying does potentially include malware and I want to retain it perfectly - I am scanning it later on in the process and flagging, but that's after it's off the netapp)

 

We're also not using fpolicy - these CIFS shares are used by a document management system where the permissions are stored in a database and handled by the application. End users have no access to these shares (or indeed any on these filers).

 

The higher latency does seem to correspond to the lower performance - ie when it's working well, the latency is low.

 

Obvious question : could it simply be sheer load on the filer?

Highlighted

Re: CIFS performance struggling

You may check if you have any performance issue on NetApp:

Diagnosing a performance issue

 

Also you can open a technical case with the support team for further troubleshooting.

 

A packet trace between the filer and the client is needed. Below the instruction:

😆 node run -node <node_name> pktt start e0a -i <ip_addr> -i <ip_addr>  -d /etc/crash

How to capture packet traces (PKTT) on Data ONTAP 8 Cluster-Mode systems 

 

Highlighted

Re: CIFS performance struggling

I'd have opened a support case already if I could 🙂 IBM N-Series, Netapp refuse to take our money to support this.

Highlighted

Re: CIFS performance struggling

Does IBM not have an active contract? Just curious.

 

I'm trying to think how we could check. We could just do a basic system check. A few commands:

priv set admin;stats start volume;priv set diag; statit -b; sysstat -c 30 -M 1; sysstat -c 30 -x 1; stats stop; statit -e

 

That should give some output. If you can attach here feel free. The volume names can be ommitted but maybe something like "vol1" then "vol2" would help. Then you can track if we say vol1, vol2 to your names.

 

If you want, you can send the file and link via PM if security is an issue. One other command if you do:

wafl_susp -z

Wait 5 minutes

wafl_susp -w

 

That should get most of it.

Highlighted

Re: CIFS performance struggling

Apologies for not getting back to this sooner.

 

IBM abandoned their partnership with Netapp a few years ago, and support went with it. Which is a pity, because despite being 6 or so years old, the hardware is still pretty effective (as is the 9yo backup filer, though that one does struggle a bit with load). But I reckon Netapp think the sort of people who buy their kit will be willing to pay the money to keep newer hardware.

 

I will look at the detailed logging things you've given me, though it might not be instant. However I am starting to wonder if it's something as simple as load. I keep finding other heavy loads on the disks with the CIFS volumes (SCOM agent going mental on a couple of servers, a big database on SATA disk when it should probably be on SAS, a very busy database I can probably move to an SSD based SAN), and I'm slowly working my way through these.

 

Nobody's yet mentioned cifs.per_client_stats.enable 🙂 That was on, and I turned it off a few days ago (before writing this post). I think I might no longer be getting the dead stops, just go-slows now.

 

Thanks for sticking with it so far.

 

cheers,

clive

Highlighted

Re: CIFS performance struggling

A little more data - not the data I've been asked for, but I thought it might be interesting.

 

If I create a LUN on the same disks as the CIFS volume on each controller, it seems to perform at the same speed on both.

 

If I snapmirror to a second netapp (we've got an old FAS3250 too, also unsupported by netapp, this time due to being second hand), fast disk on the second one, mirror initialize of 50GB or so takes the same speed on both controllers - 7 minutes or so, roughly 1.3-1.4 Gb/s over the network.

 

That seems to eliminate networking and the disks as the source of the problem, and points to the protocol itself.

 

 

Try the NEW Knowledgebase!
NetApp KB Site
Forums