Re: CIFS performance issues on 7mode

adambgunn · ‎2018-09-13

hello netapp community.. I am reaching out because I wanted to get some input on a CIFS performance issue we are experiencing from the perspective of the end user.. Like taking updwards of 10 minutes to open a folder in some cases. The end user environment is predominently Mac with just a handful of windows 7/10.

The volume in question is 13T total of which 11T are used with 114,646,233 inodes used (about 30% use).

I mounted this up to my test machine and there is a spiderweb of directories and subdirectories upon subdirectories which have even more subdirectories and filer... some of these take minutes to ls -la and hundreds of directories/files are listed. It's a bit of mess. I was thinking about performing a rellocate on the volume but it would probably take days just to finish the first step (measure) and it seems to impact the filer pretty heavily while scanning. I was also thinking abouit increasing the tcp_window_size possibly but a restart of CIFS is needed which is pretty disruptive so I would need to have a window for that. Current setting is:

ourfiler*> options cifs.tcp_window_size
cifs.tcp_window_size 64240

What are the limitations of cifs.tcp_window_size?
Default value = 17520 bytes
Minimum value = 1600 bytes
Maximum value = 8388608 bytes(8M)

We have a dual 10G LACP connection for data. The filer is an FAS8020. NetApp Release 8.2.5 7-Mode: Wed Jul 19 03:55:53 PDT 2017

Some information about our CIFS environment

nss2-ord> stats show cifs
cifs:cifs:instance_name:cifs
cifs:cifs:node_name:
cifs:cifs:node_uuid:
cifs:cifs:cifs_ops:2427/s
cifs:cifs:cifs_latency:0.09ms
cifs:cifs:cifs_read_ops:20/s
cifs:cifs:cifs_write_ops:0/s
cifs:cifs:cifs_dlc_ops:0
cifs:cifs:cifs_dlc_ops_queued:0

We have audit logging off /etc is not filling up

nss2-ord> stats show cifs
cifs:cifs:instance_name:cifs
cifs:cifs:node_name:
cifs:cifs:node_uuid:
cifs:cifs:cifs_ops:2427/s
cifs:cifs:cifs_latency:0.09ms
cifs:cifs:cifs_read_ops:20/s
cifs:cifs:cifs_write_ops:0/s
cifs:cifs:cifs_dlc_ops:0
cifs:cifs:cifs_dlc_ops_queued:0
nss2-ord> options cifs
cifs.AD.retry_delay          0
cifs.LMCompatibilityLevel    1
cifs.W2K_password_change     off
cifs.W2K_password_change_interval 4w
cifs.W2K_password_change_within 1h
cifs.audit.account_mgmt_events.enable off
cifs.audit.autosave.file.extension
cifs.audit.autosave.file.extension.nanosecond_precision off
cifs.audit.autosave.file.limit 0
cifs.audit.autosave.onsize.enable off
cifs.audit.autosave.onsize.threshold 75%
cifs.audit.autosave.ontime.enable off
cifs.audit.autosave.ontime.interval 1d
cifs.audit.enable            off
cifs.audit.file_access_events.enable on
cifs.audit.liveview.allowed_users
cifs.audit.liveview.enable   off
cifs.audit.logon_events.enable on
cifs.audit.logsize           1048576
cifs.audit.nfs.enable        off
cifs.audit.nfs.filter.filename
cifs.audit.saveas            /etc/log/adtlog.evt
cifs.bypass_traverse_checking on
cifs.client.dup-detection    ip-address
cifs.comment
cifs.enable_share_browsing   on
cifs.gpo.enable              off
cifs.gpo.trace.enable        off
cifs.grant_implicit_exe_perms off
cifs.guest_account
cifs.home_dir.generic_share_access_level 3
cifs.home_dir.generic_share_access_warn on
cifs.home_dir_namestyle
cifs.home_dirs_public_for_admin on
cifs.idle_timeout            900
cifs.ipv6.enable             off
cifs.max_mpx                 253
cifs.ms_snapshot_mode        xp
cifs.netbios_aliases
cifs.netbios_over_tcp.enable on
cifs.nfs_root_ignore_acl     off
cifs.oplocks.enable          on
cifs.oplocks.opendelta       0
cifs.per_client_stats.enable off
cifs.perfmon.allowed_users
cifs.perm_check_ro_del_ok    off
cifs.perm_check_use_gid      on
cifs.preserve_unix_security off
cifs.restrict_anonymous      0
cifs.restrict_anonymous.enable off
cifs.save_case               on
cifs.scopeid
cifs.search_domains
cifs.show_dotfiles           on
cifs.show_snapshot           off
cifs.shutdown_msg_level      2
cifs.sidcache.enable         on
cifs.sidcache.lifetime       1440
cifs.signing.enable          off
cifs.smb2.enable             on
cifs.smb2.signing.max_threads 5
cifs.smb2.signing.multiprocessing default
cifs.smb2.signing.required   off
cifs.smb2_1.branch_cache.enable off
cifs.smb2_1.branch_cache.hash_time_out 3600       (value might be overwritten in takeover)
cifs.smbx_signing_required   off
cifs.snapshot_file_folding.enable off
cifs.symlinks.cycleguard     on
cifs.symlinks.enable         on
cifs.trace_dc_connection     off
cifs.trace_login             off
cifs.universal_nested_groups.enable on
cifs.widelink.ttl            10m

nss2-ord> sysstat -x 1
CPU    NFS   CIFS   HTTP   Total     Net   kB/s    Disk   kB/s    Tape   kB/s Cache Cache    CP CP Disk   OTHER    FCP iSCSI     FCP   kB/s   iSCSI   kB/s
                                       in    out    read write    read write    age    hit time ty util                            in    out      in    out
71%    343   8928      0    9271    2106 59835    1652      0       0      0    19     98%    0% -    10%       0      0      0       0      0       0      0
72%    255   7589      0    7844    1866 61397    1420     24       0      0    19     98%    0% -     9%       0      0      0       0      0       0      0
65%    234   6501      0    6735    1577 46512   13396 13704       0      0    19     98%   58% 3    19%       0      0      0       0      0       0      0
67%    385   7070      0    7455    1726 48205    1652      0       0      0    19     97%    0% -    15%       0      0      0       0      0       0      0
64%    259   7270      0    7535    1764 53944    1648     32       0      0    19     97%    0% -     9%       6      0      0       0      0       0      0
57%    216   8213      0    8429    1886 48937    1816      0       0      0    19     97%    0% -     8%       0      0      0       0      0       0      0

The highest spike in Disk Out I have seen is 51%... cache hits are nice and high which is great. the filer is very busy in general though.

I am thinking the spiderweb of directories which goes 15 to 20 levels deep is the cause of this but I wanted to get some ideas of what more I can test on the filer end and possible tuning implementations.

Thanks for looking..

adambgunn · ‎2018-09-13

Just a note someone else set this all up and I inferited it. So I am learning and working.

Is this bad on the cifs domaininfo?

Preferred Addresses:
None

?? I am assuming it will fall to the favored addressese but is there a time cost associated with that?

cifs domaininfo

NetBIOS Domain:                         CNxxxxxxxxxx
Windows Domain Name:                    cnxxxxxxxxxxx
Domain Controller Functionality:        Windows 2012 R2
Domain Functionality:                   Windows 2008 R2
Forest Functionality:                   Windows 2008 R2
Filer AD Site:                          ord

Current Connected DCs:                  \\ORD-DC301 and \\ORD-DC302
Total DC addresses found:               7
Preferred Addresses:
                                        None
Favored Addresses:
                                        10.110.24.11    ORD-DC302        PDC
                                        10.110.24.10                     PDC
Other Addresses:
                                        10.28.24.8                       PDC
                                        10.28.24.9                       PDC
                                        10.32.8.191                      PDC
                                        10.24.7.8                        PDC
                                        10.24.7.9                        PDC

Connected AD LDAP Server:               \\ord-dc301.cnxxxx
Preferred Addresses:
                                        None
Favored Addresses:
                                        10.110.24.10
                                         ord-dc301.cnxxxx
                                        10.110.24.11
                                         ord-dc302.cnxxxxx
Other Addresses:
                                        10.28.24.8
                                         sj2-dc301.cnxxxxx
                                        10.28.24.9
                                         sj2-dc302.cnxxxx
                                        10.32.8.191
                                         wl-dc202.cnxxxx
                                        10.24.7.8
                                         ams-dc201.cnxxxx
                                        10.24.7.9
                                         ams-dc202.cnxxxx

maffo · ‎2018-10-03

I personally don't think the problem is related to the TCP window size or the Windows domain configuration, I have seen this issue happening often in volumes which have multiple nidification level of folders and subfolders, each of them containing also multiple files.

In order to confirm the issue is caused by the layout of the data in that volume, could you create a test volume with a bunch of random files/folders it in and see how much time it takes to open them/list the content?

I would suggest to create a layout similar to the following: a few (~50) text files in the root of the volume, 1 folder (A) containing another few files, 1 folder (B) containing some files and one subfolder (A) which in turns contain a few files, 1 folder (C) containing some files and two folders (A and B) containing other files, etc...
As you keep adding new folders with multiple subfolders and nidification levels, you should see the directory listing operations taking more time to complete.

CIFS performance issues on 7mode

New video on NetApp KB TV

New video on NetApp KB TV

New video on NetApp KB TV