Network and Storage Protocols
Network and Storage Protocols
hello netapp community.. I am reaching out because I wanted to get some input on a CIFS performance issue we are experiencing from the perspective of the end user.. Like taking updwards of 10 minutes to open a folder in some cases. The end user environment is predominently Mac with just a handful of windows 7/10.
The volume in question is 13T total of which 11T are used with 114,646,233 inodes used (about 30% use).
I mounted this up to my test machine and there is a spiderweb of directories and subdirectories upon subdirectories which have even more subdirectories and filer... some of these take minutes to ls -la and hundreds of directories/files are listed. It's a bit of mess. I was thinking about performing a rellocate on the volume but it would probably take days just to finish the first step (measure) and it seems to impact the filer pretty heavily while scanning. I was also thinking abouit increasing the tcp_window_size possibly but a restart of CIFS is needed which is pretty disruptive so I would need to have a window for that. Current setting is:
ourfiler*> options cifs.tcp_window_size
cifs.tcp_window_size 64240
cifs.tcp_window_size
?We have a dual 10G LACP connection for data. The filer is an FAS8020. NetApp Release 8.2.5 7-Mode: Wed Jul 19 03:55:53 PDT 2017
Some information about our CIFS environment
nss2-ord> stats show cifs
cifs:cifs:instance_name:cifs
cifs:cifs:node_name:
cifs:cifs:node_uuid:
cifs:cifs:cifs_ops:2427/s
cifs:cifs:cifs_latency:0.09ms
cifs:cifs:cifs_read_ops:20/s
cifs:cifs:cifs_write_ops:0/s
cifs:cifs:cifs_dlc_ops:0
cifs:cifs:cifs_dlc_ops_queued:0
We have audit logging off /etc is not filling up
nss2-ord> stats show cifs
cifs:cifs:instance_name:cifs
cifs:cifs:node_name:
cifs:cifs:node_uuid:
cifs:cifs:cifs_ops:2427/s
cifs:cifs:cifs_latency:0.09ms
cifs:cifs:cifs_read_ops:20/s
cifs:cifs:cifs_write_ops:0/s
cifs:cifs:cifs_dlc_ops:0
cifs:cifs:cifs_dlc_ops_queued:0
nss2-ord> options cifs
cifs.AD.retry_delay 0
cifs.LMCompatibilityLevel 1
cifs.W2K_password_change off
cifs.W2K_password_change_interval 4w
cifs.W2K_password_change_within 1h
cifs.audit.account_mgmt_events.enable off
cifs.audit.autosave.file.extension
cifs.audit.autosave.file.extension.nanosecond_precision off
cifs.audit.autosave.file.limit 0
cifs.audit.autosave.onsize.enable off
cifs.audit.autosave.onsize.threshold 75%
cifs.audit.autosave.ontime.enable off
cifs.audit.autosave.ontime.interval 1d
cifs.audit.enable off
cifs.audit.file_access_events.enable on
cifs.audit.liveview.allowed_users
cifs.audit.liveview.enable off
cifs.audit.logon_events.enable on
cifs.audit.logsize 1048576
cifs.audit.nfs.enable off
cifs.audit.nfs.filter.filename
cifs.audit.saveas /etc/log/adtlog.evt
cifs.bypass_traverse_checking on
cifs.client.dup-detection ip-address
cifs.comment
cifs.enable_share_browsing on
cifs.gpo.enable off
cifs.gpo.trace.enable off
cifs.grant_implicit_exe_perms off
cifs.guest_account
cifs.home_dir.generic_share_access_level 3
cifs.home_dir.generic_share_access_warn on
cifs.home_dir_namestyle
cifs.home_dirs_public_for_admin on
cifs.idle_timeout 900
cifs.ipv6.enable off
cifs.max_mpx 253
cifs.ms_snapshot_mode xp
cifs.netbios_aliases
cifs.netbios_over_tcp.enable on
cifs.nfs_root_ignore_acl off
cifs.oplocks.enable on
cifs.oplocks.opendelta 0
cifs.per_client_stats.enable off
cifs.perfmon.allowed_users
cifs.perm_check_ro_del_ok off
cifs.perm_check_use_gid on
cifs.preserve_unix_security off
cifs.restrict_anonymous 0
cifs.restrict_anonymous.enable off
cifs.save_case on
cifs.scopeid
cifs.search_domains
cifs.show_dotfiles on
cifs.show_snapshot off
cifs.shutdown_msg_level 2
cifs.sidcache.enable on
cifs.sidcache.lifetime 1440
cifs.signing.enable off
cifs.smb2.enable on
cifs.smb2.signing.max_threads 5
cifs.smb2.signing.multiprocessing default
cifs.smb2.signing.required off
cifs.smb2_1.branch_cache.enable off
cifs.smb2_1.branch_cache.hash_time_out 3600 (value might be overwritten in takeover)
cifs.smbx_signing_required off
cifs.snapshot_file_folding.enable off
cifs.symlinks.cycleguard on
cifs.symlinks.enable on
cifs.trace_dc_connection off
cifs.trace_login off
cifs.universal_nested_groups.enable on
cifs.widelink.ttl 10m
nss2-ord> sysstat -x 1
CPU NFS CIFS HTTP Total Net kB/s Disk kB/s Tape kB/s Cache Cache CP CP Disk OTHER FCP iSCSI FCP kB/s iSCSI kB/s
in out read write read write age hit time ty util in out in out
71% 343 8928 0 9271 2106 59835 1652 0 0 0 19 98% 0% - 10% 0 0 0 0 0 0 0
72% 255 7589 0 7844 1866 61397 1420 24 0 0 19 98% 0% - 9% 0 0 0 0 0 0 0
65% 234 6501 0 6735 1577 46512 13396 13704 0 0 19 98% 58% 3 19% 0 0 0 0 0 0 0
67% 385 7070 0 7455 1726 48205 1652 0 0 0 19 97% 0% - 15% 0 0 0 0 0 0 0
64% 259 7270 0 7535 1764 53944 1648 32 0 0 19 97% 0% - 9% 6 0 0 0 0 0 0
57% 216 8213 0 8429 1886 48937 1816 0 0 0 19 97% 0% - 8% 0 0 0 0 0 0 0
The highest spike in Disk Out I have seen is 51%... cache hits are nice and high which is great. the filer is very busy in general though.
I am thinking the spiderweb of directories which goes 15 to 20 levels deep is the cause of this but I wanted to get some ideas of what more I can test on the filer end and possible tuning implementations.
Thanks for looking..
Just a note someone else set this all up and I inferited it. So I am learning and working.
Is this bad on the cifs domaininfo?
Preferred Addresses:
None
?? I am assuming it will fall to the favored addressese but is there a time cost associated with that?
cifs domaininfo
NetBIOS Domain: CNxxxxxxxxxx
Windows Domain Name: cnxxxxxxxxxxx
Domain Controller Functionality: Windows 2012 R2
Domain Functionality: Windows 2008 R2
Forest Functionality: Windows 2008 R2
Filer AD Site: ord
Current Connected DCs: \\ORD-DC301 and \\ORD-DC302
Total DC addresses found: 7
Preferred Addresses:
None
Favored Addresses:
10.110.24.11 ORD-DC302 PDC
10.110.24.10 PDC
Other Addresses:
10.28.24.8 PDC
10.28.24.9 PDC
10.32.8.191 PDC
10.24.7.8 PDC
10.24.7.9 PDC
Connected AD LDAP Server: \\ord-dc301.cnxxxx
Preferred Addresses:
None
Favored Addresses:
10.110.24.10
ord-dc301.cnxxxx
10.110.24.11
ord-dc302.cnxxxxx
Other Addresses:
10.28.24.8
sj2-dc301.cnxxxxx
10.28.24.9
sj2-dc302.cnxxxx
10.32.8.191
wl-dc202.cnxxxx
10.24.7.8
ams-dc201.cnxxxx
10.24.7.9
ams-dc202.cnxxxx
I personally don't think the problem is related to the TCP window size or the Windows domain configuration, I have seen this issue happening often in volumes which have multiple nidification level of folders and subfolders, each of them containing also multiple files.
In order to confirm the issue is caused by the layout of the data in that volume, could you create a test volume with a bunch of random files/folders it in and see how much time it takes to open them/list the content?
I would suggest to create a layout similar to the following: a few (~50) text files in the root of the volume, 1 folder (A) containing another few files, 1 folder (B) containing some files and one subfolder (A) which in turns contain a few files, 1 folder (C) containing some files and two folders (A and B) containing other files, etc...
As you keep adding new folders with multiple subfolders and nidification levels, you should see the directory listing operations taking more time to complete.