hello netapp community.. I am reaching out because I wanted to get some input on a CIFS performance issue we are experiencing from the perspective of the end user.. Like taking updwards of 10 minutes to open a folder in some cases. The end user environment is predominently Mac with just a handful of windows 7/10.
The volume in question is 13T total of which 11T are used with 114,646,233 inodes used (about 30% use).
I mounted this up to my test machine and there is a spiderweb of directories and subdirectories upon subdirectories which have even more subdirectories and filer... some of these take minutes to ls -la and hundreds of directories/files are listed. It's a bit of mess. I was thinking about performing a rellocate on the volume but it would probably take days just to finish the first step (measure) and it seems to impact the filer pretty heavily while scanning. I was also thinking abouit increasing the tcp_window_size possibly but a restart of CIFS is needed which is pretty disruptive so I would need to have a window for that. Current setting is:
nss2-ord> stats show cifs cifs:cifs:instance_name:cifs cifs:cifs:node_name: cifs:cifs:node_uuid: cifs:cifs:cifs_ops:2427/s cifs:cifs:cifs_latency:0.09ms cifs:cifs:cifs_read_ops:20/s cifs:cifs:cifs_write_ops:0/s cifs:cifs:cifs_dlc_ops:0 cifs:cifs:cifs_dlc_ops_queued:0 nss2-ord> options cifs cifs.AD.retry_delay 0 cifs.LMCompatibilityLevel 1 cifs.W2K_password_change off cifs.W2K_password_change_interval 4w cifs.W2K_password_change_within 1h cifs.audit.account_mgmt_events.enable off cifs.audit.autosave.file.extension cifs.audit.autosave.file.extension.nanosecond_precision off cifs.audit.autosave.file.limit 0 cifs.audit.autosave.onsize.enable off cifs.audit.autosave.onsize.threshold 75% cifs.audit.autosave.ontime.enable off cifs.audit.autosave.ontime.interval 1d cifs.audit.enable off cifs.audit.file_access_events.enable on cifs.audit.liveview.allowed_users cifs.audit.liveview.enable off cifs.audit.logon_events.enable on cifs.audit.logsize 1048576 cifs.audit.nfs.enable off cifs.audit.nfs.filter.filename cifs.audit.saveas /etc/log/adtlog.evt cifs.bypass_traverse_checking on cifs.client.dup-detection ip-address cifs.comment cifs.enable_share_browsing on cifs.gpo.enable off cifs.gpo.trace.enable off cifs.grant_implicit_exe_perms off cifs.guest_account cifs.home_dir.generic_share_access_level 3 cifs.home_dir.generic_share_access_warn on cifs.home_dir_namestyle cifs.home_dirs_public_for_admin on cifs.idle_timeout 900 cifs.ipv6.enable off cifs.max_mpx 253 cifs.ms_snapshot_mode xp cifs.netbios_aliases cifs.netbios_over_tcp.enable on cifs.nfs_root_ignore_acl off cifs.oplocks.enable on cifs.oplocks.opendelta 0 cifs.per_client_stats.enable off cifs.perfmon.allowed_users cifs.perm_check_ro_del_ok off cifs.perm_check_use_gid on cifs.preserve_unix_security off cifs.restrict_anonymous 0 cifs.restrict_anonymous.enable off cifs.save_case on cifs.scopeid cifs.search_domains cifs.show_dotfiles on cifs.show_snapshot off cifs.shutdown_msg_level 2 cifs.sidcache.enable on cifs.sidcache.lifetime 1440 cifs.signing.enable off cifs.smb2.enable on cifs.smb2.signing.max_threads 5 cifs.smb2.signing.multiprocessing default cifs.smb2.signing.required off cifs.smb2_1.branch_cache.enable off cifs.smb2_1.branch_cache.hash_time_out 3600 (value might be overwritten in takeover) cifs.smbx_signing_required off cifs.snapshot_file_folding.enable off cifs.symlinks.cycleguard on cifs.symlinks.enable on cifs.trace_dc_connection off cifs.trace_login off cifs.universal_nested_groups.enable on cifs.widelink.ttl 10m
The highest spike in Disk Out I have seen is 51%... cache hits are nice and high which is great. the filer is very busy in general though.
I am thinking the spiderweb of directories which goes 15 to 20 levels deep is the cause of this but I wanted to get some ideas of what more I can test on the filer end and possible tuning implementations.
Just a note someone else set this all up and I inferited it. So I am learning and working.
Is this bad on the cifs domaininfo?
Preferred Addresses: None
?? I am assuming it will fall to the favored addressese but is there a time cost associated with that?
NetBIOS Domain: CNxxxxxxxxxx Windows Domain Name: cnxxxxxxxxxxx Domain Controller Functionality: Windows 2012 R2 Domain Functionality: Windows 2008 R2 Forest Functionality: Windows 2008 R2 Filer AD Site: ord
Current Connected DCs: \\ORD-DC301 and \\ORD-DC302 Total DC addresses found: 7 Preferred Addresses: None Favored Addresses: 10.110.24.11 ORD-DC302 PDC 10.110.24.10 PDC Other Addresses: 10.28.24.8 PDC 10.28.24.9 PDC 10.32.8.191 PDC 10.24.7.8 PDC 10.24.7.9 PDC
I personally don't think the problem is related to the TCP window size or the Windows domain configuration, I have seen this issue happening often in volumes which have multiple nidification level of folders and subfolders, each of them containing also multiple files.
In order to confirm the issue is caused by the layout of the data in that volume, could you create a test volume with a bunch of random files/folders it in and see how much time it takes to open them/list the content?
I would suggest to create a layout similar to the following: a few (~50) text files in the root of the volume, 1 folder (A) containing another few files, 1 folder (B) containing some files and one subfolder (A) which in turns contain a few files, 1 folder (C) containing some files and two folders (A and B) containing other files, etc... As you keep adding new folders with multiple subfolders and nidification levels, you should see the directory listing operations taking more time to complete.