Network and Storage Protocols
Network and Storage Protocols
Hi,
can somebody explain how we should troubleshoot high CIFS Utilisation? (was just for a short time)
Kind Regards
Christian
FAS3270 ONATP 8.1.1P1 7-Mode
aggregate 26 Disks (24 Data Disks) mirrored
sysstat -M 1
ANY1+ ANY2+ ANY3+ ANY4+ AVG CPU0 CPU1 CPU2 CPU3 Network Protocol Cluster Storage Raid Target Kahuna WAFL_Ex(Kahu) WAFL_XClean SM_Exempt Cifs Exempt Intr Host Ops/s CP
100% 43% 14% 3% 44% 26% 24% 27% 99% 20% 0% 0% 3% 7% 0% 5% 18%( 16%) 1% 0% 99% 6% 10% 8% 5449 100%
100% 36% 10% 3% 40% 22% 18% 21% 99% 17% 0% 0% 3% 4% 0% 3% 16%( 14%) 0% 0% 98% 6% 7% 6% 4844 100%
99% 31% 10% 3% 39% 20% 16% 21% 98% 16% 0% 0% 2% 3% 0% 3% 17%( 15%) 0% 0% 98% 2% 7% 6% 5066 20%
100% 33% 7% 1% 39% 21% 15% 21% 99% 19% 0% 0% 1% 2% 0% 3% 16%( 14%) 0% 0% 99% 1% 8% 8% 6071 0%
100% 35% 8% 2% 40% 22% 17% 22% 98% 20% 0% 0% 2% 2% 0% 3% 16%( 15%) 0% 0% 98% 1% 8% 8% 6199 0%
100% 33% 7% 1% 39% 20% 15% 20% 99% 18% 0% 0% 2% 2% 0% 2% 15%( 14%) 0% 0% 99% 1% 7% 8% 5664 0%
100% 34% 8% 2% 39% 21% 16% 21% 98% 19% 0% 0% 2% 2% 0% 3% 17%( 15%) 0% 0% 98% 1% 7% 8% 5999 0%
99% 32% 9% 2% 39% 21% 16% 20% 97% 17% 0% 0% 3% 3% 0% 3% 17%( 15%) 0% 0% 97% 1% 7% 7% 5548 0%
100% 34% 8% 2% 40% 22% 16% 21% 99% 18% 0% 0% 2% 2% 0% 3% 17%( 15%) 0% 0% 99% 1% 8% 9% 5798 0%
100% 35% 8% 2% 40% 22% 17% 22% 98% 20% 0% 0% 2% 2% 0% 3% 17%( 16%) 0% 0% 98% 1% 8% 8% 6438 0%
100% 35% 9% 2% 40% 23% 17% 22% 99% 20% 0% 0% 2% 2% 0% 3% 18%( 16%) 0% 0% 99% 1% 8% 8% 6260 0%
100% 30% 8% 2% 38% 20% 15% 19% 99% 16% 0% 0% 2% 3% 0% 3% 15%( 14%) 0% 0% 99% 1% 7% 7% 4879 0%
100% 40% 16% 5% 43% 26% 23% 25% 98% 15% 0% 0% 4% 8% 0% 4% 22%( 19%) 3% 0% 98% 6% 7% 6% 4829 80%
99% 37% 11% 3% 41% 24% 20% 23% 97% 18% 0% 0% 3% 4% 0% 3% 17%( 16%) 0% 0% 97% 5% 8% 8% 5563 100%
100% 33% 9% 2% 39% 21% 16% 21% 99% 17% 0% 0% 2% 3% 0% 3% 17%( 15%) 0% 0% 99% 2% 7% 7% 5199 41%
99% 31% 8% 2% 38% 20% 15% 20% 98% 17% 0% 0% 2% 3% 0% 3% 15%( 14%) 0% 0% 98% 1% 7% 7% 5336 0%
99% 38% 12% 3% 42% 26% 19% 26% 98% 19% 0% 0% 2% 3% 0% 3% 22%( 19%) 0% 0% 98% 2% 9% 9% 5466 0%
89% 39% 11% 2% 40% 31% 25% 39% 66% 26% 0% 0% 2% 3% 0% 13% 25%( 22%) 0% 0% 66% 3% 11% 11% 9329 0%
100% 36% 10% 3% 41% 24% 19% 23% 99% 20% 0% 0% 2% 3% 0% 3% 18%( 16%) 0% 0% 99% 2% 9% 10% 5582 0%
100% 35% 10% 3% 41% 25% 18% 23% 99% 20% 0% 0% 2% 2% 0% 3% 18%( 16%) 0% 0% 99% 1% 10% 9% 5470 0%
sysstat -x 1
CPU NFS CIFS HTTP Total Net kB/s Disk kB/s Tape kB/s Cache Cache CP CP Disk OTHER FCP iSCSI FCP kB/s iSCSI kB/s
in out read write read write age hit time ty util in out in out
98% 1968 1129 0 3097 2532 19118 9648 104 0 0 13 98% 5% : 26% 0 0 0 0 0 0 0
98% 4389 1325 0 5714 3396 24294 5764 0 0 0 13 94% 0% - 15% 0 0 0 0 0 0 0
97% 4709 1543 0 6340 2148 26073 4244 16 0 0 13 97% 0% - 11% 88 0 0 0 0 0 0
99% 4642 1100 0 5742 6089 24659 3168 48 0 0 13 98% 0% - 8% 0 0 0 0 0 0 0
99% 5207 1146 0 6353 2271 28075 3256 0 0 0 13 97% 0% - 9% 0 0 0 0 0 0 0
98% 4184 1288 0 5472 3539 22340 1076 0 0 0 13 99% 0% - 4% 0 0 0 0 0 0 0
80% 3630 4912 0 8799 3093 93962 11748 48 0 0 13 99% 0% - 24% 257 0 0 0 0 0 0
87% 4561 2668 0 7233 2438 27895 4444 16 0 0 13 99% 0% - 6% 4 0 0 0 0 0 0
99% 4748 1207 0 5956 3303 24560 3884 0 0 0 13 95% 0% - 15% 1 0 0 0 0 0 0
99% 4878 919 0 5797 5352 25890 4420 48 0 0 13 96% 0% - 16% 0 0 0 0 0 0 0
99% 5088 964 0 6052 6467 24895 28548 57496 0 0 13 94% 94% Tf 25% 0 0 0 0 0 0 0
99% 4808 1152 0 5960 3232 23969 8560 50340 0 0 13 94% 100% :f 17% 0 0 0 0 0 0 0
98% 4504 1260 0 5776 1905 24054 2240 35892 0 0 13 95% 85% : 13% 12 0 0 0 0 0 0
99% 4813 1001 0 5815 17678 23785 4300 1040 0 0 13 94% 0% - 10% 1 0 0 0 0 0 0
99% 4891 1016 0 5907 3490 24335 3068 0 0 0 13 96% 0% - 15% 0 0 0 0 0 0 0
99% 4909 894 0 5803 1946 24525 2380 48 0 0 13 95% 0% - 16% 0 0 0 0 0 0 0
99% 4999 1049 0 6048 1860 24571 2624 0 0 0 13 95% 0% - 14% 0 0 0 0 0 0 0
vfiler run filer2 cifs stat
===== filer2
reject 0 0%
mkdir 1191 0%
rmdir 3514 0%
open 0 0%
create 0 0%
close 1098191575 14%
X&close 0 0%
flush 1448 0%
X&flush 0 0%
delete 469821 0%
rename 58209 0%
NTRename 0 0%
getatr 93 0%
setatr 0 0%
read 0 0%
X&read 0 0%
write 480477 0%
X&write 0 0%
lock 0 0%
unlock 0 0%
mknew 0 0%
chkpth 93 0%
exit 0 0%
lseek 0 0%
lockread 0 0%
X&lockread 0 0%
writeunlock 0 0%
readbraw 0 0%
writebraw 0 0%
writec 0 0%
gettattre 0 0%
settattre 0 0%
lockingX 6244353 0%
IPC 273028 0%
open2 0 0%
find_first2 220063067 3%
find_next2 1115069 0%
query_fs_info 9568326 0%
query_path_info 1858137663 24%
set_path_info 4373384 0%
query_file_info 738040511 10%
set_file_info 4693818 0%
create_dir2 0 0%
Dfs_referral 15075284 0%
Dfs_report 0 0%
echo 567655 0%
writeclose 0 0%
openX 2192983 0%
readX 299922070 4%
writeX 44670227 1%
findclose 0 0%
tcon 0 0%
tdis 675448 0%
negprot 8330 0%
login 556585 0%
logout 277584 0%
tconX 522951 0%
dskattr 2 0%
search 0 0%
fclose 5349 0%
NTCreateX 1154096683 15%
NTTransCreate 3704 0%
NTTransIoctl 188499179 2%
NTTransNotify 137639 0%
NTTransSetSec 0 0%
NTTransQuerySec 885926601 12%
NTNamedPipeMulti 0 0%
NTCancel CN 4810 0%
NTCancel Other 4 0%
SMB2Echo 0 0%
SMB2Negprot 54196 0%
SMB2TreeConnnect 209860 0%
SMB2TreeDisconnect 168462 0%
SMB2Login 53867 0%
SMB2Create 582571786 8%
SMB2Read 161692733 2%
SMB2Write 51193975 1%
SMB2Lock 11395836 0%
SMB2Unlock 11390614 0%
SMB2OplkBrkAck 0 0%
SMB2ChgNfy 356167 0%
SMB2CLose 127057533 2%
SMB2Flush 357578 0%
SMB2Logout 49025 0%
SMB2Cancel 120985 0%
SMB2IPCCreate 491354 0%
SMB2IPCRead 496976 0%
SMB2IPCWrite 491921 0%
SMB2QueryDir 50818664 1%
SMB2QueryFileBasicInfo 1808 0%
SMB2QueryFileStndInfo 1291 0%
SMB2QueryFileIntInfo 14414472 0%
SMB2QueryFileEAInfo 15456638 0%
SMB2QueryFileFEAInfo 538 0%
SMB2QueryFileModeInfo 0 0%
SMB2QueryAltNameInfo 0 0%
SMB2QueryFileStreamInfo 1575674 0%
SMB2QueryNetOpenInfo 1337388 0%
SMB2QueryAttrTagInfo 1 0%
SMB2QueryAccessInfo 0 0%
SMB2QueryFileUnsupported 0 0%
SMB2QueryFileInvalid 1996 0%
SMB2QueryFSVolInfo 22701940 0%
SMB2QueryFSSizeInfo 336641 0%
SMB2QueryFSDevInfo 0 0%
SMB2QueryFSAttrInfo 22701937 0%
SMB2QueryFSFullSzInfo 61881 0%
SMB2QueryFSObjIdInfo 0 0%
SMB2QueryFSInvalid 0 0%
SMB2QuerySecurityInfo 6763117 0%
SMB2SetBasicInfo 1173575 0%
SMB2SetRenameInfo 126134 0%
SMB2SetFileLinkInfo 0 0%
SMB2SetFileDispInfo 2226284 0%
SMB2SetFullEAInfo 0 0%
SMB2SetModeInfo 0 0%
SMB2SetAllocInfo 1383904 0%
SMB2SetEOFInfo 1008637 0%
SMB2SetUnsupported 0 0%
SMB2SetInfoInvalid 0 0%
SMB2SetSecurityInfo 4418 0%
SMB2FsctlPipeTransceive 591374 0%
SMB2FsctlPipePeek 0 0%
SMB2FsctlEnumSnapshots 2666 0%
SMB2FsctlDfsReferrals 261445 0%
SMB2FsctlSetSparse 0 0%
SMB2FsctlSecureShare 0 0%
SMB2FsctlFileUnsupported 780930 0%
SMB2FsctlIpcUnsupported 51594 0%
cancel lock 0
wait lock 0
copy to align 349756
alignedSmall 63670910
alignedLarge 583299431
alignedSmallRel 0
alignedLargeRel 0
FidHashAllocs 5996
TidHashAllocs 112
UidHashAllocs 0
mbufWait 0
nbtWait 0
pBlkWait 0
BackToBackCPWait 0
cwaWait 0
short msg prevent 1062
multipleVCs 277495
SMB signing 0
mapped null user 0
PDCupcalls 0
nosupport 0
read pipe busy 0
write pipe busy 0
trans pipe busy 0
read pipe broken 0
write pipe broken 0
trans pipe broken 0
queued writeraw 0
nbt disconnect 53094
smb disconnect 3818
dup disconnect 204
OpLkBkXorBatchToL2 2942053
OpLkBkXorBatchToNone 288
OpLkBkL2ToNone 3088456
OpLkBkNoBreakAck 1
OpLkBkNoBreakAck95 0
OpLkBkNoBreakAckNT 1
OpLkBkIgnoredAck 1601112
OpLkBkWaiterTimedOut 0
OpLkBkDelayedBreak 0
SharingErrorRetries 4561
FoldAttempts 0
FoldRenames 0
FoldRenameFailures 0
FoldOverflows 0
FoldDuplicates 0
FoldWAFLTooBusy 0
NoAllocCredStat 0
RetryRPCcollision 0
TconCloseTID 0
GetNTAPExtAttrs 0
SetNTAPExtAttrs 0
SearchBusy 0
ChgNfyNoMemory 0
ChgNfyNewWatch 82017
ChgNfyLastWatch 81117
UsedMIDTblCreated 750
UnusedMIDTblCreated 330
InvalidMIDRejects 0
SMB2InvalidSignature 65
SMB2DurableCreateReceived 582558319
SMB2DurableCreateSucceeded 33008891
SMB2DurableReclaimReceived 6744
SMB2DurableReclaimSucceeded 1156
SMB2DurableHandlePreserved 20678
SMB2DurableHandlePurged 35
SMB2DurableHandleExpired 19487
SMB2FileDirInfo 7
SMB2FileFullDirInfo 506146
SMB2FileIdFullDirInfo 0
SMB2FileBothDirInfo 14726173
SMB2FileIdBothDirInfo 16305769
SMB2FileNamesInfo 3528940
SMB2FileDirUnsupported 0
SMB2QueryInfo 85398010
SMB2SetInfo 5922955
SMB2Ioctl 1688009
SMB2RelatedCompRequest 57443936
SMB2UnRelatedCompRequest 0
SMB2FileRequest 1088913892
SMB2PipeRequest 3308848
SMB2_1_LeaseBreaks 455412
SMB2_1_LeaseUpgrades 54940
SMB2_1_LeaseBreakExcuses 11514840
SMB2_1_LeaseBreakAckTimeouts 8885
SMB2_1_HandleLeaseBreaks 13486
SMB2_1_LeaseBreaksToNone 14774
SMB2_1_LeaseBreakAcksIgnored 1
SMB2nosupport 333079
Max Multiplex = 162, Max pBlk Exhaust = 15103, Max pBlk Reserve Exhaust = 15033
Max FIDs = 58672, Max FIDs on one tree = 10590
Max Searches on one tree = 10, Max Core Searches on one tree = 0
Max sessions = 279
Max trees = 1782
Max shares = 48
Max session UIDs = 45, Max session TIDs = 369
Max locks = 102228
Max credentials = 287
Max group SIDs per credential = 165
Max pBlks = 1024 Current pBlks = 1024 Num Logons = 0
Max reserved pBlks = 32 Current reserved pBlks = 32
Max gAuthQueue depth = 3
Max gSMBBlockingQueue depth = 49
Max gSMBTimerQueue depth = 6
Max gSMBAlfQueue depth = 2
Max gSMBRPCWorkerQueue depth = 50
Max gOffloadQueue depth = 2
Local groups: builtins = 6, user-defined = 1, SIDs = 6
RPC group count = 10, RPC group active count = 1
Max Watched Directories = 2059, Current Watched Directories = 1197
Max Pending ChangeNotify Requests = 1979, Current Pending ChangeNotify Requests = 1089
Max Pending DeleteOnClose Requests = 3072, Current Pending DeleteOnClose Requests = 3
Checklist for troubleshooting CIFS issues
• Use "sysstat –x 1" to determine how many CIFS ops/s and how much CPU is being utilized
• Check /etc/messages for any abnormal messages, especially for oplock break timeouts
• Use "perfstat" to gather data and analyze (note information from "ifstat", "statit", "cifs stat", and "smb_hist", messages, general cifs info)
• "pktt" may be necessary to determine what is being sent/received over the network
• "sio" should / could be used to determine how fast data can be written/read from the filer
• Client troubleshooting may include review of event logs, ping of filer, test using a different filer or Windows server
• If it is a network issue, check "ifstat –a", "netstat –in" for any I/O errors or collisions
• If it is a gigabit issue check to see if the flow control is set to FULL on the filer and the switch
• On the filer if it is one volume having an issue, do "df" to see if the volume is full
• Do "df –i" to see if the filer is running out of inodes
• From "statit" output, if it is one volume that is having an issue check for disk fragmentation
• Try the "netdiag –dv" command to test filer side duplex mismatch. It is important to find out what the benchmark is and if it’s a reasonable one
• If the problem is poor performance, try a simple file copy using Explorer and compare it with the application's performance. If they both are same, the issue probably is not the application. Rule out client problems and make sure it is tested on multiple clients. If it is an application performance issue, get all the details about:
◦ The version of the application
◦ What specifics of the application are slow, if any
◦ How the application works
◦ Is this equally slow while using another Windows server over the network?
◦ The recipe for reproducing the problem in a NetApp lab
• If the slowness only happens at certain times of the day, check if the times coincide with other heavy activity like SnapMirror, SnapShots, dump, etc. on the filer. If normal file reads/writes are slow:
◦ Check duplex mismatch (both client side and filer side)
◦ Check if oplocks are used (assuming they are turned off)
◦ Check if there is an Anti-Virus application running on the client. This can cause performance issues especially when copying multiple small files
◦ Check "cifs stat" to see if the Max Multiplex value is near the cifs.max_mpx option value. Common situations where this may need to be increased are when the filer is being used by a Windows Terminal Server or any other kind of server that might have many users opening new connections to the filer. What is CIFS Max Multiplex?
◦ Check the value of OpLkBkNoBreakAck in "cifs stat". Non-zero numbers indicate oplock break timeouts, which cause performance problem
Check these as well,
1. enabling SMB 2 on the controller and using SMB 2 enabled clients would give better performance than SMBv1.
2. Make sure there is not much latency between the Domain controllers and the controller
3. Need to make sure that no stale DCs are listed under the preferred DCs
What's cifs.tcp_window_size set to?
Max Multiplex = 162, Max pBlk Exhaust = 15103, Max pBlk Reserve Exhaust = 15033
Looks like you having the same Max pBlk Exhaust that we are having on our new 6290c 8.1.3P1 7-mode filer service Windows Home directories. We previously had the same issue on a 3160
When the pBlk's get too high CIFS stops serving data which has caused us to failover 4 times in the past 2 years now. 4 - P1 cases, no root cause.
Max pBlk Exhaust can be caused by a few known problems including vscan , and slow Active Directory. We don't have virus scanner on.
Sure would like to get some community help with this issue...
Hi we are also seeing pBlk Exhaustment on our FAS6240AE with Ontap 8.1.4 (SMB 1.x only).
After a reboot/Upgrade of the Filer we dont see this problem for about 3-4 Weeks - but then it starts going up fast and faster until we see disruption on CIFS-Shares when this counter reaches very high numbers.
Sun Feb 9 13:17:16 CET [filer:cifs.stats.pBlkExhaust:info]: CIFS: All CIFS control blocks for the STANDARD pool are in use. The request for a new control block can not be granted.
Mon Feb 10 09:22:30 CET [filer:ems.engine.suppressed:debug]: Event 'cifs.stats.pBlkExhaust' suppressed 5 times in last 72314 seconds.
Mon Feb 10 09:22:30 CET [filer:cifs.stats.pBlkExhaust:info]: CIFS: All CIFS control blocks for the STANDARD pool are in use. The request for a new control block can not be granted.
Mon Feb 10 12:05:34 CET [filer:cifs.stats.pBlkExhaust:info]: CIFS: All CIFS control blocks for the STANDARD pool are in use. The request for a new control block can not be granted.
Mon Feb 10 16:02:14 CET [filer:cifs.stats.pBlkExhaust:info]: CIFS: All CIFS control blocks for the STANDARD pool are in use. The request for a new control block can not be grante
We also monitored the other Queues like offload (Virus Scanning) and auth (AD), but dont see any high numbers here.
So it looks like the cause of this pBLK exhaustion lays somewhere in Data Ontap (cifs Filefolding, .....)
here also a short overview of the cifs stat on the Filer
Max Multiplex = 46, Max pBlk Exhaust = 20, Max pBlk Reserve Exhaust = 0
Max pBlks = 1152 Current pBlks = 1152 Num Logons = 121
Max reserved pBlks = 32 Current reserved pBlks = 32
Max gAuthQueue depth = 2
Max gOffloadQueue depth = 2
In my opinion a reboot/takeover would help, but only for about 3-5 weeks and then the problem (pblk Exhaustion counter rises) will start again 😞
We also have opened a Netapp Support Case, but problem looks far more complex and is unsolved since weeks.
So i am interested if anyone of you maybe have seen this issue and knows a solution.
Best Regards,
klaus
what about the "options cifs.smb" ?
SMB2.x is completely disabled, as we had many differnet issues in the past (pBlk, durable Handles, SMB2.1 Access issues, ...)
[root@mucsx003 20140211-14:18:03]# ssh vihsdn03 options cifs.smb
cifs.smb2.durable_handle.enable off
cifs.smb2.enable off
cifs.smb2.signing.required off
cifs.smb2_1.branch_cache.enable off
cifs.smb2_1.branch_cache.hash_time_out 3600 (value might be overwritten in takeover)
Hi DanPancamo,
looks like you are seeing the same issues as we see on our FAS6240AE
Max Multiplex = 162, Max pBlk Exhaust = 15103, Max pBlk Reserve Exhaust = 15033
but no real clue, why this happens.
Looks like it starts some weeks after an Update/Reboot and when counters start rising until EndOfCifsService 😞
But it looks no connected to AD or vSCAN issues, as we also closely monitor the other counters.
Any update if you see advances in this Case would be fine.
Best Regards,
Klaus
Same here... Having the same problem and worst case is we never know at what time we are going to hit again. Experienced the problem few days ago and it hit us yesterday night. As a workaround we used to terminate CIFS and restart it again but it is not a solution.We host 2500+ Virtual desktops and when this issue comes all the desktops get freeze because User Profiles and "User data" drives are from CIFS. It seems this problem is a long prevailing one and solution NO so far.
Running Data ONTAP 8.1.2P4 on FAS3240AE
Max Multiplex = 49, Max pBlk Exhaust = 411091, Max pBlk Reserve Exhaust = 68563
Max FIDs = 52522, Max FIDs on one tree = 730
Max Searches on one tree = 15, Max Core Searches on one tree = 0
Max sessions = 5686
Instance exhaust_mem_ exhaust_mem_ max_auth_qle max_offload_
vfiler0 409962 68563 2 2
vfiler0 409962 68563 2 2
vfiler0 409962 68563 2 2
vfiler0 409962 68563 2 2
vfiler0 409962 68563 2 2
vfiler0 409962 68563 2 2
vfiler0 409962 68563 2 2
vfiler0 409962 68563 2 2
Added to the list as well..
Max Multiplex = 33, Max pBlk Exhaust = 53221316, Max pBlk Reserve Exhaust = 157
We can't determine if it's deswizzle of volumes causing cpu to spike to 100% or if it's vscan. We disabled vscan and we think we were still exhausting.. runnin 8.1.3p1
any ideas?
We have this problem also. First time we tought it was DC problem and we user preferred DC to resolve but we have had this problem again...
NetApp Release 8.1.3P2 7-Mode
Max Multiplex = 177, Max pBlk Exhaust = 46878, Max pBlk Reserve Exhaust = 0
Dear all,
So sorry for replying in late...
On the days I posted on this thread (around March 08), I also contacted domain experts in NetApp technical support team regarding this issue. They told me that this is due to SMB 2.1 lease issues in specific Data ONTAP 7-Mode releases. Then they advised me to disable SMB 2.1 in the filer. The way to do so is...
priv set -q diag; setflag smb_enable_2_1 0; priv set;
Note that above is NOT persistent across system reboots so, you need to add the same in the /etc/rc file to make it persistent.
Setting above will effect to new CIFS sessions so in order to make it effect for the existing sessions, you need to restart the CIFS service.
On a later date (25th March 2014), NetApp has released a KB article for the same.
Hope this helps... It is always a pleasure to share knowledge and experiences.
^^
This is a good find.. I See that part of it was fixed in 8.1.4, but the other bug is scheduled to be fixed.
http://support.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=798842
in later releases....
I have the same issue using Ontapp 7.3.3 without smb 2.1, have you heard anything for older ontaps having this issue.