ONTAP Hardware
ONTAP Hardware
Hi Team,
I am facing performance issue in one of our NetApp cluster.
I have checked and found the filer having large number of files/directories (millons of files) have slow response time as compare to the filer which have less number of files.
we have recently purchased FAS 3270 which again have millons of files and is working slowly.
I have few questions in my mind:
Is this issue comes because of total number of files stored in a filer or files used by memory?
Infact I have increased the read only memory by installing the 520 GB Flash Cache module which will not worked.
Is there any way through which we are able to see, out of total available files how may files are actually used.?
Did some one know root cause of this issue?
Please try to help me for resolving this issue which really create a big impact in our environment.
Best Regards
Pulkit Kaul
Do you see a lot of "other" IOPs? Is the flashcache in default mode (data and metadata)? It might help to set to metadata only if a lot of directory and metadata lookups. A perfstat would be interesting to have analyzed to see performance.
HI have tried both the modes but still not get any success.
I have generated the perfstat output which clearly show the number of files in filer dlhfs15 is much higher as compare to filer bnrfs04.
Below is the analysis which NetApp has provided:
Hello Pulkit,
I checked both the perfstat and as suspected the high file
count is the main issue. Below is the detailed analysis.
Volume
volume:fs04ag1v1:avg_latency:18.53us – BNRFS04
volume:fs15ag5v2:avg_latency:311.90us–DLHFS15 – The latency on the volume is
way higher than BNRFS04
nfsv3: nfs:
nfsv3_op_latency.readdirplus:64.66us -
BNRFS04
nfsv3:nfs:nfsv3_op_latency.readdirplus:6041.57us - DLHFS15
BNRFS04
Server nfs V3: (2824081 calls)
null
getattr setattr
lookup access
readlink read
2 0%
2210077 78% 24892 1% 241781 9% 115522 4% 0
0% 24381 1%
write
create mkdir
symlink mknod
remove rmdir
106985 4% 66943 2%
6200 0% 298 0% 0
0% 18853 1% 412 0%
rename
link readdir readdir+ fsstat
fsinfo pathconf
2238 0% 0
0% 0 0% 5440 0% 55
0% 2 0% 0 0%
DLHFS15
Server
nfs V3: (1309331 calls)
null
getattr setattr
lookup access
readlink read
38
0% 379717 29% 43240 3% 110999 8%
127401 10% 516 0% 114686 9%
write
create mkdir
symlink mknod
remove rmdir
190382
15% 46584 4% 5460 0% 454
0% 0 0% 105022
8% 27123 2%
rename
link readdir readdir+ fsstat
fsinfo pathconf
16870
1% 1268 0% 252 0% 138377 11% 881 0% 56
0% 5 0%
commit
0
0%
nfsv3:nfs:nfsv3_op_latency.readdir:0us - BNRFS04 directory scanning
nfsv3:nfs:nfsv3_op_latency.readdirplus:64.66us - BNRFS04 directory scanning
nfsv3:nfs:nfsv3_op_latency.readdir:1132.03us - DLHFS15 directory scanning
nfsv3:nfs:nfsv3_op_latency.readdirplus:6041.57us - DLHFS15 directory scanning – This
build does a lot of directory scanning (11% of the operation is directory
scanning) and the latency for the same on DLHFS15 is much higher than that of
BNRFS04
BNRFS04
Filesystem
iused ifree %iused Mounted on
/vol/fs04ag1v1/
934210 30942479 3%
/vol/fs04ag1v1/
/vol/fs04ag1v2/
3611657 28265032 11% /vol/fs04ag1v2/
/vol/fs04ag2v1/
4833894 27042795 15% /vol/fs04ag2v1/
/vol/vol0/
224695 7229982 3%
/vol/vol0/
Total number of files 9604456
DLHFS15
Filesystem
iused ifree %iused Mounted on
/vol/fs15ag6v1/
18732711 41267274 31% /vol/fs15ag6v1/
/vol/fs15ag2v1/
36341917 25534764 59% /vol/fs15ag2v1/
/vol/fs15ag5v1/
18479935 13396754 58% /vol/fs15ag5v1/
/vol/fs15ag3v1/
2270413 37729577 6%
/vol/fs15ag3v1/
/vol/fs15ag6v2/
5215281 26661408 16% /vol/fs15ag6v2/
/vol/fs15ag2v2/
9713544 43286444 18% /vol/fs15ag2v2/
/vol/fs15ag5v2/
21013190 10863499 66% /vol/fs15ag5v2/
/vol/fs15ag1v1/
23117514 27013924 46% /vol/fs15ag1v1/
/vol/fs15ag4v1/
13323666 18553023 42% /vol/fs15ag4v1/
/vol/vol0/
7319 3451513 0%
/vol/vol0/
/vol/fs15ag4v2/
11803990 19325591 38% /vol/fs15ag4v2/
/vol/fs15ag1v3/
12170159 19706530 38% /vol/fs15ag1v3/
/vol/fs15ag1v2/
18837357 21162633 47% /vol/fs15ag1v2/
Total number of files 191026996
The number of files in BNRFS04 is
9604456whereas in DLHFS15, it is 191026996. This is the root cause of it. The
solution for this problem is to have a system with more memory. We have already
tried this with different configuration of PAM on FAS6040, but its caching is
not helping with your workload. This is the configuration that we tried in your
environment for PAM
But just want to confirm why Flash Cache is not usefull in such type of environment.
Also I believe the number of file available in filer doesn't matters. In fact the thing which matter is the the number of files or indoes for which lookup requests is generated.
Did any one know how we find the the number of lookup request generated by the filer?
Please comment on that.
Hi Guys,
Still waiting for the reply.
Can someone please comment on this.
Best Regards
Pulkit Kaul.
This sounds like a support related question. Performance related issues are rather difficult to resolve via communities, I suggest you open a case with NetApp Technical Support.
Regards,
Christine