ONTAP Hardware

Performance Issue because of having high number of files in filer

pulkitkaul
5,125 Views

Hi Team,

I am facing performance issue in one of our NetApp cluster.

I have checked and found the filer having large number of files/directories (millons of files) have slow response time as compare to the filer which have less number of files.

we have recently purchased FAS 3270 which again have millons of files and is working slowly.

I have few questions in my mind:

Is this issue comes because of total number of files stored in a filer or files used by memory?

Infact I have increased the read only memory by installing the 520 GB Flash Cache module which will not worked.

Is there any way through which we are able to see, out of total available files how may files are actually used.?

Did some one know root cause of this issue?

Please try to help me for resolving this issue which really create a big impact in our environment.

Best Regards

Pulkit Kaul

4 REPLIES 4

scottgelb
5,125 Views

Do you see a lot of "other" IOPs?  Is the flashcache in default mode (data and metadata)?  It might help to set to metadata only if a lot of directory and metadata lookups.  A perfstat would be interesting to have analyzed to see performance.

pulkitkaul
5,125 Views

HI have tried both the modes but still not get any success.

I have generated the perfstat output which clearly show the number of files in filer dlhfs15 is much higher as compare to filer bnrfs04.

Below is the analysis which NetApp has provided:

Hello Pulkit,

I checked both the perfstat and as suspected the high file
count is the main issue. Below is the detailed analysis.

Volume

volume:fs04ag1v1:avg_latency:18.53us – BNRFS04

volume:fs15ag5v2:avg_latency:311.90us–DLHFS15 – The latency on the volume is
way higher than BNRFS04

nfsv3: nfs:
nfsv3_op_latency.readdirplus:
64.66us -
BNRFS04

nfsv3:nfs:nfsv3_op_latency.readdirplus:6041.57us - DLHFS15

BNRFS04

Server nfs V3: (2824081 calls)

null     
getattr    setattr  
lookup     access   
readlink   read

2 0%   
  2210077 78% 24892 1%   241781 9%  115522 4%  0
0%       24381 1%

write    
create     mkdir    
symlink    mknod    
remove     rmdir

106985 4%  66943 2% 
6200 0%    298 0%     0
0%       18853 1%   412 0%

rename   
link       readdir   
readdir+   fsstat   
fsinfo     pathconf

2238 0%    0
0%       0 0%      5440 0%    55
0%      2 0%       0 0%

DLHFS15

Server
nfs V3: (1309331 calls)

null     
getattr    setattr  
lookup     access   
readlink   read

38
0%      379717 29% 43240 3%   110999 8%
127401 10% 516 0%     114686 9%

write    
create     mkdir    
symlink    mknod    
remove     rmdir

190382
15% 46584 4%   5460 0%    454
0%     0 0%       105022
8%  27123 2%

rename   
link       readdir   
readdir+   fsstat   
fsinfo     pathconf

16870
1%   1268 0%    252 0%    
138377 11% 881 0%     56
0%      5 0%

commit

0
0%

nfsv3:nfs:nfsv3_op_latency.readdir:0us - BNRFS04 directory scanning

nfsv3:nfs:nfsv3_op_latency.readdirplus:64.66us - BNRFS04 directory scanning

nfsv3:nfs:nfsv3_op_latency.readdir:1132.03us - DLHFS15 directory scanning

nfsv3:nfs:nfsv3_op_latency.readdirplus:6041.57us - DLHFS15 directory scanning – This
build does a lot of directory scanning (11% of the operation is directory
scanning) and the latency for the same on DLHFS15 is much higher than that of
BNRFS04

BNRFS04

Filesystem             
iused      ifree  %iused  Mounted on

/vol/fs04ag1v1/       
934210   30942479      3%
/vol/fs04ag1v1/

/vol/fs04ag1v2/      
3611657   28265032     11%  /vol/fs04ag1v2/

/vol/fs04ag2v1/      
4833894   27042795     15%  /vol/fs04ag2v1/

/vol/vol0/            
224695    7229982      3%
/vol/vol0/

Total number of files 9604456

DLHFS15

Filesystem             
iused      ifree  %iused  Mounted on

/vol/fs15ag6v1/     
18732711   41267274     31%  /vol/fs15ag6v1/

/vol/fs15ag2v1/     
36341917   25534764     59%  /vol/fs15ag2v1/

/vol/fs15ag5v1/     
18479935   13396754     58%  /vol/fs15ag5v1/

/vol/fs15ag3v1/      
2270413   37729577      6%
/vol/fs15ag3v1/

/vol/fs15ag6v2/      
5215281   26661408     16%  /vol/fs15ag6v2/

/vol/fs15ag2v2/      
9713544   43286444     18%  /vol/fs15ag2v2/

/vol/fs15ag5v2/     
21013190   10863499     66%  /vol/fs15ag5v2/

/vol/fs15ag1v1/     
23117514   27013924     46%  /vol/fs15ag1v1/

/vol/fs15ag4v1/     
13323666   18553023     42%  /vol/fs15ag4v1/

/vol/vol0/              
7319    3451513      0%
/vol/vol0/

/vol/fs15ag4v2/     
11803990   19325591     38%  /vol/fs15ag4v2/

/vol/fs15ag1v3/     
12170159   19706530     38%  /vol/fs15ag1v3/

/vol/fs15ag1v2/     
18837357   21162633     47%  /vol/fs15ag1v2/

Total number of files 191026996

The number of files in BNRFS04 is
9604456whereas in DLHFS15, it is 191026996. This is the root cause of it. The
solution for this problem is to have a system with more memory. We have already
tried this with different configuration of PAM on FAS6040, but its caching is
not helping with your workload. This is the configuration that we tried in your
environment for PAM

  1. flexscale.enable           
    on         (same value in local+partner
    recommended)
  2. flexscale.lopri_blocks     
    off        (same value in local+partner
    recommended)
  3. flexscale.normal_data_blocks
    off        (same value in local+partner
    recommended)
  4. flexscale.pcs_high_res     
    off        (same value in local+partner
    recommended)
  5. flexscale.pcs_size         
    0GB        (same value in local+partner
    recommended)

But just want to confirm why Flash Cache is not usefull in such type of environment.

Also I believe the number of file available in filer doesn't matters. In fact the thing which matter is the the number of files or indoes for which lookup requests is generated.

Did any one know how we find the the number of lookup request generated by the filer?

Please comment on that.

pulkitkaul
5,125 Views

Hi Guys,

Still waiting for the reply.

Can someone please comment on this.

Best Regards

Pulkit Kaul.

christin
5,125 Views

This sounds like a support related question. Performance related issues are rather difficult to resolve via communities, I suggest you open a case with NetApp Technical Support.

Regards,

Christine

Public