Effective December 3, NetApp adopts Microsoft’s Business-to-Customer (B2C) identity management to simplify and provide secure access to NetApp resources.
For accounts that did not pre-register (prior to Dec 3), access to your NetApp data may take up to 1 hour as your legacy NSS ID is synchronized to the new B2C identity.
To learn more, read the FAQ and watch the video.
Need assistance? Complete this form and select “Registration Issue” as the Feedback Category.

Software Development Kit (SDK) and API Discussions

Java file-list-directory-iter performance

michael_england

I'm running a java web service to query file attributes, particularly create, modify, access time using "file-list-directory-iter".  Basically it runs recursively for each directory it finds across multiple threads.  The process works well but I'm finding the load on the filer to be quite a bit.  Here's the code snippet:

 

request = new NaElement("file-list-directory-iter");
request.addNewChild("path", "/vol/"+volumeName+path);
request.addNewChild("max-records", "65536");
response = server.invokeElem(request);
NaElement fileInfo = response.getChildByName("attributes-list");
List<NaElement> fileList = fileInfo.getChildren();
for (NaElement element : fileList) {
    String fileType = element.getChildContent("file-type");
    String fileName = element.getChildContent("name");
    if (fileType.equalsIgnoreCase("directory")) {
        if (fileName.equals(".") || fileName.equals("..") || fileName.equals(".snapshot")) {
            //skip
        } else {
            directoryList.addDirectory(fileName);
        }
    } else if (fileType.equalsIgnoreCase("file")) {
        MyFile myFile = new MyFile();
        myFile.setFileName(fileName);
        myFile.setFileSize(Long.valueOf(element.getChildContent("file-size")));
        myFile.setBytesUsed(Long.valueOf(element.getChildContent("bytes-used")));
        myFile.setAccessTime(Long.valueOf(element.getChildContent("accessed-timestamp")));
        myFile.setCreateTime(Long.valueOf(element.getChildContent("creation-timestamp")));
        myFile.setModifiedTime(Long.valueOf(element.getChildContent("modified-timestamp")));
        directoryList.addFile(myFile);
    }
}

The problem I'm seeing is the IOP load on the filer is about 2 for every file and directory.  This means on a simple test directory with 60,000 files I need to do 120,000 IOPs, it's nice and quick but if I increase the worker thread count I can completely saturate all CPUs on a 6240 for a while.  This isn't ideal.

 

I've also created a powershell script to do similar work although using a CIFS connection rather than through the API.  This seems to do about 1 IOP for every 10 files, so for 60,000 files I need about 6,000 IOPs total.  Much more efficient.  The problem is, powershell spends minutes churning through the data.  I'd rather use the Java code as it offers me more flexibility (security, nfs or cifs), and I can run it as a REST based web service rather than a windows only script but I'm not sure this is going to scale without seriously impacting performance on the arrays when deployed at scale.

 

Has anyone else played with file-list-directory-iter, or perhaps there's a way to improve the java code so I have less impact on the array?

0 REPLIES 0
Announcements
NetApp on Discord Image

We're on Discord, are you?

Live Chat, Watch Parties, and More!

Explore Banner

Meet Explore, NetApp’s digital sales platform

Engage digitally throughout the sales process, from product discovery to configuration, and handle all your post-purchase needs.

NetApp Insights to Action
I2A Banner
Public