<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Java file-list-directory-iter performance in Software Development Kit (SDK) and API Discussions</title>
    <link>https://community.netapp.com/t5/Software-Development-Kit-SDK-and-API-Discussions/Java-file-list-directory-iter-performance/m-p/100731#M1374</link>
    <description>&lt;P&gt;I'm running a java web service to query file attributes, particularly create, modify, access time using "file-list-directory-iter". &amp;nbsp;Basically it runs recursively for each directory it finds across multiple threads. &amp;nbsp;The process works well but I'm finding the load on the filer to be quite a bit. &amp;nbsp;Here's the code snippet:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;request = new NaElement("file-list-directory-iter");
request.addNewChild("path", "/vol/"+volumeName+path);
request.addNewChild("max-records", "65536");
response = server.invokeElem(request);
NaElement fileInfo = response.getChildByName("attributes-list");
List&amp;lt;NaElement&amp;gt; fileList = fileInfo.getChildren();
for (NaElement element : fileList) {
    String fileType = element.getChildContent("file-type");
    String fileName = element.getChildContent("name");
    if (fileType.equalsIgnoreCase("directory")) {
        if (fileName.equals(".") || fileName.equals("..") || fileName.equals(".snapshot")) {
            //skip
        } else {
            directoryList.addDirectory(fileName);
        }
    } else if (fileType.equalsIgnoreCase("file")) {
        MyFile myFile = new MyFile();
        myFile.setFileName(fileName);
        myFile.setFileSize(Long.valueOf(element.getChildContent("file-size")));
        myFile.setBytesUsed(Long.valueOf(element.getChildContent("bytes-used")));
        myFile.setAccessTime(Long.valueOf(element.getChildContent("accessed-timestamp")));
        myFile.setCreateTime(Long.valueOf(element.getChildContent("creation-timestamp")));
        myFile.setModifiedTime(Long.valueOf(element.getChildContent("modified-timestamp")));
        directoryList.addFile(myFile);
    }
}&lt;/PRE&gt;&lt;P&gt;The problem I'm seeing is the IOP load on the filer is about 2 for every file and directory. &amp;nbsp;This means on a simple test directory with 60,000 files I need to do 120,000 IOPs, it's nice and quick but if I increase the worker thread count I can completely saturate all CPUs on a 6240 for a while. &amp;nbsp;This isn't ideal.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I've also created a powershell script to do similar work although using a CIFS connection rather than through the API. &amp;nbsp;This seems to do about 1 IOP for every 10 files, so for 60,000 files I need about 6,000 IOPs total. &amp;nbsp;Much more efficient. &amp;nbsp;The problem is, powershell spends minutes churning through the data. &amp;nbsp;I'd rather use the Java code as it offers me more flexibility (security, nfs or cifs), and I can run it as a REST based web service rather than a windows only script but I'm not sure this is going to scale without seriously impacting performance on the arrays when deployed at scale.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Has anyone else played with file-list-directory-iter, or perhaps there's a way to improve the java code so I have less impact on the array?&lt;/P&gt;</description>
    <pubDate>Thu, 05 Jun 2025 05:04:38 GMT</pubDate>
    <dc:creator>michael_england</dc:creator>
    <dc:date>2025-06-05T05:04:38Z</dc:date>
    <item>
      <title>Java file-list-directory-iter performance</title>
      <link>https://community.netapp.com/t5/Software-Development-Kit-SDK-and-API-Discussions/Java-file-list-directory-iter-performance/m-p/100731#M1374</link>
      <description>&lt;P&gt;I'm running a java web service to query file attributes, particularly create, modify, access time using "file-list-directory-iter". &amp;nbsp;Basically it runs recursively for each directory it finds across multiple threads. &amp;nbsp;The process works well but I'm finding the load on the filer to be quite a bit. &amp;nbsp;Here's the code snippet:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;request = new NaElement("file-list-directory-iter");
request.addNewChild("path", "/vol/"+volumeName+path);
request.addNewChild("max-records", "65536");
response = server.invokeElem(request);
NaElement fileInfo = response.getChildByName("attributes-list");
List&amp;lt;NaElement&amp;gt; fileList = fileInfo.getChildren();
for (NaElement element : fileList) {
    String fileType = element.getChildContent("file-type");
    String fileName = element.getChildContent("name");
    if (fileType.equalsIgnoreCase("directory")) {
        if (fileName.equals(".") || fileName.equals("..") || fileName.equals(".snapshot")) {
            //skip
        } else {
            directoryList.addDirectory(fileName);
        }
    } else if (fileType.equalsIgnoreCase("file")) {
        MyFile myFile = new MyFile();
        myFile.setFileName(fileName);
        myFile.setFileSize(Long.valueOf(element.getChildContent("file-size")));
        myFile.setBytesUsed(Long.valueOf(element.getChildContent("bytes-used")));
        myFile.setAccessTime(Long.valueOf(element.getChildContent("accessed-timestamp")));
        myFile.setCreateTime(Long.valueOf(element.getChildContent("creation-timestamp")));
        myFile.setModifiedTime(Long.valueOf(element.getChildContent("modified-timestamp")));
        directoryList.addFile(myFile);
    }
}&lt;/PRE&gt;&lt;P&gt;The problem I'm seeing is the IOP load on the filer is about 2 for every file and directory. &amp;nbsp;This means on a simple test directory with 60,000 files I need to do 120,000 IOPs, it's nice and quick but if I increase the worker thread count I can completely saturate all CPUs on a 6240 for a while. &amp;nbsp;This isn't ideal.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I've also created a powershell script to do similar work although using a CIFS connection rather than through the API. &amp;nbsp;This seems to do about 1 IOP for every 10 files, so for 60,000 files I need about 6,000 IOPs total. &amp;nbsp;Much more efficient. &amp;nbsp;The problem is, powershell spends minutes churning through the data. &amp;nbsp;I'd rather use the Java code as it offers me more flexibility (security, nfs or cifs), and I can run it as a REST based web service rather than a windows only script but I'm not sure this is going to scale without seriously impacting performance on the arrays when deployed at scale.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Has anyone else played with file-list-directory-iter, or perhaps there's a way to improve the java code so I have less impact on the array?&lt;/P&gt;</description>
      <pubDate>Thu, 05 Jun 2025 05:04:38 GMT</pubDate>
      <guid>https://community.netapp.com/t5/Software-Development-Kit-SDK-and-API-Discussions/Java-file-list-directory-iter-performance/m-p/100731#M1374</guid>
      <dc:creator>michael_england</dc:creator>
      <dc:date>2025-06-05T05:04:38Z</dc:date>
    </item>
  </channel>
</rss>

