Tech ONTAP Blogs

Fpolicy utilization and performance with NetApp Cloud Insights and Storage Workload Security

Kendall
NetApp
671 Views

Managing Fpolicy utilization and performance with NetApp Cloud Insights and Storage Workload Security

 

What is Storage Workload Security and why is it important?

While ransomware and cyber attacks may be top of mind for many, few would disagree that file auditing is an indispensable feature for any SMB or NFS environment, and it's in these domains that Cloud Insights Storage Workload Security (SWS) is unmatched. SWS, known for it’s impressive anti-ransomware features, also offers user-centric, and fully searchable auditing alongside forensic analysis and reporting. It goes well beyond ransomware detection and automated response capability and provides invaluable insights into user behavior and file actions. Being a SaaS-based solution, SWS also delivers immediate value to businesses without any management overhead.

 

FPolicy, as many are aware, is the incredibly powerful framework behind many NetApp and 3rd party management and reporting capabilities for ONTAP. As is always the case, this level of inspection does not come without a price, and especially at scale FPolicy must be managed to ensure that it is performing properly. SWS relies on data from Fpolicy and in conversations with customers there is occasionally some trepidation about adding another FPolicy based product to the environment. In order to address those concerns this article will discuss three key steps to managing and maintaining FPolicy utilization using NetApp Cloud Insights.

 

  1. Monitor FPolicy performance using a Cloud Insights dashboard
  2. Set up log alerts for pre-cursors to potential FPolicy performance issues
  3. Utilize lightweight tools such as Storage Workload Security to accomplish critical tasks with negligible FPolicy performance impact.

NetApp Cloud Insights FPolicy dashboards

NetApp Cloud Insights offers detailed metrics and visualizations for FPolicy, such as the number of file operations monitored, the number of notifications sent to third-party applications, and the time taken to process these notifications. If there's a sudden spike in file operations or a slowdown in processing times, Cloud Insights can alert you to these issues, allowing you to investigate and resolve them promptly. This monitoring capability helps ensure that FPolicy continues to efficiently by quickly identifying any issues impacting your storage system's performance.

 

 Cloud Insights contains tremendous capability to track FPolicy latency and correlate it to spikes in IOPS or cluster CPU utilization in order to quickly and accurately home in on any potential issues. In just a few short steps you can be on your way with a basic dashboard presentation (below) that is fully customizable. Simply log on to your Cloud Insights tenant, click Explore -> All Dashboards, then select + From Gallery at the top right of the screen and select ONTAP – FPolicy Troubleshooting.

 

Kendall_0-1713221469118.png

 

From here you can easily and immediately identify any offenders and address them appropriately.

 

Kendall_1-1713221469118.png

 

FPolicy eagain write errors

So now you have your FPolicy Dashboards ready to go and set up the way you like them, so what’s next?  Fortunately Cloud Insights has us covered there as well. Identifying and solving an issue quickly is great but what if you could predict potential issues and avoid the angry phone call all together?

 

EAGAIN errors, or errors indicating that resources are temporarily unavailable, can be a valuable early warning sign of potential FPolicy performance issues. These standard UNIX errors typically occur when the system resource limits are being approached. In the context of FPolicy, EAGAIN errors might occur if the FPolicy server or the external FPolicy engine is overloaded and cannot handle more data. This could happen, for example, if there's a sudden surge in file operations that need to be monitored, the SVM TCP send buffer becomes full, or if the FPolicy engine is slow in processing notifications.

 

Monitoring for EAGAIN errors can therefore help predict FPolicy performance issues. If you start seeing an increase in these errors, it could indicate that the FPolicy system is under stress and might soon struggle to keep up with the workload. This could potentially lead to slower response times and the aforementioned phone call.

 

Kendall_2-1713221469118.png

 

In addition to monitoring FPolicy performance on Cloud Insights dashboards, we can create a log monitor alert which will act as our proverbial canary in the coal mine. Upon encountering multiple EAGAIN log messages Cloud Insights will send an alert that indicates we are approaching the threshold of FPolicy impacting system performance so that we may begin investigating the cause before we experience the symptoms. The steps to add this alert are:

 

  1. Log in to Cloud Insights. Under Observability select Alerts -> Manage Monitors
  2. At the top right of the window click +Monitor -> Log Monitor

 

Kendall_3-1713221469118.png

 

 

  1. The log source is logs.netapp.ems, click the plus next to filter by, type in “message” then select it from the search list. Type “eagain” as our search parameter in the box to the right.
  2. Enter ems.cluster_name in Group By to quickly id the cluster in the log.
  3. For the alert behavior there are options, a good starting point would be a warning at 3 occurrences within 5 minutes.
  4. Complete the form per preference/policy

 

Kendall_4-1713221469118.png

 

An ounce of prevention

Now we are ready to predict potential issues and troubleshoot them should they arise, next let’s focus on using efficient tooling to minimize the burden on FPolicy in the first place. Native and 3rd party FPolicy based tools should be rolled out with consideration to the overall performance of the ONTAP landscape. Running FPolicy in a synchronous manner for example can potentially increase service times as there are more steps in the process of serving file I/O. Likewise 3rd party tools should be managed accordingly to ensure they do not induce performance issues.

 

Storage Workload Security operates in an impressively lightweight, asynchronous manner, ensuring minimal impact on FPolicy system performance. Moreover, it generates invaluable, immutable records that are retained for a period of 13 months, offering a significant archive for data auditing and forensics and leveraging cloud based AI/ML to perform user behavior analytics on that data in order to provide early detection of insider threats.

 

Kendall_5-1713221469118.png

 

 

Storage Workload Security is a turnkey solution, integrated with Cloud Insights, which is designed to make the most efficient possible utilization of FPolicy data. It is always very important to monitor and if practical predict potential FPolicy performance issues if it is heavily utilized within the environment. The best defense is a good offense and utilizing SaaS based tools such as SWS not only provide an excellent line of defense against insider threats and ransomware, but an extremely powerful and lightweight forensic and auditing capability.

 

For more information and a free trial of Cloud Insights and Storage Workload Security, visit https://cloud.netapp.com.

Public