2 weeks ago
Since some weeks I observed that many of Brocade switches in my multiple environments are failing on performance. Error shown in datasource and log file is:
"Failed to get performance for switch x.x.x.x VF 120 Error indication in response: A no access error occurred.
"Access Control List on SNMP may be configured and Acquisition Unit server's ip is not permitted by it, so it is not allowed to make SNMP requests."
Quick help on this is much appreciated.
2 weeks ago
Have virtual fabrics been enabled for the first time on these switches?
The error message is basically stating that OCI is getting denied tried to obtain performance statistics on ports in VF 120
You will need to be using SNMPv3 to collect performance from these switches. It is impossible to obtain statistics on ports in non-default VFs via SNMP v2.
If you are using SNMP v3, I would recommend looking at what username you are using via SNMP v3, and how that user is configured on that switch - it is possible that someone built a RBAC style, least privilege user account, but subsequently VF 120 has been introduced, but no one added VF 120 to the list of VFs that your datasource's user account is permitted to access
2 weeks ago
I checked with customer and they are saying that nothing has been changed on switches.
BTW on another switch this is the error, I am sure that have nothing to do with SNMP version or user
2017-07-10 13:18:09,614 ERROR [com.netapp.oci.platform.common.interfaces.session.SessionCache] Session Cache - Failed to communicate with the server ( PerformanceApiRemote) - unrecoverable error: Failed to store performance samples for dataSource: #181, type: port, key: 20:00:00:0D:EC:3A:99:C0cause: class com.datastax.driver.core.exceptions.WriteTimeoutException:Cassandra timeout during write query at consistency ONE (1 replica were required but only 0 acknowledged the write)
java.lang.RuntimeException: Failed to store performance samples for dataSource: #181, type: port, key: 20:00:00:0D:EC:3A:99:C0cause: class com.datastax.driver.core.exceptions.WriteTimeoutException:Cassandra timeout during write query at consistency ONE (1 replica were required but only 0 acknowledged the write)
2 weeks ago
Those failed to store... messages have a high correlation with Cassandra problems. They are basically indicative that Acquisition did all of its work to obtain and process the data, but the Server is telling Acq that it was unable to store the data to complete the action so the performance poll functionally failed as the data had no where to go.
There are some Cassandra problems that only impact 1-n datasources, whereas there can also be systemic Cassandra problems where all datasources' performance packages fail with similar insertion messages.
Systemic Cassandra problems tend to occur on systems that are undersized, or have virtual memory problems due to small, fixed sized Windows paging files. We strongly recommend that OCI servers have paging files set to Windows managed sizing.
It sounds like you may be experiencing one of the 1-n problems, but I'd like to see a bit more.
If this OCI instance is sending OCI ASUP, could you PM me the site name?
Has some cassandra-client.log files. If you zip them up, and email them to me, I can take a look. However, there is a chance that root cause or when the problem started won't be captured, as these logs can roll over on large systems