Active IQ Unified Manager Discussions
Active IQ Unified Manager Discussions
Collection on one of my 8.3.2.P2 clusters stopped with the errors below logged. All other clusters seem to be fine. Has anyone seen this?
[2016-06-21 11:36:00] [NORMAL ] Poller status: status, secs=14400, api_time=8170, plugin_time=274, metrics=1978019, skips=587, fails=0
[2016-06-21 13:00:42] [WARNING] [nic_common] plugin failed to compile: Illegal division by zero at /opt/netapp-harvest/plugin/cdot-nic-common line 86.
[2016-06-21 13:00:42] [ERROR ] [nic_common] Restarting netapp-worker as an attempt to clear issue
[2016-06-21 13:00:42] [NORMAL ] WORKER STARTED [Version: 1.2.2] [Conf: netapp-harvest.conf] [Poller: ntap-cla01]
[2016-06-21 13:00:42] [NORMAL ] [main] Poller will monitor a [FILER] at [192.168.94.1:443]
[2016-06-21 13:00:42] [NORMAL ] [main] Poller will use [password] authentication with username [netapp-harvest] and password [**********]
[2016-06-21 13:00:43] [NORMAL ] [main] Collection of system info from [192.168.94.1] running [NetApp Release 8.3.2P2] successful.
[2016-06-21 13:00:43] [NORMAL ] [main] Using best-fit collection template: [cdot-8.3.0.conf]
[2016-06-21 13:00:43] [NORMAL ] [main] Using graphite_root [netapp.perf.springfield.ntap-cla01]
[2016-06-21 13:00:43] [NORMAL ] [main] Using graphite_meta_metrics_root [netapp.poller.perf.springfield.ntap-cla01]
[2016-06-21 13:00:43] [NORMAL ] [smb2:node] Collection of object not enabled; skipping
[2016-06-21 13:00:43] [NORMAL ] [smb2:vserver] Collection of object not enabled; skipping
[2016-06-21 13:00:43] [NORMAL ] [main] Startup complete. Polling for new data every [60] seconds.
[2016-06-21 13:02:39] [WARNING] [nic_common] plugin failed to compile: Illegal division by zero at /opt/netapp-harvest/plugin/cdot-nic-common line 86.
Solved! See The Solution
Hi @dlmaldonado
There appears to be an issue with the link_speed counter value on some interface(s) on your cluster. My guess is either something changed in 8.3.2P2, or after the upgrade/reboot some unused interface didn't get a value set as it should (which could also be a new behavior in 8.3.2P2).
Can you restart the poller in verbose mode, wait for 5 minutes, and then restart again in normal mode?:
/opt/netapp-harvest/netapp-manager -restart -poller <clustername> -v
<wait 5 minutes>
/opt/netapp-harvest/netapp-manager -restart -poller <clustername>
Then provide the logfile in /opt/netapp-harvest/log/<poller>_netapp-harvest.log
From that log I can see what the incoming link_speed values are and hopefully explain why it's not working as it should.
I will also send you a private message in case you prefer to share the logs privately.
Cheers,
Chris Madden
Storage Architect, NetApp EMEA (and author of Harvest)
Blog: It all begins with data
If this post resolved your issue, please help others by selecting ACCEPT AS SOLUTION or adding a KUDO or both!
no, and I was about to push P2 to a DEV cluster. This was running fine against P1?
It's working on other 832P2 clusters and had been working fine after we upgraded. For at least 2 weeks. Not sure why this stopped collecting.
FYI, in order to pull back any metrics I had to comment out these lines in "/opt/netapp-harvest/plugin/cdot-nic-common"
my $rx_pct = sprintf ("%.2f", $h{$start}{$port}{rx_bytes_per_sec} / $link_speed * 100 );
my $tx_pct = sprintf ("%.2f", $h{$start}{$port}{tx_bytes_per_sec} / $link_speed * 100 );
my $pct = sprintf ("%.2f", $tx_pct);
$pct = sprintf ("%.2f", $rx_pct) if ($rx_pct > $tx_pct);
push @emit_items, "$start.$port.rx_pct_util $rx_pct $timestamp";
push @emit_items, "$start.$port.tx_pct_util $tx_pct $timestamp";
push @emit_items, "$start.$port.link_pct_util $pct $timestamp";
I realize this is not a solution, but I need to collect something vs nothing and as I said, I only experienced this on one cluster. The others are fine. And it had been working previously after 8.3.2P2 upgrade. It's a 14 node NFS cluster. After a certain date, collection failed with [WARNING] [nic_common] plugin failed to compile: Illegal division by zero at /opt/netapp-harvest/plugin/cdot-nic-common.
Hi @dlmaldonado
There appears to be an issue with the link_speed counter value on some interface(s) on your cluster. My guess is either something changed in 8.3.2P2, or after the upgrade/reboot some unused interface didn't get a value set as it should (which could also be a new behavior in 8.3.2P2).
Can you restart the poller in verbose mode, wait for 5 minutes, and then restart again in normal mode?:
/opt/netapp-harvest/netapp-manager -restart -poller <clustername> -v
<wait 5 minutes>
/opt/netapp-harvest/netapp-manager -restart -poller <clustername>
Then provide the logfile in /opt/netapp-harvest/log/<poller>_netapp-harvest.log
From that log I can see what the incoming link_speed values are and hopefully explain why it's not working as it should.
I will also send you a private message in case you prefer to share the logs privately.
Cheers,
Chris Madden
Storage Architect, NetApp EMEA (and author of Harvest)
Blog: It all begins with data
If this post resolved your issue, please help others by selecting ACCEPT AS SOLUTION or adding a KUDO or both!
Hi
We are using harvest to get performance on cDOT8.2.3 and 8.3
Aug 23 8.2.3 upgrading to 8.2.4P4
after upgrade the same error has occurred
When you confirm the netapp-dashboard-cluster of grafana
eth port utilization is greater than 3000 percent
dlmaldonado wrote
"/opt/netapp-harvest/plugin/cdot-nic-common"
--------------------------------------------------------------------------------------------------------
my $rx_pct = sprintf ("%.2f", $h{$start}{$port}{rx_bytes_per_sec} / $link_speed * 100 );
my $tx_pct = sprintf ("%.2f", $h{$start}{$port}{tx_bytes_per_sec} / $link_speed * 100 );
my $pct = sprintf ("%.2f", $tx_pct);
$pct = sprintf ("%.2f", $rx_pct) if ($rx_pct > $tx_pct);
push @emit_items, "$start.$port.rx_pct_util $rx_pct $timestamp";
push @emit_items, "$start.$port.tx_pct_util $tx_pct $timestamp";
push @emit_items, "$start.$port.link_pct_util $pct $timestamp";
--------------------------------------------------------------------------------------------------------
How to fix this code?
Hi @hashiya1112
Actually, we resolved offline. One of the ports was link up but at 10Mbit and the plugin logic was not able to convert this correctly. I have added a fix and it will ship in the next Harvest release on the toolchest. In the meantime perhaps you can just find the port(s) that are online at 10Mbit and fix that to be 100Mbit or faster?
Cheers,
Chris Madden
Storage Architect, NetApp EMEA (and author of Harvest)
Blog: It all begins with data
If this post resolved your issue, please help others by selecting ACCEPT AS SOLUTION or adding a KUDO or both!
Hi Chris
thank you for reply
There is an offline port
Not 10Mbit in online port
Do I change the network port modify command?
I've nic_common tried to edit it as follows
{ $link_speed = 1.25 if ($h{$start}{$port}{link_speed} == 10000000 ); #10Mbit $link_speed = 12.5 if ($h{$start}{$port}{link_speed} == 100000000 ); #100Mbit $link_speed = 125 if ($h{$start}{$port}{link_speed} == 1000000000 ); #1Gbit $link_speed = 1250 if ($h{$start}{$port}{link_speed} == 10000000000 ); #10Gbit } elsif ($connection{normalized_xfer} eq 'kb_per_sec') { $link_speed = 1250 if ($h{$start}{$port}{link_speed} == 10000000 ); #10Mbit $link_speed = 12500 if ($h{$start}{$port}{link_speed} == 100000000 ); #100Mbit $link_speed = 125000 if ($h{$start}{$port}{link_speed} == 1000000000 ); #1Gbit $link_speed = 1250000 if ($h{$start}{$port}{link_speed} == 10000000000 ); #10Gbit } elsif ($connection{normalized_xfer} eq 'b_per_sec') { $link_speed = 1250000 if ($h{$start}{$port}{link_speed} == 10000000 ); #10Mbit $link_speed = 12500000 if ($h{$start}{$port}{link_speed} == 100000000 ); #100Mbit $link_speed = 125000000 if ($h{$start}{$port}{link_speed} == 1000000000 ); #1Gbit $link_speed = 1250000000 if ($h{$start}{$port}{link_speed} == 10000000000 ); #10Gbit } elsif ($connection{normalized_xfer} eq 'gb_per_sec') { $link_speed = .00125 if ($h{$start}{$port}{link_speed} == 10000000 ); #10Mbit $link_speed = .0125 if ($h{$start}{$port}{link_speed} == 100000000 ); #100Mbit $link_speed = .125 if ($h{$start}{$port}{link_speed} == 1000000000 ); #1Gbit $link_speed = 1.25 if ($h{$start}{$port}{link_speed} == 10000000000 ); #10Gbit }
error is no longer out
but Calculation of eth port utilization percent became strange
e0M(node management port) utilization 3820 percent....
e0M is 100Mbit port
Hmm....
Hi @hashiya1112
Maybe give this a try:
my $link_speed = 1; if ($connection{normalized_xfer} eq 'mb_per_sec') { $link_speed = $h{$start}{$port}{link_speed} / 8000000; } elsif ($connection{normalized_xfer} eq 'kb_per_sec') { $link_speed = $h{$start}{$port}{link_speed} / 8000; } elsif ($connection{normalized_xfer} eq 'b_per_sec') { $link_speed = $h{$start}{$port}{link_speed} / 8; } elsif ($connection{normalized_xfer} eq 'gb_per_sec') { $link_speed = $h{$start}{$port}{link_speed} / 8000000000; } next if ($link_speed == 1); # Skip posting utilization if we couldn't normalize
If you still see a weird utilization check higher in this post for instructions on how to collect logs needed to understand what is happening. Send me these logs in a private message.
Cheers,
Chris Madden
Storage Architect, NetApp EMEA (and author of Harvest)
Blog: It all begins with data
If this post resolved your issue, please help others by selecting ACCEPT AS SOLUTION or adding a KUDO or both!
Hi Chris
Thanks for the code!!
I There is also another project
using the harvest even in a other project
We're using the cDOT8.2.3P3 and cODT8.2.4P4 in another project
In that case
cp /opt/netapp-harvest/plugin/cdot-nic-common /opt/netapp-harvest/plugin/cdot-nic-common-8.2.4 vi /opt/netapp-harvest/plugin/cdot-nic-common-8.2.4 ------------------Fix to this code------------------------- my $link_speed = 1; if ($connection{normalized_xfer} eq 'mb_per_sec') { $link_speed = $h{$start}{$port}{link_speed} / 8000000; } elsif ($connection{normalized_xfer} eq 'kb_per_sec') { $link_speed = $h{$start}{$port}{link_speed} / 8000; } elsif ($connection{normalized_xfer} eq 'b_per_sec') { $link_speed = $h{$start}{$port}{link_speed} / 8; } elsif ($connection{normalized_xfer} eq 'gb_per_sec') { $link_speed = $h{$start}{$port}{link_speed} / 8000000000; } next if ($link_speed == 1); # Skip posting utilization if we couldn't normalize ------------------------------------------- cp /opt/netapp-harvest/template/default/cdot-8.2.0.conf /opt/netapp-harvest/template/default/cdot-8.2.4.conf vi /opt/netapp-harvest/template/default/cdot-8.2.4.conf 'nic_common' => { counter_list => [ qw(node_name node_uuid instance_name rx_bytes_per_sec tx_bytes_per_sec link_speed link_up_to_downs ) ], graphite_leaf => 'node.{node_name}.eth_port.{instance_name}', plugin => 'cdot-nic-common-8.2.4', enabled => '1' },
Modify the part of the plugin to create a new cdot-8.2.4.conf and cdot-nic-common-8.2.4?
Apart from the nic_common file
because Calculation of eth port utilization percent became strange of cDOT8.2.3P3....
Regards.
Hi @hashiya1112
I think the issue is related to bug 915637. The counters in nic_common that track tx/rx are stored as 4bit numbers which means they rollover quite frequently which can impact display. New 8 bit counters were added in 8.2.4 and 8.3.2 and Harvest v1.3 will include this fix. I will contact you offline to provide a patch in the meantime.
Cheers,
Chris Madden
Storage Architect, NetApp EMEA (and author of Harvest)
Blog: It all begins with data
If this post resolved your issue, please help others by selecting ACCEPT AS SOLUTION or adding a KUDO or both!