Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
NetApp Harvest error: [nic_common] plugin failed to compile
2016-06-23
06:41 AM
9,615 Views
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Collection on one of my 8.3.2.P2 clusters stopped with the errors below logged. All other clusters seem to be fine. Has anyone seen this?
[2016-06-21 11:36:00] [NORMAL ] Poller status: status, secs=14400, api_time=8170, plugin_time=274, metrics=1978019, skips=587, fails=0
[2016-06-21 13:00:42] [WARNING] [nic_common] plugin failed to compile: Illegal division by zero at /opt/netapp-harvest/plugin/cdot-nic-common line 86.
[2016-06-21 13:00:42] [ERROR ] [nic_common] Restarting netapp-worker as an attempt to clear issue
[2016-06-21 13:00:42] [NORMAL ] WORKER STARTED [Version: 1.2.2] [Conf: netapp-harvest.conf] [Poller: ntap-cla01]
[2016-06-21 13:00:42] [NORMAL ] [main] Poller will monitor a [FILER] at [192.168.94.1:443]
[2016-06-21 13:00:42] [NORMAL ] [main] Poller will use [password] authentication with username [netapp-harvest] and password [**********]
[2016-06-21 13:00:43] [NORMAL ] [main] Collection of system info from [192.168.94.1] running [NetApp Release 8.3.2P2] successful.
[2016-06-21 13:00:43] [NORMAL ] [main] Using best-fit collection template: [cdot-8.3.0.conf]
[2016-06-21 13:00:43] [NORMAL ] [main] Using graphite_root [netapp.perf.springfield.ntap-cla01]
[2016-06-21 13:00:43] [NORMAL ] [main] Using graphite_meta_metrics_root [netapp.poller.perf.springfield.ntap-cla01]
[2016-06-21 13:00:43] [NORMAL ] [smb2:node] Collection of object not enabled; skipping
[2016-06-21 13:00:43] [NORMAL ] [smb2:vserver] Collection of object not enabled; skipping
[2016-06-21 13:00:43] [NORMAL ] [main] Startup complete. Polling for new data every [60] seconds.
[2016-06-21 13:02:39] [WARNING] [nic_common] plugin failed to compile: Illegal division by zero at /opt/netapp-harvest/plugin/cdot-nic-common line 86.
Solved! See The Solution
1 ACCEPTED SOLUTION
dlmaldonado has accepted the solution
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @dlmaldonado
There appears to be an issue with the link_speed counter value on some interface(s) on your cluster. My guess is either something changed in 8.3.2P2, or after the upgrade/reboot some unused interface didn't get a value set as it should (which could also be a new behavior in 8.3.2P2).
Can you restart the poller in verbose mode, wait for 5 minutes, and then restart again in normal mode?:
/opt/netapp-harvest/netapp-manager -restart -poller <clustername> -v
<wait 5 minutes>
/opt/netapp-harvest/netapp-manager -restart -poller <clustername>
Then provide the logfile in /opt/netapp-harvest/log/<poller>_netapp-harvest.log
From that log I can see what the incoming link_speed values are and hopefully explain why it's not working as it should.
I will also send you a private message in case you prefer to share the logs privately.
Cheers,
Chris Madden
Storage Architect, NetApp EMEA (and author of Harvest)
Blog: It all begins with data
If this post resolved your issue, please help others by selecting ACCEPT AS SOLUTION or adding a KUDO or both!
11 REPLIES 11
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
no, and I was about to push P2 to a DEV cluster. This was running fine against P1?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It's working on other 832P2 clusters and had been working fine after we upgraded. For at least 2 weeks. Not sure why this stopped collecting.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
FYI, in order to pull back any metrics I had to comment out these lines in "/opt/netapp-harvest/plugin/cdot-nic-common"
my $rx_pct = sprintf ("%.2f", $h{$start}{$port}{rx_bytes_per_sec} / $link_speed * 100 );
my $tx_pct = sprintf ("%.2f", $h{$start}{$port}{tx_bytes_per_sec} / $link_speed * 100 );
my $pct = sprintf ("%.2f", $tx_pct);
$pct = sprintf ("%.2f", $rx_pct) if ($rx_pct > $tx_pct);
push @emit_items, "$start.$port.rx_pct_util $rx_pct $timestamp";
push @emit_items, "$start.$port.tx_pct_util $tx_pct $timestamp";
push @emit_items, "$start.$port.link_pct_util $pct $timestamp";
I realize this is not a solution, but I need to collect something vs nothing and as I said, I only experienced this on one cluster. The others are fine. And it had been working previously after 8.3.2P2 upgrade. It's a 14 node NFS cluster. After a certain date, collection failed with [WARNING] [nic_common] plugin failed to compile: Illegal division by zero at /opt/netapp-harvest/plugin/cdot-nic-common.
dlmaldonado has accepted the solution
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @dlmaldonado
There appears to be an issue with the link_speed counter value on some interface(s) on your cluster. My guess is either something changed in 8.3.2P2, or after the upgrade/reboot some unused interface didn't get a value set as it should (which could also be a new behavior in 8.3.2P2).
Can you restart the poller in verbose mode, wait for 5 minutes, and then restart again in normal mode?:
/opt/netapp-harvest/netapp-manager -restart -poller <clustername> -v
<wait 5 minutes>
/opt/netapp-harvest/netapp-manager -restart -poller <clustername>
Then provide the logfile in /opt/netapp-harvest/log/<poller>_netapp-harvest.log
From that log I can see what the incoming link_speed values are and hopefully explain why it's not working as it should.
I will also send you a private message in case you prefer to share the logs privately.
Cheers,
Chris Madden
Storage Architect, NetApp EMEA (and author of Harvest)
Blog: It all begins with data
If this post resolved your issue, please help others by selecting ACCEPT AS SOLUTION or adding a KUDO or both!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi
We are using harvest to get performance on cDOT8.2.3 and 8.3
Aug 23 8.2.3 upgrading to 8.2.4P4
after upgrade the same error has occurred
When you confirm the netapp-dashboard-cluster of grafana
eth port utilization is greater than 3000 percent
dlmaldonado wrote
"/opt/netapp-harvest/plugin/cdot-nic-common"
--------------------------------------------------------------------------------------------------------
my $rx_pct = sprintf ("%.2f", $h{$start}{$port}{rx_bytes_per_sec} / $link_speed * 100 );
my $tx_pct = sprintf ("%.2f", $h{$start}{$port}{tx_bytes_per_sec} / $link_speed * 100 );
my $pct = sprintf ("%.2f", $tx_pct);
$pct = sprintf ("%.2f", $rx_pct) if ($rx_pct > $tx_pct);
push @emit_items, "$start.$port.rx_pct_util $rx_pct $timestamp";
push @emit_items, "$start.$port.tx_pct_util $tx_pct $timestamp";
push @emit_items, "$start.$port.link_pct_util $pct $timestamp";
--------------------------------------------------------------------------------------------------------
How to fix this code?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @hashiya1112
Actually, we resolved offline. One of the ports was link up but at 10Mbit and the plugin logic was not able to convert this correctly. I have added a fix and it will ship in the next Harvest release on the toolchest. In the meantime perhaps you can just find the port(s) that are online at 10Mbit and fix that to be 100Mbit or faster?
Cheers,
Chris Madden
Storage Architect, NetApp EMEA (and author of Harvest)
Blog: It all begins with data
If this post resolved your issue, please help others by selecting ACCEPT AS SOLUTION or adding a KUDO or both!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Chris
thank you for reply
There is an offline port
Not 10Mbit in online port
Do I change the network port modify command?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I've nic_common tried to edit it as follows
{ $link_speed = 1.25 if ($h{$start}{$port}{link_speed} == 10000000 ); #10Mbit $link_speed = 12.5 if ($h{$start}{$port}{link_speed} == 100000000 ); #100Mbit $link_speed = 125 if ($h{$start}{$port}{link_speed} == 1000000000 ); #1Gbit $link_speed = 1250 if ($h{$start}{$port}{link_speed} == 10000000000 ); #10Gbit } elsif ($connection{normalized_xfer} eq 'kb_per_sec') { $link_speed = 1250 if ($h{$start}{$port}{link_speed} == 10000000 ); #10Mbit $link_speed = 12500 if ($h{$start}{$port}{link_speed} == 100000000 ); #100Mbit $link_speed = 125000 if ($h{$start}{$port}{link_speed} == 1000000000 ); #1Gbit $link_speed = 1250000 if ($h{$start}{$port}{link_speed} == 10000000000 ); #10Gbit } elsif ($connection{normalized_xfer} eq 'b_per_sec') { $link_speed = 1250000 if ($h{$start}{$port}{link_speed} == 10000000 ); #10Mbit $link_speed = 12500000 if ($h{$start}{$port}{link_speed} == 100000000 ); #100Mbit $link_speed = 125000000 if ($h{$start}{$port}{link_speed} == 1000000000 ); #1Gbit $link_speed = 1250000000 if ($h{$start}{$port}{link_speed} == 10000000000 ); #10Gbit } elsif ($connection{normalized_xfer} eq 'gb_per_sec') { $link_speed = .00125 if ($h{$start}{$port}{link_speed} == 10000000 ); #10Mbit $link_speed = .0125 if ($h{$start}{$port}{link_speed} == 100000000 ); #100Mbit $link_speed = .125 if ($h{$start}{$port}{link_speed} == 1000000000 ); #1Gbit $link_speed = 1.25 if ($h{$start}{$port}{link_speed} == 10000000000 ); #10Gbit }
error is no longer out
but Calculation of eth port utilization percent became strange
e0M(node management port) utilization 3820 percent....
e0M is 100Mbit port
Hmm....
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @hashiya1112
Maybe give this a try:
my $link_speed = 1; if ($connection{normalized_xfer} eq 'mb_per_sec') { $link_speed = $h{$start}{$port}{link_speed} / 8000000; } elsif ($connection{normalized_xfer} eq 'kb_per_sec') { $link_speed = $h{$start}{$port}{link_speed} / 8000; } elsif ($connection{normalized_xfer} eq 'b_per_sec') { $link_speed = $h{$start}{$port}{link_speed} / 8; } elsif ($connection{normalized_xfer} eq 'gb_per_sec') { $link_speed = $h{$start}{$port}{link_speed} / 8000000000; } next if ($link_speed == 1); # Skip posting utilization if we couldn't normalize
If you still see a weird utilization check higher in this post for instructions on how to collect logs needed to understand what is happening. Send me these logs in a private message.
Cheers,
Chris Madden
Storage Architect, NetApp EMEA (and author of Harvest)
Blog: It all begins with data
If this post resolved your issue, please help others by selecting ACCEPT AS SOLUTION or adding a KUDO or both!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Chris
Thanks for the code!!
I There is also another project
using the harvest even in a other project
We're using the cDOT8.2.3P3 and cODT8.2.4P4 in another project
In that case
cp /opt/netapp-harvest/plugin/cdot-nic-common /opt/netapp-harvest/plugin/cdot-nic-common-8.2.4 vi /opt/netapp-harvest/plugin/cdot-nic-common-8.2.4 ------------------Fix to this code------------------------- my $link_speed = 1; if ($connection{normalized_xfer} eq 'mb_per_sec') { $link_speed = $h{$start}{$port}{link_speed} / 8000000; } elsif ($connection{normalized_xfer} eq 'kb_per_sec') { $link_speed = $h{$start}{$port}{link_speed} / 8000; } elsif ($connection{normalized_xfer} eq 'b_per_sec') { $link_speed = $h{$start}{$port}{link_speed} / 8; } elsif ($connection{normalized_xfer} eq 'gb_per_sec') { $link_speed = $h{$start}{$port}{link_speed} / 8000000000; } next if ($link_speed == 1); # Skip posting utilization if we couldn't normalize ------------------------------------------- cp /opt/netapp-harvest/template/default/cdot-8.2.0.conf /opt/netapp-harvest/template/default/cdot-8.2.4.conf vi /opt/netapp-harvest/template/default/cdot-8.2.4.conf 'nic_common' => { counter_list => [ qw(node_name node_uuid instance_name rx_bytes_per_sec tx_bytes_per_sec link_speed link_up_to_downs ) ], graphite_leaf => 'node.{node_name}.eth_port.{instance_name}', plugin => 'cdot-nic-common-8.2.4', enabled => '1' },
Modify the part of the plugin to create a new cdot-8.2.4.conf and cdot-nic-common-8.2.4?
Apart from the nic_common file
because Calculation of eth port utilization percent became strange of cDOT8.2.3P3....
Regards.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @hashiya1112
I think the issue is related to bug 915637. The counters in nic_common that track tx/rx are stored as 4bit numbers which means they rollover quite frequently which can impact display. New 8 bit counters were added in 8.2.4 and 8.3.2 and Harvest v1.3 will include this fix. I will contact you offline to provide a patch in the meantime.
Cheers,
Chris Madden
Storage Architect, NetApp EMEA (and author of Harvest)
Blog: It all begins with data
If this post resolved your issue, please help others by selecting ACCEPT AS SOLUTION or adding a KUDO or both!
