Subscribe

Harvest / Graphite Outlier Issue

[ Edited ]

Hi All,

 

I'm having a strange recurring issue where absurdly high metrics are getting placed into my harvest/graphite instance. I don't see any strange messages in the logs for the timeframes where it occurs.

 

Has anyone seen anything similar? I have two clusters and it is happening with both. I have two screencaps below, the first is a 12 hour view, and the second is a 60 day view. It almost appears the outliers are just getting higher and higher with time? But I'm not sure if that is just some sort of rollup issue with graphite.

 

I would love to be able to use this data- really the only reason I created the QOS polices was for this purpose, but it's almost impossible to parse with these outliers.

 

12hr svm qos policy group

 

60 day history

 

edit: it looks like I can somewhat get around this issue by using the ''removeAboveValue(100000)" or "removeAbovePercentile(99.6)" functions. However that doesn't negate the fact this erroroneous data is getting placed into graphite.

 

workaround