3 weeks ago - last edited 3 weeks ago
I'm having a strange recurring issue where absurdly high metrics are getting placed into my harvest/graphite instance. I don't see any strange messages in the logs for the timeframes where it occurs.
Has anyone seen anything similar? I have two clusters and it is happening with both. I have two screencaps below, the first is a 12 hour view, and the second is a 60 day view. It almost appears the outliers are just getting higher and higher with time? But I'm not sure if that is just some sort of rollup issue with graphite.
I would love to be able to use this data- really the only reason I created the QOS polices was for this purpose, but it's almost impossible to parse with these outliers.
edit: it looks like I can somewhat get around this issue by using the ''removeAboveValue(100000)" or "removeAbovePercentile(99.6)" functions. However that doesn't negate the fact this erroroneous data is getting placed into graphite.