Some more digging and I found this in "/opt/graphite/storage/log/webapp/error.log":
[Wed Dec 23 13:30:59 2015] [error] mod_wsgi (pid=5773): Target WSGI script '/opt/graphite/conf/graphite.wsgi' cannot be loaded as Python module.
[Wed Dec 23 13:30:59 2015] [error] mod_wsgi (pid=5773): Exception occurred processing WSGI script '/opt/graphite/conf/graphite.wsgi'.
which is suggestive of a "wrong version of python" issue.
# yum info mod_wsgi
Loaded plugins: product-id, security, subscription-manager
This system is not registered to Red Hat Subscription Management. You can use subscription-manager to register.
rhel-6-updates | 2.9 kB 00:00
Installed Packages
Name : mod_wsgi
Arch : x86_64
Version : 3.2
Release : 7.el6
Size : 177 k
Repo : installed
From repo : rhel-6-updates
Summary : A WSGI interface for Python web applications in Apache
URL : http://modwsgi.org
License : ASL 2.0
Description : The mod_wsgi adapter is an Apache module that provides a WSGI compliant
: interface for hosting Python based web applications within Apache. The
: adapter is written completely in C code against the Apache C runtime and
: for hosting WSGI applications within Apache has a lower overhead than using
: existing WSGI adapters for mod_python or CGI.
# yum deplist mod_wsgi
...
dependency: libpython2.6.so.1.0()(64bit)
So it appears that wsgi is correctly matched with python.
If I try run it directly...
# python --version
Python 2.6.6
# python /opt/graphite/conf/graphite.wsgi
/usr/lib/python2.6/site-packages/django/conf/__init__.py:75: DeprecationWarning: The ADMIN_MEDIA_PREFIX setting has been removed; use STATIC_URL instead.
"use STATIC_URL instead.", DeprecationWarning)
but no "cannot load" error.
A more comprehensive extract from that erorr file is:
[Wed Dec 23 13:30:59 2015] [error] /usr/lib/python2.6/site-packages/django/conf/__init__.py:75: DeprecationWarning: The ADMIN_MEDIA_PREFIX setting has been removed; use STATIC_URL instead.
[Wed Dec 23 13:30:59 2015] [error] "use STATIC_URL instead.", DeprecationWarning)
[Wed Dec 23 13:30:59 2015] [error] mod_wsgi (pid=5773): Target WSGI script '/opt/graphite/conf/graphite.wsgi' cannot be loaded as Python module.
[Wed Dec 23 13:30:59 2015] [error] mod_wsgi (pid=5773): Exception occurred processing WSGI script '/opt/graphite/conf/graphite.wsgi'.
[Wed Dec 23 13:30:59 2015] [error] Traceback (most recent call last):
[Wed Dec 23 13:30:59 2015] [error] File "/opt/graphite/conf/graphite.wsgi", line 25, in <module>
[Wed Dec 23 13:30:59 2015] [error] import graphite.metrics.search
[Wed Dec 23 13:30:59 2015] [error] File "/opt/graphite/webapp/graphite/metrics/search.py", line 6, in <module>
[Wed Dec 23 13:30:59 2015] [error] from graphite.storage import is_pattern, match_entries
[Wed Dec 23 13:30:59 2015] [error] File "/opt/graphite/webapp/graphite/storage.py", line 9, in <module>
[Wed Dec 23 13:30:59 2015] [error] from graphite.remote_storage import RemoteStore
[Wed Dec 23 13:30:59 2015] [error] File "/opt/graphite/webapp/graphite/remote_storage.py", line 8, in <module>
[Wed Dec 23 13:30:59 2015] [error] from graphite.util import unpickle
[Wed Dec 23 13:30:59 2015] [error] File "/opt/graphite/webapp/graphite/util.py", line 82, in <module>
[Wed Dec 23 13:30:59 2015] [error] defaultUser = User.objects.create_user('default','default@localhost.localdomain',randomPassword)
[Wed Dec 23 13:30:59 2015] [error] File "/usr/lib/python2.6/site-packages/django/contrib/auth/models.py", line 160, in create_user
[Wed Dec 23 13:30:59 2015] [error] user.save(using=self._db)
[Wed Dec 23 13:30:59 2015] [error] File "/usr/lib/python2.6/site-packages/django/db/models/base.py", line 463, in save
[Wed Dec 23 13:30:59 2015] [error] self.save_base(using=using, force_insert=force_insert, force_update=force_update)
[Wed Dec 23 13:30:59 2015] [error] File "/usr/lib/python2.6/site-packages/django/db/models/base.py", line 551, in save_base
[Wed Dec 23 13:30:59 2015] [error] result = manager._insert([self], fields=fields, return_id=update_pk, using=using, raw=raw)
[Wed Dec 23 13:30:59 2015] [error] File "/usr/lib/python2.6/site-packages/django/db/models/manager.py", line 203, in _insert
[Wed Dec 23 13:30:59 2015] [error] return insert_query(self.model, objs, fields, **kwargs)
[Wed Dec 23 13:30:59 2015] [error] File "/usr/lib/python2.6/site-packages/django/db/models/query.py", line 1593, in insert_query
[Wed Dec 23 13:30:59 2015] [error] return query.get_compiler(using=using).execute_sql(return_id)
[Wed Dec 23 13:30:59 2015] [error] File "/usr/lib/python2.6/site-packages/django/db/models/sql/compiler.py", line 912, in execute_sql
[Wed Dec 23 13:30:59 2015] [error] cursor.execute(sql, params)
[Wed Dec 23 13:30:59 2015] [error] File "/usr/lib/python2.6/site-packages/django/db/backends/sqlite3/base.py", line 344, in execute
[Wed Dec 23 13:30:59 2015] [error] return Database.Cursor.execute(self, query, params)
[Wed Dec 23 13:30:59 2015] [error] IntegrityError: column username is not unique
[Wed Dec 23 13:31:04 2015] [error] mod_wsgi (pid=5771): Target WSGI script '/opt/graphite/conf/graphite.wsgi' cannot be loaded as Python module.
[Wed Dec 23 13:31:04 2015] [error] mod_wsgi (pid=5771): Exception occurred processing WSGI script '/opt/graphite/conf/graphite.wsgi'.
There is also a bunch of these:
[Wed Dec 23 15:24:29 2015] [error] No handlers could be found for logger "cache"
[Wed Dec 23 15:24:29 2015] [error] No handlers could be found for logger "cache"
[Wed Dec 23 15:24:29 2015] [error] No handlers could be found for logger "cache"
[Wed Dec 23 15:24:29 2015] [error] No handlers could be found for logger "cache"
[Wed Dec 23 15:24:29 2015] [error] No handlers could be found for logger "cache"
and the timing of these is closest to matching when the problem occurs in the web UI.
Ok, now this is a bit more curious:
whisper.CorruptWhisperFile: Unable to read header (/opt/graphite/storage/whisper/netapp/perf7/xxx/wafl/cp_phase_times/P2V_SNAP.wsp)
... and there are quite a few of those! And they're all 0 sized files... likely a result of /opt running out of space!
how to regrow those files? Just remove the empty ones and restart netapp-harvest?
It seems likely...
Removing all of the zero sized files caused them to be recreated (now that there was disk space aplenty) after restarting all of the services:
# tail /opt/graphite/storage/log/carbon-cache/carbon-cache-a/creates.log
31/12/2015 18:09:10 :: creating database file /opt/graphite/storage/whisper/netapp/perf7/XXX/avg_latency.wsp (archive=[(60, 50400), (300, 28800), (900, 37920), (3600, 43800)] xff=0.5 agg=average)
31/12/2015 18:12:13 :: new metric netapp.perf7.XXX.write_align_histo.4 matched schema netapp.perf
31/12/2015 18:12:13 :: new metric netapp.perf7.XXX.write_align_histo.4 matched aggregation schema default_average
31/12/2015 18:12:13 :: creating database file /opt/graphite/storage/whisper/netapp/perf7/XXX/write_align_histo/4.wsp (archive=[(60, 50400), (300, 28800), (900, 37920), (3600, 43800)] xff=0.5 agg=average)
31/12/2015 18:15:08 :: new metric netapp.perf7.XXX.write_ops matched schema netapp.perf
31/12/2015 18:15:08 :: new metric netapp.perf7.XXX.write_ops matched aggregation schema default_average
31/12/2015 18:15:08 :: creating database file /opt/graphite/storage/whisper/netapp/perf7/XXX/write_ops.wsp (archive=[(60, 50400), (300, 28800), (900, 37920), (3600, 43800)] xff=0.5 agg=average)
31/12/2015 18:15:09 :: new metric netapp.perf7.XXX.write_data matched schema netapp.perf
31/12/2015 18:15:09 :: new metric netapp.perf7.XXX.write_data matched aggregation schema default_average
31/12/2015 18:15:09 :: creating database file /opt/graphite/storage/whisper/netapp/perf7/XXX/write_data.wsp (archive=[(60, 50400), (300, 28800), (900, 37920), (3600, 43800)] xff=0.5 agg=average)
... and now every row *except* "TOP VOLUMES DRILLDOWN" from the dashboard page is ok. On "TOP VOLUMES", there are still 4 with the red triangle, including "Read Latency".