Sorry it took me some time to get back to your query. Usually, when we talk about the 'internals' of cDOT, it's only then we use the term 'dblade' and 'nblade'. We/I generally don't use these terms when monitoring resource metrics. However, when we say 'dblade/data-blade', we are simply talking about the resources specific to the 'Node', it could be capacity/utilization or performnace related metrics.
I haven't used or seen graffana/grafite for couple of years now, hence I can't comment if there is anything that could be pulled named as dblade stats or report. To me, anything that says 'node-level disk utilization, aggregate disk utilization, reads from HDD, node level cpu etc is what I would map to node/dblade stats.
Now, instead of dblade if I simply say 'Node' level statistics, then there are many stats that I generally retrieve from the command line i.e cpu,memory,iops, statit (wafl statistics) etc. If you want I can give you the commands that I personally use to extract this information. These are simple common commands available via node-lvel. However, I always begin investigation using 'Qos' command for real-time and use OCUM for looking at the trends, and then narrow down based on that output.
In general, statistics parameters that are available via 'Qos Statistics' command from clustershell and via 'OCUM Perofrmnace GUI tool' are sufficient and good enough to narrow down the source problem/issue, and based on that I would logically drill down to the lowest end i.e disk.
End of the day, there are many bottlenecks that can come up, especially in ONTAP systems. There are plenty of paths in the network, cluster connections, CPU, disk, and various layers in between. Hence, whatever file-system (WAFL) you are dealing with, it comes down to over-all infrastructure.
Thanks Ontapforrum for the detailed explanation. We do have all the commands and tools which you have mentioned and Grafana is something new we implemented and want to use it for our performance monitoring (deep dive level) but couple of views missing in it is D-blade stats and Head room of the controllers. If these can be veiwed it makes life easier for us to make decisions whether to go buy more head room or not, I know we can still do it now using current views but it leads to more questions when we post those views to business as they might ask where is this IO coming who is doing it etc etc.
Generally I rely on OCUM & powershell , and for headroom I simply run a powershell script to identify 'optimal point utilization vs current value' etc. It gives you an overall view quickly without breaking your head. I guess, until there is way to show them as single pane grahp, it will be a manual task.