Subscribe

Disk Busy 100% without much I/O activity

[ Edited ]

Hello everybody,

 

Could someone help us diagnose root cause of the problem with our FAS3170? ONTAP 8.1.2 7-mode, we have a single 1TB SATA disk shelf, configured with one aggregate. Aggregate utilization is less than 45%, there are several thin provisioned volumes, volumes are less than 70% full as well.

 

The problem is that from time to time, disks on the aggregate are busy 99-100% (see below) and access time is hitting 300-700ms (see attached screenshots), making all our VMs running from the volumes pretty much non-workable.

 

disk:50000C90:001D0E64:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:disk_busy:8%

disk:50000C90:001D2A34:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:disk_busy:11%

disk:50000C90:001D1F40:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:disk_busy:100%

disk:50000C90:001DD5DC:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:disk_busy:100%

disk:50000C90:001D0D84:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:disk_busy:99%

disk:50000C90:001D1E78:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:disk_busy:100%

disk:50000C90:001D1B9C:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:disk_busy:100%

disk:50000C90:001DE5F8:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:disk_busy:99%

disk:50000C90:001DFDCC:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:disk_busy:94%

disk:50000C90:001D0C98:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:disk_busy:97%

disk:50000C90:001DE004:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:disk_busy:94%

disk:50000C90:001DEF90:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:disk_busy:100%

disk:50000C90:001DD5A8:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:disk_busy:100%

disk:50000C90:001DE208:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:disk_busy:100%

disk:50000C90:001DDE1C:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:disk_busy:0%

disk:50000C90:001DD560:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:disk_busy:0%

disk:50000C90:001D29D0:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:disk_busy:100%

disk:50000C90:001D0CF8:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:disk_busy:100%

disk:50000C90:001D0E18:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:disk_busy:97%

disk:50000C90:001D1F60:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:disk_busy:99%

disk:50000C90:001D185C:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:disk_busy:99%

disk:50000C90:001D1FAC:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:disk_busy:97%

 

I need just one thing - find out what causes 100% disk busy using ONTAP's or third-party tools. I turned off dedup, there is no snapmirror activity, no WAFL scans, no realloc, there are no high IO/s from VMs or some other places - still have no f-ing idea why disks are so busy.

 

Any help? Tickets opened at netapp support were useless, they told us that the system is too old and it is expected to have such kind of behaviour with SATA shelves...

Re: Disk Busy 100% without much I/O activity

if you are willing to collect and send me a perfstat file (while the disks are 100%) I might be able to help

 

Re: Disk Busy 100% without much I/O activity

Sure, do you need them with some specific settings?

Re: Disk Busy 100% without much I/O activity

just sent you a private message with the details

 

 

Re: Disk Busy 100% without much I/O activity

Hello,

 

Could you find a solution for this issue? We have the same issue with a FAS3250, 4TB Disk, OnTap 8.1.4P8 7-Mode. Disk utilizations runs really high each morning for approx. 5 hours.

 

Regards

SaPu

Re: Disk Busy 100% without much I/O activity

Are you checking it from sysstat ? It will give you the busiest disk as it only gives you the busiest CPU.

 

Go to advanced mode.

*> statit -b

 

Then wait for one minute

 

*>statit -e

To stop statit collection and dump data.

*>priv set

 

Now check the disk utilization in disk/aggregate section which will give the values for every disk.

Refer below document if you need help in running those stats.

 

https://kb.netapp.com/support/index?page=content&id=1014701

Re: Disk Busy 100% without much I/O activity

"each morning for approx. 5 hours" sounds very much like disk scrub.

Re: Disk Busy 100% without much I/O activity

+1 for disk scrub.  Default duration is 6 hours

Re: Disk Busy 100% without much I/O activity

Well the best solution is to add more disks... In our case it was determined that we simply did not have enough disks. After we moved flash cache cards from another filer, situation got better.