We have a process of using RMAN to restore a DB to a VM server by reading a backup image stored on a NFS share, then after the successful restoration, back the DB up to the same NFS share.
During the restoration, both latency and IOPS on this NFS share were very high, 25 ms/op and 1400 IOPS. During the backup, the latency was only about 1-2 ms/op, and about the same IOPS. Throughputs are about the same in both case as well.
I thought the RMAN read and write both is sequential I/O, I cannot explain why the read would cause such high latency.
It is actually CVO in Azure, in a 10x1TB HDD aggregate.
Yes, there is bad performance impact.
I didn't mention these details because my question was just simply about read(restoration) and write(backup) on the same environment/aggregate/filer(cvo), why was there very high latency for reading, but very low latency for writing and how to explain this?
Ah, that explains a lot. The IOPS limits on the underlying disks for CVO are based on bandwidth, not actual IO operations. It's fairly easy to size CVO for the day-to-day needs of an Oracle database, but when it's time for backups, you can easily overwhelm those disks. It's not a CVO limitation, the same thing happens with databases directly on Azure or AWS drives. 5000 IOPS with an 8K block size might be plenty for normal database operations, but an RMAN operation can consume all 5K IOPS and starve the database.
The best option is really to stop moving backup data around and rely on snapshots for in-place backup and restore instead. If that's not an option, you might be stuck increasing the IOPS capabilities of the backend disks to handle the streaming backup workloads.
Also, AWS and Azure (not sure about Google) have burst credits. When those expire, that hurts as it's like being double degraded in a regular FAS HDD array in terms of performance. You have to stop i/o on the aggregate so it can refill the credits.
1) Yes SnapCenter or any product that supports SnapDiff APIs and Oracle.
2) Getting an aggregate with drives big enough that burst credits are not a thing and enough disks to do so.
3) Writes are cached in RAM and NVLOG written to a temporary disk, so latency is usually less for writes than reading from spinning media. However, we are given a symptom with no data to review, so it is hard to say. If you really want a deeper dig, and it's not past 28 days I'd open a perf case and we can see if my theory is correct.
I've seen synthetic IO tests with FlashCache on CVO, and the results are amazing, as they should be. FlashCache was a miracle for database workloads when if first came out (I was in NetApp PS at the time) because it brought down the average latency of HDD's. It works the same with CVO. The backend drives for AWS and Azure native storage are flash, but it's still a shared resource and it's not nearly as fast as an actual on-premises all-flash array. FlashCache on that block of NVMe storage on the CVO instance works the same - it brings down the average latency.
I don't think there's a way to monitor the burst credits, but they publish the math of how it's calculated. You'll exhaust the burst credits pretty quickly with backups, so it's probably not going to help with that scenario. I did some tests a few years back where the burst credits were really confusing my results until I figured out what was happening.
With respect to snapshots, check out TR-4591 which includes some material on how to use plain snapshots right on ONTAP itself. SnapCenter is often best option, but not always. What you want is a snapshot. There are multiple ways to get it.
FlashCache is helpful for Databasee workload or random read, as you pointed out. But, after I sent out previous message, I am thinking it might be not helpful for RMAN backup or restore because they are sequential read/write. Right?
It is interesting to know that NetApp native snapshots can be used for Oracle backup. My understanding was different. I will be looking into that TR document you are referring to. Thanks
FlashCache actually might still help with the backup situation.
The basic problem here is the QoS limit that the cloud providers placed on their storage. They call it IOPS, but it's not, it's bandwidth. We've seen a lot of customers run into this exact problem.
Let's say you had an ancient HDD array. You could do backups AND the usual database random IO pretty easily because those large-block backup IO's are easy for a storage array to process. They're nice big IO's, and the array can do readahead. When we used to size for HDD database storage, we'd always focus on the random IOPS because that controlled the number of drives that goes into the solution. The sequential IO, like backups and many reporting operations, was almost free. You pay for random IO, and we throw in the sequential IO for at no cost. If you looked at the numbers, a typical HDD array might be able to provide 250MB/sec of random IOPS, but could easily do 2.5GB/sec of large-block sequential IOPS.
Public cloud storage doesn't give you that "free" sequential IO. They have strict bandwidth controls, and the result is that customers are often surprised that everything about their database works just fine with the exception of the backup or maybe that one late-night report that included a full table scan. The day-to-day random IOPS fit within the capabilities of the backend storage, but the sequential IO work slams into the limits relatively easily.
FlashCache ought to ease the pressure there because the IOPS serviced by the FlashCache layer won't touch the backend disks. I'd recommend limiting the number of RMAN channels too, because some IO will still need to reach those backend disks.
IOPS should refer to individual IO operations. A typical ERP database might reach 10K IOPS during random IO operations. That means 10,000 discrete, individual 8K block operations per second. If you do the math, that's also about 80MB/sec.
When the database does a full table scan or an RMAN backup, the IO's are normally in 1MB chunks. That may get broken down by the OS, but it's trying to do 1MB IO's. That means only 80 IOPS will consume 80MB/sec of bandwidth.
The end result is a true IOPS-based QoS control will throttle IO at a much lower total bandwidth than sequential IO. That's normally okay, because storage arrays have an easier time with large-block IO. It's okay to allow a host to consume lots of bandwidth.
It's easy to make a sizing error with databases when QoS is involved. It happens on-premises too. A storage admin won't realize that during the day they need 10K IOPS at an 8K IO size (80MB/sec) but at night they need 800IOPS at a 1MB size (800MB/sec). If you have a pure bandwidth QoS limit, like 100MB/sec, you'll be safely under the limit most of the time when the database is doing normal random IO tasks, but sometimes those late-night reports and RMAN backups slam into that 100MB/sec limit.
That's why you really, really need snapshots with a database in the public cloud. It's the only way to avoid bulk data movement. If you have to size for RMAN bulk data transfers, you end up with storage that is 10X more powerful and expensive than you need outside of the backup window.
One neat thing one of our customers did to fix the Oracle full table scan problem in the public cloud was Oracle In-Memory. They spent more for the In-Memory licenses, and they spent more for the RAM in the VM, but the result was dramatically less load on the storage system. That saved money, but more importantly, they were able to meet their performance targets in the public cloud. It's a perfectly obvious use case for In-Memory, it was just nice to see proof that it worked as predicted.
You're correct. They've normalized a bandwidth limit.
If you read carefully, you'll see statements like a volume offers "100 IOPS per GB (8K IO size)" which really means 800KB/GB and then they divided by 8K to get 100 IOPS. They could have described it as 200 IOPS per GB (4K IO size) if they wanted to. It's the same thing.
Normally this is all pretty unimportant, but with databases you have a mix of IO types. The random IO's are a lot more work to process, so it's nice to be able to limit actual IOPS. In addition, a small number of huge-block sequential IO's can consume a lot of bandwidth, so it's nice to be able to limit bandwidth independently.