Tech ONTAP Blogs
Tech ONTAP Blogs
I've seen an extraordinary amount of interest in the C-Series systems for all sorts of workloads. I'm the Big Scary Enterprise Workload guy, which means I want proof before I recommend anything to a customer. So, I reached out to the Workload Engineering team and got some realistic test data that demonstrates what it can do.
If you want to real all the low-level details about the A-Series and C-Series architectures and their performance characteristics, there's a couple of hyperlinks below.
Before I get into the numbers, I wanted recap my personal view of "What is C-Series?, or to phrase it better…
For a long time, solid-state storage arrays just used "Flash". For a brief time, it appeared the solid state market was going to divide into Flash media and 3DCrosspoint media, which was a much faster solid state technology, but 3DXP couldn't break out of its niche.
Instead, we've seen the commercial Flash market divide into SLC/TLC drives and QLC drives. Without going into details, SLC/TLC are the faster and more durable options aimed at high-speed storage applications, whereas QLC is somewhat slower and less durable* but also less expensive and are aimed at capacity-centric, less IO-intensive storage applications.
*Note: Don't get misled on this topic. The durability difference between TLC and QLC might be important if you're purchasing a drive for your personal laptop, but the durabilty is essentially identical when drives are inserted into ONTAP. ONTAP RAID technology is still there protecting data against media failures. Furthermore, ONTAP WAFL technology distributes inbound write data to free blocks across multiple drives. This minimizes overwrites of the individual cells within the drive, which maximizes the drive's useful life. Also, NetApp support agreements that cover drive failures also include drive replacement for SSDs that have exhausted their write cycles.
The result of the market changes is that NetApp now offers the A-Series for high-speed, latency sensitive databases or IO-intensive VMware estates, while the C-Series is for less latency-sensitive and more capacity-centric workloads.
That's easy enough to understand in principle, but it's not enough for DBAs, virtualization admins, and storage admins to make decisions. They want to see the numbers…
What makes this graph so compelling is its simplicity.
The workload is an Oracle database issuing a roughly 80%/20% random-read, random-write split. We graph total IOPS against read latency because read latency is normally the most important factor with real-world workloads.
The reason we use Oracle databases is twofold. First, we've got thousands and thousands of controllers servicing Oracle databases, so it's an important market for us. Second, Oracle databases are an especially brutal, latency sensitive workloads with lots of interdependencies between different IO operations. If anything is wrong with storage behavior, you should see it in the charts. You can also extrapolate these results to lots of different workloads beyond databases.
We are also graphing the latency as seen from the database itself. The timing is based on the elapsed time between the application submitting an IO, and the IO being reported as complete. The means we're measuring the entire storage path from the OS, through the network, into the storage system, back across the network to the OS, and up to the application layer.
I'd also like to point out that the performance of the A-Series is amazing. 900K IOPS without even breaching the 1ms latency runs circles around the competition, but I've posted about that before. This post is focusing on C-Series.
Note: These tests all used ONTAP 9.13.1, which includes some significant performance improvements for both A-Series and C-Series.
Obviously write latency is also important, but the A-Series and C-Series both use the same write logic from the point of view of the host. Write operations commit to the mirrored, nonvolatile NVRAM journal. Once the data is in NVRAM, the write is acknowledged and host continues. The write to the drive layer comes much later.
Want proof?
This is a graph of total IOPS versus the write latency. Note that the latency is reported in microseconds.
You can see your workload's write latency is mostly unaffected by the choice of A-Series or C-Series. As the IOPS increase toward the saturation point, the write latency on C-Series increases more quickly than A-Series as a result of the somewhat slower media in use, but keep this in perspective. Most real-world workloads run as expected so long as write latency remains below 500µs. Even 1ms of write latency is not necessarily a problem, even with databases.
These tests used 10TB of data within the database (the database itself was larger, but we're accessing 10TB during the test runs). This means the test results above do include some cache hits on the controller, which reflects how storage is used in the real world. There will be some benefit from caching, but nearly all IO in these tests is being serviced from the actual drives.
We also run these tests with storage efficiency features enabled, using a working set with a reasonable level of compressibility. If you use unrealistic test data, you can get outrageously unreasonable amounts of caching that skew the results in the same way that running a test with a tiny working set that is unrealistically cacheable can skew results.
The reason I want to point this out is that customers considering C-Series need to understand that not all IO latency is affected. The higher latencies only appear with read operations that actually require a disk IO. Reads that can be serviced by onboard cache should be measurable in microseconds, as is true with the A-Series. This is important because all workloads, especially databases, include hot blocks of data that require ultra-low latency. Access times for cached blocks should be essentially identical between A-Series and C-Series.
Sequential IO is much less affected by the drive type than random IO. The reason is sequential IO involves both readahead and larger blocks. That means the storage system can start performing read IO to the drives before the host even requests the data, and there are much fewer (but larger) IO operations happening on the backend drives.
On the whole, if you're doing sequential IO you should see comparable performance with A-Series, C-Series and even old FAS arrays if they have enough drives.
We compared A-Series to C-Series and saw a peak sequential read throughput of about 30GB/sec with the A-Series and about 27GB/sec with the C-Series. These numbers were using synthetic tools. It's difficult to perform a truly representative sequential IO tests from a database because of the configuration requirements. You'd need an enormous number of FC adapters to run a single A800 or C800 controller to its limit, and it's difficult to get a database to try to read 30GB/sec in the first place.
As a practical matter, few workloads are purely sequential IO, but tasks such as Oracle RMAN backups or database full table scans should perform about the same on both A-Series and C-Series. The limiting factor should normally be the available network bandwidth, not the storage controller.
It's all ONTAP, it's just about the media. If you have a workload that genuinely requires consistent latency down in the 100's of µs range, then choose A-Series. If your workload can accept 2ms (or so) of read latency (and remember, cache hits and write IO latency is much faster) then look at C-Series.
As a general principle, think about whether your workload is about computational speed or is about the end-user experience. A bank performing end-of-day account reconciliation probably needs the ultra-low latency of the A-Series. In contrast, a CRM database is usually about the end users. If you're updating customer records, you probably don't care if it takes an extra 2ms to retrieve a block of data that contains the customer contact information.
You can also build mixed clusters and tier your databases between A-Series and C-Series as warranted. It's all still ONTAP, and you can nondisruptively move your workloads between controllers.
Finally, the C-Series is an especially good option for upgrading legacy spinning drive and hybrid arrays. Spinning drive latencies are typically around 8-10ms, which means C-Series is about 4X to 5X faster in terms of latency. If you're looking at raw IOPS, there's no comparison. A spinning drive saturates around 120 IOPS/drive. You would need about 6000 spinning drives to reach the 800,000 IOPS delivered by just 24 QLC drives as shown in these tests. It's a huge improvement in terms of performance, power/cooling requirements, and costs, but at a lower price point than arrays using SLC or TLC drives.
If you want to learn more, we published the following two technical reports today:
Oracle Performance on AFF A-Series and C-Series
Virtualized Oracle Performance on AFF A-Series and C-Series
If you're wondering how A-Series and C-Series compare under virtualization, here it is. It's not easy building a truly optimized 4-node virtualized RAC environment, and we probably could have tuned this better to reduce the overhead from ESX and VMDK's, but the results are still outstanding and more importantly consistent with the bare metal tests. The latency is higher with C-Series, but the IOPS levels are comparable to A-Series.
We also did this test with FCP, not NVMe/FC, because most VMware customers are using traditional FCP. The protocol change is the primary reason for the lower maximum IOPS level.