Regarding compression, one of our Partners has received the following reply from one of their customers:
"I was happily reading through the acceptance document until I found that the compressed data bypasses the Cache Cards (PAM), this seems a reason not to implement. Your thoughts would be of interest, have you enabled compression for any other customer and if so what were the results."
From the Data ONTAP Testing FAQ I also found:
I have a pam (or Flash cache) card in my SYSTEM. will compression have any Effect, either positive or negative, on the performance of the card?
Answer: Any blocks that contain compressed data bypass the PAM or Flash Cache card. This means that these blocks do not benefit from the performance boost that the card would otherwise provide. Any uncompressed data such as that inside volumes that do not have compression enabled will still benefit from the card.
Can anybody provide a helpful reply to the above comments?
I actually wasn't aware of this, but one immediate comment:
Make sure compression is not confused with de-duplication - the latter does work (very well) with Flash Cache.
Is there a technical case to use compression in this particular solution, or would de-duplication deliver required storage savings?
The customer is already using de-duplication but as they have the compression license were wondering if they should implement it or not.
While they accept compression will have a small performance hit they don’t want to use it if it is significant, by-passing the PAM cards would certainly have the potential to be significant.
Would be good to go back to them with a positive view.
Actually, I haven't found any reference that PAM does not cache compressed data.
On the contrary, in every interop table regarding PAM or compression, it's said that it does.
Is there any TR or white paper that declares it?
tr-3958, page 22 states below in red.
7.6 PAM AND FLASH CACHE CARDS
In environments with high amounts of shared blocks that are read repeatedly, the PAM or Flash Cache card can significantly reduce the number of disk reads, thus improving the read performance. The PAM or Flash Cache card does not increase performance of the compression engine or of reading compressed data. The amount of performance improvement with the PAM or Flash Cache card depends on the amount of shared blocks, the access rate, the active dataset size, and the data layout. The PAM or Flash Cache card has provided significant performance improvements in VMware® VDI environments. These advantages are further enhanced when combined with shared block technologies, such as NetApp deduplication or NetApp FlexClone® technology.
So according to NetApp documentation PAM/Flash Cache does indeed bypass compressed data.
This can be also interpreted that PAM bypasses uncompressed data. That PAM does not increase compression performance is understandable - there is no repeated reads here. When you read compressed data, NetApp has to uncompress them before delivering to client. So this phrase could simply mean that NetApp does not attempt to cache these (temporary) uncompressed data.
But I agree, it should be clarify to avoid confusion.
Thanks for your question. You are correct blocks on disk that contain compressed data will not be stored on the Flash Cache card. The Flash Cache card is used to increase random read performance. By it's nature compression is sequential therefore regardless of the fact that the data is compressed it will not be stored in Flash Cache. Any blocks on disk that do not contain compressed data as well as metadata will continue to utilize the Flash Cache card.
Yes, regardless of how this is set any data that exists on disk as compressed will bypass the Flash Cache card. Uncompressed data and metadata will still utilize the Flash Cache card.
Is there any particular reason why there is such a behavior?
Compressed data might be frequently accessed and ie keeping compressed or even better rehydrated (un-compressed) data in the flashcard could prove to give a significant performance boost with Flash Cache Card.
I feel that the PAM behavion should be usercontrolled, as you already enable users to do today onper volume basis (right?)
Anyway I just noticed that Ontap 8.1 RC documentation claims that compression is compatible with among others (storage management guide page 246-247):
Performance Acceleration Module or Flash cache cards
Yes, compression is compatible with the PAM and Flash Cache cards. That was true in both Data ONTAP 8.0.X as it is now in Data ONTAP 8.1. The behavior of how it works with the Flash Cache card in Data ONTAP 8.1 is the same as how it worked in Data ONTAP 8.0.X, compressed blocks bypass the card.
I realized there is some confusion here. May be you could clarify this.
When we speak about compression we have original blocks (let's call them A), blocks that contain compressed data and are physically stored on disks (let's call them B) and transient blocks that contain data, uncompressed from A during read (let's call them C).
So - are all A, B, C not cached in PAM, or some types are?
Sorry, I am a little confused by your example. Let me see if i can clarify a little. Let's say that you have blocks on disk that contain compressed data, we will call A. We also have blocks on disk that contain uncompressed data let's call B. When we read A into memory it will be uncompressed in memory and remain in memory as uncompressed until memory is full then it will be flushed. When we read B into memory it will remain in memory until memory is full then it will be moved in the Flash Cache card. Remember not all data is compressible and therefore a compressed volume may contain both compressed and uncompressed data.
I hope this helps,
TME - Storage Efficiency
Sorry for not being clear. What I had in mind, was something about lifecycle of data.
1. Client writes data into NetApp. Data comes as uncompressed blocks. They were called "A" in my example.
2. NetApp compresses incoming data. It will result in new blocks, blocks "B" in my example. "B" is written to disks.
3. Client makes read request for data in blocks "B". "B" is fetched from disk into memory, gets uncompressed resulting in blocks C (or A' )
4. Some blocks from C are sent to client as requested.
Hope it clarifies it.
So you say that in above "C" will never enter PAM. Is it right? But what about "B" in above? I would expect them to be retained for future reads.
Correct since the data on disk is compressed, even it's uncompressed version in memory will not be written to the Flash Cache card. In your example if A was written as uncompressed to disk that could be written to PAM. Blocks B and C in your example will bypass the Flash Cache card. Blocks A and C will remain in main memory for servicing additional reads until main memory flushes this data.
While I do understand the fact that compressed data never touches PAM, I would like to know why this kind of design decision was made as a general policy for every volume with compressed data.
Actually, I can not understand why compressed data is not cached in PAM. It could potentially save more disk reads comparing with uncompressed data.
I can understand why transient decompressed data is not entered in PAM; this is basically CPU vs space decision. But not for compressed blocks ...
some very valid points and questions on this thread. I just quickly wanted to highlight a couple of things:
* As mentioned in the TR, compression provides huge benefits in terms of space savings for certain workloads. However, as is true with any compression solution performance impact needs to be accounted for.
* Integration with flash is definitely an appealing use case but we need to evaluate where it (flash) is used(in workloads) and whether compression is the right fit for those workloads. From this thread it looks like there are some use cases that can benefit and in that case it is something that we will have to address soon. As cache technology becomes more pervasive in the storage/host layer, we can expect better synergy with storage efficiency (including compression). The current goal is to ensure we provide the best savings across a wide range of use cases in a way that compliments their initial requirement of storage and reduce the TCO without negatively impacting the system behavior.
NetApp has lead from the front when it comes to offering the best efficiency (storage and operational) and will continue to do so. If you need more insights into our SE offerings and where we are headed, I will be more than happy to engage with you/your customer on a 1:1 basis.