Warming of Folder in Capacity Tier

Markus1 · ‎2024-02-21

Hello,

we are currently using AWS FSx for NetApp ONTAP and have a specific use case and we dont have a good solution to cover it yet.

We have tiering enabled and are tiering data to the capacity tier that is older than 2 weeks. That works for most of our data perfectly fine, but we have certain data that is roughly one year old and needs to be accessed from an application roughly a year after it was created. The access takes (as expected) roughly 3x as long as when the data would be in the performance tier. This really impacts our business.

We are not able to put that data on a dedicated volume without tiering. But we do know the folders in scope of the application beforehand and would like to "pre-warm" them before it's accessed sometime after. Is there a way to achieve this? Even if its just a script? How can make sure that we dont do sequential reads because from what I read, sequential reads do not result in the data ending up in the performance tier. Is this correct?

Any help here is highly appreciated.

Best,

Markus

elementx · ‎2024-02-21

If the app accesses it for read-only processing, I'd copy data to a temp folder (with tiering), process it with the app and then delete the temp folder. If writes are needed, output results to the original folder.

I wouldn't want to rewarm old data if it's just for one-off job, when copying can do it.

Markus1 · ‎2024-02-21

Hi elementx,

thank you for your replay 🙂 Really appreciating it. I forgot to mention that the data has to stay in its original place for the application to access it. Would copying the data to a temp folder even warm up the original data?

Best,
Markus

elementx · ‎2024-02-21

I don't know your policy but copying data creates new file(s) (although the blocks are identical to cold).

You can copy 1 file with cold blocks to a temp folder on the same volume and check where it's loading from if you read the copied file the next day. If you can read it faster than the original from which it was copied, then that may be a good approach.

Markus1 · ‎2024-02-21

Understood 🙂 I was more asking if even the original data (where i am copying from) are also warmed up during a copy process. Because this data i need to be warmed up for the application to access it more quickly when its necessary.

elementx · ‎2024-02-21

The original data are read as you established in the question, so it should not be warmed, but there are some conditions and it depends on policies. See TR-4598, e.g.

Markus1 · ‎2024-02-22

Thank you very much again!!

Yeah i also have read that in the documentation that i found within AWS for the service we are using. I was not sure though if a folder copy is a sequential or non-sequential read and even if so, if changing the cloud-retrieval policy (https://kb.netapp.com/onprem/ontap/dm/FabricPool/How_to_move_the_data_from_capacity_tier_to_performance_tier_in_FabricPool) to "on-read" would maybe help then.

The best case solution for us would be if it would be possible to promote only a specific folder to the performance tier instead of a whole volume (with cloud-retrieval-policy -> promote). Therefore the idea to write a script or something that basically implementes such a feature via e.g. copying the data to some place to warm it up before the business applications wants to access it the day after.

I am not sure how likley a future change of the cooling period to more than the currently possible 183 days is.

elementx · ‎2024-02-22

I don't know if they plan to add admin commands that could retrieve individual folders or files, but it seems to me that admin commands deal only with blocks and aren't aware of upper-layer constructs such as a "volume" or "filesystem".

Personally I would not want to use that approach (frequent use of admin commands to get things done seems like a trouble waiting to happen), but I no doubt some users would not mind.

There's something similar in 9.14.1 that doesn't require constant case-by-case recalls:

https://docs.netapp.com/us-en/ontap/fabricpool/enable-disable-aggressive-read-ahead-task.html

I don't know what kind of application or data set size we're talking about, but copying files to a temp directory sounds like the simplest idea possible if the read-ahead thing from 9.14.1 fails to help.