Solved: configuration backup is very large and contains netboot

tom_dewit · ‎2021-05-03

Hi all,

I noticed that on one node of a cluster (Ontap 9.7), the system configuration backup was much larger than on others (1.5Gb instead of the normal 50Mb on the other nodes). I see that the complete netboot directory is contained in this backup (1.4Gb in size). How can I exclude this directory from the configuratioon backup or can I even just delete it in the root volume ?

Every time the configuration backup is taken, I get slow CP warnings for my root aggregate, so I would like to ease the stress by shrinking the config backup.

There used to be a KB describing this I think, but it seems to have disappeared. It was called 'How to find what is filling a node's root volume'

Thanks is advance

Grtz,

Tom

Mjizzini · ‎2021-05-04

How to find what is filling a node's root volume

View solution in original post

Mjizzini · ‎2021-05-04

How to find what is filling a node's root volume

tom_dewit · ‎2021-05-04

I don't have access to that link

I was able to solve it myself by deleting the netboot directory from systemshell. However, I would like this KB to be accessible again as this problem is not so uncommon.

paul_stejskal · ‎2021-05-04

There are diagnostic level commands that are not to be performed without NetApp guidance. We will see what we can do however to make it possible or some process.

tom_dewit · ‎2021-05-04

Hi Paul,

I understand that NetApp would want to shield those articles from customers, but they should be accessible to SSC/ASP NetApp partners (and they were accessible in the past)

Tom

paul_stejskal · ‎2021-05-04

Fixed.

tom_dewit · ‎2021-05-04

Thanks Paul. One suggestion: Could you add an extra bullet point that if a netboot folder is present under /mroot/etc that this folder can be safely deleted in the root volume ? That was what was causing my cluster configuration backups to be huge.

paul_stejskal · ‎2021-05-04

Good point. I added a note to step 5. Let me know if that works.

tom_dewit · ‎2021-05-04

Not to split hairs, but it would be easier to add it as a numbered point like where you are deleting trace files in point 3. We would be deleting the whole netboot folder if it exists and not just a large file that would happen to be in that folder (that is where you have put the note now).

But thanks for the clarification ... it would at least have helped me solve this issue more quickly.

loganb · ‎2021-05-04

hello tom,

im curious about how a netboot directory ended up in /etc.

do you have the full path? or can you provide the serial number?( if Autosupports are on i can check)
adding a note to look for netboot dir might be rare since it may be a one -off for this case.

we will review the KBs we have and clean them up.

fyi we also have this kb better suited for config backups failures - https://kb.netapp.com/Advice_and_Troubleshooting/Data_Storage_Software/ONTAP_OS/Cluster_configuration_backup_creation_failed

though in your case it was not failing , you just noticed a size diff compared to the other nodes. would need to create something new for this scenario.

We appreciate the Feedback

tom_dewit · ‎2021-05-04

Logan,

It may be more interesting to make a separate article for that problem and link to that in both the current KB's. I actually think an article may already have existed for that or it still does, but I don't have access to it. The netboot folder impacts both sizes.

The controller I had removed this netboot directory on was sysID 0538175320. It was recently headswapped to a FAS8300 and netbooted at that time, so that may explain something about how it ended up being there.

It may be rare, but not that rare, as I am sure about at least 3 times that I had this issue including this one..

paul_stejskal · ‎2021-05-04

The node shell rm command doesn't remove directories, just individual files. Hence why I put in step 5.

tom_dewit · ‎2021-05-04

To remove the directory you need to use ‘rm -rf’

JustSomeGuy · ‎2021-08-26

This is the first link proposed when searching for "netapp configuration backup is very large". Coincidentally we also just went through a head swap, are using the 8300 hardware, and are running 9.7. In my case I'm trying to perform a root volume migration over to another set of disks. The process times out and I notice the cp.toolong warnings in the event log during the backup. Even though the root migration fails, the backup does complete about 2 hours later and is about 6.8GB in size.

I don't know what all gets captured in that node backup, but that seems excessive compared to what I've read about in other posts / sites. Already tried deleting core dumps and disabling snapshots, but it doesn't appear to be a space issue. When I look at etc/log/mlog through the SPI interface I can quickly tally up about 9GB just between mgwd, messages, and sktrace log files.

I do have a couple cases open with NetApp support on the issue, but I wish I had access to the KB referenced here. Might provide some extra insight into what's going on.

loganb · ‎2021-08-26

thats pretty large .

the steps to review the files and dir in root require diagnostic access. which prevents us from adding those steps publicly. you can damage your config by deleting the wrong files.

- the key here is to identify what is consuming the extra space, then work with engineering to add a check or fix to prevent this from occurring in the future.

can you provide the case #? i can track it

thank you

JustSomeGuy · ‎2021-08-26

Thanks for the information. I sent you the case numbers a moment ago.