ONTAP Discussions

Efficiency Errors on DR Volumes

TMADOCTHOMAS
7,761 Views

Hello,

 

I have been noticing over the past six + months or so an ever increasing number of volumes on our DR cluster generating volume efficiency errors. I don't understand this as I don't run efficiency jobs in DR. The destination volumes of course include whatever savings were obtained on the source, and I do have compaction enabled on the destination, but there are no scheduled efficiency jobs running.

 

I am looking at an example now. It says an efficiency job started last night at 10:02:03 and ended at 10:02:21 with a Failure because "Operation was stopped". Changelog usage is 0%. Stale Fingerprint Percent is 1.

 

Anyone else encounter this before and/or have any ideas? It doesn't appear to be a significant problem, but the main thing I want to reduce is all the alert noise, so I can focus on legitimate issues to resolve.

13 REPLIES 13

jcolonfzenpr
7,730 Views

Any errors on the destination  sis or snapmirror_[audit,error] logs?

 

https://<cluster-mgmt-LIF>/spi/

 

https://docs.netapp.com/ontap-9/topic/com.netapp.doc.dot-cm-sag/GUID-E593FD00-D062-4649-853A-4409E282FA12.html

 

Hope this help.

 

 

Jonathan Colón | Blog | Linkedin

TMADOCTHOMAS
7,722 Views

Thanks @jcolonfzenpr , I hadn't thought of that! 

 

Here is a clean log for one volume's sis job, no errors:

 

[sid: 0] Info (sis start vault)
[sid: 1614593546] Begin (sis start)
[sid: 1614593546] Processing transfer data logs (94314583 log entries)
[sid: 1614593546] Generating transfer change logs (82548031 log entries)

[sid: 1614593546] Sort (82548031 fp entries)

[sid: 1614593546] Dedup Pass1 (69042 dup entries)
[sid: 1614593546] Dedup Pass2 (1774 dup entries)
[sid: 1614593546] Sharing (0 return status)
[sid: 1614593546] Stats (blks gathered 0,finger prints sorted 1671058788,dups found 69042,new dups found 1774,blks deduped 0,finger prints checked 0,finger prints deleted 0)
[sid: 1614593546] End (330192124 KB)

 

Then, here's another from a different time with the error:

 

[sid: 0] Info (sis start vault)
[sid: 1614720003] Begin (sis start)
[sid: 1614720003] Processing transfer data logs (86548136 log entries)
[sid: 0] Info (sis stop vault)
[sid: 0] Info (Dedupe operation is pausing)
[sid: 1614720003] Stats (blks gathered 0,finger prints sorted 0,dups found 0,new dups found 0,blks deduped 0,finger prints checked 0,finger prints deleted 0)
[sid: 0] Error (Operation was stopped )

 

As you can see, no real explanation - it just pauses and then stops. The email alert gets sent when it sees "Operation was stopped". Any ideas?

jcolonfzenpr
7,662 Views

If you have many volume with the same behavior I think its better to open a case.

 

Can you provide me this info:

 

volume efficiency show -volume <volume with error> -vserver <vserver> -instance

 

 

Jonathan Colón | Blog | Linkedin

TMADOCTHOMAS
7,607 Views

Here's an example of one that alerted this weekend. Unfortunately I can't open a case as this is our DR system that we only have under third party support.

 

Vserver Name: <vserver>
Volume Name: <volume>
Volume Path: <path>
State: Enabled
Status: Idle
Progress: Idle for 02:01:50
Type: Snapvault
Schedule: -
Efficiency Policy Name: -
Blocks Skipped Sharing: 0
Last Operation State: Success
Last Success Operation Begin: Mon Mar 08 05:47:13 2021
Last Success Operation End: Mon Mar 08 05:49:59 2021
Last Operation Begin: Mon Mar 08 05:47:13 2021
Last Operation End: Mon Mar 08 05:49:59 2021
Last Operation Size: 751.4MB
Last Operation Error: -
Changelog Usage: 0%
Logical Data Size: 5.24TB
Logical Data Limit: 640TB
Logical Data Percent: 1%
Queued Job: -
Stale Fingerprint Percentage: 1
Compression: false
Inline Compression: false
Constituent Volume: false
Inline Dedupe: false
Data Compaction: true
Cross Volume Inline Deduplication: false
Cross Volume Background Deduplication: false
Extended Compressed Data: true

jcolonfzenpr
7,597 Views

Sorry for asking too many question but:

Can you validate this?

Because i saw from your last comment this:

  • Extended Compressed Data: true
    • And this KB SnapMirror storage efficiency configurations and behavior says 
      • Since extended compressed data includes adaptive compression, enabling on a SnapMirror destination volume will result in an LRE transfer.
      • Logical Replication (LRE) - Source-side storage efficiency savings are not maintained by SnapMirror but can be re-gained at the destination
         

I have doubts about this and the reported sis errors, maybe a NetApp folks can take a look and contribute!

Jonathan Colón | Blog | Linkedin

TMADOCTHOMAS
7,571 Views

No apologies needed @jcolonfzenpr , this is very helpful! I searched through the log but didn't find a case where LRE was in use, including specifically with several of the volumes that have been alerting. Having said that, I think you may be on to something. We only just recently upgraded to OnTAP 9.5 on both source and destination, and that may have had some kind of impact regarding the new "extended compression" setting.

 

Do you know how to determine if a volume is configured to set "extended compression"? I know the one I showed you revealed there is "extended compressed" data on the target volume, but does that mean it is configured to "run" on that volume or just that the type of data is there (i.e. as transferred from the source)?

 

I don't know if I mentioned that our source is an AFF system which may help make sense of this. 

 

The second KB says "If a source volume uses Extended compressed data, the destination must be running ONTAP 9.5 or later for SnapMirror to maintain storage efficiency savings (LRSE), regardless of the destination's storage efficiency settings." We upgraded both systems to 9.5 on the same day so LRSE should be in use, and I show that it is.

 

It also says, "Since extended compressed data includes adaptive compression, enabling on a SnapMirror destination volume will result in an LRE transfer." (emphasis mine)

 

The key question here is: what does it mean by "enabling"? I couldn't find a setting that deliberately enables or disables this feature. Any additional help will be greatly appreciated!

jcolonfzenpr
7,455 Views
I think Extended Compressed Data is related to adaptive compression.
 
Can you provide this info from both source and destination?
 
volume efficiency show <volume with problem> -fields compression,state,compression-type,policy
 
also make sure you are on a the latest patch release of the version your are running.
Jonathan Colón | Blog | Linkedin

TMADOCTHOMAS
7,449 Views

Thanks @jcolonfzenpr . Results:

 

On the production system:

 

vserver         volume         state       policy                    compression-type compression
--------------- --------------- -------       -------------             ----------------             -----------
<vserver>    <volume>   Enabled eff_1000pm_01 adaptive                    true

 

On the DR system:

 

vserver         volume         state       policy                    compression-type compression
--------------- --------------- -------       -------------             ----------------             -----------
<vserver>    <volume>   Enabled  -                               adaptive                    false

jcolonfzenpr
7,445 Views

I think your configuration is ok.

what ontap 9.5 version are you running (patch releases)?

Jonathan Colón | Blog | Linkedin

TMADOCTHOMAS
7,397 Views

Thanks @jcolonfzenpr . As of a couple weeks ago we are on OnTAP 9.5P16.

TMADOCTHOMAS
6,433 Views

Hi @jcolonfzenpr , was curious if you had any other suggestions. Would love to hear anyone else's comments as well! I'm at a loss as to why this keeps happening.

jcolonfzenpr
6,427 Views

i send a pm to you.

 

Hope this helps 

Jonathan Colón | Blog | Linkedin

TMADOCTHOMAS
6,426 Views

Thanks, I saw that! I don't have bandwidth at the moment to work through joining the tool you mentioned but will see if I can do that sometime soon.

Public