Active IQ Unified Manager Discussions

WFA issue with cache updates

dcornely1
18,431 Views

Hello,

I'm running the latest version of WFA v2.1.0.70.32 in my environment and I've come across an issue I thought wouldn't exist because I'm using certified commands.  The issue is that WFA does not appear to be aware of changes it has made before the OnCommand cache updates occur.  Here is the scenario:


Step 1)

I'm creating a new CDOT export policy, export rule, and volume.  This flow works without issue and is called ss_cdot_CreateNFSVolumeWithExport

Step 2)

Before WFA and/or OnCommand has had a chance to learn about the change from the step 1 workflow via scheduled cache updates I attempt to run a workflow

that creates a new rule in the policy created from step 1.  This fails and will continue to fail until WFA's cache is updated from OnCommand and it learns about this new policy.

This flow is called ss_cdot_CreateExportRule

All the commands in both flows are certified so I had thought that would avoid this issue.  I had originally been using a modified No-Op command in the first create flow for the reasons behind this post but even after removing that single non-certified command the problem remains.  The only thing I can think of is that I created these flows in WFA v2.0 and recently upgraded to v2.1.

I'm either missing something or have encountered a bug in WFA regarding export policies and cache awareness of them, although I'm leaning towards an error I made somewhere but haven't found it yet.  I'm attaching both flows in the hopes that they will reveal where I've tripped up.  Hopefully it's something simple, thanks in advance.

-Dave

37 REPLIES 37

abhit
8,362 Views

Francois:

Do you see the Volume which you are trying to delete in the DB?

You can use a DB viewer tool and see if the volume is present in the DB.

Abhi

francoisbnc
8,362 Views

abhit,

Running filter test "Find volume by name in a given array" after Datasource acquire"

Query return volume existence, but  only when "use reservation data in test" unchecked.

francoisbnc
8,609 Views

Hi Anil,

Attached the simple workflow, note my custom command is inside, but disabled.

This morning after one night, the situation was back to normal, workflow running fine.

DFM view:

before clone:

root@gdc01093# dfm volume list |grep test

1761 gdc01148:/test_clone                 Flexible     64_bit     No        

            

after clone

root@gdc01093# dfm volume list |grep test

1764 gdc01148:/test_clone                 Flexible     64_bit     No                     

After aquire, WFA pull the old volume definition 1761, because dfm discover was not launched.

Could be an incidence?

Regards,

ag
NetApp
8,609 Views

Francois,

It could be a one-off issue. Keep a watch and if it happens again, we will raise an internal bug and address it.

Thanks,

Anil

francoisbnc
8,610 Views

That is not an one-off, I can reproduce, just with Datasource acquire now.

Seems that clean the cache for the volume concerned or change the reference back to the old cloned volume in precedent round

Francois

abhit
8,610 Views

Are you using your modified commands, where you have written the reservation/congruence scripts?

It is most likely an issue with the congruence script.

The congruence script is running late or not cleaning the reservation.

Congruence cleans the reserved entry when the entry is available in DB.

Hence when you uncheck "use reservation data" you are getting the correct result.

Regards

Abhi

francoisbnc
8,611 Views

This is happening with certified commands "clone volume", "remove volume" as well.

After acquire volume disappears from query when cache is used.

Waiting on complet cycle OCUM discover and WFA acquire, volume come back again

François

abhit
8,611 Views

Congruence script, in this case, is cleaning up after complete cycle OCUM discover and WFA acquire.

That is how the behavior is in WFA.

Regards

Abhi

francoisbnc
6,658 Views
Any update? Same behavior with ver 4.2.0.0.1, it's a pity because I'm sure with some reflection we could have an elegant fix.

yannb
8,860 Views

Just to be clear, if "ITS Clone volume on Last Snapshot" command does not have congruenceTest, you will have reservation issue.

Now, you said it is happening with certified commands as well ? Canyou provide a workflow that demonstrate the issue ? Using only certified ?

francoisbnc
8,182 Views

Hello Yann,

Effectively I had forgotten congruence in my custom command, but anyway with only certified I experience the same. Is it working for you?

Here attached my example.

yannb
7,932 Views

Did some basic tests.

First thing that hit me is that, like in a lot of situations, you need to :

options licensed_feature.multistore.enable on

Did you do that ? Otherwise the delete volume command never finds the volume.

Then after that it almost looked like it was working, but I had one situation where, indeed, it looked like the volume created by clone wasn't found, but I need to start over after I enabled multistore and do some more tests

francoisbnc
7,932 Views

I used for testing fas3160, that not support licensed_feature.multistore.enable option.

So I had to add licence multistore via licence add XXXXX.

Despite this change and after multiple tests, problem is still the same, congruence test want work for me, annoying.

Thanks.

francoisbnc
7,121 Views

Hello Yann,

 

I reborn this old  thread and I wondering if you found a workaround on this. Because in version 3.1 the problem is stilll there and it's really annoying for me.

 

Best,

François

adaikkap
9,538 Views

Hi Francois,

     Pls add your case to RFE 746319 so that it gets prioritized.

BTW can you provide some more details on your custom command ?

Regards

adai

francoisbnc
9,538 Views

Hello  adai,

Custom command is Clone Volume like with clone on last snapshot with specific suffix name.

I use this for cloning SQL environnement snapshoted with SMSQL and unic name convention -UN

Regards,

François

ag
NetApp
8,180 Views

Francois and Yann,

You guys have actually uncovered a limitation in WFA.

I was able to reproduce the issue of the "clone volume workflow" where you remove a volume and clone it again.

I digged deeper and here is what i found:

     1. There are two commands that create reservations here:

          Remove volume( if exists)

          Create Clone

     2. An important point here is that the volume that gets deleted and the clone being created have the same name.

     3. The workflow runs perfectly fine the first time and reservations are created. But at this point, the "cache update" is set to "NO" for both the reservations. Which is expected. Now if you go ahead and run data acquisition on DFM and cache acquisition in WFA, the reservation for "create clone" is cleared("cache updated" is set to YES). But, the reservation for "remove volume" is not cleared("cache updated" in still NO). This is what you guys observed.

     4. Why is that happening?

There are two conflicting reservations that are happening here. One reservation thinks "test_clone" volume does not exist(REMOVE VOLUME). The other reservation thinks "test_clone" volume exists and was newly created(CREATE CLONE).

Now, When the cache acquisition happens in WFA, the congruence test for "create clone" checks if the newly reported data contains the "test_clone" volume. It does exist, so the reservation is CLEARED. No problems here.

The congruence test for "remove volume" is expecting that "test_clone" volume is NOT present in the cache because it deleted the volume itself. But, surprise!! It is present, so it thinks that the OCUM has not yet reported that the volume is deleted and it remains in "cache updated" being NO. It will stay NO until the default period of 4hours(because OCUM will never report that the volume does not exist), which is why francois was able to come again the next day and execute it successfully.

     5.Coming to the results of the filter being weird.

When you run a filter with "Use reservation" being checked, the filter takes the WFA cache, merges it with reservation data and provides the result.

Essentially, what is happening here is that the filter takes the "test_clone" volume from WFA and applies any reservations (Remember, the "remove volume" is not cleared yet). The resultant effect is that the volume is removed.

If you remove the "use reservation" check in the filter, the "test_clone" volume is taken from WFA cache but the "remove volume" reservation is NOT run. Therefore you will see the volume in the result.

     6. How can we fix this?

This can happen to any object(not just volume) if a workflow create conflicting reservations.

To decide upon a fix and provide a workaround, i need to understand the use case here.

Why is this workflow being used exactly?

Are there similar workflows that are being used?

If not, what other workflows are being run?

How frequently will this workflow required to be run?

It will be helpful if i can get answers to those questions.

Thanks,

Anil

Public